Peter Kairouz
            Peter Kairouz is a researcher interested in machine learning, security, and privacy. At Google, he is a Research Scientist working on decentralized and privacy-preserving machine learning algorithms. Prior to Google, his doctoral and postdoctoral research have largely focused on building decentralized technologies for anonymous broadcasting over complex networks, understanding the fundamental trade-off between data privacy and utility, and leveraging state-of-the-art deep generative models for data-driven privacy. You can learn more about his background and research by visiting his  Stanford webpage. Some of his recent Google publications are listed below.
          
        
        
      Authored Publications
    
  
  
  
    
    
  
      
        Sort By
        
        
    
    
        
          
            
              Differentially Private Insights into AI Use
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Daogao Liu
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Pritish Kamath
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Alexander Knop
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Adam Sealfon
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Da Yu
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Chiyuan Zhang
                      
                    
                  
              
            
          
          
          
          
            Conference on Language Modeling (COLM) 2025 (2025)
          
          
        
        
        
          
              Preview abstract
          
          
              We introduce Urania, a novel framework for generating insights about LLM chatbot interactions with rigorous differential privacy (DP) guarantees. The framework employs a private clustering mechanism and innovative keyword extraction methods, including frequency-based, TF-IDF-based, and LLM-guided approaches. By leveraging DP tools such as clustering, partition selection, and histogram-based summarization, Urania provides end-to-end privacy protection. Our evaluation assesses lexical and semantic content preservation, pair similarity, and LLM-based metrics, benchmarking against a non-private method inspired by CLIO (Tamkin et al., 2024). Moreover, we develop a simple empirical privacy evaluation that demonstrates the enhanced robustness of our DP pipeline. The results show the framework’s ability to extract meaningful conversational insights while maintaining stringent user privacy, effectively balancing data utility with privacy preservation.
              
  
View details
          
        
      
    
        
        
          
              Preview abstract
          
          
              Differentially private (DP) synthetic data is a versatile tool for enabling the analysis of private data. With the rise of foundation models, a number of new synthetic data algorithms privately finetune the weights of foundation models to improve over existing approaches to generating private synthetic data. In this work, we propose two algorithms for using API access only to generate DP tabular synthetic data. We extend the Private Evolution algorithm \citep{lin2023differentially, xie2024differentially} to the tabular data domain, define a workload-based distance measure, and propose a family of algorithms that use one-shot API access to LLMs.
              
  
View details
          
        
      
    
        
        
          
              Preview abstract
          
          
              Differentially private (DP) synthetic data is a versatile tool for enabling the analysis of private data. With the rise of foundation models, a number of new synthetic data algorithms privately finetune the weights of foundation models to improve over existing approaches to generating private synthetic data. In this work, we propose two algorithms for using API access only to generate DP tabular synthetic data. We extend the Private Evolution algorithm \citep{lin2023differentially, xie2024differentially} to the tabular data domain, define a workload-based distance measure, and propose a family of algorithms that use one-shot API access to LLMs.
              
  
View details
          
        
      
    
        
        
          
              Preview abstract
          
          
              Service providers of large language model (LLM) applications collect user instructions in the wild and use them in further aligning LLMs with users’ intentions. These instructions, which potentially contain sensitive information, are annotated by human workers in the process. This poses a new privacy risk not addressed by the typical private optimization. To this end, we propose using synthetic instructions to replace real instructions in data annotation and model fine-tuning. Formal differential privacy is guaranteed by generating those synthetic instructions using privately fine-tuned generators. Crucial in achieving the desired utility is our novel filtering algorithm that matches the distribution of the synthetic instructions to that of the real ones. In both supervised fine-tuning and reinforcement learning from human feedback, our extensive experiments demonstrate the high utility of the final set of synthetic instructions by showing comparable results to real instructions. In supervised fine-tuning, models trained with private synthetic instructions outperform leading open-source models such as Vicuna
              
  
View details
          
        
      
    
        
          
            
              Improved Communication-Privacy Trade-offs in L2 Mean Estimation under Streaming Differential Privacy
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Wei-Ning Chen
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Albert No
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sewoong Oh
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Zheng Xu
                      
                    
                  
              
            
          
          
          
          
            International Conference on Machine Learning (ICML) (2024)
          
          
        
        
        
          
              Preview abstract
          
          
                  We study $L_2$ mean estimation under central differential privacy and communication constraints, and address two key challenges: firstly, existing mean estimation schemes that simultaneously handle both constraints are usually optimized for $L_\infty$ geometry and rely on random rotation or Kashin's representation to adapt to $L_2$ geometry, resulting in suboptimal leading constants in mean square errors (MSEs); secondly, schemes achieving order-optimal communication-privacy trade-offs do not extend seamlessly to streaming differential privacy (DP) settings (e.g., tree aggregation or matrix factorization), rendering them incompatible with DP-FTRL type optimizers.
    In this work, we tackle these issues by introducing a novel privacy accounting method for the sparsified Gaussian mechanism that incorporates the randomness inherent in sparsification into the DP noise. Unlike previous approaches, our accounting algorithm directly operates in $L_2$ geometry, yielding MSEs that fast converge to those of the uncompressed Gaussian mechanism. Additionally, we extend the sparsification scheme to the matrix factorization framework under streaming DP and provide a precise accountant tailored for DP-FTRL type optimizers. Empirically, our method demonstrates at least a 100x improvement of compression for DP-SGD across various FL tasks. 
              
  
View details
          
        
      
    
        
          
            
              Federated Learning of Gboard Language Models with Differential Privacy
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Zheng Xu
                      
                    
                
              
            
              
                
                  
                    
                    
                      
                        Yanxiang Zhang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Galen Andrew
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Christopher Choquette
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jesse Rosenstock
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yuanbo Zhang
                      
                    
                  
              
            
          
          
          
          
            ACL industry track (2023) (to appear)
          
          
        
        
        
          
              Preview abstract
          
          
              We train language models (LMs) with federated learning (FL) and differential privacy (DP) in the Google Keyboard (Gboard). We apply the DP-Follow-the-Regularized-Leader (DP-FTRL)~\citep{kairouz21b} algorithm to achieve meaningfully formal DP guarantees without requiring uniform sampling of client devices. 
To provide favorable privacy-utility trade-offs, we introduce a new client participation criterion and discuss the implication of its configuration in large scale systems. We show how quantile-based clip estimation~\citep{andrew2019differentially} can be combined with DP-FTRL to adaptively choose the clip norm during training or reduce the hyperparameter tuning in preparation for training. 
With the help of pretraining on public data, we train and deploy more than twenty Gboard LMs that achieve high utility and $\rho-$zCDP privacy guarantees with $\rho \in (0.2, 2)$, with two models additionally trained with secure aggregation~\citep{bonawitz2017practical}.
We are happy to announce that all the next word prediction neural network LMs in Gboard now have DP guarantees, and all future launches of Gboard neural network LMs will require DP guarantees. 
We summarize our experience and provide concrete suggestions on DP training for practitioners.
              
  
View details
          
        
      
    
        
          
            
              Practical and Private (Deep) Learning without Sampling or Shuffling
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
    
    
    
        
         
          
  
Preview
        
    
  
                      
                        Om Thakkar
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Abhradeep Thakurta
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Zheng Xu
                      
                    
                  
              
            
          
          
          
          
            38th International Conference on Machine Learning (ICML 2021) (2021) (to appear)
          
          
        
        
        
          
              Preview abstract
          
          
              Building privacy-preserving systems for machine learning and data science on decentralized data
              
  
View details
          
        
      
    
        
          
            
              A Field Guide to Federated Optimization
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Jianyu Wang
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Zheng Xu
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Gauri Joshi
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Maruan Al-Shedivat
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Galen Andrew
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        A. Salman Avestimehr
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Katharine Daly
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Deepesh Data
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Suhas Diggavi
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Hubert Eichner
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Advait Gadhikar
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Antonious M. Girgis
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Filip Hanzely
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Chaoyang He
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Samuel Horvath
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Martin Jaggi
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Tara Javidi
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Satyen Chandrakant Kale
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sai Praneeth Karimireddy
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jakub Konečný
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sanmi Koyejo
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Tian Li
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Peter Richtarik
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Karan Singhal
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Virginia Smith
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Mahdi Soltanolkotabi
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Weikang Song
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sebastian Stich
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ameet Talwalkar
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Hongyi Wang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Blake Woodworth
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Honglin Yuan
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Manzil Zaheer
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Mi Zhang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Tong Zhang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Chunxiang (Jake) Zheng
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Chen Zhu
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            arxiv (2021)
          
          
        
        
        
          
              Preview abstract
          
          
              Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection. The distributed learning process can be formulated as solving federated optimization problems, which emphasize communication efficiency, data heterogeneity, compatibility with privacy and system requirements, and other constraints that are not primary considerations in other problem settings. This paper provides recommendations and guidelines on formulating, designing, evaluating and analyzing federated optimization algorithms through concrete examples and practical implementation, with a focus on conducting effective simulations to infer real-world performance. The goal of this work is not to survey the current literature, but to inspire researchers and practitioners to design federated learning algorithms that can be used in various practical applications.
              
  
View details
          
        
      
    
        
          
            
              Privacy-first Health Research with Federated Learning
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Adam Sadilek
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Dung Nguyen
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Methun Kamruzzaman
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Benjamin Rader
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Stefan Mellem
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Elaine O. Nsoesie
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jamie MacFarlane
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Anil Vullikanti
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Madhav Marathe
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Paul C. Eastham
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        John S. Brownstein
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            npj Digital Medicine (2021)
          
          
        
        
        
          
              Preview abstract
          
          
              Privacy protection is paramount in conducting health research. However, studies often rely on data stored in a centralized repository, where analysis is done with full access to the sensitive underlying content. Recent advances in federated learning enable building complex machine-learned models that are trained in a distributed fashion. These techniques facilitate the calculation of research study endpoints such that private data never leaves a given device or healthcare system. We show—on a diverse set of single and multi-site health studies—that federated models can achieve similar accuracy, precision, and generalizability, and lead to the same interpretation as standard centralized statistical models while achieving considerably stronger privacy protections and without significantly raising computational costs. This work is the first to apply modern and general federated learning methods that explicitly incorporate differential privacy to clinical and epidemiological research—across a spectrum of units of federation, model architectures, complexity of learning tasks and diseases. As a result, it enables health research participants to remain in control of their data and still contribute to advancing science—aspects that used to be at odds with each other.
              
  
View details