Parfait: Enabling private AI with research tools
January 22, 2025
Chloé Kiddon, Software Engineer, and Prem Eruvbetine, Senior Product Manager
We introduce Parfait, which stands for for “private aggregation & retrieval, federated, analytics, inference, & training”, a new GitHub organization from Google Research that showcases technologies for private AI.
Quick links
As generative AI advances, the need for more private AI systems that protect users' data and give them control throughout the AI lifecycle remains a top priority. At Google, we bake privacy into our AI development and use. With some AI systems relying on user data to perform helpful tasks — like understanding the user’s surroundings or acting on personal information — advancing privacy-preserving technologies remains essential in an era of innovation to safeguard personal data while fostering trust in the technologies that drive progress.
We’re excited to announce Parfait — which stands for “private aggregation & retrieval, federated, analytics, inference, & training” — a GitHub organization (i.e., a shared account where businesses and open-source projects can securely collaborate across many projects at once) that we have developed at Google to demonstrate our state-of-the-art methods across four privacy pillars:
- Transparency, which shows the data that are being used and how
- Data minimization, which includes federated learning, federated analytics, and secure aggregation
- Data anonymization, which includes differential privacy algorithms for model training, model fine-tuning, heavy hitter discovery, and histogram estimation
- External verifiability, which uses trusted execution environment (TEE) workflows that allow users or other external parties to verify privacy claims
Parfait has been used to provide research and production code for Google deployments of federated learning and analytics from Gboard to Android’s Private Compute Core, to Google Maps. We are releasing Parfait open-source repositories to advance private AI by helping define and execute machine learning (ML) and analytics computations and workflows under a variety of settings that enable strong privacy claims consistent with users’ privacy expectations. This blog post explains how and why Parfait was created, the repositories, and real-world Parfait use cases.
The Parfait journey
Parfait evolved from technologies for federated learning and analytics. Federated learning, which Google introduced in 2016, is an innovative, privacy-enhancing approach that enables developers to train ML models across many devices without centralized data collection, ensuring that only the user has a copy of their data. Since then, Federated Learning has been used to enhance the privacy of many experiences like expressions in Gboard and improving the quality of smart replies in Android Messages.
In 2020, Google introduced federated analytics, which was built on federated learning by applying data science methods to perform analysis of raw data stored locally on users’ devices in private and secure ways.
Since then, Google has opened the door for greater collaboration with the external privacy community by open sourcing TensorFlow Federated, a framework for federated computation and libraries and a reference architecture for cross-device federated computation, as well as collaborating on TensorFlow Privacy.
These building blocks and references led to the development of a Google Cloud architecture for cross-silo and cross-device federated learning and Privacy Sandbox’s Federated Compute server for on-device-personalization.
Recognizing the increased value Google technologies provide when interconnected and evolved in unison, we created Parfait to support these deployments and the research advances driving them.
Parfait repositories
Parfait repositories, which have more than 100 contributors, show some of Google’s key privacy-preserving technologies in practice.
The repositories include:
- federated-language: Our core language for concisely expressing novel federated algorithms with distributed communication operators within a strongly-typed functional programming environment. This is also usable with any ML framework (e.g., JAX, TensorFlow). Previously part of our TensorFlow Federated framework, this foundational piece, upon which rests our open-sourced learning and analytics algorithms, has been fully decoupled from TensorFlow, making it truly platform independent.
- tensorflow-federated: A set of high-level interfaces, algorithms, and ML platform integrations that allow developers to apply the included implementations of federated learning or federated analytics to their existing TensorFlow or JAX models.
- federated-compute: Code for executing cross-device federated programs and computations, including Android client libraries, as well as a reference end-to-end demo that lays out the core pieces of a cross-device architecture for federated compute. Check out our federated learning at scale whitepaper.
- confidential-federated-compute: Publicly verifiable components that run within TEEs and interact with user data to enable federated learning and analytics using confidential computing. Check out our Confidential Federated Computations white paper for more information.
- trusted-computations-platform: Publicly verifiable components that run within secure enclaves and interact with user data to enable stateful rollback protected replicated computations.
- raft-rs: Raft distributed consensus algorithm implemented in Rust, used by trusted-computations-platform.
- dataset_grouper: Libraries for efficient and scalable group-structured dataset pipelines.
Parfait in action
While Parfait remains an evergreen space for research advancements to be driven into products (at Google and beyond), Google product teams are using it in real-world deployments. For example, Gboard has used technologies in Parfait to improve user experiences, launching the first neural-net models trained using federated learning with formal differential privacy and expanding its use. They also continue to use federated analytics to advance Gboard’s out-of-vocab words for less common languages.
The on-device personalization module, which is in a limited testing phase as part of the Privacy Sandbox initiative on Android, helps to protect user information from businesses with whom they haven’t interacted. It provides an open-source, federated compute platform to coordinate cross-device ML and statistical analysis for its adopters. The module’s team, referencing and depending on different parts of Parfait, has launched a preview version of an open-source federated compute service that can be deployed on a TEE-based cloud service.
More recently we previewed our novel approach to using CPU TEEs to enable Android devices to verify the exact version of the server-side software that may decrypt uploaded messages. This approach builds on Project Oak and a software keystore hosted in our new trusted computations platform. This new platform guarantees that uploaded data can be decrypted only by the expected server side workflow (anonymizing aggregation) in an expected virtual machine, running in a TEE backed by a CPU’s cryptographic attestation (e.g., AMD or Intel). Parfait’s confidential federated computations repository implements this code, leveraging state-of-the-art differential privacy aggregation primitives in the TensorFlow Federated repository.
Conclusion
As part of our commitment to privacy-preserving technology, our hope is that Parfait makes it easier for researchers and developers to see how some of these key techniques work in practice, and we hope they inspire future collaborations and advances in other frameworks.
We believe strong, formal privacy guarantees are increasingly practical in real-world deployments, and we are committed to making our approaches and innovations open to the public. We encourage privacy engineers and researchers outside of Google to make their approaches public, too, and are excited about the potential for continued and further collaborations across industry and with academia.
Acknowledgements
Special thanks to Michael Reneer for his critical contributions in setting up Parfait. Direct contributors to work on Parfait repositories include Galen Andrew, Isha Arkatkar, Sean Augenstein, Amlan Chakraborty, Zachary Charles, Stanislav Chiknavaryan, DeWitt Clinton, Taylor Cramer, Katharine Daly, Stefan Dierauf, Randy Dodgen, Hubert Eichner, Nova Fallen, Ken Franko, Zachary Garrett, Emily Glanz, Zoe Gong, Suxin Guo, Wolfgang Grieskamp, Mira Holford, Dzmitry Huba, Vladimir Ivanov, Peter Kairouz, Yash Katariya, Jakub Konečný, Artem Lagzdin, Hui Li, Stefano Mazzocchi, Brett McLarnon, Sania Nagpal, Krzysztof Ostrowski, Michael Reneer, Jason Roselander, Keith Rush, Karan Singhal, Maya Spivak, Rakshita Tandon, Hardik Vala, Timon Van Overveldt, Scott Wegner, Shanshan Wu, Yu Xiao, Zheng Xu, Ren Yi, Chunxiang Zheng, and Wennan Zhu. We would also like to thank external contributors and collaborators on TensorFlow Federated throughout the years. The private AI research program represented in these repositories is steered by Daniel Ramage and Brendan McMahan, with sponsorship from Corinna Cortes, Blaise Aguera y Arcas, and Yossi Matias.