Toward provably private insights into AI use

October 30, 2025

Artem Lagzdin, Software Engineer, and Daniel Ramage, Research Director, Google Research

We detail how confidential federated analytics technology is leveraged to understand on-device generative AI features, ensuring strong transparency in user data handling and analysis.

Generative AI (GenAI) enables personalized experiences and powers the creation of unstructured data, including summaries, transcriptions, and more. Insights into real-world AI use [1, 2] can help GenAI developers enhance their tools by understanding common applications and identifying failure modes. And especially when those tools are applied to on-device data, our goal is to offer increasingly robust privacy guarantees during the insight generation process. This post introduces provably private insights (PPI), a new north star for generating dynamic insights into how people use LLMs and GenAI tools while guaranteeing that individual data is not inspectable and that aggregate insights are anonymous.

Today we announce a first-of-its kind PPI system that leverages the power of large language models (LLMs), differential privacy (DP), and trusted execution environments (TEEs) to analyze unstructured GenAI data. This system proves that server-side processing is limited to privacy-preserving computations and can be fully externally inspected. With our system, GenAI tool developers can analyze interactions using a “data expert” LLM, tasked with answering questions like “what topic is being discussed?” or “is the user frustrated?” The LLM’s answers are aggregated with DP to provide a comprehensive view of GenAI feature usage across the user population without exposing unaggregated data. The “data expert” LLM itself resides within the TEE. PPI is enabled by confidential federated analytics (CFA), a technique first deployed in Gboard, where open source analysis software runs in TEEs, offering complete transparency into the mechanisms and privacy properties of server-side data processing. Our deployment of PPI in the Recorder application for Pixel leverages Google’s latest open-source Gemma models as the “data expert” to offer insights into Recorder usage.

To encourage the external community to verify our claims, we’ve open-sourced LLM-powered privacy preserving insights as part of confidential federated analytics in Google Parfait, along with the rest of our TEE-hosted confidential federated analytics stack.

How provably private insights are possible

Google’s CFA leverages confidential computing to protect unaggregated user data during processing, and only releases outputs with a formal (user-level) DP guarantee. CFA provides strong data isolation and anonymization guarantees regardless of what query an analyst runs.

In this technique, user devices first decide what data should be uploaded for analysis. Devices encrypt and upload this data, along with the processing steps that the server is authorized to use for decryption. Uploaded data is protected with encryption keys managed by a TEE-hosted key management service, which releases decryption keys only to device-approved processing steps. Devices can verify that the key management service is the expected open source code (included in a public, tamper-resistant transparency log, Rekor), and that the code is running in a properly configured TEE that is inaccessible to Google. The key management service in turn verifies that the approved, public processing steps are running in TEEs. No other analyses can be performed on the data and no human can access data from individual devices.

ProvablyPrivateInsights_Overview

Private insights are derived by passing the data through a well-defined set of processing steps. First, unstructured raw data is analyzed by an LLM tasked with extracting the answer to a specific question, such as the category or topic of the input (“structured summarization”). Processing begins by using an open-source Gemma 3 model to classify transcripts into categories of interest. These classes are then summed to compute a histogram of topics with differentially private noise guaranteeing that the output histogram cannot be strongly influenced by any one user. The LLM’s prompt can be changed frequently, because the DP guarantee applies to the aggregation algorithm regardless of the LLM prompt. Even if the developer asked a question designed to single out one user, the differential private statistics would not reveal it.

All privacy-relevant parts of our system are open source and reproducibly buildable — from the private aggregation algorithm to the TEE stack — and the LLM itself is also open source. The signatures of the workflows used to analyze the data are also public. When combined with TEEs' ability to attest to the state of the system running the software, every part of the data processing pipeline can be verifiably linked to published code. This provides external parties the ability to verify our privacy claims. This commitment to end-to-end verifiability is how the system makes progress toward being provable — we anchor on this capability, allowing third parties to inspect the open-source code and confirm that it is exactly the code we claim to run, thereby proving to clients that this is the only code their data will be processed with, subject to known weaknesses in current-generation TEEs.

In short, provably private insights can be generated by an LLM-powered structured summarization workflow in confidential federated analytics. The combination of structured summarization with differentially private histogram generation enables deeper understanding into how the GenAI tools are used in the real world, all while guaranteeing privacy. Technical details of the system can be found in the whitepaper.

How provably private insights are used in Recorder

Google’s Recorder app on Pixel offers powerful AI features, such as transcription, summarization, and speaker labeling. A key challenge for the application developers is to understand how users interact with these features. For instance, are users creating "Notes to self," "Reminders," or recording "Business meetings"? Traditional count-based analytics are insufficient to analyze such data without the help of structured summarization or another form of classification. In a traditional setting, a system would log these transcripts to a central server for classification, and then run (differentially) private count queries on the results. PPI operates in a similar way but without the risk of data being used for any other purpose.

In the Recorder application, a subset of transcripts (from users who have enabled “Improve for everyone” in settings) are encrypted with a public key managed by a central TEE-hosted keystore protected via Google’s Project Oak attestation stack running on AMD Secure Encrypted Virtualization-Secure Nested Paging (SEV-SNP) CPUs. The keystore ensures that the uploaded data can be decrypted only by pre-approved processing steps, themselves attested to running the expected processing steps in TEEs. A Gemma 3 4B model running within the AMD SEV-SNP TEE classifies the transcripts into topics, which are then aggregated with differential privacy. External parties can verify that raw transcripts never leave the secure environment of the TEE, and only private sums of the summarized output categories are released to Google.

ProvablyPrivateInsights2_Example

An example differentially private distribution of Recorder transcripts across various topics, as categorized by Gemma. Inner rectangle size is proportional to relative topic frequency.

PPI can also help evaluate the performance of on-device GenAI features, such as the accuracy of summaries generated by Recorder. Instead of relying solely on synthetic data, which may not accurately represent real-world use, CFA can run an LLM auto-rater as a part of the structured summarization component. This auto-rater LLM also resides within the TEE and can assess the results of the on-device model, ensuring a more accurate and privacy-preserving evaluation. This allows developers to fine-tune the on-device model based on real user interactions without compromising individual privacy.

The configuration we’re running in Recorder is available in our GitHub repository which can be connected to the specific code paths and privacy guarantees by following these instructions. The Recorder configuration guarantees that whatever LLM query is run, it is passed through the auto-tuned DP histogram aggregator with strict privacy guarantees (user-level ε = 1 used in the figure above).

What’s next?

This work demonstrates that provably private insights are possible: real-world GenAI tool use is analyzed with LLMs and then aggregated into differentially private statistics, all with full transparency into the server-side processing steps. Every step of the insight generation process has been designed to offer state-of-the-art data isolation and anonymization, and external verifiers can check the source code of the methods and the proof that we run them.

Moreover, we’ve shared LLM-powered structured summarization as a first application. We expect others, including differentially private clustering and synthetic data generation to follow, all with the same level of verifiability and confidentiality. And with future work to enable confidential use of higher-throughput accelerators such as Google TPUs, richer analyses will become possible, including detailed transcript analysis and auto-rating. Insight generation is now possible without exposing sensitive user data outside of the confidential computation boundary, and with strong user-level DP guarantees for generated insights. We are excited that the technology for provably private insights is maturing just as GenAI tools are beginning to apply to on-device and sensitive-data experiences.

Acknowledgements

We thank the teams at Google that helped with algorithm design, infrastructure implementation, and production maintenance of this system, in particular teams led by Marco Gruteser, Peter Kairouz, and Timon Van Overveldt, with product manager Prem Eruvbetine, including: Albert Cheu, Brett McLarnon, Chunxiang (Jake) Zheng, Edo Roth, Emily Glanz, Grace Ni, James Bell-Clark, Katharine Daly, Krzysztof Ostrowski, Maya Spivak, Mira Holford, Nova Fallen, Rakshita Tandon, Ren Yi, Stanislav Chiknavaryan, Stefan Dierauf, Steve He, and Zoe Gong. We also thank close partners who supported this system through technologies and the Recorder integration, including: Allen Su, Austin Hsu, Console Chen, Daniel Minare Ho, Dennis Cheng, Jan Wassenberg, Kristi Bradford, Ling Li, Mina Askari, Miranda Huang, Tam Le, Yao-Nan Chen, and Zhimin Yao. This work was supported by Corinna Cortes, Jay Yagnik, Ramanan Rajeswaran, Seang Chau, and Yossi Matias. We additionally thank Peter Kairouz, Marco Gruteser, Mark Simborg, and Kimberly Schwede for feedback and contributions to the writing of this post.