Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10827 publications
Preview abstract
For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step.
View details
Preview abstract
AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization.
View details
Unprecedented Insights into Maternal Sleep: A Large-scale Longitudinal Analysis of Real-world Wearable Device Data Before, During, and After Pregnancy
Nichole Young-Lin
Conor Heneghan
Logan Schneider
Logan Niehaus
Ariel Haney
Karla Gleichauf
Jacqueline Shreibati
Belen Lafon
Lancet eBioMedicine (2025)
Preview abstract
Introduction: Current understanding of pregnancy and postpartum sleep is driven by limited lab or self-reported data. Consumer wearable devices may help reveal longitudinal, real-world sleep patterns.
Methods: We analyzed de-identified wearable device data from 2,540 users in the United States and Canada who met strict wear-time requirements (≥80% daily usage for ≥80% of the time periods of interest [12 weeks prepregnancy, throughout pregnancy, and 20 weeks immediately postpartum]). We tracked sleep time and staging using Fitbit devices.
Results: Compared to prepregnancy, total sleep time (TST) increased from an average of 425.3±43.5 min to a peak of 447.6±47.6 min at gestational week 10 with ongoing declines throughout pregnancy. Time in bed (TIB) followed a similar pattern. Increased light sleep drove the initial TST rise. Deep and REM sleep decreased significantly throughout pregnancy, with maximum reductions of 19.2±13.8 min (p<0.01) and 9.0±19.2 min (p<0.01) respectively by pregnancy end. Sleep efficiency also declined slightly during pregnancy (median drop from 88.3% to 86.8%). After delivery, TIB remained below the prepregnancy baseline by 14.7±45.7 min at one year postpartum and 15.2±47.7 min at 1.5 years postpartum.
Conclusion: This unprecedented look at large-scale, real-world sleep and pregnancy patterns revealed a previously unquantified initial increase in sleep followed by decreases in both quantity and quality as pregnancy progresses. Sleep deficits persist for at least 1.5 years postpartum. These quantified trends can assist clinicians and patients in understanding what to expect.
View details
ECG-Nest-FM: A Frequency-Focused ECG Foundation Model with Nested Embeddings
Abhishek Sharma
Lin Yang
ICLR 2025 Workshop MLGenX (2025)
Preview abstract
Electrocardiograms (ECGs) are fundamental to cardiac diagnostics, providing noninvasive insights into cardiovascular conditions. Recent advancements in deep learning have led to foundation models (FMs) capable of learning powerful representations of ECG signals. However, these models often fail to fully exploit the periodic nature and diagnostic frequency bands of ECGs, leading to inefficiencies in computational cost and interpretability. We propose a novel ECG foundation model that learns nested embeddings, where each subset of dimensions encodes progressively higher-frequency information. By explicitly modeling frequency structures and applying a correlation penalty, the method achieves compact, high-rank representations that reduce model size without sacrificing performance. We evaluate our approach on two large-scale datasets for embedding redundancy and prediction performance on downstream clinical tasks such as arrhythmia classification, and cardiac condition detection. We observe similar prediction performance AUROC scores and lower embedding redundancy, offering a computationally efficient and interpretable framework for ECG analysis. Finally, the representations obtained from our model in UK Biobank data capture known cardiovascular variants and detect novel loci, which can be applied to drug discovery.
View details
Shadow Hamiltonian Simulation
Rolando Somma
Robbie King
Tom O'Brien
Nature Communications, 16 (2025), pp. 2690
Preview abstract
Simulating quantum dynamics is one of the most important applications of quantum computers. Traditional approaches for quantum simulation involve preparing the full evolved state of the system and then measuring some physical quantity. Here, we present a different and novel approach to quantum simulation that uses a compressed quantum state that we call the "shadow state". The amplitudes of this shadow state are proportional to the time-dependent expectations of a specific set of operators of interest, and it evolves according to its own Schrödinger equation. This evolution can be simulated on a quantum computer efficiently under broad conditions. Applications of this approach to quantum simulation problems include simulating the dynamics of exponentially large systems of free fermions or free bosons, the latter example recovering a recent algorithm for simulating exponentially many classical harmonic oscillators. These simulations are hard for classical methods and also for traditional quantum approaches, as preparing the full states would require exponential resources. Shadow Hamiltonian simulation can also be extended to simulate expectations of more complex operators such as two-time correlators or Green's functions, and to study the evolution of operators themselves in the Heisenberg picture.
View details
Ethical Co-Development of AI Applications with Indigenous Communities
Claudio Pinhanez
Edem Wornyo
(2025) (to appear)
Preview abstract
This course explores how researchers and practitioners can engage ethically with Indigenous communities when
developing AI- and data-intensive applications. Some key issues such as fair engagement, legal constraints, reciprocity, and informed consent are discussed based on the examples drawn from the instructors’ experience. The course also examines good practices in terms of co-designing and co-development processes, data governance and sovereignty issues and systems, decolonial software licensing, and processes of technology transfer and appropriation. In its practical part, the course critically discusses examples and cases gathered from the audience to explore the diversity of issues and solutions when working with Indigenous communities.
View details
Preview abstract
The quest to identify quantum advantages, where quantum physics truly outperforms classical physics, lies at the heart of quantum technology. While quantum devices promise extraordinary capabilities, from exponential computational speedups to unprecedented measurement precision, distinguishing genuine advantages from mere illusions remains a formidable challenge. In this endeavor, quantum theorists are like prophets trying to foretell a future where quantum technologies reign supreme. Yet, the boundary between visionary insight and unfounded fantasy is perilously thin. In this perspective, we explore the properties defining an ideal quantum advantage and examine our mathematical tools for navigating the vast world of quantum advantages across computation, learning, sensing, communication, and beyond. We show that some quantum advantages are inherently unpredictable using classical resources alone, suggesting a landscape far richer than what we can currently foresee. While mathematical rigor remains our indispensable guide in this exploration, the ultimate power of quantum technologies may emerge from the quantum advantages we cannot yet conceive.
View details
Preview abstract
Differentially private (DP) synthetic data is a versatile tool for enabling the analysis of private data. With the rise of foundation models, a number of new synthetic data algorithms privately finetune the weights of foundation models to improve over existing approaches to generating private synthetic data. In this work, we propose two algorithms for using API access only to generate DP tabular synthetic data. We extend the Private Evolution algorithm \citep{lin2023differentially, xie2024differentially} to the tabular data domain, define a workload-based distance measure, and propose a family of algorithms that use one-shot API access to LLMs.
View details
Preview abstract
Visual in-context learning (VICL), as a new paradigm in computer vision, allows the model to rapidly adapt to various tasks with only a handful of prompts and examples. While effective, the existing VICL paradigm exhibits poor generalizability under distribution shifts. In this work, we propose test-time visual in-context tuning (VICT), a method that can learn adaptive VICL models on the fly with a single test sample. Specifically, We flip the role between task prompts and the test sample and use a cycle consistency loss to reconstruct the original task prompt output. Our key insight is that a model should be aware of a new test distribution if it can successfully recover the original task prompts. Extensive experiments on seven representative vision tasks with 15 corruptions demonstrate that our VICT can improve the generalizability of VICL to unseen new domains
View details
AndroidWorld: An Open World for Autonomous Agents
Jonathan Waltz
Marybeth Fair
Daniel Toyama
Will Bishop
Sarah Clinckemaillie
Timothy Lillicrap
Chris Rawles
Robert Berry
Gabrielle Lau
Divya Tyam
Yifan Chang
Alice Li
Folawiyo Campbell-Ajala
Wei Li
ICLR 2025 (2025)
Preview abstract
Autonomous computer control agents that execute human tasks by controlling user interfaces (UIs) are emerging. Such agents would be valuable for humans, and progress in the field will be driven by realistic and reproducible benchmarks. We present AndroidWorld, a fully-functioning Android environment that pro-vides reward signals across 20 apps on 114 programmatic tasks. Instead of a static test set, the tasks in AndroidWorld parameterized, allowing for unlimited variation in language and task parameters. Reward signals are derived from An-droid system state, making them highly durable and extensible across different applications. To demonstrate AndroidWorld's extensibility, we integrate the popular MiniWoB++ into it.To evaluate AndroidWorld, we introduce a new multimodal autonomous agent for Android, M3A. Our agent achieves a 27% success rate leaving ample room for future work. Furthermore, we adapt a popular desktop web agent for Android, which we find to be less effective on mobile, suggesting future research is needed to build universal, cross-domain agents. Finally, we conduct robustness testing by testing M3A against a suite of real-world variations on a representative subset of tasks. AndroidWorld and the experiments in this paper are available at https://https//github.com/google-research/android-world:
View details
Synthesizing and Adapting Error Correction Data for Mobile Large Language Model Applications
Yanxiang Zhang
Zheng Xu
Yuanbo Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track) (2025)
Preview abstract
Error correction is an important capability when applying large language models (LLMs) to facilitate user typing on mobile devices. In this paper, we use LLMs to synthesize a high-quality dataset of error correction pairs to evaluate and improve LLMs for mobile applications. We first prompt LLMs with error correction domain knowledge to build a scalable and reliable addition to the existing data synthesis pipeline. We then adapt the synthetic data distribution to match the mobile application domain by reweighting the samples. The reweighting model is learnt by predicting (a handful of) live A/B test metrics when deploying LLMs in production, given the LLM performance on offline evaluation data and scores from a small privacy-preserving on-device language model. Finally, we present best practices for mixing our synthetic data with other data sources to improve model performance on error correction in both offline evaluation and production live A/B testing.
View details
Inside-Out: Hidden Factual Knowledge in LLMs
Eyal Ben David
Eran Ofek
Hadas Orgad
Zorik Gekhman
Roi Reichart
Yonatan Belinkov
2025
Preview abstract
This work presents a framework for assessing whether large language models (LLMs) encode more factual knowledge in their parameters than what they express
in their outputs. While a few studies hint at this possibility, none has clearly defined or demonstrated this phenomenon. We first propose a formal definition of knowledge, quantifying it for a given question as the fraction of correct-incorrect answer pairs where the correct one is ranked higher. This gives rise to external and internal knowledge, depending on the information used to score individual answer candidates: either the model’s observable token-level probabilities or its intermediate computations. Hidden knowledge arises when internal knowledge exceeds external knowledge. We then present a case study, applying this framework to three popular open-weights LLMs in a closed-book QA setup. Our results indicate that: (1) LLMs consistently encode more factual knowledge internally than what they express externally, with an average gap of 40%. (2) Surprisingly, some knowledge is so deeply hidden that a model can internally know an answer perfectly, yet fail to generate it even once, despite large-scale repeated sampling of 1,000 answers. This reveals fundamental limitations in the generation capabilities of LLMs, which (3) puts a practical constraint on scaling test-time compute via repeated answer sampling in closed-book QA: significant performance improvements remain inaccessible because some answers are practically never sampled, yet if they were, we would be guaranteed to rank them first.
View details
An Empirical Study of Time of Day Breakpoints in Traffic Light Plans
Eliav Buchnik
Tom Kalvari
Jack Haddad
Dan Karliner
Danny Veikherman
Shai Ferster
Ori Rottenstreich
2025
Preview abstract
Fixed time strategy is a common approach in signal traffic control in which signal plans are simple and periodic, enjoying easy implementation without detection mechanisms. A traffic light is associated with several daily plans, each applied to several consecutive hours. Time-of-day breakpoints (TODs) refer to the times over the day in which the plan is changed. TODs are often selected based on traffic, aiming to divide the day into groups of consecutive hours with similar traffic characteristics within each group of hours. We present a methodology to study time-of-day breakpoints in practice. We use this methodology to estimate and analyze time-of-day breakpoints in the city of Rio de Janeiro, Brazil based on traffic properties derived from traffic trajectories. Our study examines over 900 of the city intersections. We refer to properties such as the number of daily plans and the times by which plans start. We also provide traffic-aware insights on the potential improvement in the selection of TODs and identify key intersections where adjusting TODs could reduce average delay times. We identify potential improvements in over 8% of the examined intersections. These findings provide valuable insights for traffic engineers seeking to optimize signal timing.
View details
Beyond Retrieval: Generating Narratives in Conversational Recommender Systems
Krishna Sayana
Raghavendra Vasudeva
Yuri Vasilevski
Kun Su
Liam Hebert
James Pine
Hubert Pham
Ambarish Jash
Sukhdeep Sodhi
(2025)
Preview abstract
Large Language Models (LLMs) have shown remarkable progress in generating human-quality text and engaging in complex reasoning. This presents a unique opportunity to revolutionize conversational recommender systems by enabling them to generate rich, engaging and personalized narratives that go beyond recommendations. However, the lack of suitable datasets limits research in this area. This paper addresses this challenge by making two key contributions.
First, we introduce REGEN Reviews Enhanced with GEnerative Narratives, a new dataset extending the Amazon Product Reviews with rich user narratives. Furthermore, we perform an extensive automated evaluation of the dataset using a rater LLM. Second, the paper introduces a fusion architecture (CF model with an LLM) which serves as a baseline for REGEN. To the best of our knowledge, this represents the first attempt to analyze the capabilities of LLMs in understanding recommender signals and generating rich narratives. We demonstrate that LLMs can effectively learn from simple fusion architectures utilizing interaction-based CF embeddings, and this can be further enhanced using the metadata and personalization data associated with items. Our experiments show that combining CF and content embeddings leads to improvements of 4-12% in key language metrics compared to using either type of embedding individually. We also provide an analysis to interpret their contributions to this new generative task.
View details
Scaling Wearable Foundation Models
Girish Narayanswamy
Kumar Ayush
Yuzhe Yang
Orson Xu
Shun Liao
Shyam Tailor
Jake Sunshine
Tim Althoff
Shrikanth (Shri) Narayanan
Jiening Zhan
Mark Malhotra
Shwetak Patel
Samy Abdel-Ghaffar
Daniel McDuff
2025
Preview abstract
Wearable sensors have become ubiquitous thanks to a variety of health tracking features. The resulting continuous and longitudinal measurements from everyday life generate large volumes of data. However, making sense of these observations for scientific and actionable insights is non-trivial. Inspired by the empirical success of generative modeling, where large neural networks learn powerful representations from vast amounts of text, image, video, or audio data, we investigate the scaling properties of wearable sensor foundation models across compute, data, and model size. Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, accelerometer, electrodermal activity, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM, a multimodal foundation model built on the largest wearable-signals dataset with the most extensive range of sensor modalities to date. Our results establish the scaling laws of LSM for tasks such as imputation, interpolation and extrapolation across both time and sensor modalities. Moreover, we highlight how LSM enables sample-efficient downstream learning for tasks including exercise and activity recognition.
View details