Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10822 publications
Preview abstract
For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step.
View details
Preview abstract
AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization.
View details
Data Selection for ERMs
Alexander Shlimovich
Steve Hanneke
Amir Yehudayoff
Shay Moran
2025
Preview abstract
Learning theory has traditionally followed a model-centric approach, focusing on designing optimal algorithms for a fixed natural learning task (e.g., linear classification or regression). In this paper, we adopt a complementary data-centric perspective, whereby we fix a natural learning rule and focus on optimizing the training data. Specifically, we study the following question: given a learning rule \(\mathcal{A}\) and a data selection budget \(n\), how well can \(\mathcal{A}\) perform when trained on at most \(n\) data points selected from a population of \(N\) points? We investigate when it is possible to select \(n \ll N\) points and achieve performance comparable to training on the entire population.
We address this question across a variety of empirical risk minimizers. Our results include optimal data-selection bounds for mean estimation, linear classification, and linear regression. Additionally, we establish two general results: a taxonomy of error rates in binary classification and in stochastic convex optimization. Finally, we propose several open questions and directions for future research.
View details
User-Centered Delivery of AI-Powered Health Care Technologies in Clinical Settings: Mixed Methods Case Study
Randall Brandt
Hien Brown
Christine Silva
JMIR Human Factors (2025)
Preview abstract
Background:
Providers spend a large percentage of their day using electronic health record (EHR) technology and frequently report frustration when EHR tasks are time-consuming and effortful. To solve these challenges, artificial intelligence (AI)–based enhancements to EHR technology are increasingly being deployed. However, AI-based implementations for EHR features often lack user-centered evaluation.
Objective:
This study evaluates, using a user-centered approach, the implementation of an AI-powered search and clinical discovery tool within an EHR system.
Methods:
We conducted a mixed methods study consisting of interviews, observations, and surveys for 5 months.
Results:
High adoption rates for the AI-based features (163/176, 93% users after 3 months) and significant increases across key metrics, including user satisfaction (U=49; P<.001) and perception of time saved (U=49; P<.001), demonstrated that the AI-based features were not only successfully integrated into various clinical workflows but also improved the user experience for clinicians.
Conclusions:
Our results underscore the feasibility and effectiveness of using a user-centered approach for the deployment of clinical AI tools. High adoption rates and positive user experiences were driven by our user-centered research program, which emphasized close collaboration with users, rapid incorporation of feedback, and tailored user training. This study program can be used as a starting framework for the design and integration of human-centered research methods for AI tool deployment in clinical settings.
View details
Zero-Shot Mono-to-Binaural Speech Synthesis
Alon Levkovitch
Nadav Bar
Bastiaan Kleijn
Interspeech (2025), pp. 4168-4172
Preview abstract
We present ZeroBAS, a neural method to synthesize binaural speech from monaural speech recordings and positional information without training on any binaural data. To our knowledge, this is the first published zero-shot neural approach to mono-to-binaural speech synthesis. Specifically, we show that a parameter-free geometric time warping and amplitude scaling based on source location suffices to get an initial binaural synthesis that can be refined by iteratively applying a pretrained denoising vocoder. Furthermore, we find this leads to generalization across room conditions, which we measure by introducing a new dataset, TUT Mono-to-Binaural, to evaluate state-of-the-art monaural-to-binaural synthesis methods on unseen conditions. Our zero-shot method is perceptually on-par with the performance of supervised methods on previous standard mono-to-binaural dataset, and even surpasses them on our out-of-distribution TUT Mono-to-Binaural dataset.
View details
Safe Coding: Rigorous Modular Reasoning about Software Safety
Queue, 23(5) (2025)
Preview abstract
Many persistent and dangerous software vulnerabilities, including memory safety violations and code injection, arise from a common root cause: Developers inadvertently violate the implicit safety preconditions of widely-used programming constructs. These preconditions—such as pointer validity, array-access bounds, and the trustworthy provenance of code fragments to be evaluated as SQL, HTML, or JavaScript—are traditionally the developer's responsibility to ensure. In complex systems, meeting these obligations often relies on non-local, whole-program invariants that are notoriously difficult to reason about correctly, leading to vulnerabilities that are difficult to detect after the fact.
This article introduces Safe Coding, a collection of software design patterns and practices designed to cost-effectively provide a high degree of assurance against entire classes of such vulnerabilities. The core principle of Safe Coding is to shift responsibility for safety from individual developers to the programming language, software libraries, and frameworks. This is achieved by systematically eliminating the direct use of risky operations—those with complex safety preconditions—in application code. Instead, these operations are encapsulated within safe abstractions: modules with public APIs that are safe by design, whose implementations fully ensure all module-internal safety preconditions through a combination of local runtime checks and by elevating safety preconditions into type invariants.
Safe Coding facilitates a modular and compositional approach to whole-program safety: Difficult reasoning is localized to the implementation of safe abstractions, which undergo focused expert scrutiny. The composition of these abstractions with the majority of the codebase (which is kept free of risky operations) is then automatically verified by the language’s type checker. This form of compositional reasoning, drawing from patterns used in formal software verification, can be viewed as a semi-formal approach that balances rigor with broad applicability to large industrial codebases. We discuss the successful application of these practices at Google, where they have nearly eliminated vulnerabilities such as Cross-Site Scripting (XSS) and SQL injection, and their critical role in ensuring memory safety in Rust, collectively demonstrating a favorable cost-assurance tradeoff for achieving software safety at scale.
View details
Preview abstract
A growing body of research has demonstrated that the behavior of large language models can be effectively controlled at inference time by directly modifying their internal states, either through vector additions to their activations or through updates to their weight matrices. These techniques, while powerful, are often guided by empirical heuristics, such as deriving steering vectors from the average activations of contrastive prompts. This work provides a theoretical foundation for these interventions, explaining how they emerge from the fundamental computations of the transformer architecture. Building on the recent finding that a prompt's influence can be mathematically mapped to implicit weight updates (Dherin et al., 2025), we generalize this theory to deep, multi-block transformers. We show how the information contained in any chunk of a user prompt is represented and composed internally through weight vectors and weight matrices. We then derive a principled method for condensing this information into token-independent thought vectors and thought matrices. These constructs provide a theoretical explanation for existing vector- and matrix-based model editing techniques and offer a direct, computationally-grounded method for transmuting textual input into reusable weight updates.
View details
Beyond Digital Literacy: Building Youth Digital Resilience Through Existing “Information Sensibility” Practices
Mia Hassoun
Ian Beacock
Todd Carmody
Patrick Gage Kelley
Beth Goldberg
Devika Kumar
Laura Murray
Rebekah Park
Behzad Sarmadi
Social Sciences Journal, 14(4) (2025)
Preview abstract
Youth media consumption and disordered eating practices have historically been subjects of moral panics, often resulting in protective, deficit-based interventions like content removal. We argue for interventions which instead equip youth to evaluate and manage risks in their online environments, building upon their existing “information sensibility” practices. Drawing upon ethnographic research and intervention testing with 77 participants in the US and India, we analyze how youth (aged 13–26), including those with diverse political perspectives and those recovering from disordered eating (DE), engage with online news and health information. Participants generally algorithmically encountered (rather than searched for) information online, and their engagement was shaped more by social motivations—like belonging—than truth seeking. Participants interpreted online information collaboratively, relying on social cues and peer validation within their online communities. They demonstrated preference for personal testimonies and relatable sources, particularly those with similar social identities. We propose resilience-building interventions that build upon these youth online information practices by: (1) leveraging peer networks, promoting critical information engagement through collaborative learning and peer-to-peer support within online communities; (2) developing social media sensibility, equipping youth to critically evaluate information sources in situ; (3) providing pathways offline, connecting youth to desired in-person communities; and (4) encouraging probabilistic thinking.
View details
AndroidWorld: An Open World for Autonomous Agents
Jonathan Waltz
Marybeth Fair
Daniel Toyama
Will Bishop
Sarah Clinckemaillie
Timothy Lillicrap
Chris Rawles
Robert Berry
Gabrielle Lau
Divya Tyam
Yifan Chang
Alice Li
Folawiyo Campbell-Ajala
Wei Li
ICLR 2025 (2025)
Preview abstract
Autonomous computer control agents that execute human tasks by controlling user interfaces (UIs) are emerging. Such agents would be valuable for humans, and progress in the field will be driven by realistic and reproducible benchmarks. We present AndroidWorld, a fully-functioning Android environment that pro-vides reward signals across 20 apps on 114 programmatic tasks. Instead of a static test set, the tasks in AndroidWorld parameterized, allowing for unlimited variation in language and task parameters. Reward signals are derived from An-droid system state, making them highly durable and extensible across different applications. To demonstrate AndroidWorld's extensibility, we integrate the popular MiniWoB++ into it.To evaluate AndroidWorld, we introduce a new multimodal autonomous agent for Android, M3A. Our agent achieves a 27% success rate leaving ample room for future work. Furthermore, we adapt a popular desktop web agent for Android, which we find to be less effective on mobile, suggesting future research is needed to build universal, cross-domain agents. Finally, we conduct robustness testing by testing M3A against a suite of real-world variations on a representative subset of tasks. AndroidWorld and the experiments in this paper are available at https://https//github.com/google-research/android-world:
View details
Preview abstract
We introduce sum-of-squares spectral amplification (SOSSA), a framework for improving quantum simulation algorithms relevant to low-energy problems. SOSSA first represents the Hamiltonian as a sum-of-squares and then applies spectral amplification to amplify the low-energy spectrum. The sum-of-squares representation can be obtained using semidefinite programming. We show that SOSSA can improve the efficiency of traditional methods in several simulation tasks involving low-energy states. Specifically, we provide fast quantum algorithms for energy and phase estimation that improve over the state-of-the-art in both query and gate complexities, complementing recent results on fast time evolution of low-energy states. To further illustrate the power of SOSSA, we apply it to the Sachdev-Ye-Kitaev model, a representative strongly correlated system, where we demonstrate asymptotic speedups by a factor of the square root of the system size. Notably, SOSSA was recently used in [G.H. Low \textit{et al.}, arXiv:2502.15882 (2025)] to achieve state-of-art costs for phase estimation of real-world quantum chemistry systems.
View details
Participatory AI Considerations for Advancing Racial Health Equity
Andrea G. Parker
Jatin Alla
Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI) (2025) (to appear)
Matryoshka Model Learning for Improved Elastic Student Models
Cho-Jui Hsieh
Chetan Verma
Inderjit Dhillon
Xin Liu
Wen Chen
Ngot Bui
Yang Zhang
2025
Preview abstract
Production machine learning models in the industry are often devel-oped with a primary focus on maximizing model quality. However,these models must ultimately operate within the resource con-straints of their serving infrastructure, including limitations on com-pute, memory and bandwidth. The rapid evolution of serving hard-ware, particularly with advancements in accelerator technology,necessitates periodic retraining to leverage newer, more efficientinfrastructure. This cyclical retraining process is resource-intensive,demanding significant model development time and incurring sub-stantial training costs. This challenge is further amplified by thetrend towards increasingly complex models, which inherently re-quire greater computational resources for training and deployment.While prior work has explored techniques like supernet sub-modelextraction to address training efficiency, a critical gap remains: theefficient generation of a spectrum of high-quality models froman existing production model, a common requirement in diverseindustrial applications. To bridge this gap, we introduce a novel ap-proach leveraging a "Teaching Assistant" (TA) model, derived froma given production model (referred to as the Student model). Wedemonstrate that through co-training the Student and TA modelswith Matryoshka structure while using online distillation, we notonly enhance the Student model’s performance but also enable theflexible creation of a model family offering a compelling trade-offbetween model quality and model size.
View details
Preview abstract
In-context Ranking (ICR) is an emerging paradigm for Information Retrieval (IR), which leverages contextual understanding of LLMs by directly incorporating the task description, candidate documents, and the query into the model's input prompt and tasking the LLM to identify relevant document(s). While it is effective, efficiency is a significant challenge in this paradigm, especially as the candidate list grows due to quadratic/super-linear scaling of attention operation with context length. To this end, this paper first identifies inherent and exploitable structures in the attention of LLMs finetuned for ICR: (1) inter-document block sparsity: attention is dense within each document block but sparse across different documents in the context; and (2) query-document block relevance: the attention scores from certain query tokens to a document block in middle layers strongly correlate with that document's actual relevance. Motivated by these observations, we introduce BlockRank (Blockwise In-context Ranking), a novel method that adapts the attention operation in an LLM by (a) architecturally enforcing the observed inter-document block sparsity, reducing attention complexity from quadratic to linear without loss in performance, and (b) optimizing query-document block relevance for true relevant documents during fine-tuning using an auxiliary contrastive training objective, improving retrieval in attention. Experiments on BEIR, MSMarco and NQ with Mistral-7B demonstrate that BlockRank Mistral matches or outperforms existing SOTA listwise rankers and controlled fine-tuned baseline while being significantly more efficient at inference (4.7x for 100 MSMarco documents in context) and scaling gracefully to long-context shortlists, around 500 documents in-context (approximately 100K context length) within a second, presenting a scalable and effective solution for ICR.
View details
ECG-Nest-FM: A Frequency-Focused ECG Foundation Model with Nested Embeddings
Abhishek Sharma
Lin Yang
ICLR 2025 Workshop MLGenX (2025)
Preview abstract
Electrocardiograms (ECGs) are fundamental to cardiac diagnostics, providing noninvasive insights into cardiovascular conditions. Recent advancements in deep learning have led to foundation models (FMs) capable of learning powerful representations of ECG signals. However, these models often fail to fully exploit the periodic nature and diagnostic frequency bands of ECGs, leading to inefficiencies in computational cost and interpretability. We propose a novel ECG foundation model that learns nested embeddings, where each subset of dimensions encodes progressively higher-frequency information. By explicitly modeling frequency structures and applying a correlation penalty, the method achieves compact, high-rank representations that reduce model size without sacrificing performance. We evaluate our approach on two large-scale datasets for embedding redundancy and prediction performance on downstream clinical tasks such as arrhythmia classification, and cardiac condition detection. We observe similar prediction performance AUROC scores and lower embedding redundancy, offering a computationally efficient and interpretable framework for ECG analysis. Finally, the representations obtained from our model in UK Biobank data capture known cardiovascular variants and detect novel loci, which can be applied to drug discovery.
View details
Preview abstract
A critical component in the trustworthiness of LLMs is reliable uncertainty communication, yet LLMs often use assertive language when conveying false claims, leading to over-reliance and eroded trust. We present the first systematic study of faithful confidence calibration of LLMs, benchmarking models' ability to use linguistic expressions of uncertainty that faithfully reflect their intrinsic uncertainty, across a comprehensive array of models, datasets, and prompting strategies. Our results demonstrate that LLMs largely fail at this task, and that existing interventions are insufficient: standard prompt approaches provide only marginal gains, and existing, factuality-based calibration techniques can even harm faithful calibration. To address this critical gap, we introduce MetaFaith, a novel prompt-based calibration approach inspired by human metacognition. We show that MetaFaith robustly improves faithful calibration across diverse models and task domains, enabling up to 61% improvement in faithfulness and achieving an 83% win rate over original generations as judged by humans.
View details