Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10820 publications
Preview abstract
AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization.
View details
Preview abstract
For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step.
View details
mmMUSE: An mmWave-based Motion-resilient Universal Speech Enhancement System
Chenming He
Yanyong Zhang
Kai Wang
Dequan Wang
Lingyu Wang
the Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), ACM (2026) (to appear)
Preview abstract
Voice-based smart systems can greatly enhance user experiences by allowing higher-quality interactions through better voice perception. Speech enhancement can benefit such systems by isolating noise from speech. Recently, integrating millimeter-wave (mmWave) with audio for speech perception has gained increasing attention due to microphones' limitations in noisy environments. However, mmWave-based vocal extraction is severely affected by motion, which disperses vocal signals across ranges and introduces distortions. In this paper, we propose an mmWave-based motion-resilient universal speech enhancement system called mmMUSE, which fuses mmWave and audio signals. To mitigate motion interference, we develop a Doppler-based method for motion-robust vocal signal extraction. Moreover, by introducing the Vocal-Noise-Ratio metric to assess the prominence of vocal signals from mmWave, we achieve real-time voice activity detection that gains 3.81 dB of SISDR in noisy speeches. Additionally, we design a two-stage complex-valued network that includes an attention-based fusion network for cross-modal complementing and a time-frequency masking network for correcting amplitude and phase of speech to isolate noises.
Using mmWave and audio datasets from 46 participants, mmMUSE outperforms the state-of-the-art speech enhancement models, achieving an average SISDR improvement of 3.12 dB. Additionally, mmMUSE achieves SISDR improvements of 16.51 dB, 17.93 dB, 14.93 dB, and 18.95 dB in controlled environments involving intense noise, extensive motion, multiple speakers, and various obstructive materials, respectively. Finally, we evaluate mmMUSE in real-world scenarios including running, public spaces, and driving, maintaining a word error rate (WER) below 10%.
View details
Preview abstract
Intuitively, the more complex a software system is, the harder it is to maintain. Statistically, it is not clear which complexity measures correlate with maintenance effort; in fact, it is not even clear how to objectively measure maintenance burden, so that developers’ sentiment and intuition can be supported by numbers. Without effective complexity and maintenance measures, it remains difficult to objectively monitor maintenance, control complexity, or justify refactoring. In this paper, we report a large-scale study of 1200+ projects written in C++ and Java from Google LLC. In this study, we collected three categories of measures: (1) architectural complexity, measured using propagation cost (PC), decoupling level (DL), and structural anti-patterns; (2) maintenance activity, measured using the number of changes, lines of code (LOC) written, and active coding time (ACT) spent on feature-addition vs. bug-fixing, and (3) developer sentiment on complexity and productivity, collected from 7200 survey responses. We statistically analysed the correlations among these measures and obtained significant evidence of the following findings: 1) the more complex the architecture is (higher propagation cost, more instances of anti-patterns), the more LOC is spent on bug-fixing, rather than adding new features; 2) developers who commit more changes for features, spend more lines of code on features, or spend more time on features also feel that they are less hindered by technical debt and complexity. To the best of our knowledge, this is the first large-scale empirical study establishing the statistical correlation among architectural complexity, maintenance activity, and developer sentiment. The implication is that, instead of solely relying upon developer sentiment and intuitions to detect degraded structure or increased burden to evolve, it is possible to objectively and continuously measure and monitor architectural complexity and maintenance difficulty, increasing feature delivery efficiency by reducing architectural complexity and anti-patterns.
View details
A Simplified Version of the Quantum OTOC2 Problem
Robbie King
Kostyantyn Kechedzhi
Tom O'Brien
Vadim Smelyanskiy
arXiv:2510.19751 (2025)
Preview abstract
This note presents a simplified version of the OTOC2 problem that was recently experimentally implemented by Google Quantum AI and collaborators. We present a formulation of the problem for growing input size and hope this spurs further theoretical work on the problem.
View details
SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling
Xinyun Chen
Transactions on Machine Learning Research (TMLR) (2025)
Preview abstract
Recent advancements in Large Language Models (LLMs) have created new opportunities to enhance performance on complex reasoning tasks by leveraging test-time computation. However, existing scaling methods have key limitations: parallel methods like repeated sampling are often inefficient and quickly saturate, while sequential methods like SELF-REFINE struggle to improve after a few rounds. Although combining these approaches shows promise, current methods require fine-tuned reward and revision models. This paper proposes Self-Enhanced Test-Time Scaling (SETS), a simple yet effective approach that overcomes these limitations by strategically combining parallel and sequential techniques and fully leveraging LLMs' self-improvement abilities. SETS exploits the inherent self-verification and self-correction capabilities of LLMs, unifying sampling, verification, and correction within a single framework. This facilitates efficient and scalable test-time computation for enhanced performance on complex tasks without any model training. Our comprehensive experimental results on challenging benchmarks spanning planning, reasoning, math, and coding demonstrate that SETS achieves significant performance improvements and more advantageous test-time scaling behavior than the alternatives.
View details
Quantum learning advantage on a scalable photonic platform
Jens A. H. Nielsen
Changhun Oh
Senrui Chen
Yat Wong
Robert Huang
Zhenghao Liu
Liang Jiang
Oscar Cordero
John Preskill
Axel B. Bregnsbo
Romain Jeremie Baptiste Brunel
Jonas S. Neergaard-Nielsen
Sisi Zhou
Emil E. B. Ostergaard
Ulrik L. Andersen
Science (2025)
Preview abstract
Recent advancements in quantum technologies have opened new horizons for exploring the physical world in ways once deemed impossible. Central to these breakthroughs is the concept of quantum advantage, where quantum systems outperform their classical counterparts in solving specific tasks. While much attention has been devoted to computational speedups, quantum advantage in learning physical systems remains a largely untapped frontier. Here, we present a photonic implementation of a quantum-enhanced protocol for learning the probability distribution of a multimode bosonic displacement channel. By harnessing the unique properties of continuous-variable quantum entanglement, we achieve high-precision reconstruction of the displacement distribution using multiple orders of magnitude fewer experiments compared to methods that do not employ entangled resources. Specifically, with approximately $5$ dB of two-mode squeezing---corresponding to imperfect Einstein--Podolsky--Rosen (EPR) entanglement---we successfully reconstruct a 100-mode bosonic displacement channel, requiring $10^{11}$ fewer experiments than a conventional measurement scheme. Our results demonstrate that even with non-ideal, noisy entanglement, a significant quantum advantage can be realized in continuous-variable quantum systems. This marks an important step towards practical quantum-enhanced learning protocols with implications for quantum metrology, certification and machine learning.
View details
Fine-grained Measurement of Vehicle Delay Fairness
Eliav Buchnik
Tom Kalvari
Jack Haddad
Dan Karliner
Danny Veikherman
Ron Tsibulsky
Shai Ferster
Ori Rottenstreich
2025
Preview abstract
Optimizing signal timing in traffic lights helps to improve traffic flow and reduce emissions through reducing delays. At intersections, vehicles from different movements observe different delays impacted by the traffic light plan. This paper analyzes delay fairness among various vehicles at intersections. We refer to three cities: Rio de Janeiro, Hamburg and Seattle with a total number of over 5100 intersections. We present an intuitive methodology to compute delay fairness based on Gini index, a common fairness measure in economics. We evaluate the fairness based on real traffic data and provide insights on the relationship of fairness with day hours and traffic demand. We also examine real changes in traffic light plans that occurred in practice to check whether improving delay is often aligned with increasing fairness.
View details
H2E: Hand, Head, Eye: A Multimodal Cascade of Natural Inputs
Khushman Patel
Hans Gellersen
Ken Pfeuffer
IEEE VR (2025)
Preview abstract
Eye-based interaction techniques for extended reality, such as gaze and pinch, are simple to use however suffer from input precision issues. We present H2E, a fine and coarse-grained pointing technique that cascades Hand, Head, and Eye inputs. As users initiate a pinch gesture, a cursor appears at the gaze point that can be dragged by head pointing before pinch confirmation. This has the potential advantage that it can add a precision component without changing the semantics of the technique. In this paper, we describe the design and implementation of the technique. Furthermore, we present an evaluation of our method in a Fitts-based user study, exploring the speed-accuracy trade-offs against a gaze and pinch interaction baseline.
View details
Project Euphonia: Advancing Inclusive Speech Recognition through Expanded Data Collection and Evaluation
Alicia Martín
Bob MacDonald
Katrin Tomanek
Frontiers in Language Sciences (2025)
Preview abstract
Speech recognition models, predominantly trained on standard speech, often exhibit lower accuracy for individuals with accents, dialects, or speech impairments. This disparity is particularly pronounced for economically or socially marginalized communities, including those with disabilities or diverse linguistic backgrounds. Project Euphonia, a Google initiative originally launched in English dedicated to improving Automatic Speech Recognition (ASR) of disordered speech, is expanding its data collection and evaluation efforts to include international languages like Spanish, Japanese, French and Hindi, in a continued effort to enhance inclusivity. This paper presents an overview of the extension of processes and methods used for English data collection to more languages and locales, progress on the collected data, and details about our model evaluation process, focusing on meaning preservation based on Generative AI.
View details
Life at the Boundary of Chemical Kinetics and Program Execution
Thomas Fischbacher
Physical Review E (2025)
Preview abstract
Abstract
This work introduces a generic quantitative framework for studying
processes that involve interactions of polymer sequences. Possible
applications range from quantitative studies of the reaction kinetics
of polymerization processes to explorations of the behavior of
chemical implementations of computational - including basic life-like
- processes. This way, we establish a bridge between thermodynamic and
computational aspects of systems that are defined in terms of sequence
interactions. As a by-product of these investigations, we clarify some
common confusion around the notion of ``autocatalysis''.
Using a Markov process model of polymer sequence composition and
dynamical evolution of the Markov process's parameters via an ODE that
arises when taking the double ``chemical'' many-particle limit as well
as ``rarefied interactions'' limit, this approach enables - for example
- accurate quantitative explorations of entropy generation in systems
where computation is driven by relaxation to thermodynamic equilibrium.
The computational framework internally utilizes the Scheme programming
language's intrinsic continuation mechanisms to provide nondeterministic
evaluation primitives that allow the user to specify example systems in
straight purely functional code, making exploration of all possible
relevant sequence composition constellations - which would be otherwise
tedious to write code for - automatic and hidden from the user.
As the original motivation for this work came from investigations into
emergent program evolution that arises in computational substrates of
the form discussed in recent work on ``Computational Life''
\cite{alakuijala2024computational}, a major focus of attention is on
giving a deeper explanation of key requirements for the possible
emergence of self-replicators especially in settings whose behavior is
governed by real world physics rather than ad-hoc rules that may be
difficult to implement in a physical system. A collection of fully
worked out examples elucidate how this modeling approach is
quantitatively related to Metropolis Monte Carlo based simulations as
well as exact or approximate analytic approaches, and how it can be
utilized to study a broad range of different systems. These examples
can also serve as starting points for further explorations.
View details
Differentially Private Insights into AI Use
Daogao Liu
Pritish Kamath
Alexander Knop
Adam Sealfon
Da Yu
Chiyuan Zhang
Conference on Language Modeling (COLM) 2025 (2025)
Preview abstract
We introduce Urania, a novel framework for generating insights about LLM chatbot interactions with rigorous differential privacy (DP) guarantees. The framework employs a private clustering mechanism and innovative keyword extraction methods, including frequency-based, TF-IDF-based, and LLM-guided approaches. By leveraging DP tools such as clustering, partition selection, and histogram-based summarization, Urania provides end-to-end privacy protection. Our evaluation assesses lexical and semantic content preservation, pair similarity, and LLM-based metrics, benchmarking against a non-private method inspired by CLIO (Tamkin et al., 2024). Moreover, we develop a simple empirical privacy evaluation that demonstrates the enhanced robustness of our DP pipeline. The results show the framework’s ability to extract meaningful conversational insights while maintaining stringent user privacy, effectively balancing data utility with privacy preservation.
View details
Preview abstract
Protecting user privacy in financial transactions is desirable, yet in the classical world it is effectively impossible without hardware assumptions. Most existing quantum money schemes also fail to guarantee anonymity. We introduce a construction of single-use quantum tokens that give users the ability to detect whether the issuing authority is tracking them, for which we prove unconditional security. Our tokens do not require quantum communication from the users themselves, making them relatively practical to deploy.
We discuss potential applications including one-time payment tokens, anonymous one-time pads and voting.
View details
Preview abstract
A growing body of research has demonstrated that the behavior of large language models can be effectively controlled at inference time by directly modifying their internal states, either through vector additions to their activations or through updates to their weight matrices. These techniques, while powerful, are often guided by empirical heuristics, such as deriving ``steering vectors'' from the average activations of contrastive prompts. This work provides a theoretical foundation for these interventions, explaining how they emerge from the fundamental computations of the transformer architecture. Building on the recent finding that a prompt's influence can be mathematically mapped to implicit weight updates Dherin et al. (2025), we generalize this theory to deep, multi-block transformers. We show how the information contained in any chunk of a user prompt is represented and composed internally through virtual weight vectors and virtual weight matrices. We then derive a principled method for condensing this information into token-independent thought vectors and thought matrices. These constructs provide a theoretical explanation for existing vector- and matrix-based model editing techniques and offer a direct, computationally-grounded method for transforming textual input into reusable weight updates.
View details
Preview abstract
Several resource allocation settings involve agents with unequal entitlements represented by weights. We analyze weighted fair division from an asymptotic perspective: if m items are divided among n agents whose utilities are independently sampled from a probability distribution, when is it likely that a fair allocation exist? We show that if the ratio between the weights is bounded, a weighted envy-free allocation exists with high probability provided that m = Ω(n log n/ log log n), generalizing a prior unweighted result. For weighted proportionality, we establish a sharp threshold of m = n/(1 − μ) for the transition from non-existence to existence, where μ ∈ (0, 1) denotes the mean of the distribution. In addition, we prove that for two agents, a weighted envy-free (and weighted proportional) allocation is likely to exist if m = ω(√r), where r denotes the ratio between the two weights.
View details