Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 11056 publications
Preview abstract
For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step.
View details
Preview abstract
AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization.
View details
CrossCheck: Input Validation for WAN Control Systems
Rishabh Iyer
Isaac Keslassy
Sylvia Ratnasamy
Networked Systems Design and Implementation (NSDI) (2026) (to appear)
Preview abstract
We present CrossCheck, a system that validates inputs to the Software-Defined Networking (SDN) controller in a Wide Area Network (WAN). By detecting incorrect inputs—often stemming from bugs in the SDN control infrastructure—CrossCheck alerts operators before they trigger network outages.
Our analysis at a large-scale WAN operator identifies invalid inputs as a leading cause of major outages, and we show how CrossCheck would have prevented those incidents. We deployed CrossCheck as a shadow validation system for four weeks in a production WAN, during which it accurately detected the single incident of invalid inputs that occurred while sustaining a 0% false positive rate under normal operation, hence imposing little additional burden on operators. In addition, we show through simulation that CrossCheck reliably detects a wide range of invalid inputs (e.g., detecting demand perturbations as small as 5% with 100% accuracy) and maintains a near-zero false positive rate for realistic levels of noisy, missing, or buggy telemetry data (e.g., sustaining zero false positives with up to 30% of corrupted telemetry data).
View details
Preview abstract
How many T gates are needed to approximate an arbitrary n-qubit quantum state to within
a given precision ϵ? Improving prior work of Low, Kliuchnikov and Schaeffer, we show that the
optimal asymptotic scaling is Θ(sqrt{2^n log(1/ε)} + log(1/ε)) if we allow an unlimited number of ancilla qubits. We also show that this is the optimal T-count for implementing an arbitrary
diagonal n-qubit unitary to within error ϵ. We describe an application to batched synthesis of
single-qubit unitaries: we can approximate a tensor product of m = O(log log(1/ϵ)) arbitrary
single-qubit unitaries to within error ϵ with the same asymptotic T-count as is required to
approximate just one single-qubit unitary.
View details
Preview abstract
Semantic data models express high-level business concepts and metrics, capturing the business logic needed to query a database correctly. Most data modeling solutions are built as layers above SQL query engines, with bespoke query languages or APIs. The layered approach means that semantic models can’t be used directly in SQL queries. This paper focuses on an open problem in this space – can we define semantic models in SQL, and make them naturally queryable in SQL?
In parallel, graph query is becoming increasingly popular, including in SQL. SQL/PGQ extends SQL with an embedded subset of the GQL graph query language, adding property graph views and making graph traversal queries easy.
We explore a surprising connection: semantic data models are graphs, and defining graphs is a data modeling problem. In both domains, users start by defining a graph model, and need query language support to easily traverse edges in the graph, which means doing joins in the underlying data.
We propose some useful SQL extensions that make it easier to use higher-level data model abstractions in queries. Users can define a “semantic data graph” view of their data, encapsulating the complex business logic required to query the underlying tables correctly. Then they can query that semantic graph model easily with SQL.
Our SQL extensions are useful independently, simplifying many queries – particularly, queries with joins. We make declared foreign key relationships usable for joins at query time – a feature that seems obvious but is notably missing in standard SQL.
In combination, these extensions provide a practical approach to extend SQL incrementally, bringing semantic modeling and graph query together with the relational model and SQL.
View details
Reasoning-SQL: Reinforcement Learning with Partial Rewards for Reasoning-Enhanced Text-to-SQL
Mohammadreza Pourreza
Shayan Talaei
Hailong Li
Azalia Mirhoseini
Amin Saberi
Conference on Language Modeling (COLM) (2025) (to appear)
Preview abstract
Text-to-SQL is a challenging task involving multiple reasoning-intensive subtasks, including natural language understanding, database schema comprehension, and precise SQL query formulation. Existing approaches often rely on handcrafted reasoning paths with inductive biases that can limit their overall effectiveness. Motivated by the recent success of reasoning-enhanced models such as DeepSeek R1 and OpenAI o1, which effectively leverage reward-driven self-exploration to enhance reasoning capabilities and generalization, we propose a novel set of partial rewards tailored specifically for the Text-to-SQL task. Our reward set includes schema-linking, AI feedback, n-gram similarity, and syntax check, explicitly designed to address the reward sparsity issue prevalent in reinforcement learning (RL). Leveraging group relative policy optimization (GRPO), our approach explicitly encourages large language models (LLMs) to develop intrinsic reasoning skills necessary for accurate SQL query generation. With models of different sizes, we demonstrate that RL-only training with our proposed rewards consistently achieves higher accuracy and superior generalization compared to supervised fine-tuning (SFT). Remarkably, our RL-trained 14B-parameter model significantly outperforms larger proprietary models, e.g. o3-mini by 4% and Gemini-1.5-Pro-002 by 3% on the BIRD benchmark. These highlight the efficacy of our proposed RL-training framework with partial rewards for enhancing both accuracy and reasoning capabilities in Text-to-SQL tasks.
View details
Preview abstract
Intuitively, the more complex a software system is, the harder it is to maintain. Statistically, it is not clear which complexity measures correlate with maintenance effort; in fact, it is not even clear how to objectively measure maintenance burden, so that developers’ sentiment and intuition can be supported by numbers. Without effective complexity and maintenance measures, it remains difficult to objectively monitor maintenance, control complexity, or justify refactoring. In this paper, we report a large-scale study of 1200+ projects written in C++ and Java from Google LLC. In this study, we collected three categories of measures: (1) architectural complexity, measured using propagation cost (PC), decoupling level (DL), and structural anti-patterns; (2) maintenance activity, measured using the number of changes, lines of code (LOC) written, and active coding time (ACT) spent on feature-addition vs. bug-fixing, and (3) developer sentiment on complexity and productivity, collected from 7200 survey responses. We statistically analysed the correlations among these measures and obtained significant evidence of the following findings: 1) the more complex the architecture is (higher propagation cost, more instances of anti-patterns), the more LOC is spent on bug-fixing, rather than adding new features; 2) developers who commit more changes for features, spend more lines of code on features, or spend more time on features also feel that they are less hindered by technical debt and complexity. To the best of our knowledge, this is the first large-scale empirical study establishing the statistical correlation among architectural complexity, maintenance activity, and developer sentiment. The implication is that, instead of solely relying upon developer sentiment and intuitions to detect degraded structure or increased burden to evolve, it is possible to objectively and continuously measure and monitor architectural complexity and maintenance difficulty, increasing feature delivery efficiency by reducing architectural complexity and anti-patterns.
View details
VaultGemma
Lynn Chua
Prem Eruvbetine
Chiyuan Zhang
Thomas Mesnard
Borja De Balle Pigem
Daogao Liu
Amer Sinha
Pritish Kamath
Yangsibo Huang
Christopher A. Choquette-Choo
George Kaissis
Armand Joulin
Da Yu
Ryan McKenna
arxiv (2025)
Preview abstract
In this work, we present VaultGemma 1B, a model based on the Gemma family of models fully trained with differential privacy. VaultGemma 1B is 1 billion parameter pretrained model based on the Gemma 2 series of models and uses the same dataset for training. We will be releasing a tech report and the weights of this model.
View details
Circadian rhythm of heart rate and activity: a cross-sectional study
Maryam Khalid
Logan Schneider
Aravind Natarajan
Conor Heneghan
Karla Gleichauf
Chronobiology International (2025)
Preview abstract
ABSTRACT
Background: Circadian rhythms are commonly observed in a number of physiological processes. Consumer wearable devices have made it possible to obtain continuous time series data from a large number of individuals. We study circadian rhythms from measurements of heart rate, movement, and sleep, from a cohort of nearly 20,000 participants over the course of 30 days.
Methods: Participation was restricted to Fitbit users of age 21 years or older residing in the United States or Canada. Participants were enrolled through a recruitment banner shown on the Fitbit App. The advertisement was shown to 531,359 Fitbit users, and 23,239 enrolled in the program. Of these, we obtained heart rate data from 19,350 participants. We obtain the underlying circadian rhythm from time series heart rate by modeling the circadian rhythm as a sum over the first two Fourier harmonics. The first Fourier harmonic accounts for the 24-hour rhythmicity, while the second harmonic accounts for non-sinusoidal perturbations.
Findings: We observe a circadian rhythm in both heart rate and acceleration. From the diurnal modulation, we obtain the following circadian parameters: (i) amplitude of modulation, (ii) bathyphase, (iii) acrophase, (iv) non-sinusoidal fraction, and (v) fraction of day when the heart rate is greater than the mean. The amplitude, bathyphase, and acrophase depend on sex, and decrease with age. The waketime on average, follows the bathyphase by 2.4 hours. In most individuals, the circadian rhythm of heart rate lags the circadian rhythm of activity.
Interpretation: Circadian metrics for heart rate and activity can be reliably obtained from commercially available wearable devices. Distributions of circadian metrics can be valuable tools for individual-level interpretation.
View details
Synthesizing and Adapting Error Correction Data for Mobile Large Language Model Applications
Yanxiang Zhang
Zheng Xu
Yuanbo Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track) (2025)
Preview abstract
Error correction is an important capability when applying large language models (LLMs) to facilitate user typing on mobile devices. In this paper, we use LLMs to synthesize a high-quality dataset of error correction pairs to evaluate and improve LLMs for mobile applications. We first prompt LLMs with error correction domain knowledge to build a scalable and reliable addition to the existing data synthesis pipeline. We then adapt the synthetic data distribution to match the mobile application domain by reweighting the samples. The reweighting model is learnt by predicting (a handful of) live A/B test metrics when deploying LLMs in production, given the LLM performance on offline evaluation data and scores from a small privacy-preserving on-device language model. Finally, we present best practices for mixing our synthetic data with other data sources to improve model performance on error correction in both offline evaluation and production live A/B testing.
View details
The Cost of Consistency: Submodular Maximization with Constant Recourse
Paul Duetting
Federico Fusco
Ashkan Norouzi Fard
Ola Svensson
Proceedings of the 57th Annual ACM Symposium on Theory of Computing (2025), 1406–1417
Preview abstract
In this work, we study online submodular maximization and how the requirement of maintaining a stable solution impacts the approximation. In particular, we seek bounds on the best-possible
approximation ratio that is attainable when the algorithm is allowed to make, at most, a constant number of updates per step. We show a tight information-theoretic bound of $2/3$ for general monotone submodular functions and an improved (also tight) bound of $3/4$ for coverage functions. Since both these bounds are attained by non poly-time algorithms, we also give a poly-time randomized algorithm that achieves a $0.51$-approximation. Combined with an
information-theoretic hardness of $1/2$ for deterministic algorithms from prior work, our work thus shows a separation between deterministic and randomized algorithms, both information theoretically and for poly-time algorithms.
View details
Round Elimination via Self-Reduction: Closing Gaps for Distributed Maximal Matching
Seri Khoury
Aaron Schild
2025
Preview abstract
We show that there is no randomized LOCAL algorithm for maximal matching (MM) that takes o(min(log D, sqrt(log n))) rounds, even on regular graphs and trees. This improves upon the KMW bound from 21 years ago and shows a surprising separation between MM and MIS on trees, among other implications.
View details
The JPEG XL Image Codec: History, Features, Coding Tools, Design Rationale, and Future
Jon Sneyers
Luca Versari
Zoltan Szabadka
Amnon Cohen-Tidhar
Moritz Firsching
Evgenii Kliuchnikov
Tal Lev-Ami
Eric Portis
Thomas Richter
WATANABE Osamu
arxiv (2025) (to appear)
Preview abstract
This article provides an extensive overview of the JPEG XL codec, describing its features and coding tools,
the design rationale behind it, as well as its performance, history and potential future. It can be used as a
companion document to the standard (ISO/IEC 18181), or as a standalone article to better understand the
codec, either at a high level or in considerable technical detail.
View details
Magic state cultivation on a superconducting quantum processor
Emma Rosenfeld
Gabrielle Roberts
Alexis Morvan
Nathan Lacroix
Alexandre Bourassa
arXiv (2025)
Preview abstract
Fault-tolerant quantum computing requires a universal gate set, but the necessary non-Clifford gates represent a significant resource cost for most quantum error correction architectures. Magic state cultivation offers an efficient alternative to resource-intensive distillation protocols; however, testing the proposal's assumptions represents a challenging departure from quantum memory experiments. We present an experimental study of magic state cultivation on a superconducting quantum processor. We implement cultivation, including code-switching into a surface code, and develop a fault-tolerant measurement protocol to bound the magic state fidelity. Cultivation reduces the error by a factor of 40, with a state fidelity of 0.9999(1) (retaining 8% of attempts). Our results experimentally establish magic state cultivation as a viable solution to one of quantum computing's most significant challenges.
View details
Preview abstract
We initiate the study of contextual dynamic pricing with a heterogeneous population of buyers, where a seller repeatedly (over T rounds) posts prices that depend on the observable
dimensional context and receives binary purchase feedback. Unlike prior work assuming homogeneous buyer types, in our setting the buyer's valuation type is drawn from an unknown distribution with finite support K*. We develop a contextual pricing algorithm based on Optimistic Posterior Sampling with regret K* sqrt(dT), which we prove to be tight in d, T up to logarithmic terms. Finally, we refine our analysis for the non-contextual pricing case, proposing a variance-aware Zooming algorithm that achieves the optimal dependence on K*.
.
View details