Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 11054 publications
Preview abstract
How many T gates are needed to approximate an arbitrary n-qubit quantum state to within
a given precision ϵ? Improving prior work of Low, Kliuchnikov and Schaeffer, we show that the
optimal asymptotic scaling is Θ(sqrt{2^n log(1/ε)} + log(1/ε)) if we allow an unlimited number of ancilla qubits. We also show that this is the optimal T-count for implementing an arbitrary
diagonal n-qubit unitary to within error ϵ. We describe an application to batched synthesis of
single-qubit unitaries: we can approximate a tensor product of m = O(log log(1/ϵ)) arbitrary
single-qubit unitaries to within error ϵ with the same asymptotic T-count as is required to
approximate just one single-qubit unitary.
View details
CrossCheck: Input Validation for WAN Control Systems
Rishabh Iyer
Isaac Keslassy
Sylvia Ratnasamy
Networked Systems Design and Implementation (NSDI) (2026) (to appear)
Preview abstract
We present CrossCheck, a system that validates inputs to the Software-Defined Networking (SDN) controller in a Wide Area Network (WAN). By detecting incorrect inputs—often stemming from bugs in the SDN control infrastructure—CrossCheck alerts operators before they trigger network outages.
Our analysis at a large-scale WAN operator identifies invalid inputs as a leading cause of major outages, and we show how CrossCheck would have prevented those incidents. We deployed CrossCheck as a shadow validation system for four weeks in a production WAN, during which it accurately detected the single incident of invalid inputs that occurred while sustaining a 0% false positive rate under normal operation, hence imposing little additional burden on operators. In addition, we show through simulation that CrossCheck reliably detects a wide range of invalid inputs (e.g., detecting demand perturbations as small as 5% with 100% accuracy) and maintains a near-zero false positive rate for realistic levels of noisy, missing, or buggy telemetry data (e.g., sustaining zero false positives with up to 30% of corrupted telemetry data).
View details
Preview abstract
For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step.
View details
Preview abstract
Semantic data models express high-level business concepts and metrics, capturing the business logic needed to query a database correctly. Most data modeling solutions are built as layers above SQL query engines, with bespoke query languages or APIs. The layered approach means that semantic models can’t be used directly in SQL queries. This paper focuses on an open problem in this space – can we define semantic models in SQL, and make them naturally queryable in SQL?
In parallel, graph query is becoming increasingly popular, including in SQL. SQL/PGQ extends SQL with an embedded subset of the GQL graph query language, adding property graph views and making graph traversal queries easy.
We explore a surprising connection: semantic data models are graphs, and defining graphs is a data modeling problem. In both domains, users start by defining a graph model, and need query language support to easily traverse edges in the graph, which means doing joins in the underlying data.
We propose some useful SQL extensions that make it easier to use higher-level data model abstractions in queries. Users can define a “semantic data graph” view of their data, encapsulating the complex business logic required to query the underlying tables correctly. Then they can query that semantic graph model easily with SQL.
Our SQL extensions are useful independently, simplifying many queries – particularly, queries with joins. We make declared foreign key relationships usable for joins at query time – a feature that seems obvious but is notably missing in standard SQL.
In combination, these extensions provide a practical approach to extend SQL incrementally, bringing semantic modeling and graph query together with the relational model and SQL.
View details
Preview abstract
AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization.
View details
Develop High-performance Quantum Hardware
Yu Chen
(2025)
Preview abstract
A review on the hardware development in QAI, based on existing publications
View details
Databases in the Era of Memory-Centric Computing
Yannis Chronis
Anastasia Ailamaki
Lawrence Benson
Jana Gičeva
Eric Seldar
Lisa Wu Wills
2025
Preview abstract
The increasing disparity between processor core counts and memory bandwidth, coupled with the rising cost and underutilization of memory, introduces a performance and cost Memory Wall and presents a significant challenge to the scalability of database systems. We argue that current processor-centric designs are unsustainable, and we advocate for a shift towards memory-centric computing, where disaggregated memory pools enable cost-effective scaling and robust performance. Database systems are uniquely positioned to leverage memory-centric systems because of their intrinsic data-centric nature. We demonstrate how memory-centric database operations can be realized with current hardware, paving the way for more efficient and scalable data management in the cloud.
View details
Reducing Symbiosis Bias through Better A/B Tests of Recommendation Algorithms
Yahu Cong
Yiwei Yu
Lina Lin
Yajun Peng
Changping Meng
Ningren (Peter) Han
David Holtz
Proceedings of WWW'25 (2025)
Preview abstract
It is increasingly common in digital environments to use A/B tests to compare the performance of recommendation algorithms. However, such experiments often violate the stable unit treatment value assumption (SUTVA), particularly SUTVA's ''no hidden treatments'' assumption, due to the shared data between algorithms being compared. This results in a novel form of bias, which we term ''symbiosis bias,'' where the performance of each algorithm is influenced by the training data generated by its competitor. In this paper, we investigate three experimental designs--cluster-randomized, data-diverted, and user-corpus co-diverted experiments--aimed at mitigating symbiosis bias. We present a theoretical model of symbiosis bias and simulate the impact of each design in dynamic recommendation environments. Our results show that while each design reduces symbiosis bias to some extent, they also introduce new challenges, such as reduced training data in data-diverted experiments. We further validate the existence of symbiosis bias using data from a large-scale A/B test conducted on a global recommender system, demonstrating that symbiosis bias affects treatment effect estimates in the field. Our findings provide actionable insights for researchers and practitioners seeking to design experiments that accurately capture algorithmic performance without bias in treatment effect estimates introduced by shared data.
View details
Preview abstract
Large-scale video generative models, capable of creating realistic videos of diverse visual concepts, are strong candidates for general-purpose physical world simulators. However, their adherence to physical commonsense across real-world actions remains unclear (e.g., playing tennis, backflip). Existing benchmarks suffer from limitations such as limited size, lack of human evaluation, sim-to-real gaps, and absence of fine-grained physical rule analysis. To address this, we introduce VideoPhy-2, an action-centric dataset for evaluating physical commonsense in generated videos. We curate 200 diverse actions and detailed prompts for video synthesis from modern generative models. We perform human evaluation that assesses semantic adherence, physical commonsense, and grounding of physical rules in the generated videos. Our findings reveal major shortcomings, with even the best model achieving only 22% joint performance (i.e., high semantic and physical commonsense adherence) on the hard subset of VideoPhy-2. We find that the models particularly struggle with conservation laws like mass and momentum. Finally, we also train VideoPhy-AutoEval, an automatic evaluator for fast, reliable assessment on our dataset. Overall, VideoPhy-2 serves as a rigorous benchmark, exposing critical gaps in video generative models and guiding future research in physically-grounded video generation. The data and code is available at https://videophy2.github.io/
View details
The Anatomy of a Personal Health Agent
Ahmed Metwally
Ken Gu
Jiening Zhan
Kumar Ayush
Hong Yu
Amy Lee
Qian He
Zhihan Zhang
Isaac Galatzer-Levy
Xavi Prieto
Andrew Barakat
Ben Graef
Yuzhe Yang
Daniel McDuff
Brent Winslow
Shwetak Patel
Girish Narayanswamy
Conor Heneghan
Max Xu
Jacqueline Shreibati
Mark Malhotra
Orson Xu
Tim Althoff
Tony Faranesh
Nova Hammerquist
Vidya Srinivas
arXiv (2025)
Preview abstract
Health is a fundamental pillar of human wellness, and the rapid advancements in large language models (LLMs) have driven the development of a new generation of health agents. However, the solution to fulfill diverse needs from individuals in daily non-clinical settings is underexplored. In this work, we aim to build a comprehensive personal health assistant that is able to reason about multimodal data from everyday consumer devices and personal health records. To understand end users’ needs when interacting with such an assistant, we conducted an in-depth analysis of query data from users, alongside qualitative insights from users and experts gathered through a user-centered design process. Based on these findings, we identified three major categories of consumer health needs, each of which is supported by a specialist subagent: (1) a data science agent that analyzes both personal and population-level time-series wearable and health record data to provide numerical health insights, (2) a health domain expert agent that integrates users’ health and contextual data to generate accurate, personalized insights based on medical and contextual user knowledge, and (3) a health coach agent that synthesizes data insights, drives multi-turn user interactions and interactive goal setting, guiding users using a specified psychological strategy and tracking users’ progress. Furthermore, we propose and develop a multi-agent framework, Personal Health Insight Agent Team (PHIAT), that enables dynamic, personalized interactions to address individual health needs. To evaluate these individual agents and the multi-agent system, we develop a set of N benchmark tasks and conduct both automated and human evaluations, involving 100’s of hours of evaluation from health experts, and 100’s of hours of evaluation from end-users. Our work establishes a strong foundation towards the vision of a personal health assistant accessible to everyone in the future and represents the most comprehensive evaluation of a consumer AI health agent to date.
View details
A Strategic Framework for AI Product Development and Evaluation in Enterprise Software
International Journal of Computer Engineering and Technology (IJCET), Volume 16, Issue 1 (2025)
Preview abstract
This article presents a comprehensive framework for developing and evaluating AI products in enterprise software systems, addressing the critical challenges organizations face during AI transformation initiatives. The article introduces a structured approach to decision-making for AI integration, encompassing ROI evaluation, user value assessment, and business impact analysis. It establishes distinct methodologies for both assistive and autonomous AI systems, providing detailed metrics for measuring success and performance across different implementation scenarios. Across various industries, the framework has shown potential in reducing implementation time, increasing user adoption rates, and enhancing overall project success rates, highlighting its practical applicability. The article methodology combines theoretical analysis with practical case studies, resulting in a flexible yet robust framework that can adapt to various organizational contexts. The framework's primary contribution lies in its practical approach to bridging the gap between theoretical AI capabilities and real-world implementation challenges, offering product leaders a systematic methodology for AI product development and evaluation. By addressing both current implementation challenges and future scalability requirements, this framework provides organizations with a foundational tool for navigating their AI transformation journey while maintaining a focus on measurable business outcomes and user value creation.
View details
Life at the Boundary of Chemical Kinetics and Program Execution
Thomas Fischbacher
Physical Review E (2025)
Preview abstract
Abstract
This work introduces a generic quantitative framework for studying
processes that involve interactions of polymer sequences. Possible
applications range from quantitative studies of the reaction kinetics
of polymerization processes to explorations of the behavior of
chemical implementations of computational - including basic life-like
- processes. This way, we establish a bridge between thermodynamic and
computational aspects of systems that are defined in terms of sequence
interactions. As a by-product of these investigations, we clarify some
common confusion around the notion of ``autocatalysis''.
Using a Markov process model of polymer sequence composition and
dynamical evolution of the Markov process's parameters via an ODE that
arises when taking the double ``chemical'' many-particle limit as well
as ``rarefied interactions'' limit, this approach enables - for example
- accurate quantitative explorations of entropy generation in systems
where computation is driven by relaxation to thermodynamic equilibrium.
The computational framework internally utilizes the Scheme programming
language's intrinsic continuation mechanisms to provide nondeterministic
evaluation primitives that allow the user to specify example systems in
straight purely functional code, making exploration of all possible
relevant sequence composition constellations - which would be otherwise
tedious to write code for - automatic and hidden from the user.
As the original motivation for this work came from investigations into
emergent program evolution that arises in computational substrates of
the form discussed in recent work on ``Computational Life''
\cite{alakuijala2024computational}, a major focus of attention is on
giving a deeper explanation of key requirements for the possible
emergence of self-replicators especially in settings whose behavior is
governed by real world physics rather than ad-hoc rules that may be
difficult to implement in a physical system. A collection of fully
worked out examples elucidate how this modeling approach is
quantitatively related to Metropolis Monte Carlo based simulations as
well as exact or approximate analytic approaches, and how it can be
utilized to study a broad range of different systems. These examples
can also serve as starting points for further explorations.
View details
On the Differential Privacy and Interactivity of Privacy Sandbox Reports
Charlie Harrison
Pritish Kamath
Alexander Knop
Ethan Leeman
Vikas Sahu
PETS (2025)
Preview abstract
The Privacy Sandbox initiative from Google includes APIs for enabling privacy-preserving advertising functionalities as part of the effort to limit third-party cookies. In particular, the Private Aggregation API (PAA) and the Attribution Reporting API (ARA) can be used for ad measurement while providing different guardrails for safeguarding user privacy, including a framework for satisfying differential privacy (DP). In this work, we provide an abstract model for analyzing the privacy of these APIs and show that they satisfy a formal DP guarantee under certain assumptions. Our analysis handles the case where both the queries and database can change interactively based on previous responses from the API.
View details
Ghost Points Matter: Far-Range Vehicle Detection with a Single mmWave Radar in Tunnel
Chengzhen Meng
Chenming He
Jianmin Ji
Yanyong Zhang
Haojie Ren
Dequan Wang
Rui Xia
MobiCom 2025: The 31th Annual International Conference On Mobile Computing And Networking, ACM
Preview abstract
Vehicle detection in tunnels is crucial for traffic monitoring and accident response, yet remains underexplored. In this paper, we develop mmTunnel, a millimeter-wave radar system that achieves far-range vehicle detection in tunnels. The main challenge here is coping with ghost points caused by multi-path reflections, which lead to severe localization errors and false alarms. Instead of merely removing ghost points, we propose correcting them to true vehicle positions by recovering their signal reflection paths, thus reserving more data points and improving detection performance, even in occlusion scenarios. However, recovering complex 3D reflection paths from limited 2D radar points is highly challenging. To address this problem, we develop a multi-path ray tracing algorithm that leverages the ground plane constraint and identifies the most probable reflection path based on signal path loss and spatial distance. We also introduce a curve-to-plane segmentation method to simplify tunnel surface modeling such that we can significantly reduce the computational delay and achieve real-time processing.
We have evaluated mmTunnel with comprehensive experiments. In two test tunnels, we conducted controlled experiments in various scenarios with cars and trucks. Our system achieves an average F1 score of 93.7% for vehicle detection while maintaining real-time processing. Even in the challenging occlusion scenarios, the F1 score remains above 91%. Moreover, we collected extensive data from a public tunnel with heavy traffic at times and show our method could achieve an F1 score of 91.5% in real-world traffic conditions.
View details
Advancing seasonal prediction of tropical cyclone activity with a hybrid AI-physics climate model
Gan Zhang
Megha Rao
Janni Yuval
Ming Zhao
Environmental Research Letters (2025)
Preview abstract
Machine learning (ML) models are successful with weather forecasting and have shown progress in climate simulations, yet leveraging them for useful climate predictions needs exploration. Here we show this feasibility using neural general circulation model (NeuralGCM), a hybrid ML-physics atmospheric model developed by Google, for seasonal predictions of large-scale atmospheric variability and Northern Hemisphere tropical cyclone (TC) activity. Inspired by physical model studies, we simplify boundary conditions, assuming sea surface temperature and sea ice follow their climatological cycle but persist anomalies present at the initialization time. With such forcings, NeuralGCM can generate 100 simulation days in ∼8 min with a single graphics processing unit while simulating realistic atmospheric circulation and TC climatology patterns. This configuration yields useful seasonal predictions (July–November) for the tropical atmosphere and various TC activity metrics. Notably, the predicted and observed TC frequency in the North Atlantic and East Pacific basins are significantly correlated during 1990–2023 (r = ∼0.7), suggesting prediction skill comparable to existing physical GCMs. Despite challenges associated with model resolution and simplified boundary forcings, the model-predicted interannual variations demonstrate significant correlations with the observed sub-basin TC tracks (p < 0.1) and basin-wide accumulated cyclone energy (ACE) (p < 0.01) of the North Atlantic and North Pacific basins. These findings highlight the promise of leveraging ML models with physical insights to model TC risks and deliver seamless weather-climate predictions.
View details