Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10795 publications
    Productionizing Quantum Mass Production
    Bill Huggins
    Nathan Wiebe
    arXiv for now (2026) (to appear)
    Preview abstract For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step. View details
    FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration
    Diganta Misra
    Yanqi Luo
    Anjali Sridhar
    Justine Gehring
    Silvio Soares Ribeiro Junior
    2026
    Preview abstract AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization. View details
    Linear Elastic Caching via Ski Rental
    Todd Lipcon
    The biennial Conference on Innovative Data Systems Research (2025)
    Preview abstract In this work we study the Linear Elastic Caching problem, where the goal is to minimize the total cost of a cache inclusive of not just its misses, but also its memory footprint integrated over time. We demonstrate a theoretical connection to the classic ski rental problem and propose a practical algorithm that combines online caching algorithms with ski rental policies. We also introduce a lightweight machine learning-based algorithm for ski rental that is optimized for production workloads and is easy to integrate within existing database systems. Evaluations on both production workloads in Google Spanner and publicly available traces show that the proposed elastic caching approach can significantly reduce the total cache cost compared to traditional fixed-size cache policies. View details
    A personal health large language model for sleep and fitness coaching
    Anastasiya Belyaeva
    Zhun Yang
    Nick Furlotte
    Chace Lee
    Erik Schenck
    Yojan Patel
    Jian Cui
    Logan Schneider
    Robby Bryant
    Ryan Gomes
    Allen Jiang
    Roy Lee
    Javier Perez
    Jamie Rogers
    Cathy Speed
    Shyam Tailor
    Megan Walker
    Jeffrey Yu
    Tim Althoff
    Conor Heneghan
    Mark Malhotra
    Shwetak Patel
    Shravya Shetty
    Jiening Zhan
    Daniel McDuff
    Nature Medicine (2025)
    Preview abstract Although large language models (LLMs) show promise for clinical healthcare applications, their utility for personalized health monitoring using wearable device data remains underexplored. Here we introduce the Personal Health Large Language Model (PH-LLM), designed for applications in sleep and fitness. PH-LLM is a version of the Gemini LLM that was finetuned for text understanding and reasoning when applied to aggregated daily-resolution numerical sensor data. We created three benchmark datasets to assess multiple complementary aspects of sleep and fitness: expert domain knowledge, generation of personalized insights and recommendations and prediction of self-reported sleep quality from longitudinal data. PH-LLM achieved scores that exceeded a sample of human experts on multiple-choice examinations in sleep medicine (79% versus 76%) and fitness (88% versus 71%). In a comprehensive evaluation involving 857 real-world case studies, PH-LLM performed similarly to human experts for fitness-related tasks and improved over the base Gemini model in providing personalized sleep insights. Finally, PH-LLM effectively predicted self-reported sleep quality using a multimodal encoding of wearable sensor data, further demonstrating its ability to effectively contextualize wearable modalities. This work highlights the potential of LLMs to revolutionize personal health monitoring via tailored insights and predictions from wearable data and provides datasets, rubrics and benchmark performance to further accelerate personal health-related LLM research. View details
    Neural Speech and Audio Coding
    Minje Kim
    IEEE Signal Processing Magazine, 41 (2025), pp. 85-93
    Preview abstract This paper explores the integration of model-based and data-driven approaches within the realm of neural speech and audio coding systems. It highlights the challenges posed by the subjective evaluation processes of speech and audio codecs and discusses the limitations of purely data-driven approaches, which often require inefficiently large architectures to match the performance of model-based methods. The study presents hybrid systems as a viable solution, offering significant improvements to the performance of conventional codecs through meticulously chosen design enhancements. Specifically, it introduces a neural network-based signal enhancer designed to post-process existing codecs’ output, along with the autoencoder-based end-to-end models and LPCNet—hybrid systems that combine linear predictive coding (LPC) with neural networks. Furthermore, the paper delves into predictive models operating within custom feature spaces (TF-Codec) or predefined transform domains (MDCTNet) and examines the use of psychoacoustically calibrated loss functions to train end-to-end neural audio codecs. Through these investigations, the paper demonstrates the potential of hybrid systems to advance the field of speech and audio coding by bridging the gap between traditional model-based approaches and modern data-driven techniques. View details
    Preview abstract Class imbalance remains a major challenge in machine learning, especially in multi-class problems with long-tailed distributions. Existing methods, such as data resampling, cost-sensitive techniques, and logistic loss modifications, though popular and often effective, lack solid theoretical foundations. As an example, we demonstrate that cost-sensitive methods are not Bayes-consistent. This paper introduces a novel theoretical framework for analyzing generalization in imbalanced classification. We propose a new class-imbalanced margin loss function for both binary and multi-class settings, prove its strong $H$-consistency, and derive corresponding learning guarantees based on empirical loss and a new notion of class-sensitive Rademacher complexity. Leveraging these theoretical results, we devise novel and general learning algorithms, IMMAX (*Imbalanced Margin Maximization*), which incorporate confidence margins and are applicable to various hypothesis sets. While our focus is theoretical, we also present extensive empirical results demonstrating the effectiveness of our algorithms compared to existing baselines. View details
    Consensus or Conflict? Fine-Grained Evaluation of Conflicting Answers in Question-Answering
    Eviatar Nachshoni
    Arie Cattan
    Shmuel Amar
    Ori Shapira
    Ido Dagan
    2025
    Preview abstract Large Language Models (LLMs) have demonstrated strong performance in question answering (QA) tasks. However, Multi-Answer Question Answering (MAQA), where a question may have several valid answers, remains challenging. Traditional QA settings often assume consistency across evidences, but MAQA can involve conflicting answers. Constructing datasets that reflect such conflicts is costly and labor-intensive, while existing benchmarks often rely on synthetic data, restrict the task to yes/no questions, or apply unverified automated annotation. To advance research in this area, we extend the conflict-aware MAQA setting to require models not only to identify all valid answers, but also to detect specific conflicting answer pairs, if any. To support this task, we introduce a novel cost-effective methodology for leveraging fact-checking datasets to construct NATCONFQA, a new benchmark for realistic, conflict-aware MAQA, enriched with detailed conflict labels, for all answer pairs. We evaluate eight high-end LLMs on NATCONFQA, revealing their fragility in handling various types of conflicts and the flawed strategies they employ to resolve them. View details
    Preview abstract Computer use agents (CUAs) need to plan long-horizon task workflows grounded in diverse, ever-changing applications and environments, but learning is hindered by the scarcity of large-scale, high-quality training data. Existing datasets are small, domain-specific, and costly to annotate, while current synthetic data generation methods often yield brittle, simplistic, or misaligned task demonstrations. We introduce Watch & Learn (W&L), a framework that transforms human demonstration videos available in the Internet into executable UI trajectories at scale. Inspired by robotics, we train an inverse dynamics model that accurately predicts user actions from consecutive screens, bypassing the need for complex heuristics. To scale to the web, we curate a large state-transition corpus and design a retrieval framework that identifies relevant video tutorials, enabling automatic conversion of raw videos into structured UI trajectories without requiring manual annotations. Beyond training data, we show that the generated UI trajectories can also serve as in-context exemplars, providing CUAs with long-horizon priors and domain-specific knowledge at inference time. On the challenging OSWorld and Mind2Web benchmarks, UI trajectories extracted with W&L consistently improve both general-purpose and state-of-the-art frameworks when used in-context, and delivers stronger gains for open-source models when used in training. These results highlight web-scale human demonstration videos as a practical and scalable foundation for advancing CUAs towards real-world deployment. View details
    Preview abstract Generative AI (GenAI), particularly Large Language Models (LLMs), offer powerful capabilities for interpreting the complex data landscape in healthcare. In this paper, we present a comprehensive overview of the capabilities, requirements and applications of GenAI for deriving clinical insights and improving clinical efficiency. We first provide some background on the forms and sources of patient data, namely real-time Remote Patient Monitoring (RPM) streams and traditional Electronic Health Records (EHR). The sheer volume and heterogeneity of this combined data present significant challenges to clinicians and contribute to information overload. In addition, we explore the potential of LLM-powered applications for improving clinical efficiency. These applications can enhance navigation of longitudinal patient data and provide actionable clinical decision support through natural language dialogue. We discuss the opportunities this presents for streamlining clinician workflows and personalizing care, alongside critical challenges such as data integration complexity, ensuring data quality and RPM data reliability, maintaining patient privacy, validating AI outputs for clinical safety, mitigating bias, and ensuring clinical acceptance. We believe this work represents the first summarization of GenAI techniques for managing clinician data overload due to combined RPM / EHR data complexities. View details
    Automatic Synthesis of Specialized Hash Function
    Renato B Hoffmann
    Leonardo G Fae
    Fernando Magno Quintao Pereira
    Dalvan Grieber
    2025
    Preview abstract Hashing is a fundamental operation in various computer sci- ence applications. Despite the prevalence of specific key formats like social security numbers, MAC addresses, plate numbers, and URLs, hashing libraries typically treat them as general byte sequences. This paper introduces a technique for synthesizing specialized hash functions tailored to par- ticular byte formats. The proposed code generation method leverages three prevalent patterns: (i) fixed-length keys, (ii) keys with common subsequences, and (iii) keys ranging on predetermined sequences of bytes. The code generation pro- cess involves two algorithms: one identifies relevant regular expressions within key examples, and the other generates specialized hash functions based on these expressions. This approach, straightforward to implement, showcases improve- ments over highly optimized hash function implementations. Comparative analysis demonstrates that our synthetic func- tions outperform counterparts in the C++ Standard Template Library and the Google Abseil Library, achieving speedups ranging from 2% to 11%, depending on the key format. View details
    Data Selection for ERMs
    Alexander Shlimovich
    Steve Hanneke
    Amir Yehudayoff
    Shay Moran
    2025
    Preview abstract Learning theory has traditionally followed a model-centric approach, focusing on designing optimal algorithms for a fixed natural learning task (e.g., linear classification or regression). In this paper, we adopt a complementary data-centric perspective, whereby we fix a natural learning rule and focus on optimizing the training data. Specifically, we study the following question: given a learning rule \(\mathcal{A}\) and a data selection budget \(n\), how well can \(\mathcal{A}\) perform when trained on at most \(n\) data points selected from a population of \(N\) points? We investigate when it is possible to select \(n \ll N\) points and achieve performance comparable to training on the entire population. We address this question across a variety of empirical risk minimizers. Our results include optimal data-selection bounds for mean estimation, linear classification, and linear regression. Additionally, we establish two general results: a taxonomy of error rates in binary classification and in stochastic convex optimization. Finally, we propose several open questions and directions for future research. View details
    Multi-Agent Combinatorial Contracts
    Paul Duetting
    Tomer Ezra
    Michal Feldman
    Thomas Kesselheim
    Proceedings of the 2025 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2025), pp. 1857 - 1891
    Preview abstract Combinatorial contracts are emerging as a key paradigm in algorithmic contract design, paralleling the role of combinatorial auctions in algorithmic mechanism design. In this paper we study natural combinatorial contract settings involving teams of agents, each capable of performing multiple actions. This scenario extends two fundamental special cases previously examined in the literature, namely the single-agent combinatorial action model of [Duetting et al., 2021] and the multi-agent binary-action model of [Babaioff et al., 2012, Duetting et al., 2023]. We study the algorithmic and computational aspects of these settings, highlighting the unique challenges posed by the absence of certain monotonicity properties essential for analyzing the previous special cases. To navigate these complexities, we introduce a broad set of novel tools that deepen our understanding of combinatorial contracts environments and yield good approximation guarantees. Our main result is a constant-factor approximation for submodular multi-agent multi-action problems with value and demand oracles access. This result is tight: we show that this problem admits no PTAS (even under binary actions). As a side product of our main result, we devise an FPTAS, with value and demand oracles, for single-agent combinatorial action scenarios with general reward functions, which is of independent interest. We also provide bounds on the gap between the optimal welfare and the principal's utility. We show that, for subadditive rewards, perhaps surprisingly, this gap scales only logarithmically (rather than linearly) in the size of the action space. View details
    DialogLab: Authoring, Simulating, and Testing Dynamic Group Conversations in Hybrid Human-AI Conversations
    Erzhen Hu
    Mingyi Li
    Alex Olwal
    Seongkook Heo
    UIST '25: Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology, ACM (2025), 210:1-20
    Preview abstract Designing compelling multi-party conversations involving both humans and AI agents presents significant challenges, particularly in balancing scripted structure with emergent, human-like interactions. We introduce DialogLab, a prototyping toolkit for authoring, simulating, and testing hybrid human-AI dialogues. DialogLab provides a unified interface to configure conversational scenes, define agent personas, manage group structures, specify turn-taking rules, and orchestrate transitions between scripted narratives and improvisation. Crucially, DialogLab allows designers to introduce controlled deviations from the script—through configurable agents that emulate human unpredictability—to systematically probe how conversations adapt and recover. DialogLab facilitates rapid iteration and evaluation of complex, dynamic multi-party human-AI dialogues. An evaluation with both end users and domain experts demonstrates that DialogLab supports efficient iteration and structured verification, with applications in training, rehearsal, and research on social dynamics. Our findings show the value of integrating real-time, human-in-the-loop improvisation with structured scripting to support more realistic and adaptable multi-party conversation design. View details
    Earth AI: Unlocking Geospatial Insights with Foundation Models and Cross-Modal Reasoning
    Aaron Bell
    Aviad Barzilai
    Roy Lee
    Gia Jung
    Charles Elliott
    Adam Boulanger
    Amr Helmy
    Jacob Bien
    Ruth Alcantara
    Nadav Sherman
    Hassler Thurston
    Yotam Gigi
    Bolous Jaber
    Vered Silverman
    Luke Barrington
    Tim Thelin
    Elad Aharoni
    Kartik Hegde
    Yuval Carny
    Shravya Shetty
    Yehonathan Refael
    Stone Jiang
    David Schottlander
    Juliet Rothenberg
    Luc Houriez
    Yochai Blau
    Joydeep Paul
    Yang Chen
    Yael Maguire
    Aviv Slobodkin
    Shlomi Pasternak
    Alex Ottenwess
    Jamie McPike
    Per Bjornsson
    Natalie Williams
    Reuven Sayag
    Thomas Turnbull
    Ali Ahmadalipour
    David Andre
    Amit Aides
    Ean Phing VanLee
    Niv Efron
    Monica Bharel
    arXiv (preprint 2025), arXiv, arXiv:2510.18318 https://doi.org/10.48550/arXiv.2510.18318 (2025)
    Preview abstract Geospatial data offers immense potential for understanding our planet. However, the sheer volume and diversity of this data along with its varied resolutions, timescales, and sparsity pose significant challenges for thorough analysis and interpretation. The emergence of Foundation Models (FMs) and Large Language Models (LLMs) offers an unprecedented opportunity to tackle some of this complexity, unlocking novel and profound insights into our planet. This paper introduces a comprehensive approach to developing Earth AI solutions, built upon foundation models across three key domains—Planet-scale Imagery, Population, and Environment—and an intelligent Gemini-powered reasoning engine. We present rigorous benchmarks showcasing the power and novel capabilities of our foundation models and validate that they provide complementary value to improve geospatial inference. We show that the synergy between these models unlocks superior predictive capabilities. To handle complex, multi-step queries, we developed a Gemini-powered agent that jointly reasons over our multiple foundation models along with large geospatial data sources and tools to unlock novel geospatial insights. On a new benchmark of real-world crisis scenarios, our agent demonstrates the ability to deliver critical and timely insights, effectively bridging the gap between raw geospatial data and actionable understanding. View details
    Preview abstract We give a new privacy amplification analysis for truncated Poisson sampling, a Poisson sampling variant that truncates a batch if it exceeds a given maximum batch size. View details