Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10822 publications
    Productionizing Quantum Mass Production
    Bill Huggins
    Nathan Wiebe
    arXiv for now (2026) (to appear)
    Preview abstract For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step. View details
    FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration
    Diganta Misra
    Yanqi Luo
    Anjali Sridhar
    Justine Gehring
    Silvio Soares Ribeiro Junior
    2026
    Preview abstract AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization. View details
    Preview abstract Tesseract is a Most-Likely-Error decoder designed for quantum error-correcting codes. Tesseract conducts a search through an graph on the set of all subsets of errors to find the lowest cost subset of errors consistent with the input syndrome. Although this set is exponentially large, the search can be made efficient in practice for random errors using A* along with a variety of pruning heuristics. We show through benchmark circuits for surface, color, and bivariate-bicycle codes that Tesseract is competitive with integer programming-based decoders at moderate physical error rates. Finally, we compare surface and bivariate bicycle codes using most-likely error decoding View details
    Preview abstract Mg(OH)2 holds potential as an alkalinity source for Ocean Alkalinity Enhancement (OAE). It is a current byproduct of desalination treatment through the alkalinity exchange of electrochemically derived NaOH to the Mg-rich reverse osmosis reject brine. Characterization found no chemical composition difference among seawater-precipitated and industrial sourced Mg(OH)2 with both having high (>98%) purity. Differences were found with the crystallinity with industrial sources containing a higher degree of crystallinity of 0.83-0.85 compared to 0.16-0.33 for seawater-precipitated paste. Mg(OH)2 with a higher degree of crystallinity (>80%) had significantly slower dissolution rates than Mg(OH)2 with a lower degree of crystallinity (<20%). Results revealed that there is a strong inverse relation between degree of crystallinity and dissolution rate of both seawater-precipitated and industrial sourced Mg(OH)2. Seawater39 precipitated Mg(OH)2, with its similar purity to industrial sources yet faster and more complete dissolution and alkalinity release, could hold an advantage over other alkalinity sources for OAE applications with its seemingly tunable dissolution kinetics. View details
    Preview abstract Large-language models and large-vision models are increasingly capable of solving compositional reasoning tasks, as measured by breakthroughs in visual-question answering benchmarks. However, state-of-the-art solutions often involve careful construction of large pre-training and fine-tuning datasets, which can be expensive. The use of external tools, whether other ML models, search engines, or APIs, can significantly improve performance by breaking down high-level reasoning questions into sub-questions that are answerable by individual tools, but this approach has similar dataset construction costs to teach fine-tuned models how to use the available tools. We propose a technique in which existing training sets can be directly used for constructing computational environments with task metrics as rewards. This enables a model to autonomously teach itself to use itself or another model as a tool. By doing so, we augment training sets by integrating external signals. The proposed method starts with zero-shot prompts and iteratively refines them by selecting few-shot examples that maximize the task metric on the training set. Our experiments showcase how Gemini learns how to use itself, or another smaller and specialized model such as ScreenAI, to iteratively improve performance on training sets. Our approach successfully generalizes and improves upon zeroshot performance on charts, infographics, and document visual question-answering datasets. View details
    The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning
    Pratik Fegade
    Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (2025), pp. 5-17
    Preview abstract A chief enabler of large-scale deep learning is the distribution of computation across multiple interconnected hardware accelerators. In order to unlock the maximum possible performance, a compiler must first select a reasonable strategy to parallelize a model's operations. Since neural network architectures admit multiple flavors of parallelism, determining the proper strategy for each instruction is a critical (albeit non-trivial) task. To solicit new ideas toward solving this challenging combinatorial optimization problem, we organized the ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning, a multi-month competition focused on advancing the state-of-the-art for model partitioning algorithms. In this paper, we offer a retrospective of this event, including the basic problem formulation, key challenges & opportunities, our new benchmark suite, and the quality of submissions received. View details
    Preview abstract Bipartite ranking is a fundamental supervised learning problem, with the goal of learning a ranking over instances with maximal area under the ROC curve (AUC) against a single binary target label. However, one may often observe multiple binary target labels, e.g., from distinct human annotators. How can one synthesize such labels into a single coherent ranking? In this work, we formally analyze two approaches to this problem—loss aggregation and label aggregation—by characterizing their Bayes-optimal solutions. We show that while both approaches can yield Pareto-optimal solutions, loss aggregation can exhibit label dictatorship: one can inadvertently (and undesirably) favor one label over others. This suggests that label aggregation can be preferable to loss aggregation, which we empirically verify. View details
    Evaluating the Robustness of a Production Malware Detection System to Transferable Adversarial Attacks
    Milad Nasr
    Yanick Fratantonio
    Ange Albertini
    Loua Farah
    Alexandre Petit-Bianco
    Nicholas Carlini
    ACM Conference on Computer and Communications Security (CCS) (2025)
    Preview abstract As deep learning models become widely deployed as components within larger production systems, their individual shortcomings can create system-level vulnerabilities with real-world impact. This paper studies how adversarial attacks targeting an ML component can degrade or bypass an entire production-grade malware detection system, performing a case study analysis of Gmail’s pipeline where file-type identification relies on a ML model. The malware detection pipeline in use by Gmail contains a machine learning model that routes each potential malware sample to a specialized malware classifier to improve accuracy and performance. This model, called Magika, has been open sourced. By designing adversarial examples that fool Magika, we can cause the production malware service to incorrectly route malware to an unsuitable malware detector thereby increasing our chance of evading detection. Specifically, by changing just 13 bytes of a malware sample, we can successfully evade Magika in 90% of cases and thereby allow us to send malware files over Gmail. We then turn our attention to defenses, and develop an approach to mitigate the severity of these types of attacks. For our defended production model, a highly resourced adversary requires 50 bytes to achieve just a 20% attack success rate. We implement this defense, and, thanks to a collaboration with Google engineers, it has already been deployed in production for the Gmail classifier. View details
    Preview abstract We propose a principled method to synthesize high-quality multi-turn function calling trajectories to align large language model (LLM)-based agents. We start with iteratively building function calling graph and defining node operations to increase its complexity. This enables us to construct reliable reference. Then, based on the synthesized function calling graph, we adopt back-and-forth translation to first construct multi-turn user queries and then, fill in the function arguments with information in the query. We sample positive trajectories that distill the function graph reference and negative trajectories that contrast with the positive trajectories in targeted loss patterns in multi-turn scenarios. Training with the positive trajectories with supervised fine-tuning and preference optimization against negative trajectories, we obtain 67.42 on BFCL and 71.7 on ToolQuery with an open-sourced model with 14B parameters, surpassing the performance of strong proprietary models like o1. View details
    Streaming Attention Approximation via Discrepancy Theory
    Michael Kapralov
    Insu Han
    Ekaterina Kochetkova
    Kshiteej Sheth
    Amir Zandieh
    2025
    Preview abstract Large language models (LLMs) have achieved impressive success, but their high memory requirements present challenges for long-context token generation. In this paper we study the streaming complexity of attention approximation, a key computational primitive underlying token generation. Our main contribution is BalanceKV, a streaming algorithm for ϵ-approximating attention computations based on geometric process for selecting a balanced collection of Key and Value tokens as per Banaszczyk's vector balancing theory. We complement our algorithm with space lower bounds for streaming attention computation. Besides strong theoretical guarantees, BalanceKV exhibits empirically validated performance improvements over existing methods, both for attention approximation and end-to-end performance on various long context benchmarks. View details
    Correlated Error Bursts in a Gap-Engineered Superconducting Qubit Array
    John Mark Kreikebaum
    Leigh Martin
    Vlad Kurilovich
    Wojtek Mruczkiewicz
    Lara Faoro
    Gabrielle Roberts
    Alex Opremcak
    Alec Eickbusch
    Igor Aleiner
    Yu Chen
    arXiv (2025)
    Preview abstract One of the roadblocks towards the implementation of a fault-tolerant superconducting quantum processor is impacts of ionizing radiation with the qubit substrate. Such impacts temporarily elevate the density of quasiparticles (QPs) across the device, leading to correlated qubit error bursts. The most damaging errors – T1 errors – stem from QP tunneling across the qubit Josephson junctions (JJs). Recently, we demonstrated that this type of error can be strongly suppressed by engineering the profile of superconducting gap at the JJs in a way that prevents QP tunneling. In this work, we identify a new type of correlated error that persists in the presence of gap engineering. We observe that impacts shift the frequencies of the affected qubits, and thus lead to correlated phase errors. The frequency shifts are systematically negative, reach values up to 3 MHz, and last for ~1 ms. We provide evidence that the shifts originate from the QP-qubit interactions in the JJ region. Further, we experimentally demonstrate these correlated phase errors are detrimental to the performance of quantum error correction protocols. View details
    Preview abstract The solution of linear systems of equations is the basis of many other quantum algorithms, and recent results provided an algorithm with optimal scaling in both the condition number κ and the allowable error ϵ [PRX Quantum 3, 0403003 (2022)]. That work was based on the discrete adiabatic theorem, and worked out an explicit constant factor for an upper bound on the complexity. Here we show via numerical testing on random matrices that the constant factor is in practice about 1,200 times smaller than the upper bound found numerically in the previous results. That means that this approach is far more efficient than might naively be expected from the upper bound. In particular, it is over an order of magnitude more efficient than using a randomized approach from [arXiv:2305.11352] that claimed to be more efficient. View details
    Preview abstract When dividing items among agents, two of the most widely studied fairness notions are envy-freeness and proportionality. We consider a setting where m chores are allocated to n agents and the disutility of each chore for each agent is drawn from a probability distribution. We show that an envy-free allocation exists with high probability provided that m ≥ 2n, and moreover, m must be at least n + Θ(n) in order for the existence to hold. On the other hand, we prove that a proportional allocation is likely to exist as long as m = ω(1), and this threshold is asymptotically tight. Our results reveal a clear contrast with the allocation of goods, where a larger number of items is necessary to ensure existence for both notions. View details
    Score-based Causal Representation Learning: Linear and General Transformations
    Abhishek Kumar
    Emre Acarturk
    Ali Tajer
    Burak Varici
    Journal of Machine Learning Research (JMLR) 2025 (2025)
    Preview abstract This paper addresses intervention-based causal representation learning (CRL) under a general nonparametric latent causal model and an unknown transformation that maps the latent variables to the observed variables. Linear and general transformations are investigated. The paper addresses both the identifiability and achievability aspects. Identifiability refers to determining algorithm-agnostic conditions that ensure recovering the true latent causal variables and the latent causal graph underlying them. Achievability refers to the algorithmic aspects and addresses designing algorithms that achieve identifiability guarantees. By drawing novel connections between score functions (i.e., the gradients of the logarithm of density functions) and CRL, this paper designs a score-based class of algorithms that ensures both identifiability and achievability. First, the paper focuses on linear transformations and shows that one stochastic hard intervention per node suffices to guarantee identifiability. It also provides partial identifiability guarantees for soft interventions, including identifiability up to ancestors for general causal models and perfect latent graph recovery for sufficiently non-linear causal models. Secondly, it focuses on general transformations and shows that two stochastic hard interventions per node suffice for identifiability. Notably, one does not need to know which pair of interventional environments have the same node intervened. View details
    Enhanced $H$-Consistency Bounds
    Anqi Mao
    Proceedings of the 36th International Conference on Algorithmic Learning Theory (ALT 2025)
    Preview abstract Recent research has introduced a key notion of $H$-consistency bounds for surrogate losses. These bounds offer finite-sample guarantees, quantifying the relationship between the zero-one estimation error (or other target loss) and the surrogate loss estimation error for a specific hypothesis set. However, previous bounds were derived under the condition that a lower bound of the surrogate loss conditional regret is given as a convex function of the target conditional regret, without non-constant factors depending on the predictor or input instance. Can we derive finer and more favorable $H$-consistency bounds? In this work, we relax this condition and present a general framework for establishing *enhanced $H$-consistency bounds* based on more general inequalities relating conditional regrets. Our theorems not only subsume existing results as special cases but also enable the derivation of more favorable bounds in various scenarios. These include standard multi-class classification, binary and multi-class classification under Tsybakov noise conditions, and bipartite ranking. View details
    ×