Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10822 publications
Preview abstract
For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step.
View details
Preview abstract
AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization.
View details
Machine learning the effects of many quantum measurements
Wanda Hou
Samuel Garratt
Norhan Eassa
Yi-Zhuang You
Ehud Altman
arXiv (2025)
Preview abstract
Measurements are essential for the processing and protection of information in quantum computers. They can also induce long-range entanglement between unmeasured qubits. However, when post-measurement states depend on many non-deterministic measurement outcomes, there is a barrier to observing and using the entanglement induced by prior measurements. Here we demonstrate a new approach for detecting such measurement-induced entanglement. We create short-range entangled states of one- and two-dimensional arrays of qubits in a superconducting quantum processor, and aim to characterize the long-range entanglement induced between distant pairs of qubits when we measure all of the others. To do this we use unsupervised training of neural networks on observations to create computational models for post-measurement states and, by correlating these models with experimental data, we reveal measurement-induced entanglement. Our results additionally demonstrate a transition in the ability of a classical agent to accurately model the experimental data; this is closely related to a measurement-induced phase transition. We anticipate that our work can act as a basis for future experiments on quantum error correction and more general problems in quantum control.
View details
Preview abstract
AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative-but their effectiveness has not been systematically evaluated. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI agents on project-level Java migrations, with a specific focus on measuring an agent's ability to preserve program semantics and avoid reward hacking, which we argue requires projects with high test coverage for a rigorous and reliable evaluation. We benchmark several state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 52.3 percent of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. Our empirical study reveals failure modes of current AI agents in realistic Java modernization tasks, providing a foundation for evaluating trustworthy code-migration systems. By releasing FreshBrew, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization.
View details
RemapRoute: Local Remapping of Internet Path Changes
renata cruz teixeira
Christophe Diot
italo cunha
Elverton Fazzion
Darryl Veitch
2025
Preview abstract
Several systems rely on traceroute to track a large number of Internet paths as they change over time. Monitoring systems perform this task by remapping paths periodically or whenever a change is detected. This paper shows that such complete remapping is inefficient, because most path changes are localized to a few hops of a path. We develop RemapRoute, a tool to remap a path locally given the previously known path and a change point. RemapRoute sends targeted probes to locate and remap the often few hops that have changed. Our evaluation with trace-driven simulations and in a real deployment shows that local remapping reduces the average number of probes issued during remapping by 63% and 79%, respectively, when compared with complete remapping. At the same time, our results show that local remapping has little impact on the accuracy of inferred paths.
View details
Integer Programming for Generalized Causal Bootstrap Designs
Adel Javanmard
Nick Doudchenko
Proceedings of the 42 nd International Conference on Machine Learning (2025)
Preview abstract
In experimental causal inference, we distinguish between two sources of uncertainty: design uncertainty, due to the treatment assignment mechanism, and sampling uncertainty, when the sample is drawn from a super-population. This distinction matters in settings with small fixed samples and heterogeneous treatment effects, as in geographical experiments. The standard bootstrap procedure most often used by practitioners primarily estimates sampling uncertainty, and the causal bootstrap procedure, which accounts for design uncertainty, was developed for the completely randomized design and the difference-in-means estimator, whereas non-standard designs and estimators are often used in these low-power regimes. We address this gap by proposing an integer program which computes numerically the worst-case copula used as an input to the causal bootstrap method, in a wide range of settings. Specifically, we prove the asymptotic validity of our approach for unconfounded, conditionally unconfounded,
and individualistic with bounded confoundedness assignments, as well as generalizing to any linear-in-treatment and quadratic-in-treatment estimators. We demonstrate the refined confidence intervals achieved through simulations of small geographical experiments.
View details
Preview abstract
Virtual hand representation in Head-Mounted Displays (HMDs) offers immersive and intuitive interactions in Virtual Reality (VR). However, current hand tracking algorithms are prone to errors, which can disrupt the user experience and hinder task performance. This paper presents a novel method for providing users with visual feedback when the quality of hand tracking decreases. Our approach employs a notification modal that warns users of potential failures. We identified three common hand tracking failure scenarios and evaluated the effectiveness of our method in two distinct VR tasks: object manipulation and complex assembly tasks. Results show that our early warning system reduces task completion time, lowers hand-tracking failures by up to 83%, decreases errors, improves system usability, and reduces cognitive load. This work contributes to the development of more robust and user-friendly VR HMD applications by enhancing hand tracking reliability, usability, and workload.
View details
Bridging Fairness and Uncertainty: Theoretical Insights and Practical Strategies for Equalized Coverage in GNNs
Longfeng Wu
Yao Zhou
Jian Kang
Dawei Zhou
2025
Preview abstract
Graph Neural Networks (GNNs) have become indispensable tools in many domains, such as social network analysis, financial fraud detection, and drug discovery. Prior research primarily concentrated on improving prediction accuracy while overlooking how reliable the model predictions are. Conformal prediction on graphs emerges as a promising solution, offering statistically sound uncertainty estimates with a pre-defined coverage level. Despite the promising progress, existing works only focus on achieving model coverage guarantees without considering fairness in the coverage within different demographic groups. To bridge the gap between conformal prediction and fair coverage across different groups, we pose the fundamental question: Can fair GNNs enable the uncertainty estimates to be fairly applied across demographic groups? To answer this question, we provide a comprehensive analysis of the uncertainty estimation in fair GNNs employing various strategies. We prove theoretically that fair GNNs can enforce consistent uncertainty bounds across different demographic groups, thereby minimizing bias in uncertainty estimates. Furthermore, we conduct extensive experiments on five commonly used datasets across seven state-of-the-art fair GNN models to validate our theoretical findings. Additionally, based on the theoretical and empirical insights, we identify and analyze the key strategies from various fair GNN models that contribute to ensuring equalized uncertainty estimates. Our work estimates a solid foundation for future exploration of the practical implications and potential adjustments needed to enhance fairness in GNN applications across various domains.
View details
Development and Evaluation of ML Models for Cardiotocography Interpretation
Nicole Chiou
Nichole Young-Lin
Abdoulaye Diack
Christopher Kelly
Sanmi Koyejo
NPJ Women's Health (2025)
Preview abstract
The inherent variability in the visual interpretation of cardiotocograms (CTGs) by obstetric clinical experts, both intra- and inter-observer, presents a substantial challenge in obstetric care. In response, we investigate automated CTG interpretation as a potential solution to enhance the early detection of fetal hypoxia during labor, thereby reducing unnecessary operative interventions and improving overall maternal and neonatal care. This study employs deep learning techniques to reduce the subjectivity associated with visual CTG interpretation. Our results demonstrate that employing objective cord blood pH measurements, rather than clinician-defined Apgar scores, yields more consistent and robust model performance. Additionally, through a series of ablation studies, we investigate the impact of temporal distribution shifts on the performance of these deep learning models. We examine tradeoffs between performance and fairness, specifically evaluating performance across demographic and clinical subgroups. Finally, we discuss the practical implications of our findings for the real-world deployment of such systems, emphasizing their potential utility in medical settings with limited resources.
View details
Preview abstract
Despite exceptional achievements, training neural networks remains computationally expensive and is often plagued by instabilities that can degrade convergence. While learning rate schedules can help mitigate these issues, finding optimal schedules is time-consuming and resource-intensive. This work explores theoretical issues concerning training stability in the constant-learning-rate (i.e., without schedule) and small-batch-size regime. Surprisingly, we show that the order of gradient updates affects stability and convergence in gradient-based optimizers. We illustrate this new line of thinking using backward-SGD, which processes batch gradient updates like SGD but in reverse order. Our theoretical analysis shows that in contractive regions (e.g., around minima) backward-SGD converges to a point while the standard forward-SGD generally only converges to a distribution. This leads to improved stability and convergence which we demonstrate experimentally. While full backward-SGD is computationally intensive in practice, it highlights opportunities to exploit reverse training dynamics (or more generally alternate iteration orders) to improve training. To our knowledge, this represents a new and unexplored avenue in deep learning optimization.
View details
Preview abstract
During remote communication, participants often share both digital and physical content, such as product designs, digital assets, and environments, to enhance mutual understanding. Recent advances in augmented communication have facilitated users to swiftly create and share digital 2D copies of physical objects from video feeds into a shared space. However, conventional 2D representations of digital objects limits spatial referencing in immersive environments. To address this, we propose Thing2Reality, an Extended Reality (XR) meeting platform that facilitates spontaneous discussions of both digital and physical items during remote sessions. With Thing2Reality, users can quickly materialize ideas or objects in immersive environments and share them as conditioned multiview renderings or 3D Gaussians. Thing2Reality enables users to interact with remote objects or discuss concepts in a collaborative manner. Our user studies revealed that the ability to interact with and manipulate 3D representations of objects significantly enhances the efficiency of discussions, with the potential to augment discussion of 2D artifacts.
View details
Preview abstract
Audio description (AD) narrates important visual details which are played during dialogue gaps in video soundtracks, making them accessible to blind and low vision (BLV) audiences. AD professionals (producers, writers, narrators, mixers, and quality control specialists) possess expert knowledge of AD development and the constraints that affect their work. However, their perspectives remain largely absent in AD research. We present interviews with 17 AD professionals (8 BLV), detailing their workflows to produce AD for recorded media and live theater. We additionally explore their perspectives on recent changes impacting their work, revealing tensions between advocacy for culturally-competent AD and the rise of automations--some beneficial, others with concerning implications for AD quality. Highlighting these tensions, we offer research directions to support AD professionals, and we pose guiding questions for AD and AI innovators on preserving the high-quality human touch professionals consider fundamental to the accessibility provision.
View details
Amplifying Trans and Nonbinary Voices: A Community-Centred Harm Taxonomy for LLMs
Eddie Ungless
Beka Gulotta
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (2025)
Preview abstract
We explore large language model (LLM) responses that may negatively impact the transgender and nonbinary (TGNB) community and introduce the Transing Transformers Toolkit, T3, which provides resources for identifying such harmful response behaviors. The heart of T3 is a community-centred taxonomy of harms, developed in collaboration with the TGNB community, which we complement with, amongst other guidance, suggested heuristics for evaluation. To develop the taxonomy, we adopted a multi-method approach that included surveys and focus groups with community experts. The contribution highlights the importance of community-centred approaches in mitigating harm, and outlines pathways for LLM developers to improve how their models handle TGNB-related topics.
View details
Preview abstract
Browser fingerprinting is an online tracking technique that is being increasingly adopted for profiling and ad targeting purposes. While prior work has analyzed the prevalence and impact of browser fingerprinting on the Web, they have traditionally relied on large-scale automated crawls. Naturally, these cannot replicate real-human interactions, e.g., solve CAPTCHAs, evade bot detectors, or operate behind login pages and paywalls. This prompts the question as to whether or not the fingerprinting ecosystem is appreciably different in real-world browsing sessions. In this paper, we begin to address this question by designing and conducting a user study aimed at collecting actual telemetry data from real browsing sessions of 30 users.
We find that almost half of the fingerprinting websites identified from real user browsing sessions are missed by equivalent automated crawls. This is mainly due to the inability of automated crawls to identify and visit authentication pages, being blocked by bot detectors, and/or failing to perform user interactions that specifically trigger browser fingerprinting scripts. We also find new fingerprinting vectors that are consistently present in fingerprinting scripts captured by real user browsing sessions yet missing from automated crawls. Finally, we assess the feasibility of collecting fingerprinting training data in a privacy-preserving way. We conclude that private models built on real user browsing sessions can detect browser fingerprinting more effectively than models trained on automated crawls alone, while simultaneously providing strong privacy guarantees to users.
View details
Preview abstract
Measuring productivity is equivalent to building a model. All models are wrong, but some are useful. Productivity models are often “worryingly selective” (wrong because of omissions). Worrying selectivity can be combated by taking a holistic approach that includes multiple measurements of multiple outcomes. Productivity models should include multiple outcomes, metrics, and methods.
View details