Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10820 publications
Preview abstract
For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step.
View details
Preview abstract
AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization.
View details
mmMUSE: An mmWave-based Motion-resilient Universal Speech Enhancement System
Chenming He
Yanyong Zhang
Kai Wang
Dequan Wang
Lingyu Wang
the Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), ACM (2026) (to appear)
Preview abstract
Voice-based smart systems can greatly enhance user experiences by allowing higher-quality interactions through better voice perception. Speech enhancement can benefit such systems by isolating noise from speech. Recently, integrating millimeter-wave (mmWave) with audio for speech perception has gained increasing attention due to microphones' limitations in noisy environments. However, mmWave-based vocal extraction is severely affected by motion, which disperses vocal signals across ranges and introduces distortions. In this paper, we propose an mmWave-based motion-resilient universal speech enhancement system called mmMUSE, which fuses mmWave and audio signals. To mitigate motion interference, we develop a Doppler-based method for motion-robust vocal signal extraction. Moreover, by introducing the Vocal-Noise-Ratio metric to assess the prominence of vocal signals from mmWave, we achieve real-time voice activity detection that gains 3.81 dB of SISDR in noisy speeches. Additionally, we design a two-stage complex-valued network that includes an attention-based fusion network for cross-modal complementing and a time-frequency masking network for correcting amplitude and phase of speech to isolate noises.
Using mmWave and audio datasets from 46 participants, mmMUSE outperforms the state-of-the-art speech enhancement models, achieving an average SISDR improvement of 3.12 dB. Additionally, mmMUSE achieves SISDR improvements of 16.51 dB, 17.93 dB, 14.93 dB, and 18.95 dB in controlled environments involving intense noise, extensive motion, multiple speakers, and various obstructive materials, respectively. Finally, we evaluate mmMUSE in real-world scenarios including running, public spaces, and driving, maintaining a word error rate (WER) below 10%.
View details
Preview abstract
When large language models (LLMs) use in-context learning (ICL) to solve a new task, they must infer latent concepts from demonstration examples. This raises the question of whether and how transformers represent latent structures as part of their computation. Our work experiments with several controlled tasks, studying this question using mechanistic interpretability. First, we show that in transitive reasoning tasks with a latent, discrete concept, the model successfully identifies the latent concept and does step-by-step concept composition. This builds upon prior work that analyzes single-step reasoning. Then, we consider tasks parameterized by a latent numerical concept. We discover low-dimensional subspaces in the model's representation space, where the geometry cleanly reflects the underlying parameterization. Overall, we show that small and large models can indeed disentangle and utilize latent concepts that they learn in-context from a handful of abbreviated demonstrations.
View details
Preview abstract
This paper investigates the theoretical underpinnings of the widely successful
pretrain-then-adapt strategy for foundation models. We introduce a Bayesian
model selection criterion, termed the downstream free energy, which quantifies
the adaptability of a pretrained checkpoint by measuring, under the downstream
data distribution, the concentration of favorable solutions near the checkpoint.
However, minimizing this downstream free energy is infeasible without access to
downstream data. To address this, we show that under certain conditions, mini-
mizing the upstream free energy – which can be estimated using only upstream
data – can serve as a reliable proxy. We validate this theoretical insight through
preliminary experiments, showing that commonly used pretraining heuristics ef-
fectively lower upstream free energy, leading to better downstream performance.
View details
Preview abstract
Google has a long tradition of open-source software, which encompasses the field of operations research with OR-Tools. In development since 2008, it offers several solvers useful to many OR practitioners:
- PDLP, a revolutionary first-order linear solver that is reshaping the landscape of linear optimisation;
- CP-SAT, an award-winning constraint-programming solver;
- Glop, an accurate linear solver;
- Routing, a vehicle routing solver underpinning Google Maps Platform Route Optimization.
OR-Tools has long had its features accessible from other languages: the core algorithms are implemented in C++ for performance, but users can tap into them in Python, Java, C#, or Go.
It is recently available in Julia too, with a current focus on the linear and constraint solvers, either locally or remotely.
We provide a wrapper for our solvers that brings them to JuMP.jl through MathOptInterface.jl.
This tutorial will walk you through the features of OR-Tools and its solvers, then show examples of using OR-Tools from within Julia, either through JuMP or a lower-level interface.
We will also share our experience of C++-Julia interop.
View details
Digital Shadow AI Risk Theoretical Framework (DART): A Framework for Managing Data Disclosure and Privacy Risks of AI tools at Work
Master's Thesis (2025) (to appear)
Preview abstract
The accelerated integration of generative AI technologies and agentic AI tools, particularly those like ChatGPT, into workplace settings has introduced complex challenges concerning data governance, regulatory compliance, and organizational privacy (GDPR 2016; CCPA/CPRA). This study introduces the Digital Shadow AI Risk Theoretical Framework (DART)—a novel theoretical framework designed to systematically identify, classify, and address the latent risks arising from the widespread, and often unregulated, use of AI systems in professional environments (NIST, 2023; OECD AI Policy Observatory, 2023). DART introduces six original, interrelated constructs developed in this study: Unintentional Disclosure Risk, Trust-Dependence Paradox, Data Sovereignty Conflict, Knowledge Dilution Phenomenon, Ethical Black Box Problem, and Organizational Feedback Loops. Each construct reflects a unique dimension of risk that emerges as organizations increasingly rely on AI-driven tools for knowledge work and decision-making.
The framework is empirically tested through a mixed-methods research design involving hypothesis testing and statistical analysis of behavioral data gathered from cross-sectional surveys of industry professionals. Two cross-industry surveys (Survey-1: 416 responses, 374 analyzed; Survey-2: 203 responses, 179 analyzed) and CB-SEM tests supported seven of eight hypotheses; H4 (sovereignty) was not significant; H7 (knowledge dilution) was confirmed in replication. The findings highlight critical gaps in employee training, policy awareness, and risk mitigation strategies—underscoring the urgent need for updated governance frameworks, comprehensive AI-use policies, and targeted educational interventions. This paper contributes to emerging scholarship by offering a robust model for understanding and mitigating digital risks in AI-enabled workplaces, providing practical implications for compliance officers, risk managers, and organizational leaders aiming to harness the benefits of generative AI responsibly and securely. The novelty of DART lies in its explicit theorization of workplace-level behavioral risks—especially Shadow AI, which unlike Shadow IT externalizes organizational knowledge into adaptive systems—thereby offering a unified framework that bridges fragmented literatures and grounds them in empirical evidence.
View details
Principled Algorithms for Optimizing Generalized Metrics in Binary Classification
Anqi Mao
Proceedings of the 42nd International Conference on Machine Learning (ICML 2025)
Preview abstract
In applications with significant class imbalance or asymmetric costs, metrics such as the $F_\beta$-measure, AM measure, Jaccard similarity coefficient, and weighted accuracy offer more suitable evaluation criteria than standard binary classification loss. However, optimizing these metrics present significant computational and statistical challenges. Existing approaches often rely on the characterization of the Bayes-optimal classifier, and use threshold-based methods that first estimate class probabilities and then seek an optimal threshold. This leads to algorithms that are not tailored to restricted hypothesis sets and lack finite-sample performance guarantees. In this work, we introduce principled algorithms for optimizing generalized metrics, supported by $H$-consistency and finite-sample generalization bounds. Our approach reformulates metric optimization as a generalized cost-sensitive learning problem, enabling the design of novel surrogate loss functions with provable $H$-consistency guarantees. Leveraging this framework, we develop new algorithms, METRO (*Metric Optimization*), with strong theoretical performance guarantees. We report the results of experiments demonstrating the effectiveness of our methods compared to prior baselines.
View details
Fast Tensor Completion via Approximate Richardson Iteration
Mehrdad Ghadiri
Yunbum Kook
Ali Jadbabaie
Proceedings of the 42nd International Conference on Machine Learning (2025)
Preview abstract
We study tensor completion (TC) through the lens of low-rank tensor decomposition (TD). Many TD algorithms use fast alternating minimization methods, which solve highly structured linear regression problems at each step (e.g., for CP, Tucker, and tensor-train decompositions). However, such algebraic structure is lost in TC regression problems, making direct extensions unclear. To address this, we propose a lifting approach that approximately solves TC regression problems using structured TD regression algorithms as blackbox subroutines, enabling sublinear-time methods. We theoretically analyze the convergence rate of our approximate Richardson iteration based algorithm, and we demonstrate on real-world tensors that its running time can be 100x faster than direct methods for CP completion.
View details
Preview abstract
Creativity in software development is frequently overlooked, specifically
in the design of developer tools which often focus on productivity. This is likely
because creativity is not always seen as a goal in software engineering; in part,
this can be explained by the unique way in which software engineers relate to
creativity as centered around reusability rather than novelty. However, creativity is
a critical aspect of software engineering, and importantly, there is a clear
possibility for AI to impact creativity in both positive or negative ways. In this
article, we explore the differences in goals for designing AI tools for productivity
compared to creativity and propose strategies to elevate creativity in the software
engineering workflow. Specifically, we apply seamful design to AI powered
software development to consider the role of seamfulness in software
development workflows as a way to support creativity.
View details
Preview abstract
Differential privacy can be achieved in a distributed manner, where multiple parties add independent noise such that their sum protects the overall dataset with differential privacy. A common technique here is for each party to sample their noise from the decomposition of an infinitely divisible distribution. We introduce two novel mechanisms in this setting: 1) the generalized discrete Laplace (GDL) mechanism, whose distribution (which is closed under summation) follows from differences of i.i.d. negative binomial shares, and 2) The multi-scale discrete Laplace (MSDLap) mechanism, which follows the sum of multiple i.i.d. discrete Laplace shares at different scales. The mechanisms can be parameterized to have 𝑂(Δ^3𝑒^{−𝜀}) and 𝑂 (min(Δ^3𝑒^{−𝜀}, Δ^2𝑒^{−2𝜀/3})) MSE, respectively, where the latter bound matches known optimality results. Furthermore, the MSDLap mechanism has the optimal MSE including constants as 𝜀 → ∞. We also show a transformation from the discrete setting to the continuous setting, which allows us to transform both mechanisms to the continuous setting and thereby achieve the optimal 𝑂 (Δ^2𝑒^{−2𝜀/3}) MSE. To our knowledge, these are the first infinitely divisible additive noise mechanisms that achieve order-optimal MSE under pure differential privacy for either the discrete or continuous setting, so our work shows formally there is no separation in utility when query-independent noise adding mechanisms are restricted to infinitely divisible noise. For the continuous setting, our result improves upon Pagh and Stausholm’s Arete distribution which gives an MSE of 𝑂(Δ^2𝑒^{−𝜀/4}) [35]. We apply our results to improve a state of the art multi-message shuffle DP protocol from [3] in the high 𝜀 regime.
View details
Preview abstract
As the demand for data and digital services continues to escalate, data centers are evolving
into key players in the global energy consumption landscape. The necessity for sustainability
and energy efficiency in these facilities has led to the integration of Artificial Intelligence
(AI) technologies. This paper explores emerging AI trends that are shaping sustainable data
centers, focusing on optimization, predictive analytics, and machine learning applications,
along with their implications for operational efficiency and environmental impact. The rapid
growth of artificial intelligence (AI) has significantly impacted data center operations,
driving the need for sustainable practices. Emerging trends such as AI-driven energy
optimization, renewable energy integration, and advanced cooling technologies are
reshaping the industry. These innovations aim to reduce energy consumption, minimize
carbon footprints, and enhance operational efficiency. By leveraging AI, data centers can
predict maintenance needs, optimize energy usage, and adapt to real-time demands. This
paper explores the intersection of AI and sustainability, highlighting how these
advancements contribute to a more eco-friendly and efficient future for data centers.
View details
Scalability of Generative AI Models: Challenges and Opportunities in Large-Scale Data Generation and Training
International Journal of Computer Science and Information Technology Research (IJCSITR) (2025)
Preview abstract
Scalability of Generative AI Models: Challenges and Opportunities in Large-Scale Data Generation and Training
View details
Preview abstract
This tutorial examines the progress and scaling limitations of IM-DD based optical technologies and explores how datacenter use cases optimized coherent technology, including a newly proposed polarization-folding, time-diversity approach and a novel single-sideband coherent detection technology—can address some of these challenges
View details
Preview abstract
In-context Ranking (ICR) is an emerging paradigm for Information Retrieval (IR), which leverages contextual understanding of LLMs by directly incorporating the task description, candidate documents, and the query into the model's input prompt and tasking the LLM to identify relevant document(s). While it is effective, efficiency is a significant challenge in this paradigm, especially as the candidate list grows due to quadratic/super-linear scaling of attention operation with context length. To this end, this paper first identifies inherent and exploitable structures in the attention of LLMs finetuned for ICR: (1) inter-document block sparsity: attention is dense within each document block but sparse across different documents in the context; and (2) query-document block relevance: the attention scores from certain query tokens to a document block in middle layers strongly correlate with that document's actual relevance. Motivated by these observations, we introduce BlockRank (Blockwise In-context Ranking), a novel method that adapts the attention operation in an LLM by (a) architecturally enforcing the observed inter-document block sparsity, reducing attention complexity from quadratic to linear without loss in performance, and (b) optimizing query-document block relevance for true relevant documents during fine-tuning using an auxiliary contrastive training objective, improving retrieval in attention. Experiments on BEIR, MSMarco and NQ with Mistral-7B demonstrate that BlockRank Mistral matches or outperforms existing SOTA listwise rankers and controlled fine-tuned baseline while being significantly more efficient at inference (4.7x for 100 MSMarco documents in context) and scaling gracefully to long-context shortlists, around 500 documents in-context (approximately 100K context length) within a second, presenting a scalable and effective solution for ICR.
View details