Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10822 publications
Preview abstract
For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step.
View details
mmMUSE: An mmWave-based Motion-resilient Universal Speech Enhancement System
Chenming He
Yanyong Zhang
Kai Wang
Dequan Wang
Lingyu Wang
the Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), ACM (2026) (to appear)
Preview abstract
Voice-based smart systems can greatly enhance user experiences by allowing higher-quality interactions through better voice perception. Speech enhancement can benefit such systems by isolating noise from speech. Recently, integrating millimeter-wave (mmWave) with audio for speech perception has gained increasing attention due to microphones' limitations in noisy environments. However, mmWave-based vocal extraction is severely affected by motion, which disperses vocal signals across ranges and introduces distortions. In this paper, we propose an mmWave-based motion-resilient universal speech enhancement system called mmMUSE, which fuses mmWave and audio signals. To mitigate motion interference, we develop a Doppler-based method for motion-robust vocal signal extraction. Moreover, by introducing the Vocal-Noise-Ratio metric to assess the prominence of vocal signals from mmWave, we achieve real-time voice activity detection that gains 3.81 dB of SISDR in noisy speeches. Additionally, we design a two-stage complex-valued network that includes an attention-based fusion network for cross-modal complementing and a time-frequency masking network for correcting amplitude and phase of speech to isolate noises.
Using mmWave and audio datasets from 46 participants, mmMUSE outperforms the state-of-the-art speech enhancement models, achieving an average SISDR improvement of 3.12 dB. Additionally, mmMUSE achieves SISDR improvements of 16.51 dB, 17.93 dB, 14.93 dB, and 18.95 dB in controlled environments involving intense noise, extensive motion, multiple speakers, and various obstructive materials, respectively. Finally, we evaluate mmMUSE in real-world scenarios including running, public spaces, and driving, maintaining a word error rate (WER) below 10%.
View details
Preview abstract
AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization.
View details
Toward Sensor-In-the-Loop LLM Agent: Benchmarks and Implications
Zhiwei Ren
Junbo Li
Minjia Zhang
Di Wang
Longfei Shangguan
SenSys 2025 - The 23rd ACM Conference on Embedded Networked Sensor Systems (2025)
Preview abstract
This paper advocates for sensor-informed personal agents that can take advantage of sensor hints on wearables to enhance the personal agent's response. We demonstrate that such a sensor-in-the-loop design paradigm can be easily integrated into existing LLM agents by building a prototype named WellMax based on existing well-developed techniques such as structured prompt tuning and few-shot prompting. The head-to-head comparison with a non-sensor-informed agent across five use scenarios demonstrates that this sensor-in-the-loop design can effectively improve users' needs and their overall experience. The deep-dive into agents' replies and participants' feedback further reveals that sensor-in-the-loop agents not only provide more contextually relevant responses but also exhibit a greater understanding of user priorities and situational nuances. We further conduct two case studies to examine the potential pitfalls and distill key insights from this sensor-in-the-loop agent. We believe this work sets the stage for more intelligent, empathetic, and effective interactions in future AI-driven personal assistants.
View details
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models
Zhanke Zhou
Xuan Li
Zhaocheng Zhu
Michael Galkin
Xiao Feng
Sanmi Koyejo
Jian Tang
Bo Han
Reasoning and Planning for LLMs @ ICLR 2025
Preview abstract
Numerous applications of large language models (LLMs) rely on their ability to perform step-by-step reasoning. However, the reasoning behavior of LLMs remains poorly understood, posing challenges to research, development, and safety. To address this gap, we introduce landscape of thoughts-the first visualization tool for users to inspect the reasoning paths of chain-of-thought and its derivatives on any multi-choice dataset. Specifically, we represent the states in a reasoning path as feature vectors that quantify their distances to all answer choices. These features are then visualized in two-dimensional plots using t-SNE. Qualitative analysis shows that the landscape of thoughts effectively distinguishes between strong and weak models, correct and incorrect answers, as well as different reasoning tasks. It also uncovers undesirable reasoning patterns, such as low consistency and high uncertainty. Additionally, users can adapt our tool to a neural model that predicts any property they observe. We showcase this advantage by adapting our tool to a lightweight verifier, which significantly improves reasoning by evaluating the correctness of reasoning paths.
View details
Day-of-the-week Awareness in Time of Day Breakpoints for Traffic Light Plans
Ori Rottenstreich
Eliav Buchnik
Shai Ferster
Tom Kalvari
Ron Tsibulsky
Danny Veikherman
Jack Haddad
2025
Preview abstract
Time-of-day breakpoints (TODs) refer to the times over the day in which the plan of a traffic light is changed. Traditionally, TODs are selected jointly for all weekdays (Monday-Friday), typically with additional TODs dedicated to weekends. In this paper, we present an alternative approach motivated by traffic characteristics that can differ among the weekdays Monday-Friday and consider TODs which are day-of-the-week aware. The traffic-aware approach studies similarities among days and computes TODs that can be shared among days with similar characteristics but can also have other forms for weekdays with unique characteristics. Based on traffic properties derived from anonymized trajectories, we apply the new methodology to compute time-of-day breakpoints that are day-of-the-week aware in the city of Rio de Janeiro, Brazil and estimate the impact of the new methodology.
View details
Improved Lower Bound for Differentially Private Facility Location
Information Processing Letters, 187 (2025)
Preview abstract
We consider the differentially private (DP) facility location problem in the so called super-set output setting proposed by Gupta et al. [GLM+10]. The current best known expected approximation ratio for an ε-DP algorithm is O(log n / √ε) due to Cohen-Addad et al. [CEF+22] where n denote the size of the metric space, meanwhile the best known lower bound is Ω(1/√ε) [EGLW19].
In this short note, we give a lower bound of Ω(min{log n, √(log n/ε)}) on the expected approximation ratio of any ε-DP algorithm, which is the first evidence that the approximation ratio has to grow with the size of the metric space.
View details
PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving
Mihir Parmar
Chitta Baral
Mingyang Ling
2025
Preview abstract
Recently, decomposing complex problems into simple subtasks--a crucial part of human-like natural planning--to solve the given problem has significantly boosted the performance of large language models (LLMs). However, leveraging such planning structures during post-training to boost the performance of smaller open-source LLMs remains underexplored. Motivated by this, we introduce Plan-Tuning, a unified post-training framework that (i) distills synthetic task decompositions (termed “planning trajectories”) from large-scale LLMs and (ii) fine-tunes smaller models via supervised and reinforcement-learning objectives designed to mimic these planning processes to improve complex reasoning. On GSM8k and the MATH benchmarks, plan-tuned models outperform strong baselines by an average ~7%. Furthermore, plan-tuned models show better generalization capabilities on out-of-domain datasets, with average ~10% and ~12% performance improvements on OlympiadBench and AIME 2024, respectively. Our detailed analysis demonstrates how planning trajectories improves complex reasoning capabilities, showing that Plan-Tuning is an effective strategy for improving task-specific performance of smaller LLMs.
View details
Mitigating Clinician Information Overload: Generative AI for Integrated EHR and RPM Data Analysis
Aman Raj
IEEE Compsac 2025 (2025)
Preview abstract
Generative AI (GenAI), particularly Large Language Models (LLMs), offer powerful capabilities for interpreting the complex data landscape in healthcare. In this paper, we present a comprehensive overview of the capabilities, requirements and applications of GenAI for deriving clinical insights and improving clinical efficiency. We first provide some background on the forms and sources of patient data, namely real-time Remote Patient Monitoring (RPM) streams and traditional Electronic Health Records (EHR). The sheer volume and heterogeneity of this combined data present significant challenges to clinicians and contribute to information overload.
In addition, we explore the potential of LLM-powered applications for improving clinical efficiency. These applications can enhance navigation of longitudinal patient data and provide actionable clinical decision support through natural language dialogue. We discuss the opportunities this presents for streamlining clinician workflows and personalizing care, alongside critical challenges such as data integration complexity, ensuring data quality and RPM data reliability, maintaining patient privacy, validating AI outputs for clinical safety, mitigating bias, and ensuring clinical acceptance. We believe this work represents the first summarization of GenAI techniques for managing clinician data overload due to combined RPM / EHR data complexities.
View details
Preview abstract
Measuring software development can help drive impactful change. However, it’s a complex task, and getting started can be daunting as it involves understanding what you should measure, and determining what you can measure. This article provides a guide to selecting a framework that aligns with organizational measurement strategy.
View details
Fast ACS: Low-Latency File-Based Ordered Message Delivery at Scale
Anil Raghunath Iyer
Neel Bagora
Chang Yu
Olivier Pomerleau
Vivek Kumar
Prunthaban Kanthakumar
Usenix Annual Technical Conference (2025)
Preview abstract
Low-latency message delivery is crucial for real-time systems. Data originating from a producer must be delivered to consumers, potentially distributed in clusters across metropolitan and continental boundaries. With the growing scale of computing, there can be several thousand consumers of the data. Such systems require a robust messaging system capable of transmitting messages containing data across clusters and efficiently delivering them to consumers. The system must offer guarantees like ordering and at-least-once delivery while avoiding overload on consumers, allowing them to consume messages at their own pace.
This paper presents the design of Fast ACS (an abbreviation for Ads Copy Service), a file-based ordered message delivery system that leverages a combination of two-sided (inter-cluster) and one-sided (intra-cluster) communication primitives—namely, Remote Procedure Call and Remote Direct Memory Access, respectively—to deliver messages. The system has been successfully deployed to dozens of production clusters and scales to accommodate several thousand consumers within each cluster, which amounts to Tbps-scale intra-cluster consumer traffic at peak. Notably, Fast ACS delivers messages to consumers across the globe within a few seconds or even sub-seconds (p99) based on the message volume and consumer scale, at a low resource cost.
View details
Preview abstract
The Latin script is often used informally to write languages with non-Latin native scripts. In many cases (e.g., most languages in India), there is no orthography, meaning that there is no conventional spelling of words in the Latin script, hence there will be high spelling variability in written text. Such romanization can render languages that are normally easily distinguished based on script highly confusable, such as Hindi and Urdu. In this work, we present methods to improve language identification of romanized text by improving methods to synthesize training sets. We find that training on synthetic samples which incorporate natural spelling variation yields higher language identification system accuracy than including available naturally occurring examples in the training set or even training higher capacity models. We demonstrate new state-of-the-art language identification performance on romanized text from 20 Indic languages in the Bhasha-Abhijnaanam evaluation set (Madhani et al., 2023a), improving test F1 from the reported 74.7% (using a pretrained neural model) to 85.4% using a linear classifier trained solely on synthetic data and 88.2% when also training on available harvested text.
View details
Preview abstract
Most contextual bandit algorithms assume that rewards are bounded uniformly, or with a high probability for each context and action. Furthermore, the theoretical regret, as well as empirical performance of most prevailing methods scales polynomially with this reward scale. In this work, we use ideas from robust mean estimation to design contextual bandit algorithms where the regret has only a mild logarithmic scaling on the reward scale, with a polynomial dependence on the second moment rather than the maximum reward value. Our algorithm is based on Catoni's mean estimator, is applicable to arbitrary non-linear function classes, and we present both variance aware and variance agnostic versions of our approach.
View details
Spherical dimension
Bogdan Chornomaz
Shay Moran
Tom Waknine
2025
Preview abstract
We introduce and study the \emph{spherical dimension}, a natural topological relaxation of the VC dimension that unifies several results in learning theory where topology plays a key role in the proofs. The spherical dimension is defined by extending the set of realizable datasets (used to define the VC dimension) to the continuous space of realizable distributions. In this space, a shattered set of size d (in the VC sense) is completed into a continuous object, specifically a d-dimensional sphere of realizable distributions. The spherical dimension is then defined as the dimension of the largest sphere in this space. Thus, the spherical dimension is at least the VC dimension.
The spherical dimension serves as a common foundation for leveraging the Borsuk-Ulam theorem and related topological tools. We demonstrate the utility of the spherical dimension in diverse applications, including disambiguations of partial concept classes, reductions from classification to stochastic convex optimization, stability and replicability, and sample compression schemes. Perhaps surprisingly, we show that the open question posed by Alon, Hanneke, Holzman, and Moran (FOCS 2021) of whether there exist non-trivial disambiguations for halfspaces with margin is equivalent to the basic open question of whether the VC and spherical dimensions are finite together.
View details
Preview abstract
Browser fingerprinting enables persistent cross-site user tracking via subtle techniques that often evade conventional defenses or cause website breakage when script-level blocking countermeasures are applied. Addressing these challenges requires detection methods offering both function-level precision to minimize breakage and inherent robustness against code obfuscation and URL manipulation.
We introduce ByteDefender, the first system leveraging V8 engine bytecode to detect fingerprinting operations specifically at the JavaScript function level. A Transformer-based classifier, trained offline on bytecode sequences, accurately identifies functions exhibiting fingerprinting behavior. We develop and evaluate lightweight signatures derived from this model to enable low-overhead, on-device matching against function bytecode during compilation but prior to execution, which only adds a 4% (average) latency to the page load time. This mechanism facilitates targeted, real-time prevention of fingerprinting function execution, thereby preserving legitimate script functionality. Operating directly on bytecode ensures inherent resilience against common code obfuscation and URL-based evasion. Our evaluation on the top 100k websites demonstrates high detection accuracy at both function- and script-level, with substantial improvements over state-of-the-art AST-based methods, particularly in robustness against obfuscation. ByteDefender offers a practical framework for effective, precise, and robust fingerprinting mitigation.
View details