Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10501 publications
Differentiable Approximations for Distance Queries
David M. Mount
Proceedings of the 2025 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)
Preview abstract
The widespread use of gradient-based optimization has motivated the adaptation of various classical algorithms into differentiable solvers compatible with learning pipelines. In this paper, we investigate the enhancement of traditional geometric query problems such that the result consists of both the geometric function as well as its gradient. Specifically, we study the fundamental problem of distance queries against a set of points P in R^d, which also underlies various similarity measures for learning algorithms.
The main result of this paper is a multiplicative (1+epsilon)-approximation of the Euclidean distance to P which is differentiable at all points in R^d \ P with asymptotically optimal bounds on the norms of its gradient and Hessian, from a data structure with storage and query time matching state-of-the-art results for approximate nearest-neighbor searching. The approximation is realized as a regularized distance through a partition-of-unity framework, which efficiently blends multiple local approximations, over a suitably defined covering of space, into a smooth global approximation. In order to obtain the local distance approximations in a manner that facilitates blending, we develop a new approximate Voronoi diagram based on a simple point-location data structure, simplifying away both the lifting transformation and ray shooting.
View details
The Case for Leveraging Transport Signals to Improve Internet Speed Test Efficiency
Cristina Leon
Computer Communication Review (2025) (to appear)
Preview abstract
Internet speed tests are an important tool to enable consumers and regulators to monitor the quality of Internet access. However, increased Internet speeds to the home and an increased demand for speed testing pose scaling challenges to providers of speed tests, who must maintain costly infrastructure to keep up with this demand. In recent years, this has led the popular NDT speed test to limit data transfer to a total of 250MB, which comes at the cost of accuracy for high bandwidth speed test clients.
In this paper, we observe that the NDT speed test server’s congestion control algorithm (BBRv1) is also trying to estimate the capacity of the connection. We leverage this observation and signals from BBR to improve the accuracy and efficiency of speed tests. We first show how leveraging signals from BBR can more than double the accuracy of a 10MB test–from 17% to 43%–for clients with speeds over 400Mbps.
We then show how using BBR signals to adaptively end the speed test reduces data transfer by 36% and increased accuracy by 13% for high bandwidth clients, relative to a 100MB fixed length test. Even accounting for clients that never observe enough samples to utilize the BBR signal, this adaptive approach still uses 25% less data than a fixed 100MB test with 37-44% higher accuracy.
View details
Automated loss of pulse detection on a commercial smartwatch
Kamal Shah
Yiwen Chen
Anthony Stange
Lawrence Cai
Matt Wimmer
Pramod Rudrapatna
Shelten Yuen
Anupam Pathak
Shwetak Patel
Mark Malhotra
Marc Stogaitis
Jeanie Phan
Ali Connell
Jim Taylor
Jacqueline Shreibati
Daniel McDuff
Tajinder Gadh
Jake Sunshine
Nature, 642 (2025), pp. 174-181
Preview abstract
Out-of-hospital cardiac arrest is a time-sensitive emergency that requires prompt identification and intervention: sudden, unwitnessed cardiac arrest is nearly unsurvivable. A cardinal sign of cardiac arrest is sudden loss of pulse. Automated biosensor detection of unwitnessed cardiac arrest, and dispatch of medical assistance, may improve survivability given the substantial prognostic role of time, but only if the false-positive burden on public emergency medical systems is minimized. Here we show that a multimodal, machine learning-based algorithm on a smartwatch can reach performance thresholds making it deployable at a societal scale. First, using photoplethysmography, we show that wearable photoplethysmography measurements of peripheral pulselessness (induced through an arterial occlusion model) manifest similarly to pulselessness caused by a common cardiac arrest arrhythmia, ventricular fibrillation. On the basis of the similarity of the photoplethysmography signal (from ventricular fibrillation or arterial occlusion), we developed and validated a loss of pulse detection algorithm using data from peripheral pulselessness and free-living conditions. Following its development, we evaluated the end-to-end algorithm prospectively: there was 1 unintentional emergency call per 21.67 user-years across two prospective studies; the sensitivity was 67.23% (95% confidence interval of 64.32% to 70.05%) in a prospective arterial occlusion cardiac arrest simulation model. These results indicate an opportunity, deployable at scale, for wearable-based detection of sudden loss of pulse while minimizing societal costs of excess false detections.
View details
Preview abstract
Cloud platforms have been virtualizing storage devices like flash-based solid-state drives (SSDs) to make effective use of storage resources. They enable either software-isolated instance or hardware-isolated instance for facilitating the storage sharing between multi-tenant applications. However, for decades, they have to combat the fundamental tussle between the performance isolation and resource utilization. They suffer from either long tail latency caused by weak isolation or low storage utilization caused by strong isolation.
In this paper, we present FleetIO, a learning-based storage virtualization framework that employs reinforcement learning (RL) for managing virtualized SSDs. FleetIO explores the unique features of RL to handle the dynamic changes of application workloads and storage states, and integrates the storage scheduling into the RL decision-making process. It achieves both performance isolation and improved storage utilization by enabling dynamic fine-grained storage harvesting across co-located application instances, while minimizing its negative impact on their service-level objectives (SLOs). FleetIO clusters workloads into different types (e.g., latency-sensitive and bandwidth-intensive) based on the collected I/O traces at runtime, and fine-tunes the RL reward functions for each type of workloads. We implement FleetIO on a real programmable SSD board and evaluate it with diverse cloud applications. We show that FleetIO improves the overall storage utilization of the shared SSD by up to 1.4×, and decreases the tail latency of I/O requests by 1.5× on average, compared to the state-of-the-art storage sharing approaches.
View details
AI Agents for Cloud Reliability: Autonomous Threat Detection and Mitigation Aligned with Site Reliability Engineering Principles
Preview
Karan Anand
Mourya Chigurupati
2025
Sufficient Context: A New Lens on Retrieval Augmented Generation Systems
Hailey Joren
Jianyi Zhang
Chun-Sung Ferng
Ankur Taly
International Conference on Learning Representations (ICLR) (2025)
Preview abstract
Augmenting LLMs with context leads to improved performance across many applications. Despite much research on Retrieval Augmented Generation (RAG) systems, an open question is whether errors arise because LLMs fail to utilize the context from retrieval or the context itself is insufficient to answer the query. To shed light on this, we develop a new notion of sufficient context, along with a method to classify instances that have enough information to answer the query. We then use sufficient context to analyze several models and datasets. By stratifying errors based on context sufficiency, we find that larger models with higher baseline performance (Gemini 1.5 Pro, GPT 4o, Claude 3.5) excel at answering queries when the context is sufficient, but often output incorrect answers instead of abstaining when the context is not. On the other hand, smaller models with lower baseline performance (Llama 3.1, Mistral 3, Gemma 2) hallucinate or abstain often, even with sufficient context. We further categorize cases when the context is useful, and improves accuracy, even though it does not fully answer the query and the model errs without the context. Building on our findings, we explore ways to reduce hallucinations in RAG systems, including a new selective generation method that leverages sufficient context information for guided abstention. Our method improves the fraction of correct answers among times where the model responds by 2--10% for Gemini, GPT, and Gemma.
View details
On the Design of the Binaural Rendering Library for Eclipsa Audio Immersive Audio Container
Tomasz Rudzki
Gavin Kearney
AES 158th Convention of the Audio Engineering Society (2025)
Preview abstract
Immersive Audio Media and Formats (IAMF), also known as Eclipsa Audio, is an open-source audio container developed to accommodate multichannel and scene-based audio formats. Headphone-based delivery of IAMF audio requires efficient binaural rendering. This paper introduces the Open Binaural Renderer (OBR), which is designed to render IAMF audio. It discusses the core rendering algorithm, the binaural filter design process as well as real-time implementation of the renderer in a form of an open-source C++ rendering library. Designed for
multi-platform compatibility, the renderer incorporates a novel approach to binaural audio processing, leveraging a combination of spherical harmonic (SH) based virtual listening room model and anechoic binaural filters. Through its design, the IAMF binaural renderer provides a robust solution for delivering high-quality immersive audio across diverse platforms and applications.
View details
Day-of-the-week Awareness in Time of Day Breakpoints for Traffic Light Plans
Ori Rottenstreich
Eliav Buchnik
Shai Ferster
Tom Kalvari
Ron Tsibulsky
Danny Veikherman
Jack Haddad
2025
Preview abstract
Time-of-day breakpoints (TODs) refer to the times over the day in which the plan of a traffic light is changed. Traditionally, TODs are selected jointly for all weekdays (Monday-Friday), typically with additional TODs dedicated to weekends. In this paper, we present an alternative approach motivated by traffic characteristics that can differ among the weekdays Monday-Friday and consider TODs which are day-of-the-week aware. The traffic-aware approach studies similarities among days and computes TODs that can be shared among days with similar characteristics but can also have other forms for weekdays with unique characteristics. Based on traffic properties derived from anonymized trajectories, we apply the new methodology to compute time-of-day breakpoints that are day-of-the-week aware in the city of Rio de Janeiro, Brazil and estimate the impact of the new methodology.
View details
In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialog Agents
Zhen Tan
George Lee
Anand Iyer
Tianlong Chen
Huan Liu
ACL 2025
Preview abstract
Large Language Models (LLMs) have made significant progress in open-ended dialogue, yet their inability to retain and retrieve relevant information from long-term interactions limits their effectiveness in applications requiring sustained personalization. External memory mechanisms have been proposed to address this limitation, enabling LLMs to maintain conversational continuity. However, existing approaches struggle with two key challenges. First, rigid memory granularity fails to capture the natural semantic structure of conversations, leading to fragmented and incomplete representations. Second, fixed retrieval mechanisms cannot adapt to diverse dialogue contexts and user interaction patterns. In this work, we propose Reflective Memory Management (RMM), a novel mechanism for long-term dialogue agents, integrating forward- and backward-looking reflections: (1) Prospective Reflection, which dynamically summarizes interactions across granularities—utterances, turns, and sessions—into a personalized memory bank for effective future retrieval, and (2) Retrospective Reflection, which iteratively refines the retrieval in an online reinforcement learning (RL) manner based on LLMs’ cited evidence. Experiments show that RMM demonstrates consistent improvement across various metrics and benchmarks. For example, RMM shows more than 10% accuracy improvement over the baseline without memory management on the LongMemEval dataset.
View details
GOALIE (GOAL oriented IntErventions) Proactive Multimodal Agent to Assist Augmented Reality
Saptarashmi Bandyopadhyay
Vikas Bahirwani
Lavisha Aggarwal
Bhanu Guda
Lin Li
Qin Liu
Tom Goldstein
John Dickerson
Andrea Colaco
2025
Preview abstract
Multimodal AI Agents are helpful to assist and guide users in completing real-time tasks like cooking, robotics, manufacturing. An emerging form of multimodal communication is Augmented Reality (AR), where an AI Agent can enhance user experience with step-by-step guidance of tasks by observing the user's vision and language inputs. Current LLM or VLM based agents are reactive, waiting for an user query before responding. Proactive AI Agents in AR focus on detecting when the AI Agent should autonomously intervene to fix mistakes or followup any instruction. Our GOALIE (GOAL-oriented IntErvention) Agent is the first multimodal proactive AR agent which guides the user step-by-step on its own. We build an innovative Zero-Shot Prompting framework PSoS (Proactive Sequence of Steps) with the context of abstract past user actions, the agent's previous responses, and the user's granular goals and actions before it is detected that the AI Agent should intervene. We use PSoS for Supervised Finetuning (SFT), Direct Preference Optimization (DPO) and Group-Relative Policy Optimization (GRPO) finetuning of our AI agent to improve the quality of the agent's proactive intervention. We also propose a new algorithmic framework, Bagged group Relative Policy Optimization (BRPO), to reduce the variance in rewards of generation groups, to adapt the finetuning algorithm for multimodal proactive interventions by the AI Agent and to enable real-time finetuning of the AI model. We compare the step-by-step intervention quality and efficiency of the GOALIE Agent with Gemma-3 models along with other VLMs for task execution with human expert labels. We conduct human evaluation of the proactive interventions, demonstrating user satisfaction with the GOALIE Agent's proactive interventions. We will release the code, model and human evaluation data.
View details
Preview abstract
Visual in-context learning (VICL), as a new paradigm in computer vision, allows the model to rapidly adapt to various tasks with only a handful of prompts and examples. While effective, the existing VICL paradigm exhibits poor generalizability under distribution shifts. In this work, we propose test-time visual in-context tuning (VICT), a method that can learn adaptive VICL models on the fly with a single test sample. Specifically, We flip the role between task prompts and the test sample and use a cycle consistency loss to reconstruct the original task prompt output. Our key insight is that a model should be aware of a new test distribution if it can successfully recover the original task prompts. Extensive experiments on seven representative vision tasks with 15 corruptions demonstrate that our VICT can improve the generalizability of VICL to unseen new domains
View details
Heterogeneous Swarms: Jointly Optimizing Model Roles and Weights for Multi-LLM Systems
Shangbin Feng
Yike Wang
Weijia Shi
Huang Xia
Luke Zettlemoyer
Yulia Tsvetkov
NeurIPS 2025
Preview abstract
We propose Heterogeneous Swarms, an algorithm to discover and adapt multi-LLM systems by jointly optimizing model roles and weights. Given a pool of LLM experts and a utility function, Heterogeneous Swarms employs two iterative steps: role-step and weight-step. For role-step, we interpret model roles as input-output relationships and optimize the directed acyclic graph (DAG) of LLMs representing a multi-LLM system. Starting from a swarm of randomly initialized continuous adjacency matrices, we decode them into discrete DAGs, call the LLMs in topological order with message passing, evaluate on the utility function, and optimize the adjacency matrices with swarm intelligence based on the utility score. For weight-step, we define JFK-score to evaluate the contribution of individual LLMs in the best-found DAG of the role-step, then optimize model weights with swarm intelligence based on the JFK-score. Extensive experiments demonstrate that Heterogeneous Swarms outperforms 15 baselines spanning role-based and weight-based approaches by 18.5% on average across 12 tasks and contexts. Further analysis reveals that Heterogeneous Swarms discovers multi-LLM systems with heterogeneous model roles and substantial collaborative gains, and benefits from the diversity of initial LLMs.
View details
Preview abstract
Virtual hand representation in Head-Mounted Displays (HMDs) offers immersive and intuitive interactions in Virtual Reality (VR). However, current hand tracking algorithms are prone to errors, which can disrupt the user experience and hinder task performance. This paper presents a novel method for providing users with visual feedback when the quality of hand tracking decreases. Our approach employs a notification modal that warns users of potential failures. We identified three common hand tracking failure scenarios and evaluated the effectiveness of our method in two distinct VR tasks: object manipulation and complex assembly tasks. Results show that our early warning system reduces task completion time, lowers hand-tracking failures by up to 83%, decreases errors, improves system usability, and reduces cognitive load. This work contributes to the development of more robust and user-friendly VR HMD applications by enhancing hand tracking reliability, usability, and workload.
View details
Gemini & Physical World: Large Language Models Can Estimate the Intensity of Earthquake Shaking from Multi-Modal Social Media Posts
Marc Stogaitis
Tajinder Gadh
Richard Allen
Alexei Barski
Robert Bosch
Patrick Robertson
Youngmin Cho
Nivetha Thiruverahan
Aman Raj
Geophysical Journal International (2025), ggae436
Preview abstract
This paper presents a novel approach for estimating the ground shaking intensity using real-time social media data and CCTV footage. Employing the Gemini 1.5 Pro’s (Reid et al. 2024) model, a multi-modal language model, we demonstrate the ability to extract relevant information from unstructured data utilizing generative AI and natural language processing. The model’s output, in the form of Modified Mercalli Intensity (MMI) values, align well with independent observational data. Furthermore, our results suggest that beyond its advanced visual and auditory understanding abilities, Gemini appears to utilize additional sources of knowledge, including a simplified understanding of the general relationship between earthquake magnitude, distance, and MMI intensity, which it presumably acquired during its training, in its reasoning and decision-making processes. These findings raise intriguing questions about the extent of Gemini's general understanding of the physical world and its phenomena. Gemini’s ability to generate results consistent with established scientific knowledge highlights the potential of LLMs like Gemini in augmenting our understanding of complex physical phenomena such as earthquakes. More specifically, the results of this study highlight the potential of LLMs like Gemini to revolutionize citizen seismology by enabling rapid, effective, and flexible analysis of crowdsourced data from eyewitness accounts for assessing earthquake impact and providing crisis situational awareness. This approach holds a great promise for improving early warning systems, disaster response, and overall resilience in earthquake-prone regions. This study provides a significant step toward harnessing the power of social media and AI for earthquake disaster mitigation.
View details
Global earthquake detection and warning using Android phones
Marc Stogaitis
Youngmin Cho
Richard Allen
Boone Spooner
Patrick Robertson
Micah Berman
Greg Wimpey
Robert Bosch
Nivetha Thiruverahan
Steve Malkos
Alexei Barski
Science, 389 (2025), pp. 254-259
Preview abstract
Earthquake early-warning systems are increasingly being deployed as a strategy to reduce losses in earthquakes, but the regional seismic networks they require do not exist in many earthquake-prone countries. We use the global Android smartphone network to develop an earthquake detection capability, an alert delivery system, and a user feedback framework. Over 3 years of operation, the system detected an average of 312 earthquakes per month with magnitudes from M 1.9 to M 7.8 in Türkiye. Alerts were delivered in 98 countries for earthquakes with M ≥4.5, corresponding to ~60 events and 18 million alerts per month. User feedback shows that 85% of people receiving an alert felt shaking, and 36, 28, and 23% received the alert before, during, and after shaking, respectively. We show how smartphone-based earthquake detection algorithms can be implemented at scale and improved through postevent analysis.
View details