Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10501 publications
    Differentiable Approximations for Distance Queries
    David M. Mount
    Proceedings of the 2025 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)
    Preview abstract The widespread use of gradient-based optimization has motivated the adaptation of various classical algorithms into differentiable solvers compatible with learning pipelines. In this paper, we investigate the enhancement of traditional geometric query problems such that the result consists of both the geometric function as well as its gradient. Specifically, we study the fundamental problem of distance queries against a set of points P in R^d, which also underlies various similarity measures for learning algorithms. The main result of this paper is a multiplicative (1+epsilon)-approximation of the Euclidean distance to P which is differentiable at all points in R^d \ P with asymptotically optimal bounds on the norms of its gradient and Hessian, from a data structure with storage and query time matching state-of-the-art results for approximate nearest-neighbor searching. The approximation is realized as a regularized distance through a partition-of-unity framework, which efficiently blends multiple local approximations, over a suitably defined covering of space, into a smooth global approximation. In order to obtain the local distance approximations in a manner that facilitates blending, we develop a new approximate Voronoi diagram based on a simple point-location data structure, simplifying away both the lifting transformation and ray shooting. View details
    Preview abstract Internet speed tests are an important tool to enable consumers and regulators to monitor the quality of Internet access. However, increased Internet speeds to the home and an increased demand for speed testing pose scaling challenges to providers of speed tests, who must maintain costly infrastructure to keep up with this demand. In recent years, this has led the popular NDT speed test to limit data transfer to a total of 250MB, which comes at the cost of accuracy for high bandwidth speed test clients. In this paper, we observe that the NDT speed test server’s congestion control algorithm (BBRv1) is also trying to estimate the capacity of the connection. We leverage this observation and signals from BBR to improve the accuracy and efficiency of speed tests. We first show how leveraging signals from BBR can more than double the accuracy of a 10MB test–from 17% to 43%–for clients with speeds over 400Mbps. We then show how using BBR signals to adaptively end the speed test reduces data transfer by 36% and increased accuracy by 13% for high bandwidth clients, relative to a 100MB fixed length test. Even accounting for clients that never observe enough samples to utilize the BBR signal, this adaptive approach still uses 25% less data than a fixed 100MB test with 37-44% higher accuracy. View details
    Automated loss of pulse detection on a commercial smartwatch
    Kamal Shah
    Yiwen Chen
    Anthony Stange
    Lawrence Cai
    Matt Wimmer
    Pramod Rudrapatna
    Shelten Yuen
    Anupam Pathak
    Shwetak Patel
    Mark Malhotra
    Marc Stogaitis
    Jeanie Phan
    Ali Connell
    Jim Taylor
    Jacqueline Shreibati
    Daniel McDuff
    Tajinder Gadh
    Jake Sunshine
    Nature, 642 (2025), pp. 174-181
    Preview abstract Out-of-hospital cardiac arrest is a time-sensitive emergency that requires prompt identification and intervention: sudden, unwitnessed cardiac arrest is nearly unsurvivable. A cardinal sign of cardiac arrest is sudden loss of pulse. Automated biosensor detection of unwitnessed cardiac arrest, and dispatch of medical assistance, may improve survivability given the substantial prognostic role of time, but only if the false-positive burden on public emergency medical systems is minimized. Here we show that a multimodal, machine learning-based algorithm on a smartwatch can reach performance thresholds making it deployable at a societal scale. First, using photoplethysmography, we show that wearable photoplethysmography measurements of peripheral pulselessness (induced through an arterial occlusion model) manifest similarly to pulselessness caused by a common cardiac arrest arrhythmia, ventricular fibrillation. On the basis of the similarity of the photoplethysmography signal (from ventricular fibrillation or arterial occlusion), we developed and validated a loss of pulse detection algorithm using data from peripheral pulselessness and free-living conditions. Following its development, we evaluated the end-to-end algorithm prospectively: there was 1 unintentional emergency call per 21.67 user-years across two prospective studies; the sensitivity was 67.23% (95% confidence interval of 64.32% to 70.05%) in a prospective arterial occlusion cardiac arrest simulation model. These results indicate an opportunity, deployable at scale, for wearable-based detection of sudden loss of pulse while minimizing societal costs of excess false detections. View details
    Preview abstract Cloud platforms have been virtualizing storage devices like flash-based solid-state drives (SSDs) to make effective use of storage resources. They enable either software-isolated instance or hardware-isolated instance for facilitating the storage sharing between multi-tenant applications. However, for decades, they have to combat the fundamental tussle between the performance isolation and resource utilization. They suffer from either long tail latency caused by weak isolation or low storage utilization caused by strong isolation. In this paper, we present FleetIO, a learning-based storage virtualization framework that employs reinforcement learning (RL) for managing virtualized SSDs. FleetIO explores the unique features of RL to handle the dynamic changes of application workloads and storage states, and integrates the storage scheduling into the RL decision-making process. It achieves both performance isolation and improved storage utilization by enabling dynamic fine-grained storage harvesting across co-located application instances, while minimizing its negative impact on their service-level objectives (SLOs). FleetIO clusters workloads into different types (e.g., latency-sensitive and bandwidth-intensive) based on the collected I/O traces at runtime, and fine-tunes the RL reward functions for each type of workloads. We implement FleetIO on a real programmable SSD board and evaluate it with diverse cloud applications. We show that FleetIO improves the overall storage utilization of the shared SSD by up to 1.4×, and decreases the tail latency of I/O requests by 1.5× on average, compared to the state-of-the-art storage sharing approaches. View details
    Sufficient Context: A New Lens on Retrieval Augmented Generation Systems
    Hailey Joren
    Jianyi Zhang
    Chun-Sung Ferng
    Ankur Taly
    International Conference on Learning Representations (ICLR) (2025)
    Preview abstract Augmenting LLMs with context leads to improved performance across many applications. Despite much research on Retrieval Augmented Generation (RAG) systems, an open question is whether errors arise because LLMs fail to utilize the context from retrieval or the context itself is insufficient to answer the query. To shed light on this, we develop a new notion of sufficient context, along with a method to classify instances that have enough information to answer the query. We then use sufficient context to analyze several models and datasets. By stratifying errors based on context sufficiency, we find that larger models with higher baseline performance (Gemini 1.5 Pro, GPT 4o, Claude 3.5) excel at answering queries when the context is sufficient, but often output incorrect answers instead of abstaining when the context is not. On the other hand, smaller models with lower baseline performance (Llama 3.1, Mistral 3, Gemma 2) hallucinate or abstain often, even with sufficient context. We further categorize cases when the context is useful, and improves accuracy, even though it does not fully answer the query and the model errs without the context. Building on our findings, we explore ways to reduce hallucinations in RAG systems, including a new selective generation method that leverages sufficient context information for guided abstention. Our method improves the fraction of correct answers among times where the model responds by 2--10% for Gemini, GPT, and Gemma. View details
    On the Design of the Binaural Rendering Library for Eclipsa Audio Immersive Audio Container
    Tomasz Rudzki
    Gavin Kearney
    AES 158th Convention of the Audio Engineering Society (2025)
    Preview abstract Immersive Audio Media and Formats (IAMF), also known as Eclipsa Audio, is an open-source audio container developed to accommodate multichannel and scene-based audio formats. Headphone-based delivery of IAMF audio requires efficient binaural rendering. This paper introduces the Open Binaural Renderer (OBR), which is designed to render IAMF audio. It discusses the core rendering algorithm, the binaural filter design process as well as real-time implementation of the renderer in a form of an open-source C++ rendering library. Designed for multi-platform compatibility, the renderer incorporates a novel approach to binaural audio processing, leveraging a combination of spherical harmonic (SH) based virtual listening room model and anechoic binaural filters. Through its design, the IAMF binaural renderer provides a robust solution for delivering high-quality immersive audio across diverse platforms and applications. View details
    Day-of-the-week Awareness in Time of Day Breakpoints for Traffic Light Plans
    Ori Rottenstreich
    Eliav Buchnik
    Shai Ferster
    Tom Kalvari
    Ron Tsibulsky
    Danny Veikherman
    Jack Haddad
    2025
    Preview abstract Time-of-day breakpoints (TODs) refer to the times over the day in which the plan of a traffic light is changed. Traditionally, TODs are selected jointly for all weekdays (Monday-Friday), typically with additional TODs dedicated to weekends. In this paper, we present an alternative approach motivated by traffic characteristics that can differ among the weekdays Monday-Friday and consider TODs which are day-of-the-week aware. The traffic-aware approach studies similarities among days and computes TODs that can be shared among days with similar characteristics but can also have other forms for weekdays with unique characteristics. Based on traffic properties derived from anonymized trajectories, we apply the new methodology to compute time-of-day breakpoints that are day-of-the-week aware in the city of Rio de Janeiro, Brazil and estimate the impact of the new methodology. View details
    Preview abstract Large Language Models (LLMs) have made significant progress in open-ended dialogue, yet their inability to retain and retrieve relevant information from long-term interactions limits their effectiveness in applications requiring sustained personalization. External memory mechanisms have been proposed to address this limitation, enabling LLMs to maintain conversational continuity. However, existing approaches struggle with two key challenges. First, rigid memory granularity fails to capture the natural semantic structure of conversations, leading to fragmented and incomplete representations. Second, fixed retrieval mechanisms cannot adapt to diverse dialogue contexts and user interaction patterns. In this work, we propose Reflective Memory Management (RMM), a novel mechanism for long-term dialogue agents, integrating forward- and backward-looking reflections: (1) Prospective Reflection, which dynamically summarizes interactions across granularities—utterances, turns, and sessions—into a personalized memory bank for effective future retrieval, and (2) Retrospective Reflection, which iteratively refines the retrieval in an online reinforcement learning (RL) manner based on LLMs’ cited evidence. Experiments show that RMM demonstrates consistent improvement across various metrics and benchmarks. For example, RMM shows more than 10% accuracy improvement over the baseline without memory management on the LongMemEval dataset. View details
    GOALIE (GOAL oriented IntErventions) Proactive Multimodal Agent to Assist Augmented Reality
    Saptarashmi Bandyopadhyay
    Vikas Bahirwani
    Lavisha Aggarwal
    Bhanu Guda
    Lin Li
    Qin Liu
    Tom Goldstein
    John Dickerson
    Andrea Colaco
    2025
    Preview abstract Multimodal AI Agents are helpful to assist and guide users in completing real-time tasks like cooking, robotics, manufacturing. An emerging form of multimodal communication is Augmented Reality (AR), where an AI Agent can enhance user experience with step-by-step guidance of tasks by observing the user's vision and language inputs. Current LLM or VLM based agents are reactive, waiting for an user query before responding. Proactive AI Agents in AR focus on detecting when the AI Agent should autonomously intervene to fix mistakes or followup any instruction. Our GOALIE (GOAL-oriented IntErvention) Agent is the first multimodal proactive AR agent which guides the user step-by-step on its own. We build an innovative Zero-Shot Prompting framework PSoS (Proactive Sequence of Steps) with the context of abstract past user actions, the agent's previous responses, and the user's granular goals and actions before it is detected that the AI Agent should intervene. We use PSoS for Supervised Finetuning (SFT), Direct Preference Optimization (DPO) and Group-Relative Policy Optimization (GRPO) finetuning of our AI agent to improve the quality of the agent's proactive intervention. We also propose a new algorithmic framework, Bagged group Relative Policy Optimization (BRPO), to reduce the variance in rewards of generation groups, to adapt the finetuning algorithm for multimodal proactive interventions by the AI Agent and to enable real-time finetuning of the AI model. We compare the step-by-step intervention quality and efficiency of the GOALIE Agent with Gemma-3 models along with other VLMs for task execution with human expert labels. We conduct human evaluation of the proactive interventions, demonstrating user satisfaction with the GOALIE Agent's proactive interventions. We will release the code, model and human evaluation data. View details
    Test-Time Visual In-Context Tuning
    Jiahao Xie
    Nathalie Rauschmayr
    Bernt Schiele
    2025
    Preview abstract Visual in-context learning (VICL), as a new paradigm in computer vision, allows the model to rapidly adapt to various tasks with only a handful of prompts and examples. While effective, the existing VICL paradigm exhibits poor generalizability under distribution shifts. In this work, we propose test-time visual in-context tuning (VICT), a method that can learn adaptive VICL models on the fly with a single test sample. Specifically, We flip the role between task prompts and the test sample and use a cycle consistency loss to reconstruct the original task prompt output. Our key insight is that a model should be aware of a new test distribution if it can successfully recover the original task prompts. Extensive experiments on seven representative vision tasks with 15 corruptions demonstrate that our VICT can improve the generalizability of VICL to unseen new domains View details
    Preview abstract We propose Heterogeneous Swarms, an algorithm to discover and adapt multi-LLM systems by jointly optimizing model roles and weights. Given a pool of LLM experts and a utility function, Heterogeneous Swarms employs two iterative steps: role-step and weight-step. For role-step, we interpret model roles as input-output relationships and optimize the directed acyclic graph (DAG) of LLMs representing a multi-LLM system. Starting from a swarm of randomly initialized continuous adjacency matrices, we decode them into discrete DAGs, call the LLMs in topological order with message passing, evaluate on the utility function, and optimize the adjacency matrices with swarm intelligence based on the utility score. For weight-step, we define JFK-score to evaluate the contribution of individual LLMs in the best-found DAG of the role-step, then optimize model weights with swarm intelligence based on the JFK-score. Extensive experiments demonstrate that Heterogeneous Swarms outperforms 15 baselines spanning role-based and weight-based approaches by 18.5% on average across 12 tasks and contexts. Further analysis reveals that Heterogeneous Swarms discovers multi-LLM systems with heterogeneous model roles and substantial collaborative gains, and benefits from the diversity of initial LLMs. View details
    Preview abstract Virtual hand representation in Head-Mounted Displays (HMDs) offers immersive and intuitive interactions in Virtual Reality (VR). However, current hand tracking algorithms are prone to errors, which can disrupt the user experience and hinder task performance. This paper presents a novel method for providing users with visual feedback when the quality of hand tracking decreases. Our approach employs a notification modal that warns users of potential failures. We identified three common hand tracking failure scenarios and evaluated the effectiveness of our method in two distinct VR tasks: object manipulation and complex assembly tasks. Results show that our early warning system reduces task completion time, lowers hand-tracking failures by up to 83%, decreases errors, improves system usability, and reduces cognitive load. This work contributes to the development of more robust and user-friendly VR HMD applications by enhancing hand tracking reliability, usability, and workload. View details
    Gemini & Physical World: Large Language Models Can Estimate the Intensity of Earthquake Shaking from Multi-Modal Social Media Posts
    Marc Stogaitis
    Tajinder Gadh
    Richard Allen
    Alexei Barski
    Robert Bosch
    Patrick Robertson
    Youngmin Cho
    Nivetha Thiruverahan
    Aman Raj
    Geophysical Journal International (2025), ggae436
    Preview abstract This paper presents a novel approach for estimating the ground shaking intensity using real-time social media data and CCTV footage. Employing the Gemini 1.5 Pro’s (Reid et al. 2024) model, a multi-modal language model, we demonstrate the ability to extract relevant information from unstructured data utilizing generative AI and natural language processing. The model’s output, in the form of Modified Mercalli Intensity (MMI) values, align well with independent observational data. Furthermore, our results suggest that beyond its advanced visual and auditory understanding abilities, Gemini appears to utilize additional sources of knowledge, including a simplified understanding of the general relationship between earthquake magnitude, distance, and MMI intensity, which it presumably acquired during its training, in its reasoning and decision-making processes. These findings raise intriguing questions about the extent of Gemini's general understanding of the physical world and its phenomena. Gemini’s ability to generate results consistent with established scientific knowledge highlights the potential of LLMs like Gemini in augmenting our understanding of complex physical phenomena such as earthquakes. More specifically, the results of this study highlight the potential of LLMs like Gemini to revolutionize citizen seismology by enabling rapid, effective, and flexible analysis of crowdsourced data from eyewitness accounts for assessing earthquake impact and providing crisis situational awareness. This approach holds a great promise for improving early warning systems, disaster response, and overall resilience in earthquake-prone regions. This study provides a significant step toward harnessing the power of social media and AI for earthquake disaster mitigation. View details
    Global earthquake detection and warning using Android phones
    Marc Stogaitis
    Youngmin Cho
    Richard Allen
    Boone Spooner
    Patrick Robertson
    Micah Berman
    Greg Wimpey
    Robert Bosch
    Nivetha Thiruverahan
    Steve Malkos
    Alexei Barski
    Science, 389 (2025), pp. 254-259
    Preview abstract Earthquake early-warning systems are increasingly being deployed as a strategy to reduce losses in earthquakes, but the regional seismic networks they require do not exist in many earthquake-prone countries. We use the global Android smartphone network to develop an earthquake detection capability, an alert delivery system, and a user feedback framework. Over 3 years of operation, the system detected an average of 312 earthquakes per month with magnitudes from M 1.9 to M 7.8 in Türkiye. Alerts were delivered in 98 countries for earthquakes with M ≥4.5, corresponding to ~60 events and 18 million alerts per month. User feedback shows that 85% of people receiving an alert felt shaking, and 36, 28, and 23% received the alert before, during, and after shaking, respectively. We show how smartphone-based earthquake detection algorithms can be implemented at scale and improved through postevent analysis. View details