Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10483 publications
Study of Arterials in the City of Rio de Janeiro for Traffic Coordination
Ori Rottenstreich
Eliav Buchnik
Danny Veikherman
Dan Karliner
Tom Kalvari
Shai Ferster
Ron Tsibulsky
Jack Haddad
2025
Preview abstract
Urban traffic congestion is a growing challenge, and optimizing signal timing strategies is crucial for improving traffic flow and reducing emissions. The coordination of signalized intersections improves both traffic operations and environmental aspects. Coordination is particularly important along arterials, sequences of signalized intersections that serve as the primary routes and carry a high volume of traffic. In this paper we analyze real data from the city of Rio de Janeiro to study properties of arterials. We refer to their length, the distance between intersections and to the properties of the traffic light plans such as cycle time. We then study their in practice level of coordination in terms of number of stops and their common locations along the arterials. We dive into particular arterials and provide insights that can be useful for efficient design of arterials in additional cities. Based on the analysis, we show how simple traffic properties can indicate the potential upon coordinating two adjacent intersections as part of an arterial in improving traffic performance.
View details
Perceptual Evaluation of a Mix Presentation for Immersive Audio with IAMF
Carlos Tejeda-Ocampo
Toni Hirvonen
Ema Souza-Blanes
Mahmoud Namazi
AES 158th Convention of the Audio Engineering Society (2025)
Preview abstract
Immersive audio mix presentations involve transmitting and rendering several audio elements simultaneously. This enables next-generation applications, such as personalized playback. Using immersive loudspeaker and headphone MUSHRA tests, we investigate bitrate vs. quality for a typical mix presentation use case of a foreground stereo element, plus a background Ambisonics scene. For coding, we use Immersive Audio Model and Formats, a recently
proposed system for Next-Generation Audio. Excellent quality is achieved at 384 kbit/s even with reasonable amount of personalization. We also propose a framework for content-aware analysis that can significantly reduce the bitrate when using underlying legacy audio coding instances.
View details
Online-EYE: Multimodal Implicit Eye Tracking Calibration for XR
Baosheng James Hou
Lucy Abramyan
Prasanthi Gurumurthy
Khushman Patel
Haley Adams
Andrea Colaco
Ken Pfeuffer
Hans Gellersen
Karan Ahuja
2025
Preview abstract
Unlike other inputs for VR that work out of the box, eye tracking typically requires custom calibration per user or session. We present a multimodal inputs approach for implicit calibration of eye tracker in VR, leveraging UI interaction for continuous, background calibration. Our method analyzes gaze data alongside controller interaction with UI elements, and employing ML techniques it continuously refines the calibration matrix without interrupting users from their current tasks. Potentially eliminating the need for explicit calibration. We demonstrate the accuracy and effectiveness of this implicit approach across various tasks and real time applications achieving comparable eye tracking accuracy to native, explicit calibration.
View details
Scalability of Generative AI Models: Challenges and Opportunities in Large-Scale Data Generation and Training
International Journal of Computer Science and Information Technology Research (IJCSITR) (2025)
Preview abstract
Scalability of Generative AI Models: Challenges and Opportunities in Large-Scale Data Generation and Training
View details
Data-Driven Mechanism Design: Jointly Eliciting Preferences and Information
Dirk Bergemann
Marek Bojko
Paul Duetting
Haifeng Xu
EC '25: Proceedings of the 26th ACM Conference on Economics and Computation (2025), pp. 507
Preview abstract
We study mechanism design when agents have private preferences and private information about a common payoff-relevant state. We show that standard message-driven mechanisms cannot implement socially efficient allocations when agents have multidimensional types, even under favorable conditions.
To overcome this limitation, we propose data-driven mechanisms that leverage additional post-allocation information, modeled as an estimator of the payoff-relevant state. Our data-driven mechanisms extend the classic Vickrey-Clarke-Groves class. We show that they achieve exact implementation in posterior equilibrium when the state is either fully revealed or the utility is affine in an unbiased estimator. We also show that they achieve approximate implementation with a consistent estimator, converging to exact implementation as the estimator converges, and present bounds on the convergence rate.
We demonstrate applications to digital advertising auctions and large language model (LLM)-based mechanisms, where user engagement naturally reveals relevant information.
View details
Preview abstract
As large language models (LLMs) improve in their capacity to serve as personal AI assistants, their ability to output uniquely tailored, personalized responses that align with the soft preferences of their users is imperative for maximizing user satisfaction and retention. However, lay users are notoriously bad at prompt specification and often struggle with conveying their latent preferences to AI assistants. To resolve this, we demonstrate that activation steering, an inference-time method, can effectively control the response of the LLMs towards expressing different preferences. In contrast to memory-based personalization methods that require long user history, steering is extremely lightweight and easily-controllable via an interpretable linear strength factor. We further conduct a within-subjects user study (n=14) to investigate how end users personalize their conversations through three different steerable chatbot interfaces. The results demonstrate the effectiveness of preference-based steering for aligning real-world conversations with user preferences, and we discuss qualitative findings on how diverse values around control, transparency, and usability of personalization lead users to prefer different interfaces.
View details
Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models
Fei Wang
The Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) (2025) (to appear)
Preview abstract
Retrieval-Augmented Generation (RAG), while effective in integrating external knowledge to address the limitations of large language models (LLMs), can be undermined by imperfect retrieval, which may introduce irrelevant, misleading, or even malicious information. Despite its importance, previous studies have rarely explored the behavior of RAG through joint analysis on how errors from imperfect retrieval attribute and propagate, and how potential conflicts arise between the LLMs' internal knowledge and external sources. We find that imperfect retrieval augmentation might be inevitable and quite harmful, through controlled analysis under realistic conditions. We identify the knowledge conflicts between LLM-internal and external knowledge from retrieval as a bottleneck to overcome in the post-retrieval stage of RAG. To render LLMs resilient to imperfect retrieval, we propose Astute RAG, a novel RAG approach that adaptively elicits essential information from LLMs' internal knowledge, iteratively consolidates internal and external knowledge with source-awareness, and finalizes the answer according to information reliability. Our experiments using Gemini and Claude demonstrate that Astute RAG significantly outperforms previous robustness-enhanced RAG methods. Notably, Astute RAG is the only approach that matches or exceeds the performance of LLMs without RAG under worst-case scenarios. Further analysis reveals that Astute RAG effectively resolves knowledge conflicts, improving the reliability and trustworthiness of RAG systems.
View details
Supporting the Digital Safety of At-Risk Users: Lessons Learned from 9+ Years of Research and Training
Tara Matthews
Patrick Gage Kelley
Lea Kissner
Andreas Kramm
Andrew Oplinger
Andy Schou
Stephan Somogyi
Dalila Szostak
Jill Woelfer
Lawrence You
Izzie Zahorian
ACM Transactions on Computer-Human Interaction, 32(3) (2025), pp. 1-39
Preview abstract
Creating information technologies intended for broad use that allow everyone to participate safely online—which we refer to as inclusive digital safety—requires understanding and addressing the digital-safety needs of a diverse range of users who face elevated risk of technology-facilitated attacks or disproportionate harm from such attacks—i.e., at-risk users. This article draws from more than 9 years of our work at Google to understand and support the digital safety of at-risk users—including survivors of intimate partner abuse, people involved with political campaigns, content creators, youth, and more—in technology intended for broad use. Among our learnings is that designing for inclusive digital safety across widely varied user needs and dynamic contexts is a wicked problem with no “correct” solution. Given this, we describe frameworks and design principles we have developed to help make at-risk research findings practically applicable to technologies intended for broad use and lessons we have learned about communicating them to practitioners.
View details
Reconfigurable Stream Network Architecture
Chengyue Wang
Jason Cong
James Hoe
International Symposium on Computer Architecture (ISCA) (2025)
Preview abstract
As AI systems grow increasingly specialized and complex, managing hardware heterogeneity becomes a pressing challenge. How can we efficiently coordinate and synchronize heterogeneous hardware resources to achieve high utilization? How can we minimize the friction of transitioning between diverse computation phases, reducing costly stalls from initialization, pipeline setup, or drain? Our insight is that a network abstraction at the ISA level naturally unifies heterogeneous resource orchestration and phase transitions. This paper presents a Reconfigurable Stream Network Architecture (RSN), a novel ISA abstraction designed for the DNN domain. RSN models the datapath as a circuit-switched network with stateful functional units as nodes and data streaming on the edges. Programming a computation corresponds to triggering a path. Software is explicitly exposed to the compute and communication latency of each functional unit, enabling precise control over data movement for optimizations such as compute-communication overlap and layer fusion. As nodes in a network naturally differ, the RSN abstraction can efficiently virtualize heterogeneous hardware resources by separating control from the data plane, enabling low instruction-level intervention. We build a proof-of-concept design RSN-XNN on VCK190, a heterogeneous platform with FPGA fabric and AI engines. Compared to the SOTA solution on this platform, it reduces latency by 6.1x and improves throughput by 2.4x-3.2x. Compared to the T4 GPU with the same FP32 performance, it matches latency with only 18% of the memory bandwidth. Compared to the A100 GPU at the same 7nm process node, it achieves 2.1x higher energy efficiency in FP32.
View details
SSDTrain: Faster Large Language Model Training Using SSD-Based Activation Offloading
Kun Wu
Jeongmin Brian Park
Mert Hidayetoğlu
Vikram Sharma Mailthody
Sitao Huang
Steven Lumetta
Wen-mei Hwu
Design Automation Conference (DAC) (2025)
Preview abstract
The scaling up of Large Language Models (LLMs) demands more memory than current GPUs can provide, hindering the training process. To address this challenge, we propose SSDTrain to efficiently offload activations, the intermediate tensors produced during LLM training, to SSDs. This approach reduces GPU memory usage without impacting performance by adaptively overlapping data transfers with computation. SSDTrain is compatible with popular deep learning frameworks like PyTorch, Megatron, and DeepSpeed, and it employs techniques such as tensor deduplication, forwarding, and adaptive offloading to further enhance efficiency. We conduct extensive experiments on Llama, BERT, and T5. Results demonstrate that SSDTrain effectively reduces 45% of the activation peak memory usage. It can perfectly overlap the IO with the computation without introducing performance penalty. SSDTrain can achieve a performance boost of up to 31% compared to the conventional training strategy using the same GPU systems.
View details
Dynamical-generative downscaling of climate model ensembles
Tapio Schneider
John Anderson
Proceedings of the National Academy of Sciences, 122 (2025), e2420288122
Preview abstract
Regional high-resolution climate projections are crucial for many applications, such as agriculture, hydrology, and natural hazard risk assessment. Dynamical downscaling, the state-of-the-art method to produce localized future climate information, involves running a regional climate model (RCM) driven by an Earth System Model (ESM), but it is too computationally expensive to apply to large climate projection ensembles. We propose an approach combining dynamical downscaling with generative AI to reduce the cost and improve the uncertainty estimates of downscaled climate projections. In our framework, an RCM dynamically downscales ESM output to an intermediate resolution, followed by a generative diffusion model that further refines the resolution to the target scale. This approach leverages the generalizability of physics-based models and the sampling efficiency of diffusion models, enabling the downscaling of large multimodel ensembles. We evaluate our method against dynamically downscaled climate projections from the Coupled Model Intercomparison Project 6 (CMIP6) ensemble. Our results demonstrate its ability to provide more accurate uncertainty bounds on future regional climate than alternatives such as dynamical downscaling of smaller ensembles, or traditional empirical statistical downscaling methods. We also show that dynamical-generative downscaling results in significantly lower errors than popular statistical downscaling techniques, and captures more accurately the spectra, tail dependence, and multivariate correlations of meteorological fields. These characteristics make the dynamical-generative framework a flexible, accurate, and efficient way to downscale large ensembles of climate projections, currently out of reach for pure dynamical downscaling.
View details
Deep Researcher with Test-time Diffusion
Rujun Han
Zoey CuiZhu
Guan Sun
Yuanjun (Sophia) Bi
Weiming Wen
Hui Wan
Chunfeng Wen
Solène Maître
George Lee
Vishy Tirumalashetty
Emily Xue
Burak Gokturk
2025
Preview abstract
Deep research agents, powered by Large Language Models (LLMs), are rapidly advancing; yet, their performance often plateaus when generating complex, long-form research reports using generic test-time scaling algorithms. Drawing inspiration from the iterative nature of human research, which involves cycles of searching, reasoning, and revision, we propose the Test-Time Diffusion Deep Researcher (TTD-DR). This novel framework conceptualizes research report generation as a diffusion process. TTD-DR initiates this process with a preliminary draft, an updatable skeleton that serves as an evolving foundation to guide the research direction. The draft is then iteratively refined through a "denoising" process, which is dynamically informed by a retrieval mechanism that incorporates external information at each step. The core process is further enhanced by a self-evolutionary algorithm applied to each component of the agentic workflow, ensuring the generation of high-quality context for the diffusion process. This draft-centric design guides the report writing process to be more timely and coherent while reducing information loss during the iterative search process. We demonstrate that our TTD-DR achieves state-of-the-art results on a wide array of benchmarks that require intensive search and multi-hop reasoning, significantly outperforming existing deep research agents.
View details
Fine-grained Measurement of Vehicle Delay Fairness
Eliav Buchnik
Tom Kalvari
Jack Haddad
Dan Karliner
Danny Veikherman
Ron Tsibulsky
Shai Ferster
Ori Rottenstreich
2025
Preview abstract
Optimizing signal timing in traffic lights helps to improve traffic flow and reduce emissions through reducing delays. At intersections, vehicles from different movements observe different delays impacted by the traffic light plan. This paper analyzes delay fairness among various vehicles at intersections. We refer to three cities: Rio de Janeiro, Hamburg and Seattle with a total number of over 5100 intersections. We present an intuitive methodology to compute delay fairness based on Gini index, a common fairness measure in economics. We evaluate the fairness based on real traffic data and provide insights on the relationship of fairness with day hours and traffic demand. We also examine real changes in traffic light plans that occurred in practice to check whether improving delay is often aligned with increasing fairness.
View details
Development and Evaluation of ML Models for Cardiotocography Interpretation
Nicole Chiou
Nichole Young-Lin
Abdoulaye Diack
Christopher Kelly
Sanmi Koyejo
NPJ Women's Health (2025)
Preview abstract
The inherent variability in the visual interpretation of cardiotocograms (CTGs) by obstetric clinical experts, both intra- and inter-observer, presents a substantial challenge in obstetric care. In response, we investigate automated CTG interpretation as a potential solution to enhance the early detection of fetal hypoxia during labor, thereby reducing unnecessary operative interventions and improving overall maternal and neonatal care. This study employs deep learning techniques to reduce the subjectivity associated with visual CTG interpretation. Our results demonstrate that employing objective cord blood pH measurements, rather than clinician-defined Apgar scores, yields more consistent and robust model performance. Additionally, through a series of ablation studies, we investigate the impact of temporal distribution shifts on the performance of these deep learning models. We examine tradeoffs between performance and fairness, specifically evaluating performance across demographic and clinical subgroups. Finally, we discuss the practical implications of our findings for the real-world deployment of such systems, emphasizing their potential utility in medical settings with limited resources.
View details
Preview abstract
The problem of contract design addresses the challenge of moral hazard in principle-agent setups. The agent exerts costly efforts that produce a random outcome with an associated reward for the principal. Moral hazard refers to the tension that the principal cannot observe the agent’s effort level hence needs to incentivize the agent only through rewarding the realized effort outcome, i.e., the contract. Bayesian contract design studies the principal’s design problem of an optimal contract when facing an unknown agent characterized by a private Bayesian type. In its most general form, the agent’s type is inherently “multi-parameter” and can arbitrarily affect both the agent’s productivity and effort costs. In contrast, a natural single-parameter setting of much recent interest simplifies the agent’s type to a single value that describes the agent’s cost per unit of effort, whereas agents’ efforts are assumed to be equally
productive.
The main result of this paper is an almost approximation-preserving polynomial-time reduction from the most general multi-parameter Bayesian contract design (BCD) to single-parameter BCD. That is, for any multi-parameter BCD instance I^M, we construct a single-parameter instance I^S such that any β-approximate contract (resp. menu of contracts) of I^S can in turn be converted to a (β − ϵ)-approximate contract (resp. menu of contracts) of I^M. The reduction is in time polynomial in the input size and log(1/ϵ); moreover, when β = 1 (i.e., the given single-parameter solution is exactly optimal), the dependence on 1/ϵ can be removed, leading to a polynomial-time exact reduction. This efficient reduction is somewhat surprising because in the closely related problem of Bayesian mechanism design, a polynomial-time reduction from multi-parameter to single-parameter setting is believed to not exist. Our result demonstrates the intrinsic difficulty of addressing moral hazard in Bayesian contract design, regardless of being single-parameter or multi-parameter.
As byproducts, our reduction answers two open questions in recent literature of algorithmic contract design: (a) it implies that optimal contract design in single-parameter BCD is not in APX unless P=NP even when the agent’s type distribution is regular, answering the open question of [3] in the negative; (b) it implies that the principal’s (order-wise) tight utility gap between using a menu of contracts and a single contract is Θ(n) where n is the number of actions, answering the major open question of [27] for the single-parameter case.
View details