Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10501 publications
Preview abstract
This tutorial examines the progress and scaling limitations of IM-DD based optical technologies and explores how datacenter use cases optimized coherent technology, including a newly proposed polarization-folding, time-diversity approach and a novel single-sideband coherent detection technology—can address some of these challenges
View details
Study of Arterials in the City of Rio de Janeiro for Traffic Coordination
Ori Rottenstreich
Eliav Buchnik
Danny Veikherman
Dan Karliner
Tom Kalvari
Shai Ferster
Ron Tsibulsky
Jack Haddad
2025
Preview abstract
Urban traffic congestion is a growing challenge, and optimizing signal timing strategies is crucial for improving traffic flow and reducing emissions. The coordination of signalized intersections improves both traffic operations and environmental aspects. Coordination is particularly important along arterials, sequences of signalized intersections that serve as the primary routes and carry a high volume of traffic. In this paper we analyze real data from the city of Rio de Janeiro to study properties of arterials. We refer to their length, the distance between intersections and to the properties of the traffic light plans such as cycle time. We then study their in practice level of coordination in terms of number of stops and their common locations along the arterials. We dive into particular arterials and provide insights that can be useful for efficient design of arterials in additional cities. Based on the analysis, we show how simple traffic properties can indicate the potential upon coordinating two adjacent intersections as part of an arterial in improving traffic performance.
View details
Continuous Evaluation: Using CI Techniques For Experimentation At Scale
IEEE, pp. 157-157
Preview abstract
Continuous Integration (CI) is an essential software development practice that establishes processes to minimize bugs and errors in production. In a similar vein, experimentation of software products is vital for evaluating user satisfaction, quality, performance and other key business metrics. Experimentation allows product owners to evaluate the user impact of changes. This can help make informed decisions regarding feature launches. Experimentation also allows developers to tweak internal processes and algorithms to maximize the impact of new features and changes. Additionally, it can sometimes detect errors not detected by CI.
Unlike CI systems, experimentation platforms are meant to closely imitate production and usually run the system under test (SUT) against a large scale of input. Despite this, experimentation platforms have a lot in common with CI systems. The mechanisms for continuously integrating and testing changes can be modified and applied to experimentation platforms.
Google Search's experimentation platform started as a command line tool many years ago. Over time, this tool has evolved into a platform that serves the evaluation needs for many of Google's products like Search, Assistant, YouTube, Play, Lens, etc., running thousands of large experiments every day.
In this workshop, we will present the evolution of Google Search's experimentation platform and how it was transformed from a simple CLI tool into a platform that works at scale, fulfills continuous experimentation needs and provides many CI-like functionalities to its users.
View details
CountQA: How Well Do MLLMs Count in the Wild?
Jayant Tamarapalli
Rynaa Grover
Nilay Pande
Sahiti Yerramilli
(2025)
Preview abstract
While Multimodal Large Language Models (MLLMs) display a remarkable fluency in describing visual scenes, their ability to perform the fundamental task of object counting remains poorly understood. This paper confronts this issue by introducing CountQA, a challenging new benchmark composed of over 1,500 question-answer pairs centered on images of everyday, real-world objects, often in cluttered and occluded arrangements. Our evaluation of 15 prominent MLLMs on CountQA systematically investigates this weakness, revealing a critical failure of numerical grounding: the models consistently struggle to translate raw visual information into an accurate quantity. By providing a dedicated tool to probe this foundational weakness, CountQA paves the way for the development of more robust and truly capable MLLMs that are spatially aware and numerically grounded.
View details
Preventing Network Bottlenecks: Accelerating Datacenter Services with Hotspot-Aware Placement for Compute and Storage
Hamid Bazzaz
Yingjie Bi
Weiwu Pang
Minlan Yu
Ramesh Govindan
Chloe Tsai
Chris DeForeest
Charlie Carver
Jan Kopański
2025
Preview abstract
Datacenter network hotspots, defined as links with persistently high utilization, can lead to performance bottlenecks.In this work, we study hotspots in Google’s datacenter networks. We find that these hotspots occur most frequently at ToR switches and can persist for hours. They are caused mainly by bandwidth demand-supply imbalance, largely due to high demand from network-intensive services, or demand exceeding available bandwidth when compute/storage upgrades outpace ToR bandwidth upgrades. Compounding this issue is bandwidth-independent task/data placement by data-center compute and storage schedulers. We quantify the performance impact of hotspots, and find that they can degrade the end-to-end latency of some distributed applications by over 2× relative to low utilization levels. Finally, we describe simple improvements we deployed. In our cluster scheduler, adding hotspot-aware task placement reduced the number of hot ToRs by 90%; in our distributed file system, adding hotspot-aware data placement reduced p95 network latency by more than 50%. While congestion control, load balancing, and traffic engineering can efficiently utilize paths for a fixed placement, we find hotspot-aware placement – placing tasks and data under ToRs with higher available bandwidth – is crucial for achieving consistently good performance.
View details
Opportunities and Applications of GenAI in Smart Cities: A User-Centric Survey
Shashank Kapoor
Aman Raj
IEEE COINS (2025)
Preview abstract
The proliferation of IoT in cities, combined with Digital Twins, creates a rich data foundation for Smart Cities aimed at improving urban life and operations. Generative AI (GenAI) significantly enhances this potential, moving beyond traditional AI analytics by processing multimodal content and generating novel outputs like text and simulations. Using specialized or foundational models, GenAI's natural language abilities such as Natural Language Understanding (NLU) and Generation (NLG) can power tailored applications and unified interfaces, dramatically lowering barriers for users interacting with complex smart city systems. In this paper, we focus on GenAI applications based on conversational interfaces within the context of three critical user archetypes in a Smart City - Citizens, Operators and Planners. We identify and review GenAI models and techniques that have been proposed or deployed for various urban subsystems in the contexts of these user archetypes. We also consider how GenAI can be built on the existing data foundation of official city records, IoT data streams and Urban Digital Twins. We believe this work represents the first comprehensive summarization of GenAI techniques for Smart Cities from the lens of the critical users in a Smart City.
View details
Beyond Touchscreens: Dynamic and Multimodal Interaction Needs
Melissa Barnhart Wantland
Mai Kobori
Universal Access in Human-Computer Interaction, Springer-Verlag (2025)
Preview abstract
Today’s smartphone interactions are typically designed with one primary preset, accompanied by customization settings that can be manually adjusted. To promote the creation of contextually aware experiences, researchers have highlighted the factors that influence mobile device usage in the ability-based design framework. This paper expands upon existing frameworks and contributes to an empirical understanding of smartphone accessibility. Through a 10-day longitudinal diary study and video interview with 24 individuals who do and do not identify as having a disability, the research also illustrates the reactions of reattempt, adaptation, and avoidance, which were used in response to a lack of smartphone accessibility. Despite experiencing scenarios where accessibility settings could be leveraged, 20 out of 24 participants did not use accessibility settings on their smartphone. A total of 12 out of 24 participants tried accessibility settings on their smartphones, however identifying accessibility was not for them. This work highlights the need to shift current design practices to better serve the accessibility community.
View details
How Unique is Whose Web Browser? The role of demographics in browser fingerprinting
Pritish Kamath
Robin Lassonde
2025
Preview abstract
Web browser fingerprinting can be used to identify and track users across the Web, even without cookies, by collecting attributes from users' devices to create unique "fingerprints". This technique and resulting privacy risks have been studied for over a decade. Yet further research is limited because prior studies did not openly publish their data. Additionally, data in prior studies had biases and lacked user demographics.
Here we publish a first-of-its-kind open dataset that includes browser attributes with users' demographics, collected from 8,400 US study participants, with their informed consent. Our data collection process also conducted an experiment to study what impacts users' likelihood to share browser data for open research, in order to inform future data collection efforts, with survey responses from a total of 12,461 participants. Female participants were significantly less likely to share their browser data, as were participants who were shown the browser data we asked to collect.
In addition we demonstrate how fingerprinting risks differ across demographic groups. For example, we find lower income users are more at risk, and find that as users' age increases, they are both more likely to be concerned about fingerprinting and at real risk of fingerprinting. Furthermore, we demonstrate an overlooked risk: user demographics, such as gender, age, income level, ethnicity and race, can be inferred from browser attributes commonly used for fingerprinting, and we identify which browser attributes most contribute to this risk.
Overall, we show the important role of user demographics in the ongoing work that intends to assess fingerprinting risks and improve user privacy, with findings to inform future privacy enhancing browser developments. The dataset and data collection tool we openly publish can be used to further study research questions not addressed in this work.
View details
Preview abstract
Augmenting LLMs with context leads to improved performance across many applications. Despite much research on Retrieval Augmented Generation (RAG) systems, an open question is whether errors arise because LLMs fail to utilize the context from retrieval or the context itself is insufficient to answer the query. To shed light on this, we develop a new notion of sufficient context, along with a way to classify instances that have enough information to answer the query. We then use sufficient context to analyze several models and datasets. By stratifying errors based on context sufficiency, we find that proprietary LLMs (Gemini, GPT, Claude) excel at answering queries when the context is sufficient, but often output incorrect answers instead of abstaining when the context is not. On the other hand, open-source LLMs (Llama, Mistral, Gemma) hallucinate or abstain often, even with sufficient context. We further categorize cases when the context is useful, and improves accuracy, even though it does not fully answer the query and the model errs without the context. Building on our findings, we explore ways to reduce hallucinations in RAG systems, including a new selective generation method that leverages sufficient context information for guided abstention. Our method improves the fraction of correct answers among times where the model responds by 2--10% for Gemini, GPT, and Gemma.
View details
In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialog Agents
Zhen Tan
George Lee
Anand Iyer
Tianlong Chen
Huan Liu
ACL 2025
Preview abstract
Large Language Models (LLMs) have made significant progress in open-ended dialogue, yet their inability to retain and retrieve relevant information from long-term interactions limits their effectiveness in applications requiring sustained personalization. External memory mechanisms have been proposed to address this limitation, enabling LLMs to maintain conversational continuity. However, existing approaches struggle with two key challenges. First, rigid memory granularity fails to capture the natural semantic structure of conversations, leading to fragmented and incomplete representations. Second, fixed retrieval mechanisms cannot adapt to diverse dialogue contexts and user interaction patterns. In this work, we propose Reflective Memory Management (RMM), a novel mechanism for long-term dialogue agents, integrating forward- and backward-looking reflections: (1) Prospective Reflection, which dynamically summarizes interactions across granularities—utterances, turns, and sessions—into a personalized memory bank for effective future retrieval, and (2) Retrospective Reflection, which iteratively refines the retrieval in an online reinforcement learning (RL) manner based on LLMs’ cited evidence. Experiments show that RMM demonstrates consistent improvement across various metrics and benchmarks. For example, RMM shows more than 10% accuracy improvement over the baseline without memory management on the LongMemEval dataset.
View details
Introducing the DORA AI Capabilities Model: 7 keys to succeeding in AI-assisted software development
Preview abstract
Artificial intelligence is rapidly transforming software development. But simply adopting AI tools isn’t a guarantee of success. Across the industry, tech leaders and developers are asking the same critical questions: How do we move from just using AI to truly succeeding with it? How do we ensure our investment in AI delivers better, faster, and more reliable software?
The DORA research team has developed the inaugural DORA AI Capabilities Model to provide data-backed guidance for organizations grappling with these questions. This is not just another report on AI adoption trends; it is a guide to the specific technical and cultural practices that amplify the benefits of AI.
View details
Heterogeneous graph neural networks for species distribution modeling
Christine Kaeser-Chen
Keith Anderson
Michelangelo Conserva
Elise Kleeman
Maxim Neumann
Matt Overlan
Millie Chapman
Drew Purves
arxiv (2025)
Preview abstract
Species distribution models (SDMs) are necessary for measuring and predicting occurrences and habitat suitability of species and their relationship with environmental factors. We introduce a novel presence-only SDM with graph neural networks (GNN). In our model, species and locations are treated as two distinct node sets, and the learning task is predicting detection records as the edges that connect locations to species. Using GNN for SDM allows us to model fine-grained interactions between species and the environment. We evaluate the potential of this methodology on the six-region dataset compiled by National Center for Ecological Analysis and Synthesis (NCEAS) for benchmarking SDMs. For each of the regions, the heterogeneous GNN model is comparable to or outperforms previously-benchmarked single-species SDMs as well as a feed-forward neural network baseline model.
View details
Preview abstract
Visual in-context learning (VICL), as a new paradigm in computer vision, allows the model to rapidly adapt to various tasks with only a handful of prompts and examples. While effective, the existing VICL paradigm exhibits poor generalizability under distribution shifts. In this work, we propose test-time visual in-context tuning (VICT), a method that can learn adaptive VICL models on the fly with a single test sample. Specifically, We flip the role between task prompts and the test sample and use a cycle consistency loss to reconstruct the original task prompt output. Our key insight is that a model should be aware of a new test distribution if it can successfully recover the original task prompts. Extensive experiments on seven representative vision tasks with 15 corruptions demonstrate that our VICT can improve the generalizability of VICL to unseen new domains
View details
Perceptual Audio Coding: A 40-Year Historical Perspective
Juergen Herre
Schuyler Quackenbush
Minje Kim
2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2025)
Preview abstract
In the history of audio and acoustic signal processing perceptual audio coding has certainly excelled as a bright success story by its ubiquitous deployment in virtually all digital media devices, such as computers, tablets, mobile phones, set-top-boxes, and digital radios. From a technology perspective, perceptual audio coding has undergone tremendous development from the first very basic perceptually driven coders (including the popular mp3 format) to today’s full-blown integrated coding/rendering systems. This paper provides a historical overview of this research journey by pinpointing the pivotal development steps in the evolution of perceptual audio coding. Finally, it provides thoughts about future directions in this area.
View details
Preview abstract
As large language models (LLMs) improve in their capacity to serve as personal AI assistants, their ability to output uniquely tailored, personalized responses that align with the soft preferences of their users is imperative for maximizing user satisfaction and retention. However, lay users are notoriously bad at prompt specification and often struggle with conveying their latent preferences to AI assistants. To resolve this, we demonstrate that activation steering, an inference-time method, can effectively control the response of the LLMs towards expressing different preferences. In contrast to memory-based personalization methods that require long user history, steering is extremely lightweight and easily-controllable via an interpretable linear strength factor. We further conduct a within-subjects user study (n=14) to investigate how end users personalize their conversations through three different steerable chatbot interfaces. The results demonstrate the effectiveness of preference-based steering for aligning real-world conversations with user preferences, and we discuss qualitative findings on how diverse values around control, transparency, and usability of personalization lead users to prefer different interfaces.
View details