Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10496 publications
    Permission Rationales in the Web Ecosystem: An Exploration of Rationale Text and Design Patterns
    Yusra Elbitar
    Soheil Khodayari
    Marian Harbach
    Gianluca De Stefano
    Balazs Engedy
    Giancarlo Pellegrino
    Sven Bugiel
    CHI 2025, ACM
    Preview abstract Modern web applications rely on features like camera and geolocation for personalized experiences, requiring user permission via browser prompts. To explain these requests, applications provide rationales—contextual information on why permissions are needed. Despite their importance, little is known about how rationales appear on the web or their influence on user decisions. This paper presents the first large-scale study of how the web ecosystem handles permission rationales, covering three areas: (i) identifying webpages that use permissions, (ii) detecting and classifying permission rationales, and (iii) analyzing their attributes to understand their impact on user decisions. We examined over 770K webpages from Chrome telemetry, finding 3.6K unique rationale texts and 749 rationale UIs across 85K pages. We extracted key rationale attributes and assessed their effect on user behavior by cross-referencing them with Chrome telemetry data. Our findings reveal nine key insights, providing the first evidence of how different rationales affect user decisions. View details
    YETI (YET to Intervene) Proactive Interventions by Multimodal AI Agents in Augmented Reality Tasks
    Saptarashmi Bandyopadhyay
    Vikas Bahirwani
    Lavisha Aggarwal
    Bhanu Guda
    Lin Li
    Andrea Colaco
    2025
    Preview abstract Multimodal AI Agents are AI models that have the capability of interactively and cooperatively assisting human users to solve day-to-day tasks. Augmented Reality (AR) head worn devices can uniquely improve the user experience of solving procedural day-to-day tasks by providing egocentric multimodal (audio and video) observational capabilities to AI Agents. Such AR capabilities can help the AI Agents see and listen to actions that users take which can relate to multimodal capabilities of human users. Existing AI Agents, either Large Language Models (LLMs) or Multimodal Vision-Language Models (VLMs) are reactive in nature, which means that models cannot take an action without reading or listening to the human user's prompts. Proactivity of AI Agents, on the other hand, can help the human user detect and correct any mistakes in agent observed tasks, encourage users when they do tasks correctly, or simply engage in conversation with the user - akin to a human teaching or assisting a user. Our proposed YET to Intervene (YETI) multimodal Agent focuses on the research question of identifying circumstances that may require the Agent to intervene proactively. This allows the Agent to understand when it can intervene in a conversation with human users that can help the user correct mistakes on tasks, like cooking, using Augmented Reality. Our YETI Agent learns scene understanding signals based on interpretable notions of Structural Similarity (SSIM) on consecutive video frames. We also define the alignment signal which the AI Agent can learn to identify if the video frames corresponding to the user's actions on the task are consistent with expected actions. These signals are used by our AI Agent to determine when it should proactively intervene. We compare our results on the instances of proactive intervention in the HoloAssist multimodal benchmark for an expert agent guiding an user agent to complete procedural tasks. View details
    Preview abstract We discuss the challenges posed by growing machine learning workloads on datacenter networks and present how Google’s Jupiter network fabrics effectively support diverse traffic. View details
    GeoChain: Multimodal Chain-of-Thought for Geographic Reasoning
    Sahiti Yerramilli
    Nilay Pande
    Rynaa Grover
    Jayant Tamarapalli
    (2025)
    Preview abstract This paper introduces GeoChain, a large-scale benchmark for evaluating step-by-step geographic reasoning in multimodal large language models (MLLMs). Leveraging 1.46 million Mapillary street-level images, GeoChain pairs each image with a 21-step chain-of-thought (CoT) question sequence (over 30 million Q&A pairs). These sequences guide models from coarse attributes to fine-grained localization across four reasoning categories - visual, spatial, cultural, and precise geolocation - annotated by difficulty. Images are also enriched with semantic segmentation (150 classes) and a visual locatability score. Our benchmarking of contemporary MLLMs (GPT-4.1 variants, Claude 3.7, Gemini 2.5 variants) on a diverse 2,088-image subset reveals consistent challenges: models frequently exhibit weaknesses in visual grounding, display erratic reasoning, and struggle to achieve accurate localization, especially as the reasoning complexity escalates. GeoChain offers a robust diagnostic methodology, critical for fostering significant advancements in complex geographic reasoning within MLLMs. View details
    Circadian rhythm of heart rate and activity: a cross-sectional study
    Maryam Khalid
    Logan Schneider
    Aravind Natarajan
    Conor Heneghan
    Karla Gleichauf
    Chronobiology International (2025)
    Preview abstract ABSTRACT Background: Circadian rhythms are commonly observed in a number of physiological processes. Consumer wearable devices have made it possible to obtain continuous time series data from a large number of individuals. We study circadian rhythms from measurements of heart rate, movement, and sleep, from a cohort of nearly 20,000 participants over the course of 30 days. Methods: Participation was restricted to Fitbit users of age 21 years or older residing in the United States or Canada. Participants were enrolled through a recruitment banner shown on the Fitbit App. The advertisement was shown to 531,359 Fitbit users, and 23,239 enrolled in the program. Of these, we obtained heart rate data from 19,350 participants. We obtain the underlying circadian rhythm from time series heart rate by modeling the circadian rhythm as a sum over the first two Fourier harmonics. The first Fourier harmonic accounts for the 24-hour rhythmicity, while the second harmonic accounts for non-sinusoidal perturbations. Findings: We observe a circadian rhythm in both heart rate and acceleration. From the diurnal modulation, we obtain the following circadian parameters: (i) amplitude of modulation, (ii) bathyphase, (iii) acrophase, (iv) non-sinusoidal fraction, and (v) fraction of day when the heart rate is greater than the mean. The amplitude, bathyphase, and acrophase depend on sex, and decrease with age. The waketime on average, follows the bathyphase by 2.4 hours. In most individuals, the circadian rhythm of heart rate lags the circadian rhythm of activity. Interpretation: Circadian metrics for heart rate and activity can be reliably obtained from commercially available wearable devices. Distributions of circadian metrics can be valuable tools for individual-level interpretation. View details
    Preview abstract Estimating Origin-Destination (OD) travel demand is vital for effective urban planning and traffic management. Developing universally applicable OD estimation methodologies is significantly challenged by the pervasive scarcity of high-fidelity traffic data and the difficulty in obtaining city-specific prior OD estimates (or seed ODs), which are often prerequisite for traditional approaches. Our proposed method directly estimates OD travel demand by systematically leveraging aggregated, anonymized statistics from Google Maps Traffic Trends, obviating the need for conventional census or city-provided OD data. The OD demand is estimated by formulating a single-level, one-dimensional, continuous nonlinear optimization problem with nonlinear equality and bound constraints to replicate highway path travel times. The method achieves efficiency and scalability by employing a differentiable analytical macroscopic network model. This model by design is computationally lightweight, distinguished by its parsimonious parameterization that requires minimal calibration effort and its capacity for instantaneous evaluation. These attributes ensure the method's broad applicability and practical utility across diverse cities globally. Using segment sensor counts from Los Angeles and San Diego highway networks, we validate our proposed approach, demonstrating a two-thirds to three-quarters improvement in the fit to segment count data over a baseline. Beyond validation, we establish the method's scalability and robust performance in replicating path travel times across diverse highway networks, including Seattle, Orlando, Denver, Philadelphia, and Boston. In these expanded evaluations, our method not only aligns with simulation-based benchmarks but also achieves an average 13% improvement in it's ability to fit travel time data compared to the baseline during afternoon peak hours. View details
    GOALIE (GOAL oriented IntErventions) Proactive Multimodal Agent to Assist Augmented Reality
    Saptarashmi Bandyopadhyay
    Vikas Bahirwani
    Lavisha Aggarwal
    Bhanu Guda
    Lin Li
    Qin Liu
    Tom Goldstein
    John Dickerson
    Andrea Colaco
    2025
    Preview abstract Multimodal AI Agents are helpful to assist and guide users in completing real-time tasks like cooking, robotics, manufacturing. An emerging form of multimodal communication is Augmented Reality (AR), where an AI Agent can enhance user experience with step-by-step guidance of tasks by observing the user's vision and language inputs. Current LLM or VLM based agents are reactive, waiting for an user query before responding. Proactive AI Agents in AR focus on detecting when the AI Agent should autonomously intervene to fix mistakes or followup any instruction. Our GOALIE (GOAL-oriented IntErvention) Agent is the first multimodal proactive AR agent which guides the user step-by-step on its own. We build an innovative Zero-Shot Prompting framework PSoS (Proactive Sequence of Steps) with the context of abstract past user actions, the agent's previous responses, and the user's granular goals and actions before it is detected that the AI Agent should intervene. We use PSoS for Supervised Finetuning (SFT), Direct Preference Optimization (DPO) and Group-Relative Policy Optimization (GRPO) finetuning of our AI agent to improve the quality of the agent's proactive intervention. We also propose a new algorithmic framework, Bagged group Relative Policy Optimization (BRPO), to reduce the variance in rewards of generation groups, to adapt the finetuning algorithm for multimodal proactive interventions by the AI Agent and to enable real-time finetuning of the AI model. We compare the step-by-step intervention quality and efficiency of the GOALIE Agent with Gemma-3 models along with other VLMs for task execution with human expert labels. We conduct human evaluation of the proactive interventions, demonstrating user satisfaction with the GOALIE Agent's proactive interventions. We will release the code, model and human evaluation data. View details
    Supporting the Digital Safety of At-Risk Users: Lessons Learned from 9+ Years of Research and Training
    Tara Matthews
    Patrick Gage Kelley
    Lea Kissner
    Andreas Kramm
    Andrew Oplinger
    Andy Schou
    Stephan Somogyi
    Dalila Szostak
    Jill Woelfer
    Lawrence You
    Izzie Zahorian
    ACM Transactions on Computer-Human Interaction, 32(3) (2025), pp. 1-39
    Preview abstract Creating information technologies intended for broad use that allow everyone to participate safely online—which we refer to as inclusive digital safety—requires understanding and addressing the digital-safety needs of a diverse range of users who face elevated risk of technology-facilitated attacks or disproportionate harm from such attacks—i.e., at-risk users. This article draws from more than 9 years of our work at Google to understand and support the digital safety of at-risk users—including survivors of intimate partner abuse, people involved with political campaigns, content creators, youth, and more—in technology intended for broad use. Among our learnings is that designing for inclusive digital safety across widely varied user needs and dynamic contexts is a wicked problem with no “correct” solution. Given this, we describe frameworks and design principles we have developed to help make at-risk research findings practically applicable to technologies intended for broad use and lessons we have learned about communicating them to practitioners. View details
    Preview abstract This IEEE Spectrum article reflects on advocacy for U.S. technological leadership during my Congressional visit through IEEE-USA. Leading an expert group of other distinguished IEEE members, we urged lawmakers to support critical initiatives. Key priorities included sustained funding for federal research institutions like NIST, NASA, and the NSF, reauthorizing the SBIR/STTR programs vital for small business innovation, and passing the CREATE AI Act to democratize AI resources by establishing the National AI Research Resource (NAIRR). We also emphasized strengthening the STEM talent pipeline through the CHIPS and Science Act and expanding high-skilled immigrant visas. We highlighted rapid AI advancements, such as autonomous vehicles, the surge in FDA-approved AI based medical devices, as underscoring the need for these strategic investments and policy actions. The article conveys a sense of urgency, calling for concrete congressional action to ensure the U.S. maintains its technological edge while also sharing my personal experiences. View details
    Data-Driven Mechanism Design: Jointly Eliciting Preferences and Information
    Dirk Bergemann
    Marek Bojko
    Paul Duetting
    Haifeng Xu
    EC '25: Proceedings of the 26th ACM Conference on Economics and Computation (2025), pp. 507
    Preview abstract We study mechanism design when agents have private preferences and private information about a common payoff-relevant state. We show that standard message-driven mechanisms cannot implement socially efficient allocations when agents have multidimensional types, even under favorable conditions. To overcome this limitation, we propose data-driven mechanisms that leverage additional post-allocation information, modeled as an estimator of the payoff-relevant state. Our data-driven mechanisms extend the classic Vickrey-Clarke-Groves class. We show that they achieve exact implementation in posterior equilibrium when the state is either fully revealed or the utility is affine in an unbiased estimator. We also show that they achieve approximate implementation with a consistent estimator, converging to exact implementation as the estimator converges, and present bounds on the convergence rate. We demonstrate applications to digital advertising auctions and large language model (LLM)-based mechanisms, where user engagement naturally reveals relevant information. View details
    Preview abstract Generative AI's potential for hallucinations and inaccuracies are by far the most discussed limitation in AI-assisted software development. But, whether developers have other concerns about using generative AI in their coding practice has not been thoroughly explored. This article describes the results of in-depth interviews with developers about their other concerns about generative AI in coding, beyond the tools accuracy, and discusses related policy implications for organizations developing software. View details
    Preview abstract Large Language Models (LLMs) are revolutionizing many areas of AI, but their substantial resource requirements limit their deployment on mobile and edge devices. This survey paper provides a comprehensive overview of techniques for compressing LLMs to enable efficient inference in resource-constrained environments. We examine three primary approaches: knowledge distillation, model quantization and model pruning. For each technique, we discuss the underlying principles, present different forms, and provide examples of successful applications. We also briefly discuss complementary techniques like mixture-of-experts and early exit strategies and highlight the promising future directions. We aim to provide a valuable resource for both researchers and practitioners seeking to optimize LLMs for edge deployment. To the best of our knowledge, this is the first paper that provides a focused survey of LLM compression techniques from the lens of resource-constrained environments. View details
    Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
    Zilong Wang
    Steven Zheng
    Swaroop Mishra
    Yuwei Zhang
    Anush Mattapalli
    Ankur Taly
    Jingbo Shang
    ICLR 2025
    Preview abstract Retrieval augmented generation (RAG) has attracted a lot of attention across both academia and industry due to its capability in inserting timely and accurate evidence to the generation by large language models. However, the introduction of retrieved evidence largely makes the input prompt longer, which would harm the understanding quality of large language models and make it slower in actual usage scenarios. To solve these issues, we propose SpeculativeRAG, which leverages a smaller LLM to conduct the retrieval augmented generation for a larger LLM. The smaller LLM can digest a few pieces of evidence and generate multiple pieces of drafts in parallel rapidly, and these drafts will be verified by a large LLM to guarantee the quality. We achieve a higher speed as well as a better quality in the RAG results. View details
    Procurement Auctions via Approximate Submodular Optimization
    Amin Karbasi
    Grigoris Velegkas
    Forty-second International Conference on Machine Learning (2025)
    Preview abstract We study the problem of procurement auctions, in which an auctioneer seeks to acquire services from a group of strategic sellers with private costs. The quality of the services is measured through some \emph{submodular} function that is known to the auctioneer. Our goal is to design \emph{computationally efficient} procurement auctions that (approximately) maximize the difference between the quality of the acquired services and the total cost of the sellers, in a way that is incentive compatible (IC) and individual rational (IR) for the sellers, and generates non-negative surplus (NAS) for the auctioneer. Leveraging recent results from the literature of \emph{non-positive} submodular function maximization, we design computationally efficient frameworks that transform submodular function optimization algorithms to \emph{mechanisms} that are IC and IR for the sellers, NAS for the auctioneer, and \emph{approximation-preserving}. Our frameworks are general and work both in the \emph{offline} setting where the auctioneer can observe the bids and the services of all the sellers simultaneously, and in the \emph{online} setting where the sellers arrive in an adversarial order and the auctioneer has to make an irrevocable decision whether to purchase their service or not. We further investigate whether it is possible to convert state-of-art submodular optimization algorithms into a descending auction. We focurs in the adversarial setting, meaning that the schedule of the descending prices is determined by an advesary. We show that a submodular optimization algorithm satisfying bi-criteria $(\alpha, 1)$-approximation in welfare can be effectively converted to a descending auction in the adversarial setting in if and only if $\alpha \leq \frac 1 2$. Our result highlights the importance of a carefully designed schedule of descending prices to effectively convert a submodular optimization algorithm satisfying bi-criteria $(\alpha, 1)$-approximation in welfare with $\alpha > \frac 1 2$ to a descending auction. We also further establish a connection between descending auctions and online submodular optimization algorithms. We demonstrate the practical applications of our frameworks by instantiating them with different state-of-the-art submodular optimization algorithms and comparing their welfare performance through empirical experiments on publicly available datasets that consist of thousands of sellers. View details
    Calibration Properties of Time-Series Foundation Models: An Empirical Analysis
    Coen Adler
    Samar Abdi
    Yuxin Chang
    Padhraic Smyth
    2025
    Preview abstract Recent development of foundation models for time series data has generated considerable interest in using such models across a variety of applications. Although they achieve state-of-the-art predictive performance, the ability to produce well-calibrated probabilistic distributions is critical for practical applications and is relatively underexplored. In this paper, we investigate the calibration-related properties of five recent time series foundation models and two competitive baselines. We perform systematic evaluations and identify significant variation in calibration performances across models. View details