Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10492 publications
    Preview abstract Large Language Models (LLMs) have made significant progress in open-ended dialogue, yet their inability to retain and retrieve relevant information from long-term interactions limits their effectiveness in applications requiring sustained personalization. External memory mechanisms have been proposed to address this limitation, enabling LLMs to maintain conversational continuity. However, existing approaches struggle with two key challenges. First, rigid memory granularity fails to capture the natural semantic structure of conversations, leading to fragmented and incomplete representations. Second, fixed retrieval mechanisms cannot adapt to diverse dialogue contexts and user interaction patterns. In this work, we propose Reflective Memory Management (RMM), a novel mechanism for long-term dialogue agents, integrating forward- and backward-looking reflections: (1) Prospective Reflection, which dynamically summarizes interactions across granularities—utterances, turns, and sessions—into a personalized memory bank for effective future retrieval, and (2) Retrospective Reflection, which iteratively refines the retrieval in an online reinforcement learning (RL) manner based on LLMs’ cited evidence. Experiments show that RMM demonstrates consistent improvement across various metrics and benchmarks. For example, RMM shows more than 10% accuracy improvement over the baseline without memory management on the LongMemEval dataset. View details
    Preview abstract We discuss the challenges posed by growing machine learning workloads on datacenter networks and present how Google’s Jupiter network fabrics effectively support diverse traffic. View details
    Closing the AI generalisation gap by adjusting for dermatology condition distribution differences across clinical settings
    Rajeev Rikhye
    Aaron Loh
    Grace Hong
    Margaret Ann Smith
    Vijaytha Muralidharan
    Doris Wong
    Michelle Phung
    Nicolas Betancourt
    Bradley Fong
    Rachna Sahasrabudhe
    Khoban Nasim
    Alec Eschholz
    Basil Mustafa
    Jan Freyberg
    Terry Spitz
    Kat Chou
    Peggy Bui
    Justin Ko
    Steven Lin
    The Lancet eBioMedicine (2025)
    Preview abstract Background: Generalisation of artificial intelligence (AI) models to a new setting is challenging. In this study, we seek to understand the robustness of a dermatology (AI) model and whether it generalises from telemedicine cases to a new setting including both patient-submitted photographs (“PAT”) and clinician-taken photographs in-clinic (“CLIN”). Methods: A retrospective cohort study involving 2500 cases previously unseen by the AI model, including both PAT and CLIN cases, from 22 clinics in the San Francisco Bay Area, spanning November 2015 to January 2021. The primary outcome measure for the AI model and dermatologists was the top-3 accuracy, defined as whether their top 3 differential diagnoses contained the top reference diagnosis from a panel of dermatologists per case. Findings: The AI performed similarly between PAT and CLIN images (74% top-3 accuracy in CLIN vs. 71% in PAT), however, dermatologists were more accurate in PAT images (79% in CLIN vs. 87% in PAT). We demonstrate that demographic factors were not associated with AI or dermatologist errors; instead several categories of conditions were associated with AI model errors (p < 0.05). Resampling CLIN and PAT to match skin condition distributions to the AI development dataset reduced the observed differences (AI: 84% CLIN vs. 79% PAT; dermatologists: 77% CLIN vs. 89% PAT). We demonstrate a series of steps to close the generalisation gap, requiring progressively more information about the new dataset, ranging from the condition distribution to additional training data for rarer conditions. When using additional training data and testing on the dataset without resampling to match AI development, we observed comparable performance from end-to-end AI model fine tuning (85% in CLIN vs. 83% in PAT) vs. fine tuning solely the classification layer on top of a frozen embedding model (86% in CLIN vs. 84% in PAT). Interpretation: AI algorithms can be efficiently adapted to new settings without additional training data by recalibrating the existing model, or with targeted data acquisition for rarer conditions and retraining just the final layer. View details
    Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models
    Fei Wang
    The Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) (2025) (to appear)
    Preview abstract Retrieval-Augmented Generation (RAG), while effective in integrating external knowledge to address the limitations of large language models (LLMs), can be undermined by imperfect retrieval, which may introduce irrelevant, misleading, or even malicious information. Despite its importance, previous studies have rarely explored the behavior of RAG through joint analysis on how errors from imperfect retrieval attribute and propagate, and how potential conflicts arise between the LLMs' internal knowledge and external sources. We find that imperfect retrieval augmentation might be inevitable and quite harmful, through controlled analysis under realistic conditions. We identify the knowledge conflicts between LLM-internal and external knowledge from retrieval as a bottleneck to overcome in the post-retrieval stage of RAG. To render LLMs resilient to imperfect retrieval, we propose Astute RAG, a novel RAG approach that adaptively elicits essential information from LLMs' internal knowledge, iteratively consolidates internal and external knowledge with source-awareness, and finalizes the answer according to information reliability. Our experiments using Gemini and Claude demonstrate that Astute RAG significantly outperforms previous robustness-enhanced RAG methods. Notably, Astute RAG is the only approach that matches or exceeds the performance of LLMs without RAG under worst-case scenarios. Further analysis reveals that Astute RAG effectively resolves knowledge conflicts, improving the reliability and trustworthiness of RAG systems. View details
    Zero-Shot Image Moderation in Google Ads with LLM-Assisted Textual Descriptions and Cross-modal Co-embeddings
    Jimin Li
    Eric Xiao
    Katie Warren
    Enming Luo
    Krishna Viswanathan
    Ariel Fuxman
    Bill Li
    Yintao Liu
    (2025)
    Preview abstract We present a scalable and agile approach for ads image content moderation at Google, addressing the challenges of moderating massive volumes of ads with diverse content and evolving policies. The proposed method utilizes human-curated textual descriptions and cross-modal text-image co-embeddings to enable zero-shot classification of policy violating ads images, bypassing the need for extensive supervised training data and human labeling. By leveraging large language models (LLMs) and user expertise, the system generates and refines a comprehensive set of textual descriptions representing policy guidelines. During inference, co-embedding similarity between incoming images and the textual descriptions serves as a reliable signal for policy violation detection, enabling efficient and adaptable ads content moderation. Evaluation results demonstrate the efficacy of this framework in significantly boosting the detection of policy violating content. View details
    GOALIE (GOAL oriented IntErventions) Proactive Multimodal Agent to Assist Augmented Reality
    Saptarashmi Bandyopadhyay
    Vikas Bahirwani
    Lavisha Aggarwal
    Bhanu Guda
    Lin Li
    Qin Liu
    Tom Goldstein
    John Dickerson
    Andrea Colaco
    2025
    Preview abstract Multimodal AI Agents are helpful to assist and guide users in completing real-time tasks like cooking, robotics, manufacturing. An emerging form of multimodal communication is Augmented Reality (AR), where an AI Agent can enhance user experience with step-by-step guidance of tasks by observing the user's vision and language inputs. Current LLM or VLM based agents are reactive, waiting for an user query before responding. Proactive AI Agents in AR focus on detecting when the AI Agent should autonomously intervene to fix mistakes or followup any instruction. Our GOALIE (GOAL-oriented IntErvention) Agent is the first multimodal proactive AR agent which guides the user step-by-step on its own. We build an innovative Zero-Shot Prompting framework PSoS (Proactive Sequence of Steps) with the context of abstract past user actions, the agent's previous responses, and the user's granular goals and actions before it is detected that the AI Agent should intervene. We use PSoS for Supervised Finetuning (SFT), Direct Preference Optimization (DPO) and Group-Relative Policy Optimization (GRPO) finetuning of our AI agent to improve the quality of the agent's proactive intervention. We also propose a new algorithmic framework, Bagged group Relative Policy Optimization (BRPO), to reduce the variance in rewards of generation groups, to adapt the finetuning algorithm for multimodal proactive interventions by the AI Agent and to enable real-time finetuning of the AI model. We compare the step-by-step intervention quality and efficiency of the GOALIE Agent with Gemma-3 models along with other VLMs for task execution with human expert labels. We conduct human evaluation of the proactive interventions, demonstrating user satisfaction with the GOALIE Agent's proactive interventions. We will release the code, model and human evaluation data. View details
    Preview abstract Many everyday tasks ranging from fixing appliances, cooking recipes to car maintenance require expert knowledge, especially when tasks are complex and multi-step. Despite growing interest in AI agents, there is a scarcity of dialogue-video datasets grounded for real world task assistance. In this paper, we propose a simple yet effective approach that transforms single-person instructional videos into task-guidance two-person dialogues, aligned with fine grained steps and video-clips. Our fully automatic approach, powered by large language models, offers an efficient alternative to the substantial cost and effort required for manual data collection. Using this technique, we build HowToDIV, a large-scale dataset containing 507 conversations, 6636 question-answer pairs and 24 hours of videoclips across diverse tasks in cooking, mechanics, and planting. Each session includes multi-turn conversation where an expert teaches a novice user how to perform a task step by step, while observing user's surrounding through a camera and microphone equipped wearable device. We establish the baseline benchmark performance on HowToDIV dataset through Gemma-3 model, for future research on this new task of dialogues for procedural-task assistance. Our dataset and code are publicly available at our project page: https://github.com/google/howtodiv. View details
    Fast ACS: Low-Latency File-Based Ordered Message Delivery at Scale
    Anil Raghunath Iyer
    Neel Bagora
    Chang Yu
    Olivier Pomerleau
    Vivek Kumar
    Prunthaban Kanthakumar
    Usenix Annual Technical Conference (2025)
    Preview abstract Low-latency message delivery is crucial for real-time systems. Data originating from a producer must be delivered to consumers, potentially distributed in clusters across metropolitan and continental boundaries. With the growing scale of computing, there can be several thousand consumers of the data. Such systems require a robust messaging system capable of transmitting messages containing data across clusters and efficiently delivering them to consumers. The system must offer guarantees like ordering and at-least-once delivery while avoiding overload on consumers, allowing them to consume messages at their own pace. This paper presents the design of Fast ACS (an abbreviation for Ads Copy Service), a file-based ordered message delivery system that leverages a combination of two-sided (inter-cluster) and one-sided (intra-cluster) communication primitives—namely, Remote Procedure Call and Remote Direct Memory Access, respectively—to deliver messages. The system has been successfully deployed to dozens of production clusters and scales to accommodate several thousand consumers within each cluster, which amounts to Tbps-scale intra-cluster consumer traffic at peak. Notably, Fast ACS delivers messages to consumers across the globe within a few seconds or even sub-seconds (p99) based on the message volume and consumer scale, at a low resource cost. View details
    Preview abstract Initially conceived as a way to explain memory sharing in romantic couples, the concept of transactive memory systems (TMS) has been adopted by organizational psychology, information management, and other fields of study to examine team performance in corporate settings. While findings highlight a clear advantage for humans teams with TMS, it's not evident if AI-human teams could also develop such a psychological dynamic. This paper considers AI-human interaction through the lens of TMS and identifies potential opportunities for improvement in this area. View details
    Preview abstract Delay monitoring is a commonly arising problem in applications such as queue management systems, scheduling, and traffic monitoring. Motivated by such applications, we formulate a queue monitoring problem, where there is a FIFO queue with arbitrary arrivals and departures, and a server needs to monitor the length of a queue by using (decentralized) pings from packets in the queue. Packets can send pings informing the server about the number of packets ahead of them in the queue. Via novel online algorithms and lower bounds, we tightly characterize the trade-off between the number of pings sent and the accuracy of the server's real time estimates. Further, our approximate estimates can be made to be accurate to an arbitrary precision. View details
    Collaborative Diffusion Model for Recommender System
    Gyuseok Lee
    Yaochen Zhu
    Hwanjo Yu
    Yao Zhou
    Jundong Li
    2025
    Preview abstract Diffusion-based recommender systems (DR) have gained increasing attention for their advanced generative and denoising capabilities. However, existing DR face two central limitations: (i) a trade-off between enhancing generative capacity via noise injection and retaining the loss of personalized information. (ii) the underutilization of rich item-side information. To address these challenges, we present a Collaborative Diffusion model for Recommender System (CDiff4Rec). Specifically, CDiff4Rec generates pseudo-users from item features and leverages collaborative signals from both real and pseudo personalized neighbors identified through behavioral similarity, thereby effectively reconstructing nuanced user preferences. Experimental results on three public datasets show that CDiff4Rec outperforms competitors by effectively mitigating the loss of personalized information through the integration of item content and collaborative signals. View details
    Preview abstract Generative AI (GenAI), particularly Large Language Models (LLMs), offer powerful capabilities for interpreting the complex data landscape in healthcare. In this paper, we present a comprehensive overview of the capabilities, requirements and applications of GenAI for deriving clinical insights and improving clinical efficiency. We first provide some background on the forms and sources of patient data, namely real-time Remote Patient Monitoring (RPM) streams and traditional Electronic Health Records (EHR). The sheer volume and heterogeneity of this combined data present significant challenges to clinicians and contribute to information overload. In addition, we explore the potential of LLM-powered applications for improving clinical efficiency. These applications can enhance navigation of longitudinal patient data and provide actionable clinical decision support through natural language dialogue. We discuss the opportunities this presents for streamlining clinician workflows and personalizing care, alongside critical challenges such as data integration complexity, ensuring data quality and RPM data reliability, maintaining patient privacy, validating AI outputs for clinical safety, mitigating bias, and ensuring clinical acceptance. We believe this work represents the first summarization of GenAI techniques for managing clinician data overload due to combined RPM / EHR data complexities. View details
    On the Design of the Binaural Rendering Library for Eclipsa Audio Immersive Audio Container
    Tomasz Rudzki
    Gavin Kearney
    AES 158th Convention of the Audio Engineering Society (2025)
    Preview abstract Immersive Audio Media and Formats (IAMF), also known as Eclipsa Audio, is an open-source audio container developed to accommodate multichannel and scene-based audio formats. Headphone-based delivery of IAMF audio requires efficient binaural rendering. This paper introduces the Open Binaural Renderer (OBR), which is designed to render IAMF audio. It discusses the core rendering algorithm, the binaural filter design process as well as real-time implementation of the renderer in a form of an open-source C++ rendering library. Designed for multi-platform compatibility, the renderer incorporates a novel approach to binaural audio processing, leveraging a combination of spherical harmonic (SH) based virtual listening room model and anechoic binaural filters. Through its design, the IAMF binaural renderer provides a robust solution for delivering high-quality immersive audio across diverse platforms and applications. View details
    Preview abstract The dominant paradigm in image retrieval systems today is to search large databases using global image features, and re-rank those initial results with local image feature matching techniques. This design, dubbed \emph{global-to-local}, stems from the computational cost of local matching approaches, which can only be afforded for a small number of retrieved images. However, emerging efficient local feature search approaches have opened up new possibilities, in particular enabling detailed retrieval at large scale, to find partial matches which are often missed by global feature search. In parallel, global feature-based re-ranking has shown promising results with high computational efficiency. In this work, we leverage these building blocks to introduce a \emph{local-to-global} retrieval paradigm, where efficient local feature search meets effective global feature re-ranking. Critically, we propose a re-ranking method where global features are computed on-the-fly, based on the local feature retrieval similarities. Such re-ranking-only global features, dubbed \emph{similarity embeddings}, leverage multidimensional scaling techniques to create embeddings which respect the local similarities obtained during search, enabling a significant re-ranking boost. Experimentally, we demonstrate unprecedented retrieval performance on the Revisited Oxford and Paris datasets, setting new state-of-the-art results. View details
    Quantum simulation with sum-of-squares spectral amplification
    Robbie King
    Guang Hao Low
    Rolando Somma
    arXiv:2505.01528 (2025)
    Preview abstract We introduce sum-of-squares spectral amplification (SOSSA), a framework for improving quantum simulation algorithms relevant to low-energy problems. SOSSA first represents the Hamiltonian as a sum-of-squares and then applies spectral amplification to amplify the low-energy spectrum. The sum-of-squares representation can be obtained using semidefinite programming. We show that SOSSA can improve the efficiency of traditional methods in several simulation tasks involving low-energy states. Specifically, we provide fast quantum algorithms for energy and phase estimation that improve over the state-of-the-art in both query and gate complexities, complementing recent results on fast time evolution of low-energy states. To further illustrate the power of SOSSA, we apply it to the Sachdev-Ye-Kitaev model, a representative strongly correlated system, where we demonstrate asymptotic speedups by a factor of the square root of the system size. Notably, SOSSA was recently used in [G.H. Low \textit{et al.}, arXiv:2502.15882 (2025)] to achieve state-of-art costs for phase estimation of real-world quantum chemistry systems. View details