Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10595 publications
Towards Conversational AI for Disease Management
Khaled Saab
David Stutz
Kavita Kulkarni
Sara Mahdavi
Joelle Barral
James Manyika
Ryutaro Tanno
Adam Rodman
arXiv (2025)
Preview abstract
While large language models (LLMs) have shown promise in diagnostic dialogue, their capabilities for effective management reasoning - including disease progression, therapeutic response, and safe medication prescription - remain under-explored. We advance the previously demonstrated diagnostic capabilities of the Articulate Medical Intelligence Explorer (AMIE) through a new LLM-based agentic system optimised for clinical management and dialogue, incorporating reasoning over the evolution of disease and multiple patient visit encounters, response to therapy, and professional competence in medication prescription. To ground its reasoning in authoritative clinical knowledge, AMIE leverages Gemini's long-context capabilities, combining in-context retrieval with structured reasoning to align its output with relevant and up-to-date clinical practice guidelines and drug formularies. In a randomized, blinded virtual Objective Structured Clinical Examination (OSCE) study, AMIE was compared to 21 primary care physicians (PCPs) across 100 multi-visit case scenarios designed to reflect UK NICE Guidance and BMJ Best Practice guidelines. AMIE was non-inferior to PCPs in management reasoning as assessed by specialist physicians and scored better in both preciseness of treatments and investigations, and in its alignment with and grounding of management plans in clinical guidelines. To benchmark medication reasoning, we developed RxQA, a multiple-choice question benchmark derived from two national drug formularies (US, UK) and validated by board-certified pharmacists. While AMIE and PCPs both benefited from the ability to access external drug information, AMIE outperformed PCPs on higher difficulty questions. While further research would be needed before real-world translation, AMIE's strong performance across evaluations marks a significant step towards conversational AI as a tool in disease management.
View details
Preview abstract
We discuss the challenges posed by growing machine learning workloads on
datacenter networks and present how Google’s Jupiter network fabrics effectively support
diverse traffic.
View details
Preview abstract
The dominant paradigm in image retrieval systems today is to search large databases using global image features, and re-rank those initial results with local image feature matching techniques.
This design, dubbed \emph{global-to-local}, stems from the computational cost of local matching approaches, which can only be afforded for a small number of retrieved images.
However, emerging efficient local feature search approaches have opened up new possibilities, in particular enabling detailed retrieval at large scale, to find partial matches which are often missed by global feature search.
In parallel, global feature-based re-ranking has shown promising results with high computational efficiency.
In this work, we leverage these building blocks to introduce a \emph{local-to-global} retrieval paradigm, where efficient local feature search meets effective global feature re-ranking.
Critically, we propose a re-ranking method where global features are computed on-the-fly, based on the local feature retrieval similarities.
Such re-ranking-only global features, dubbed \emph{similarity embeddings}, leverage multidimensional scaling techniques to create embeddings which respect the local similarities obtained during search, enabling a significant re-ranking boost.
Experimentally, we demonstrate unprecedented retrieval performance on the Revisited Oxford and Paris datasets, setting new state-of-the-art results.
View details
Preview abstract
During remote communication, participants often share both digital and physical content, such as product designs, digital assets, and environments, to enhance mutual understanding. Recent advances in augmented communication have facilitated users to swiftly create and share digital 2D copies of physical objects from video feeds into a shared space. However, conventional 2D representations of digital objects limits spatial referencing in immersive environments. To address this, we propose Thing2Reality, an Extended Reality (XR) meeting platform that facilitates spontaneous discussions of both digital and physical items during remote sessions. With Thing2Reality, users can quickly materialize ideas or objects in immersive environments and share them as conditioned multiview renderings or 3D Gaussians. Thing2Reality enables users to interact with remote objects or discuss concepts in a collaborative manner. Our user studies revealed that the ability to interact with and manipulate 3D representations of objects significantly enhances the efficiency of discussions, with the potential to augment discussion of 2D artifacts.
View details
Ransomware over Modern Web Browsers: A Novel Strain and A New Defense Mechanism
Harun Oz
Ahmet Aris
Leonardo Babun
Selcuk Uluagac
Abbas Acar
ACM Transactions on the Web (2025)
Preview abstract
Ransomware is an increasingly prevalent form of malware targeting end-users, governments, and businesses. As it has evolved,
adversaries added new capabilities to their arsenal. Throughout the ransomware evolution, the adversaries propose a next-generation
browser-based ransomware, RøB, that performs its malicious actions via emerging web technologies, File System Access API (FSA) and
WebAssembly (Wasm). RøB uses this API through the victims’ browsers; hence, it does not require the victims to download and install
malicious binaries. We performed extensive evaluations with 3 different OSs, 23 file formats, 29 distinct directories, 5 cloud providers,
and 4 antivirus solutions. Our evaluations show that RøB can encrypt various types of files in the local and cloud-integrated directories,
external storage devices, and network-shared folders of victims. Our experiments also reveal that popular cloud solutions, Box
Individual and Apple iCloud can be severely affected by RøB. Moreover, we conducted tests with commercial antivirus software such
as AVG, Avast, Kaspersky, Malware Bytes that perform sensitive directory and suspicious behavior monitoring against ransomware.
We verified that RøB can evade these antivirus software and encrypt victim files. Moreover, existing ransomware detection solutions
in the literature also cannot be a remedy against RøB due to its distinct features. Therefore, in this paper, we also propose broguard,
a new detection system for RøB-like attacks. broguard monitors the web applications that use the FSA API via function hooking and
uses a machine learning classifier to detect RøB-like attacks in real-time without any file loss. Performance evaluations of broguard
on a comprehensive dataset show that broguard can detect RøB-like browser-based ransomware attacks with over 99% accuracy and
minimal overhead.
View details
Leveraging Per-Example Privacy for Machine Unlearning
Nazanin Mohammadi Sepahvand
Anvith Thudi
Ashmita Bhattacharyya
Nicolas Papernot
Eleni Triantafillou
Daniel M. Roy
Karolina Dziugaite
International Conference on Machine Learning (ICML) (2025)
Preview abstract
This work focuses on developing fine-grained theoretical insights to quantify unlearning difficulty at the level of individual data points for fine-tuning-based unlearning. Unlike other unlearning methods that lack theoretical guarantees for non-convex models, our approach builds on recent advances in differential privacy to provide per-instance guarantees using Rényi divergence. While our theoretical analysis applies to Langevin dynamics, we empirically demonstrate that the derived guarantees—and their trends—continue to hold for fine-tuning, even in the absence of explicit noise. Our results show that per-instance privacy levels computed from training dynamics reliably predict unlearning difficulty, offering a principled and practical way to assess unlearning performance. Furthermore, our method identifies harder-to-unlearn data more effectively than existing heuristics, providing a more precise tool for guiding unlearning strategies. These findings pave the way for adaptive and efficient unlearning methods tailored to the properties of specific data points.
View details
DORA 2025 State of AI-assisted Software Development Report
Derek DeBellis
Matt Beane
Edward Fraser
Ben Good
Eirini Kalliamvakou
Gene Kim
Daniella Villalba
DORA, Google (2025)
Preview abstract
In 2025, the central question for technology leaders is no longer if they should adopt AI, but how to realize its value. DORA’s research includes more than 100 hours of qualitative data and survey responses from nearly 5,000 technology professionals from around the world. The research reveals a critical truth: AI’s primary role in software development is that of an amplifier. It magnifies the strengths of high-performing organizations and the dysfunctions of struggling ones.
View details
Preview abstract
In this article, we describe our human-centered research focused on understanding the role of collaboration and teamwork in productive software development. We describe creation of a logs-based metric to identify collaboration through observable events and a survey-based multi-item scale to assess team functioning.
View details
Preview abstract
Browser fingerprinting is an online tracking technique that is being increasingly adopted for profiling and ad targeting purposes. While prior work has analyzed the prevalence and impact of browser fingerprinting on the Web, they have traditionally relied on large-scale automated crawls. Naturally, these cannot replicate real-human interactions, e.g., solve CAPTCHAs, evade bot detectors, or operate behind login pages and paywalls. This prompts the question as to whether or not the fingerprinting ecosystem is appreciably different in real-world browsing sessions. In this paper, we begin to address this question by designing and conducting a user study aimed at collecting actual telemetry data from real browsing sessions of 30 users.
We find that almost half of the fingerprinting websites identified from real user browsing sessions are missed by equivalent automated crawls. This is mainly due to the inability of automated crawls to identify and visit authentication pages, being blocked by bot detectors, and/or failing to perform user interactions that specifically trigger browser fingerprinting scripts. We also find new fingerprinting vectors that are consistently present in fingerprinting scripts captured by real user browsing sessions yet missing from automated crawls. Finally, we assess the feasibility of collecting fingerprinting training data in a privacy-preserving way. We conclude that private models built on real user browsing sessions can detect browser fingerprinting more effectively than models trained on automated crawls alone, while simultaneously providing strong privacy guarantees to users.
View details
Scalable Private Partition Selection via Adaptive Weighting
Justin Y. Chen
Forty-second International Conference on Machine Learning (2025)
Preview abstract
In the differentially private partition selection problem (a.k.a. private set union, private key discovery), users hold subsets of items from an unbounded universe. The goal is to output as many items as possible from the union of the users' sets while maintaining user-level differential privacy. Solutions to this problem are a core building block for many privacy-preserving ML applications including vocabulary extraction in a private corpus, computing statistics over categorical data and learning embeddings over user-provided items.
We propose an algorithm for this problem, MaxAdaptiveDegree(MAD), which adaptively reroutes weight from items with weight far above the threshold needed for privacy to items with smaller weight, thereby increasing the probability that less frequent items are output. Our algorithm can be efficiently implemented in massively parallel computation systems allowing scalability to very large datasets. We prove that our algorithm stochastically dominates the standard parallel algorithm for this problem. We also develop a two-round version of our algorithm, MAD2R, where results of the computation in the first round are used to bias the weighting in the second round to maximize the number of items output. In experiments, our algorithms provide the best results across the board among parallel algorithms and scale to datasets with hundreds of billions of items, up to three orders of magnitude larger than those analyzed by prior sequential algorithms.
View details
Shadow Hamiltonian Simulation
Rolando Somma
Robbie King
Tom O'Brien
Nature Communications, 16 (2025), pp. 2690
Preview abstract
Simulating quantum dynamics is one of the most important applications of quantum computers. Traditional approaches for quantum simulation involve preparing the full evolved state of the system and then measuring some physical quantity. Here, we present a different and novel approach to quantum simulation that uses a compressed quantum state that we call the "shadow state". The amplitudes of this shadow state are proportional to the time-dependent expectations of a specific set of operators of interest, and it evolves according to its own Schrödinger equation. This evolution can be simulated on a quantum computer efficiently under broad conditions. Applications of this approach to quantum simulation problems include simulating the dynamics of exponentially large systems of free fermions or free bosons, the latter example recovering a recent algorithm for simulating exponentially many classical harmonic oscillators. These simulations are hard for classical methods and also for traditional quantum approaches, as preparing the full states would require exponential resources. Shadow Hamiltonian simulation can also be extended to simulate expectations of more complex operators such as two-time correlators or Green's functions, and to study the evolution of operators themselves in the Heisenberg picture.
View details
Reasoning-SQL: Reinforcement Learning with Partial Rewards for Reasoning-Enhanced Text-to-SQL
Mohammadreza Pourreza
Shayan Talaei
Hailong Li
Azalia Mirhoseini
Amin Saberi
Conference on Language Modeling (COLM) (2025) (to appear)
Preview abstract
Text-to-SQL is a challenging task involving multiple reasoning-intensive subtasks, including natural language understanding, database schema comprehension, and precise SQL query formulation. Existing approaches often rely on handcrafted reasoning paths with inductive biases that can limit their overall effectiveness. Motivated by the recent success of reasoning-enhanced models such as DeepSeek R1 and OpenAI o1, which effectively leverage reward-driven self-exploration to enhance reasoning capabilities and generalization, we propose a novel set of partial rewards tailored specifically for the Text-to-SQL task. Our reward set includes schema-linking, AI feedback, n-gram similarity, and syntax check, explicitly designed to address the reward sparsity issue prevalent in reinforcement learning (RL). Leveraging group relative policy optimization (GRPO), our approach explicitly encourages large language models (LLMs) to develop intrinsic reasoning skills necessary for accurate SQL query generation. With models of different sizes, we demonstrate that RL-only training with our proposed rewards consistently achieves higher accuracy and superior generalization compared to supervised fine-tuning (SFT). Remarkably, our RL-trained 14B-parameter model significantly outperforms larger proprietary models, e.g. o3-mini by 4% and Gemini-1.5-Pro-002 by 3% on the BIRD benchmark. These highlight the efficacy of our proposed RL-training framework with partial rewards for enhancing both accuracy and reasoning capabilities in Text-to-SQL tasks.
View details
Preview abstract
Virtual hand representation in Head-Mounted Displays (HMDs) offers immersive and intuitive interactions in Virtual Reality (VR). However, current hand tracking algorithms are prone to errors, which can disrupt the user experience and hinder task performance. This paper presents a novel method for providing users with visual feedback when the quality of hand tracking decreases. Our approach employs a notification modal that warns users of potential failures. We identified three common hand tracking failure scenarios and evaluated the effectiveness of our method in two distinct VR tasks: object manipulation and complex assembly tasks. Results show that our early warning system reduces task completion time, lowers hand-tracking failures by up to 83%, decreases errors, improves system usability, and reduces cognitive load. This work contributes to the development of more robust and user-friendly VR HMD applications by enhancing hand tracking reliability, usability, and workload.
View details
Preview abstract
The quest to identify quantum advantages, where quantum physics truly outperforms classical physics, lies at the heart of quantum technology. While quantum devices promise extraordinary capabilities, from exponential computational speedups to unprecedented measurement precision, distinguishing genuine advantages from mere illusions remains a formidable challenge. In this endeavor, quantum theorists are like prophets trying to foretell a future where quantum technologies reign supreme. Yet, the boundary between visionary insight and unfounded fantasy is perilously thin. In this perspective, we explore the properties defining an ideal quantum advantage and examine our mathematical tools for navigating the vast world of quantum advantages across computation, learning, sensing, communication, and beyond. We show that some quantum advantages are inherently unpredictable using classical resources alone, suggesting a landscape far richer than what we can currently foresee. While mathematical rigor remains our indispensable guide in this exploration, the ultimate power of quantum technologies may emerge from the quantum advantages we cannot yet conceive.
View details
Depth-Width tradeoffs in Algorithmic Reasoning of Graph Tasks with Transformers
Gilad Yehudai
Maya Bechler-Speicher
Orr Fischer
Ran Gilad-Bachrach
2025
Preview abstract
In particular, they can be used to solve complex algorithmic problems, including graph-based tasks. In such algorithmic tasks a key question is what is the minimal size of a transformer that can implement a task. Recent work has begun to explore this problem for graph-based tasks, showing that for sub-linear embedding dimension (i.e., model width) logarithmic depth suffices. However, an open question, which we address here, is what happens if width is allowed to grow linearly. Here we analyze this setting, and provide the surprising result that with linear width, constant depth suffices for solving a host of graph-based problems. This suggests that a moderate increase in width can allow much shallower models, which are advantageous in terms of inference time. For other problems, we show that quadratic width is required. Our results demonstrate the complex and intriguing landscape of transformer implementations of graph-based algorithms. We support our theoretical results with empirical evaluations.
View details