Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10270 publications
Triaging mammography with artificial intelligence: an implementation study
Samantha Winter
Atilla Kiraly
Scott Mayer McKinney
Jie Yang
Krish Eswaran
Shravya Shetty
Timo Kohlberger
Stacey Caron
Fereshteh Mahvar
David Melnick
Sonya Bhole
Arnav Agharwal
David V. Schacht
Dipti Gupta
Basil Mustafa
Alejandra Maciel
Martha Sevenich
Sarah M. Friedewald
Mozziyar Etemadi
Sunny Jansen
Shiro Kadowaki
Gavin Duggan
Rubin Zhang
Luca Speroni
Breast Cancer Research and Treatment (2025)
Preview abstract
Purpose
Many breast centers are unable to provide immediate results at the time of screening mammography which results in delayed patient care. Implementing artificial intelligence (AI) could identify patients who may have breast cancer and accelerate the time to diagnostic imaging and biopsy diagnosis.
Methods
In this prospective randomized, unblinded, controlled implementation study we enrolled 1000 screening participants between March 2021 and May 2022. The experimental group used an AI system to prioritize a subset of cases for same-visit radiologist evaluation, and same-visit diagnostic workup if necessary. The control group followed the standard of care. The primary operational endpoints were time to additional imaging (TA) and time to biopsy diagnosis (TB).
Results
The final cohort included 463 experimental and 392 control participants. The one-sided Mann-Whitney U test was employed for analysis of TA and TB. In the control group, the TA was 25.6 days [95% CI 22.0–29.9] and TB was 55.9 days [95% CI 45.5–69.6]. In comparison, the experimental group's mean TA was reduced by 25% (6.4 fewer days [one-sided 95% CI > 0.3], p<0.001) and mean TB was reduced by 30% (16.8 fewer days; 95% CI > 5.1], p=0.003). The time reduction was more pronounced for AI-prioritized participants in the experimental group. All participants eventually diagnosed with breast cancer were prioritized by the AI.
Conclusions
Implementing AI prioritization can accelerate care timelines for patients requiring additional workup, while maintaining the efficiency of delayed interpretation for most participants. Reducing diagnostic delays could contribute to improved patient adherence, decreased anxiety and addressing disparities in access to timely care.
View details
Linear Elastic Caching via Ski Rental
Todd Lipcon
The biennial Conference on Innovative Data Systems Research (2025)
Preview abstract
In this work we study the Linear Elastic Caching problem, where the goal is to minimize the total cost of a cache inclusive of not just its misses, but also its memory footprint integrated over time. We demonstrate a theoretical connection to the classic ski rental problem and propose a practical algorithm that combines online caching algorithms with ski rental policies. We also introduce a lightweight machine learning-based algorithm for ski rental that is optimized for production workloads and is easy to integrate within existing database systems. Evaluations on both production workloads in Google Spanner and publicly available traces show that the proposed elastic caching approach can significantly reduce the total cache cost compared to traditional fixed-size cache policies.
View details
H2E: Hand, Head, Eye: A Multimodal Cascade of Natural Inputs
Ken Pfeuffer
Hans Gellersen
Khushman Patel
IEEE VR (2025)
Preview abstract
Eye-based interaction techniques for extended reality, such as gaze and pinch, are simple to use however suffer from input precision issues. We present H2E, a fine and coarse-grained pointing technique that cascades Hand, Head, and Eye inputs. As users initiate a pinch gesture, a cursor appears at the gaze point that can be dragged by head pointing before pinch confirmation. This has the potential advantage that it can add a precision component without changing the semantics of the technique. In this paper, we describe the design and implementation of the technique. Furthermore, we present an evaluation of our method in a Fitts-based user study, exploring the speed-accuracy trade-offs against a gaze and pinch interaction baseline.
View details
From Few to Many: Self-Improving Many-Shot Reasoners Through Iterative Optimization and Generation
Ke Jiang
International Conference on Learning Representations (ICLR) (2025) (to appear)
Preview abstract
Recent advances in long-context large language models (LLMs) have led to the emerging paradigm of many-shot in-context learning (ICL), where it is observed that scaling many more demonstrating examples beyond the conventional few-shot setup in the context can lead to performance benefits. However, despite its promise, it is unclear what aspects dominate the benefits and whether simply scaling to more examples is the most effective way of improving many-shot ICL. In this work, we first provide an analysis of the factors driving many-shot ICL, and we find that 1) many-shot performance can still be attributed to often a few disproportionately influential examples and 2) identifying such influential examples ("optimize") and using them as demonstrations to regenerate new examples ("generate") can lead to further improvements. Inspired by the findings, we propose BRIDGE, an algorithm that alternates between the optimize step with Bayesian optimization to discover the influential sets of examples and the generate step to reuse this set to expand the reasoning paths of the examples back to the many-shot regime automatically. On Gemini, Claude, and Mistral LLMs of different sizes, we show that BRIDGE to significant improvements across a diverse set of tasks, including symbolic reasoning, numerical reasoning, and code generation.
View details
Avoid global outages by partitioning cloud applications to reduce blast radius
https://cloud.google.com/ (2025)
Preview abstract
Cloud application development faces the inherent challenge of balancing rapid innovation with high availability. This blog post details how Google Workspace's Site Reliability Engineering team addresses this conflict by implementing vertical partitioning of serving stacks. By isolating application servers and storage into distinct partitions, the "blast radius" of code changes and updates is significantly reduced, minimizing the risk of global outages. This approach, which complements canary deployments, enhances service availability, provides flexibility for experimentation, and facilitates data localization. While challenges such as data model complexities and inter-service partition misalignment exist, the benefits of improved reliability and controlled deployments make partitioning a crucial strategy for maintaining robust cloud applications
View details
Preview abstract
We present a scalable and agile approach for ads image content moderation at Google, addressing the challenges of moderating massive volumes of ads with diverse content and evolving policies. The proposed method utilizes human-curated textual descriptions and cross-modal text-image co-embeddings to enable zero-shot classification of policy violating ads images, bypassing the need for extensive supervised training data and human labeling. By leveraging large language models (LLMs) and user expertise, the system generates and refines a comprehensive set of textual descriptions representing policy guidelines. During inference, co-embedding similarity between incoming images and the textual descriptions serves as a reliable signal for policy violation detection, enabling efficient and adaptable ads content moderation. Evaluation results demonstrate the efficacy of this framework in significantly boosting the detection of policy violating content.
View details
Beyond Touchscreens: Designing for Co-Occurring Accessibility Needs
Melissa Barnhart Wantland
Mai Kobori
Universal Access in Human-Computer Interaction, Springer-Verlag (2025) (to appear)
Preview abstract
Today’s smartphone interactions are typically designed with one primary preset, accompanied by customization settings that can be manually adjusted. To promote the creation of contextually aware experiences, researchers have highlighted the factors that influence mobile device usage in the ability-based design framework. This paper expands upon existing frameworks and contributes to an empirical understanding of smartphone accessibility. Through a 10-day longitudinal diary study and video interview with 24 individuals who do and do not identify as having a disability, the research also illustrates the reactions of reattempt, adaptation, and avoidance, which were used in response to a lack of smartphone accessibility. Despite experiencing scenarios where accessibility settings could be leveraged, 20 out of 24 participants did not use accessibility settings on their smartphone. A total of 12 out of 24 participants tried accessibility settings on their smartphones, however identifying accessibility was not for them. This work highlights the need to shift current design practices to better serve the accessibility community.
View details
Governance, Risk and Compliance (GRC) Engineering: Data, AI, Automation, and the Future of Compliance to Audits
Eric Zhang
Ruchi Khurana
Vikram Khare
2025
Preview abstract
In today's rapidly evolving business landscape, Governance, Risk, and Compliance (GRC) leaders in large, complex organizations face unprecedented challenges. The cloud has revolutionized how businesses operate, offering unprecedented scalability, flexibility, cost-efficiency, additional security and resilience. However, this transformation also presents new challenges for GRC professionals. In a cloud-native world, where applications are built and deployed in dynamic, distributed environments, traditional GRC on-prem approaches, manual processes and spreadsheets struggle to keep pace. The key to success lies in embracing a data-driven GRC strategy that leverages the power of the cloud to enhance agility, visibility, and resilience.
View details
Improving simulation-based origin-destination demand calibration using sample segment counts data
Yechen Li
Arwa Alanqary
The 12th Triennial Symposium on Transportation Analysis conference (TRISTAN XII), Okinawa, Japan (2025) (to appear)
Preview abstract
This paper introduces a novel approach to demand estimation that utilizes partial observations of segment-level track counts. Building on established simulation-based demand estimation methods, we present a modified formulation that integrates sample track counts as a regularization term. This approach effectively addresses the underdetermination challenge in demand estimation, moving beyond the conventional reliance on a prior OD matrix. The proposed formulation aims to preserve the distribution of the observed track counts while optimizing the demand to align with observed path-level travel times. We tested this approach on Seattle's highway network with various congestion levels. Our findings reveal significant enhancements in the solution quality, particularly in accurately recovering ground truth demand patterns at both the OD and segment levels.
View details
Preview abstract
We study the existence of almost fair and near-optimal solutions to a routing problem as defined in the seminal work of Rosenthal. We focus on the setting where multiple alternative routes are available for each potential request (which corresponds to a potential user of the network). This model captures a collection of diverse applications such as packet routing in communication networks, routing in road networks with multiple alternative routes, and the economics of transportation of goods.
Our recommended routes have provable guarantees in terms of both the total cost and fairness concepts such as approximate envy-freeness. We employ and appropriately combine tools from algorithmic game theory and fair division. Our results apply on two distinct models: the splittable case where the request is split among the selected paths (e.g., routing a fleet of trucks) and the unsplittable case where the request is assigned to one of its designated paths (e.g., a single user request). Finally, we conduct an empirical analysis to test the performance of our approach against simpler baselines using the real world road network of New York City.
View details
Online Bidding under RoS Constraints without Knowing the Value
Sushant Vijayan
Swati Padmanabhan
The Web Conference (2025)
Preview abstract
We consider the problem of auto-bidding in online advertising from the perspective of a single advertiser. The goal of the advertiser is to maximize their value under the Return-on-Spend (RoS) constraint, with performance measured in terms of \emph{regret} against the optimal offline solution that knows all queries a priori. Importantly, the value of the item is \textit{unknown} to the bidder ahead of time. The goal of the bidder is to quickly identify the optimal bid, while simultaneously satisfying budget and RoS constraints. Using a simple UCB-style algorithm, we provide the first result which achieves optimal regret and constraint violation for this problem.
View details
Scaling Laws for Downstream Task Performance in Machine Translation
Hussein Hazimeh
Natalia Ponomareva
Sanmi Koyejo
International Conference on Learning Representations (ICLR) (2025) (to appear)
Preview abstract
Scaling laws provide important insights that can guide the design of large language models (LLMs). Existing work has primarily focused on studying scaling laws for pretraining (upstream) loss. However, in transfer learning settings, in which LLMs are pretrained on an unsupervised dataset and then finetuned on a downstream task, we often also care about the downstream performance. In this work, we study the scaling behavior in a transfer learning setting, where LLMs are finetuned for machine translation tasks. Specifically, we investigate how the choice of the \emph{pretraining} data and its size affect downstream performance (translation quality) as judged by: downstream cross-entropy and translation quality metrics such as BLEU and COMET scores. Our experiments indicate that the size of the finetuning dataset and the distribution alignment between the pretraining and downstream data significantly influence the scaling behavior. With sufficient alignment, both downstream cross-entropy and translation quality scores improve monotonically with more pretraining data. In such cases, we show that it is possible to predict the downstream translation quality metrics with good accuracy using a log-law. However, there are cases where moderate misalignment causes the downstream translation scores to fluctuate or get worse with more pretraining, whereas downstream cross-entropy monotonically improves. By analyzing these, we provide new practical insights for choosing appropriate pretraining data.
View details
PreFix: Optimizing the Performance of Heap-Intensive Applications
Chaitanya Mamatha Ananda
Rajiv Gupta
Han Shen
CGO 2025: International Symposium on Code Generation and Optimization, Las Vegas, NV, USA (to appear)
Preview abstract
Analyses of heap-intensive applications show that a small fraction of heap objects account for the majority of heap accesses and data cache misses. Prior works like HDS and HALO have shown that allocating hot objects in separate memory regions can improve spatial locality leading to better application performance. However, these techniques are constrained in two primary ways, limiting their gains. First, these techniques have Imperfect Separation, polluting the hot memory region with several cold objects. Second, reordering of objects across allocations is not possible as the original object allocation order is preserved. This paper presents a novel technique that achieves near perfect separation of hot objects via a new context mechanism that efficiently identifies hot objects with high precision. This technique, named PreFix, is based upon Preallocating memory for a Fixed small number of hot objects. The program, guided by profiles, is instrumented to compute context information derived from
dynamic object identifiers, that precisely identifies hot object allocations that are then placed at predetermined locations in the preallocated memory. The preallocated memory region for hot objects provides the flexibility to reorder objects across allocations and allows colocation of objects that are part of a hot data stream (HDS), improving spatial locality. The runtime overhead of identifying hot objects is not significant as this optimization is only focused on a small number of static hot allocation sites and dynamic hot objects. While there is an increase in the program’s memory foot-print, it is manageable and can be controlled by limiting the size of the preallocated memory. In addition, PreFix incorporates an object recycling optimization that reuses the same preallocated space to store different objects whose lifetimes are not expected to overlap. Our experiments with 13 heap-intensive applications yields reductions in execution times ranging from 2.77% to 74%. On average PreFix reduces execution time by 21.7% compared to 7.3% by HDS and 14% by HALO. This is due to PreFix’s precision in hot object identification, hot object colocation, and low runtime overhead.
View details
Context is Key for Agent Security
Eugene Bagdasaryan
Lillian Tsai
arXiv (2025)
Preview abstract
Judging the safety of an action, whether taken by a human or a system, must take into account the context in which the action takes place. For example, deleting an email from a user's mailbox may or may not be appropriate depending on the email's content, the user's goals, or even available space. Systems today that make these judgements---providing security against harmful or inappropriate actions---rely on manually-crafted policies or user confirmation for each relevant context. With the upcoming deployment of systems like generalist agents, we argue that we must rethink security designs to adapt to the scale of contexts and capabilities of these systems. As a first step, this paper explores contextual security in the domain of agents and proposes contextual security for agents (Conseca), a framework to generate just-in-time, contextual, and human-verifiable security policies.
View details