Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 11268 publications
    CoDaS: AI Co-Data-Scientist for Biomarker Discovery via Wearable Sensors
    Juro Gottweis
    CJ Park
    Salman Rahman
    Ahmed Metwally
    Hong Yu
    Ivor Rendulic
    Yuzhe Yang
    Petar Sirkovic
    Daniel McDuff
    Shwetak Patel
    Nicolas Stroppa
    Yubin Kim
    Mark Malhotra
    Orson Xu
    Sam Schmidgall
    Tim Althoff
    Elahe Vedadi
    Cynthia Breazeal
    Hae Won Park
    (2026)
    Preview abstract Enterprise service delivery platforms, while vital for HR operations, create significant challenges in managing the risks of Personally Identifiable Information (PII) exposure. The integration of Generative AI offers new efficiencies but also amplifies these risks. Existing solutions—ranging from manual redaction and rule-based Data Loss Prevention (DLP) to inflexible data masking—fail to provide a nuanced, integrated approach. This paper introduces the Dual-Mode Privacy Guard (DMPG), a conceptual framework that establishes a model for Augmented Compliance. The framework provides a "defense-in-depth" strategy built on three pillars: (1) a Zero-Trust AI Foundation leveraging a verifiable, non-retention API gateway to ensure data privacy; (2) a proactive "Guardrail" that uses AI to detect and flag potential PII for human-in-the-loop review; and (3) an on-demand "Tool" that allows users to create securely anonymized data assets. By differentiating between proactive monitoring and reactive utility, the DMPG shifts the compliance paradigm from a manual burden to an AI-assisted process that enhances, rather than replaces, human oversight. This paper details the framework’s platform-agnostic architecture, using Salesforce as a reference implementation, and argues for its novelty as a model for operationalizing privacy principles within modern enterprise systems. View details
    Preview abstract Multimodal large language models (LLMs) integrate and process information from multiple modalities such as text, images, audio, and video, enabling complex tasks such as audio translation and visual question answering. While powerful, this complexity introduces novel vulnerabilities to sophisticated adversarial attacks. This survey paper provides a comprehensive overview of this rapidly expanding field, systematically categorizing attacks that range from manipulations of single modalities (e.g., perturbed images or audio) to those exploiting cross-modal interactions. We overview how these attacks exploit weaknesses in model fusion, attention mechanisms, and representation learning and provided analyses on their potential for real-world consequences. View details
    Preview abstract In modern Kubernetes environments, eBPF (Extended Berkeley Packet Filter) has become the de facto standard for high-performance dataplane enforcement. However, this architecture introduces a complex distributed state problem: the asynchronous synchronization between the Kubernetes control plane (Intent) and the kernel-space BPF maps (Reality). A critical failure mode, termed “Silent Divergence,” occurs when the control plane believes a network policy or identity is applied, but the underlying kernel state is missing or corrupted. In this “Gray Failure” state, standard observability tools—including logs, liveness probes, and agent status checks—report health, while the network silently drops traffic. This paper introduces eBPF-Auditor, a specialized consistency verification framework. Unlike standard agents that rely on event-based reconciliation, eBPF-Auditor performs a periodic “Two-Way State Audit” that mathematically verifies the intersection of Kubernetes Intent and BPF Reality. We demonstrate through fault injection and benchmarks on 5,000 pods that this approach successfully detects state drift with 100% accuracy and negligible sub-millisecond overhead (ms), making it a viable solution for high-frequency runtime verification in production hyperscale clusters. View details
    Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini
    Benjamin Hersh
    Jiahao Ren
    Xingyue Chen
    Robert Timothy Bettridge
    Faraz Faruqi
    Anthony 'Xiang' Chen
    Steve Toh
    Google XR, Google (2026)
    Preview abstract While large language models have accelerated software development through "vibe coding", prototyping intelligent Extended Reality (XR) experiences remains inaccessible due to the friction of complex game engines and low-level sensor integration. To bridge this gap, we contribute XR Blocks, an open-source, modular WebXR framework that abstracts spatial computing complexities into high-level, human-centered primitives. Building upon this foundation, we present Vibe Coding XR, an end-to-end rapid prototyping workflow that leverages LLMs to translate natural language intent directly into functional XR software. Using a web-based interface, creators can transform high-level prompts (e.g., "create a dandelion that reacts to hand") into interactive WebXR applications in under a minute. We provide a preliminary technical evaluation on a pilot dataset (VCXR60) alongside diverse application scenarios highlighting mixed-reality realism, multi-modal interaction, and generative AI integrations. By democratizing spatial software creation, this work empowers practitioners to bypass low-level hurdles and rapidly move from "idea to reality." Code and live demos are available at https://xrblocks.github.io/gem and https://github.com/google/xrblocks. View details
    Preview abstract Object-Counting for remote-sensing (RS) imagery is raising increasing research interest due to its crucial role in a wide and diverse set of applications. While several promising methods for RS object-counting have been proposed, existing methods focus on a closed, pre-defined set of object classes. This limitation necessitates costly re-annotation and model re-training to adapt current approaches for counting of novel objects that have not been seen during training, and severely inhibits their application in dynamic, real-world monitoring scenarios. To address this gap, in this work we propose RS-OVC - an adaptation of existing work for Open Vocabulary Counting (OVC) approach from general computer vision to the RS domain. We show that our model is capable of accurate counting of novel object classes, that are unseen during training, based solely on textual and/or visual conditioning. View details
    Preview abstract Despite advances in high performance computing, accurate numerical simulations of global atmospheric dynamics remain a challenge. The resolution required to fully resolve the vast range scales as well as the strong coupling with—often not fully-understood—physics renders such simulations computationally infeasible over time horizons relevant for long-term climate risk assessment. While data-driven parameterizations have shown some promise of alleviating these obstacles, the scarcity of high-quality training data and their lack of long-term stability typically hinders their ability to capture the risk of rare extreme events. In this work we present a general strategy for training variational (probabilistic) neural network models to non-intrusively correct under-resolved long-time simulations of turbulent climate systems. The approach is based on the paradigm introduced by Barthel Sorensen et al. (2024, https://doi.org/10.1029/2023ms004122) which involves training a post-processing correction operator on under-resolved simulations nudged toward a high-fidelity reference. Our variational framework enables us to learn the dynamics of the underlying system from very little training data and thus drastically improve the extrapolation capabilities of the previous deterministic state-of-the art—even when the statistics of that training data are far from converged. We investigate and compare three recently introduced variational network architectures and illustrate the benefits of our approach on an anisotropic quasi-geostrophic flow. For this prototype model our approach is able to not only accurately capture global statistics, but also the anistropic regional variation and the statistics of multiple extreme event metrics—demonstrating significant improvement over previously introduced deterministic architectures. View details
    Compact Conformal Subgraphs
    Kamesh Munagala
    Aravindan Vijayaraghavan
    ICML (2026)
    Preview abstract Conformal prediction provides rigorous uncertainty guarantees for model outputs but can produce prohibitively large prediction sets in structured domains such as routing, planning, or sequential recommendation. We introduce \emph{graph-based conformal compression}, a framework for constructing compact subgraphs that preserve the statistical validity of conformal prediction while reducing structural complexity. We study a formulation that selects a smallest subgraph capturing a prescribed fraction of conditional probability mass, and reduce to a weighted version of densest $k$-subgraphs in hypergraphs, in the regime where the subgraph has a large fraction of edges. We design efficient approximation algorithms that achieve constant factor coverage and size trade-offs. Our results highlight an algorithmic regime, distinct from classical densest-$k$-subgraph hardness settings, where the problem can be approximated efficiently, bridging conformal prediction with combinatorial graph compression. We finally validate our algorithmic approach on synthetic and real-world instances of trip planning and navigation, showing in each case that our approach handily beats natural baselines. View details
    Preview abstract The rapid expansion of the Internet of Things (IoT) and smart home ecosystems has led to a fragmented landscape of user data management across consumer electronics (CE) such as Smart TVs, gaming consoles, and set-top boxes. Current onboarding processes on these devices are characterized by high friction due to manual data entry and opaque data-sharing practices. This paper introduces the User Data Sharing System (UDSS), a platform-agnostic framework designed to facilitate secure, privacy-first PII (Personally Identifiable Information) exchange between device platforms and third-party applications. Our system implements a Contextual Scope Enforcement (CSE) mechanism that programmatically restricts data exposure based on user intent—specifically distinguishing between Sign-In and Sign-Up workflows. Unlike cloud-anchored identity standards such as FIDO2/WebAuthn, UDSS is designed for shared, device-centric CE environments where persistent user-to-device bind-ing cannot be assumed. We further propose a tiered access model that balances developer needs with regulatory compliance (GDPR/CCPA). A proof-of-concept implementation on a reference ARMv8 Linux-based middleware demonstrates that UDSS reduces user onboarding latency by 65% and measurably reduces PII over-exposure risk through protocol-enforced data minimization. This framework provides a standardized approach to identity management in the heterogeneous CE market. View details
    Preview abstract In "Elephants, Goldfish and the New Golden Age of Software Engineering," the author discusses how AI is changing knowledge work, especially software development. Written from the perspective of April 2026, the article points out that while AI speeds up coding, it can also quickly generate a lot of mistakes and messy code if it isn't carefully managed by human oversight and clear processes. The paper outlines a practical approach to working with AI, broken down into three main sections: * **Using AI as a Tool, Not a Toy:** The author notes that people often get poor results by asking AI to do everything in a single prompt. Instead, users should have back-and-forth conversations with AI to question assumptions, set clear grading rules, and guide the research. The main point is that humans must still provide the final judgment; AI is simply a way to speed up and record that thinking. * **The Elephant-Goldfish Model:** As AI creates more code than humans can easily read, written design documents become more important than the code itself. To keep AI on track, the author suggests a two-part method: * **The Elephant:** A long chat session where the human and AI discuss ideas and write a detailed design document *before* any code is written. This session holds all of the project's background information and decisions. * **The Goldfish:** A brand-new AI chat session with no memory. The human asks this "goldfish" to read the design document. If the goldfish cannot understand the plan based only on that document, the document needs more details. * Only after the design document is clear enough for the goldfish to understand does the human ask the AI to write the code based on those strict instructions. * **Managing AI and the Future of Work:** The author expects that regular employees will soon act like managers, overseeing multiple AI helpers. Because of this, workers need to learn basic management skills, like how to delegate tasks and set clear boundaries. Also, since AI will handle routine chores, humans will need to practice focusing for longer periods to do deeper, harder thinking. Ultimately, a worker's value will come from their planning and decision-making skills, rather than their ability to type code. View details
    Who Controls the Curriculum for AI? The Limits of Participatory Design for Educational AI
    Michael Madaio
    Learning Under Algorithmic Conditions, University of Minnesota Press (2026)
    Preview abstract Participatory design is a long-standing effort to shift control over technology design from technologists to users and communities impacted by technologies. For educational AI, this means involving students, families, teachers, and other stakeholders in shaping the design of AI systems. While promising, in this article, I situate the recent calls for participatory design of educational AI systems within a different historical tradition—that of contests over local control of educational curricula. I argue that approaches that attempt to steer the design and development of educational AI through participatory methods may inadvertently reproduce the history of political contestation of educational curricula, in ways that may privilege the most powerful communities, rather than those inequitably impacted. What might it look like to treat participatory AI design as a site for political contestation? How might these approaches avoid reproducing the same majoritarian tendencies that led to educational inequities in the first place? View details
    From Correctness to Collaboration: A Human-Centered Taxonomy of AI Agent Behavior in Software Engineering
    Sherry Y. Shi
    Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA ’26), ACM, New York, NY, USA (2026)
    Preview abstract The ongoing transition of Large Language Models in software engineering from code generators into autonomous agents requires a shift in how we define and measure success. While models are becoming more capable, the industry lacks a clear understanding of the behavioral norms that make an agent effective in collaborative software development in the enterprise. This work addresses this gap by presenting a taxonomy of desirable agent behaviors, synthesized from 91 sets of user-defined rules for coding agents. We identify four core expectations: Adhere to Standards and Processes, Ensure Code Quality and Reliability, Solve Problems Effectively, and Collaborate with the User. These findings offer a concrete vocabulary for agent behavior, enabling researchers to move beyond correctness-only benchmarks and design evaluations that reflect the realities of professional software development in large enterprises. View details
    The Perfection Paradox: From Architect to Curator in AI-Assisted API Design
    JJ Geewax
    David R Karger
    Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA '26), ACM, Barcelona, Spain, TBD
    Preview abstract Enterprise API design is often bottlenecked by the tension between rapid feature delivery and the rigorous maintenance of usability standards. We present an industrial case study evaluating an AI-assisted design workflow trained on API Improvement Proposals(AIPs). Through a controlled study with 16 industry experts, we compared AI-generated API specifications against human-authored ones. While quantitative results indicated AI superiority in 10 of 11 usability dimensions and an 87% reduction in authoring time, qualitative analysis revealed a paradox: experts frequently misidentified AI work as human (19% accuracy) yet described the designs as unsettlingly “perfect.” We characterize this as a “Perfection Paradox”—where hyper-consistency signals a lack of pragmatic human judgment. We discuss the implications of this perfection paradox, proposing a shift in the human designer’s role from the “drafter” of specifications to the “curator” of AI-generated patterns. View details
    Neural general circulation models for modeling precipitation
    Stephan Hoyer
    Dmitrii Kochkov
    Janni Yuval
    Ian Langmore
    Science Advances (2026)
    Preview abstract Climate models struggle to accurately simulate precipitation, particularly extremes and the diurnal cycle. While hybrid models combining machine learning and physics have emerged with the premise of improving precipitation simulations, none have proven sufficiently skillful or stable enough to outperform existing models in simulating precipitation. Here, we present the first hybrid model that is trained directly on precipitation observations. The model runs at 2.8 degrees resolution and is built on the differentiable NeuralGCM framework. This model is stable for decadal simulations and demonstrates significant improvements over existing GCMs, ERA5 reanalysis, and a Global Cloud-Resolving Model in simulating precipitation. Our approach yields reduced biases, a more realistic precipitation distribution, improved representation of extremes, and a more accurate diurnal cycle. Furthermore, it outperforms the ECMWF ensemble for mid-range weather forecasting. This advance paves the way for more reliable simulations of current climate and for the ability to fully utilize the abundance of existing observations to further improve GCMs. View details
    Preview abstract Online financial scams represent a long-standing and serious threat for which people seek help. We present a study to understand people’s in situ motivations for engaging with scams and the help needs they express before, during, and after encountering a scam. We identify the main emotions scammers exploited (e.g., fear, hope) and characterize how they did so. We examine factors—such as financial insecurity and legal precarity—which elevate people’s risk of engaging with specific scams and experiencing harm. We indicate when people sought help and describe their help-seeking needs and emotions at different stages of the scam. We discuss how these needs could be met through the design of contextually-specific prevention, diagnostic, mitigation, and recovery interventions. View details
    ×