Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 11268 publications
    Preview abstract This framework manages AI agents by establishing behavioral boundaries and a persistent identity. It uses a multi-layered stack, combining safety rules with brand guidelines, to shape an agent's reasoning. Features include authority decay to limit power if confidence drops and memory segmentation to prevent data tampering. Centralized oversight ensures these digital representatives remain aligned with company policies through continuous monitoring and testing. View details
    TDXRay: Microarchitectural Side-Channel Analysis of Intel TDX for Real-World Workloads
    Tristan Hornetz
    Hosein Yavarzadeh
    Albert Cheu
    Adria Gascon
    Lukas Gerlach
    Michael Schwarz
    Ruiyi Zhang
    IEEE Security & Privacy (S&P) (2026)
    Preview abstract Confidential computing with VM-based trusted execution environments (TEEs) promises to protect code and data from a privileged cloud operator, enabling privacy-preserving workloads ranging from medical analytics to AI inference. However, most deployments exclude microarchitectural side channels from their threat model, shifting the burden to application developers who lack practical, general-purpose tools to assess (let alone mitigate) leakage. This gap is problematic: host-observable effects such as page-fault patterns, shared-cache contention, performance-counter surrogates (where available), and fine-grained timing primitives (e.g., MWAIT) can still reveal high-level secrets even when memory remains encrypted. We present TDXRay, an open-source framework that systematizes the evaluation of side-channel risk for confidential VMs in Intel TDX. TDXRay exposes unified interfaces to exercise and measure several attack primitives—including controlled-channel attacks via page tables, cache-based contention/occupancy probes, performance-counter–derived signals, and timing channels—against unmodified guest workloads. Using TDXRay, we build two end-to-end case studies: (1) a classic AES T-table attack in which a malicious hypervisor recovers the secret key from access-pattern leakage, and (2) an LLaMA inference attack in which the host infers user prompts by monitoring memory accesses during tokenization and embedding lookups. Across both, we show that a host with no direct access to guest memory can reconstruct sensitive information by observing only externalized microarchitectural signals. View details
    Phoenix: Rowhammer Attacks on DDR5 with Self-Correcting Synchronization
    Michele Marazzi
    Kaveh Razavi
    Salman Qazi
    Diego Meyer
    Patrick Jattke
    IEEE Security & Privacy (S&P) (2026)
    An Empirical Study of Tablet Ergonomics: The Interplay of Temperature, Orientation, and Use Behaviors
    Carmen Van Ommen
    Mikki Phan
    Arun Raghupathy
    Daniel Huynh
    Barbara Chaparro
    Ergonomics in Design: The Quarterly of Human Factors Applications (2026)
    Preview abstract To balance computational performance with thermal comfort, this study explores a consolidated hotspot architecture at the top center of a tablet. We tested hotspot (39°C, 43°C, 45°C, 47°C) and ambient temperatures (25°C, 35°C) with 60 participants, measuring perception, action likelihood, and expectation. The hotspot was observed away from high contact areas, with 43°C identified as the threshold for significant discomfort. Discomfort increased with portrait mode use and higher device and ambient temperatures, while active use duration influenced acceptability. The findings underscore the importance of thermal mapping and contextual sensing, with direct applications for software throttling thresholds of coated aluminum enclosures. View details
    Preview abstract Multimodal large language models (LLMs) integrate and process information from multiple modalities such as text, images, audio, and video, enabling complex tasks such as audio translation and visual question answering. While powerful, this complexity introduces novel vulnerabilities to sophisticated adversarial attacks. This survey paper provides a comprehensive overview of this rapidly expanding field, systematically categorizing attacks that range from manipulations of single modalities (e.g., perturbed images or audio) to those exploiting cross-modal interactions. We overview how these attacks exploit weaknesses in model fusion, attention mechanisms, and representation learning and provided analyses on their potential for real-world consequences. View details
    Pixel Watch: Robust Heart Rate Sensing from Multipath PPG and On-Device Deep Learning Trained on 10,000 hours of Free-Living and Fitness Data
    Megan Walker
    Yojan Patel
    Shyam Tailor
    Matt Wimmer
    Brennan Garrett
    Dan Howe
    Abhinuv Pitale
    Hamed Vavadi
    Tien Le
    Steve Diamond
    Oleksiy Vyalov
    Vik Sharma
    Pete Richards
    Tracy Giest
    Erika Siegel
    Tuan Phan
    Sam Mravca
    Derrick Vickers
    Benjamin Stone
    Katarina Vukosavljević
    Justin Phillips
    YongSuk Cho
    Stefanie Hollidge
    Antony Siahaan
    Soren Brage
    Shwetak Patel
    Robert Harle
    IEEE Sensors Letters (2026)
    Preview abstract The Pixel Watch 2 (PW2) is the first Google smartwatch to combine multipath photoplethysmography (PPG) with deep learning-based heart rate inference, designed to significantly improve sensing accuracy during motion-heavy activities. The device processes 10 optical channels using an on-device, 15-layer temporally dilated convolutional neural network (~300K parameters) to yield a 1 Hz heart rate output. Crucial to this model's performance was its training on a massive dataset comprising 10,000 hours of data from 962 participants, curated from a broader corpus of controlled and free-living activities. We evaluated the PW2's sensing performance across two independent validation sets: an in-house fitness dataset (229 participants, 250 hours) and an external free-living dataset (27 participants, 1000+ hours). The system achieved 95% Limits of Agreement of -10.34 to 8.66 BPM during exercise and -6.57 to 7.48 BPM during free-living activities, demonstrating substantially tighter error margins than previous Google devices. Finally, we discuss key design lessons, emphasizing that large-scale deep learning was instrumental in fully leveraging multipath PPG hardware over traditional signal processing approaches. View details
    Preview abstract In modern Kubernetes environments, eBPF (Extended Berkeley Packet Filter) has become the de facto standard for high-performance dataplane enforcement. However, this architecture introduces a complex distributed state problem: the asynchronous synchronization between the Kubernetes control plane (Intent) and the kernel-space BPF maps (Reality). A critical failure mode, termed “Silent Divergence,” occurs when the control plane believes a network policy or identity is applied, but the underlying kernel state is missing or corrupted. In this “Gray Failure” state, standard observability tools—including logs, liveness probes, and agent status checks—report health, while the network silently drops traffic. This paper introduces eBPF-Auditor, a specialized consistency verification framework. Unlike standard agents that rely on event-based reconciliation, eBPF-Auditor performs a periodic “Two-Way State Audit” that mathematically verifies the intersection of Kubernetes Intent and BPF Reality. We demonstrate through fault injection and benchmarks on 5,000 pods that this approach successfully detects state drift with 100% accuracy and negligible sub-millisecond overhead (ms), making it a viable solution for high-frequency runtime verification in production hyperscale clusters. View details
    Preview abstract This disclosure describes systems and methods for a multi-agent framework that can automate and scale cognitive work. The framework can, for example, use a cognitive assembly line of specialized computational agents to perform tasks such as research and drafting. A beneficial component could be an adversarial review panel (ARP), which is a multi-agent review system where distinct agent personas critique a generated draft from varied perspectives. The structured feedback from the ARP can be used to automatically iterate on and refine the work product. This approach can improve the intellectual rigor of generated content and reduce the time required for production, which may allow human operators to focus on activities such as strategic oversight and final validation. View details
    Approximate vs Precise: An experiment in what impacts user choice when apps request location access
    Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA ’26), April 13–17, 2026, Barcelona, Spain (2026)
    Preview abstract User location data is highly sensitive, yet commonly requested by mobile apps for both core functionality and monetization. To improve user privacy, the major mobile platforms, Android and iOS, made changes so that when apps request precise location access, users can choose to share only their approximate location. However, the platforms have diverging interfaces: Android offers a side-by-side choice and iOS offers a corner toggle. This study evaluates which factors impact users’ choices when apps request location access via a randomized controlled experiment with 2579 US Android users. We tested the impact of app type, whether a reason for the request was provided, and the quality and content of the reason, including monetization. We do not find the reasons have an effect. Instead, we find users’ choices are impacted by app type and user demographics. We find that when users are given a side-by-side choice to allow approximate versus precise location access, they make reasonable choices. Of users who allowed access, the vast majority (90.7%) chose precise for a rideshare app versus the majority (71.3%) chose approximate for a local news app. Concerningly, the majority also allowed location access to a wallpaper app, and older users were significantly more likely to allow apps precise location access. We conclude by discussing implications for app platforms and future work. View details
    Preview abstract The remarkable success of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) in 2D computer vision has catalyzed significant research into their adaptation for the complex domain of 3D analysis. However, a fundamental dichotomy exists between the regular, dense grid of 2D images and the irregular, sparse nature of 3D data formats such as point clouds and meshes. This paper provides a comprehensive survey and a novel intellectual framework for navigating this burgeoning field. Our core contribution is a new taxonomy that organizes adaptation strategies into three distinct families: (1) Data-centric methods, which project 3D data into 2D formats to leverage off-the-shelf 2D models; (2) Architecture-centric methods, which design intrinsic network modules to directly process 3D data; and (3) Hybrid methods, which synergistically combine pre-trained 2D features with 3D modeling processing pipelines to benefit from both rich visual priors and explicit geometric reasoning. Through this taxonomic lens, we conduct a systematic review and qualitative synthesis of the field. We illuminate the fundamental trade-offs between these families concerning computational complexity, reliance on large-scale pre-training, and the preservation of geometric inductive biases. Based on this analysis, we identify and discuss critical open challenges and chart promising future research directions, including the development of 3D foundation models, advancements in self-supervised learning for geometric data, and the deeper integration of multi-modal signals. This survey serves as an essential resource and roadmap for researchers seeking to understand and advance the state-of-the-art in 3D computer vision. View details
    Preview abstract In "Elephants, Goldfish and the New Golden Age of Software Engineering," the author discusses how AI is changing knowledge work, especially software development. Written from the perspective of April 2026, the article points out that while AI speeds up coding, it can also quickly generate a lot of mistakes and messy code if it isn't carefully managed by human oversight and clear processes. The paper outlines a practical approach to working with AI, broken down into three main sections: * **Using AI as a Tool, Not a Toy:** The author notes that people often get poor results by asking AI to do everything in a single prompt. Instead, users should have back-and-forth conversations with AI to question assumptions, set clear grading rules, and guide the research. The main point is that humans must still provide the final judgment; AI is simply a way to speed up and record that thinking. * **The Elephant-Goldfish Model:** As AI creates more code than humans can easily read, written design documents become more important than the code itself. To keep AI on track, the author suggests a two-part method: * **The Elephant:** A long chat session where the human and AI discuss ideas and write a detailed design document *before* any code is written. This session holds all of the project's background information and decisions. * **The Goldfish:** A brand-new AI chat session with no memory. The human asks this "goldfish" to read the design document. If the goldfish cannot understand the plan based only on that document, the document needs more details. * Only after the design document is clear enough for the goldfish to understand does the human ask the AI to write the code based on those strict instructions. * **Managing AI and the Future of Work:** The author expects that regular employees will soon act like managers, overseeing multiple AI helpers. Because of this, workers need to learn basic management skills, like how to delegate tasks and set clear boundaries. Also, since AI will handle routine chores, humans will need to practice focusing for longer periods to do deeper, harder thinking. Ultimately, a worker's value will come from their planning and decision-making skills, rather than their ability to type code. View details
    Preview abstract Despite advances in high performance computing, accurate numerical simulations of global atmospheric dynamics remain a challenge. The resolution required to fully resolve the vast range scales as well as the strong coupling with—often not fully-understood—physics renders such simulations computationally infeasible over time horizons relevant for long-term climate risk assessment. While data-driven parameterizations have shown some promise of alleviating these obstacles, the scarcity of high-quality training data and their lack of long-term stability typically hinders their ability to capture the risk of rare extreme events. In this work we present a general strategy for training variational (probabilistic) neural network models to non-intrusively correct under-resolved long-time simulations of turbulent climate systems. The approach is based on the paradigm introduced by Barthel Sorensen et al. (2024, https://doi.org/10.1029/2023ms004122) which involves training a post-processing correction operator on under-resolved simulations nudged toward a high-fidelity reference. Our variational framework enables us to learn the dynamics of the underlying system from very little training data and thus drastically improve the extrapolation capabilities of the previous deterministic state-of-the art—even when the statistics of that training data are far from converged. We investigate and compare three recently introduced variational network architectures and illustrate the benefits of our approach on an anisotropic quasi-geostrophic flow. For this prototype model our approach is able to not only accurately capture global statistics, but also the anistropic regional variation and the statistics of multiple extreme event metrics—demonstrating significant improvement over previously introduced deterministic architectures. View details
    Preview abstract The current pursuit of robust machine intelligence is largely predicated on a substrate independent, computational functionalist view of cognition, where sufficiently complex computational processing is expected to eventually yield generalized reasoning. This paper explores the ontological distinctions between these computational frameworks and biological cognition, specifically how these differences impact the capacity for semantic understanding. By analyzing phenomena such as the "reversal curse" where models fail to generalize the symmetry in identity relations (A=B implies B=A), and performance on novel reasoning benchmarks (e.g., ARC-AGI), this paper examines whether current model limitations are transient artifacts of scale or indicative of a distinct architectural category. Integrating Stevan Harnad’s “symbol grounding problem” with Evan Thompson’s biological model of “intrinsic normativity,” I investigate whether robust general intelligence might require sense-making: a process distinct from information processing, whereby an agent’s internal states are causally coupled with its environment via survival or system-wide stakes which grounds symbols in meaning. Current Large Language Models (LLMs) appear to lack this intrinsic normativity, and consequently may operate primarily as epistemic instruments rather than ontic agents. By introducing the concept of “ontic grounding”, this paper presents a potential framework for distinguishing between the simulation of reasoning and true understanding, which could have implications for AI safety and governance. View details
    Preview abstract This talk addresses the challenges of operating Google's monitoring systems at scale, handling terabytes of telemetry data and preventing overload from diverse workloads. We'll explore how Google's internal client library and Monarch, its planet-scale time-series database, work together for cost-effective data collection. Key principles include a distributed push model, dynamic client-side data reduction, centralized retention, and periodic metric analysis. The session will then bridge these concepts to the open-source world, discussing our work with OpenTelemetry's OpAMP protocol to achieve similar scalable and efficient telemetry collection. Attendees will gain insights into adapting these principles for cost savings and learn about our collaboration with the OpAMP SIG to benefit the broader community. View details
    Fair Allocation of Indivisible Goods with Variable Groups
    Paul Golz
    Warut Suksompong
    Ayumi Igarashi
    AAAI (2026)
    Preview abstract We study the fair allocation of indivisible goods with variable groups. In this model, the goal is to partition the agents into groups of given sizes and allocate the goods to the groups in a fair manner. We show that for any number of groups and corresponding sizes, there always exists an envy-free up to one good (EF1) outcome, thereby generalizing an important result from the individual setting. Our result holds for arbitrary monotonic utilities and comes with an efficient algorithm. We also prove that the EF1 existence can be guaranteed even when the goods lie on a path and each group must receive a connected bundle. In addition, we consider a probabilistic model where the utilities are additive and drawn randomly from a distribution. We show that if there are n agents and the number of goods m is divisible by the number of groups k, then an envy-free outcome exists with high probability if m = ω(log n), and this bound is tight. On the other hand, if m is not divisible by k, then an envy-free outcome is unlikely to exist as long as m = o(√n). View details
    ×