Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 11292 publications
    Preview abstract As artificial intelligence (AI) is rapidly integrated into healthcare, ensuring that this innovation helps to combat health inequities requires engaging marginalized communities in health AI futuring. However, little research has examined Black populations’ perspectives on the use of AI in health contexts, despite the widespread health inequities they experience–inequities that are already perpetuated by AI. Addressing this research gap, through qualitative workshops with 18 Black adults, we characterize participants’ cautious optimism for health AI addressing structural well-being barriers (e.g., by providing second opinions that introduce fairness into an unjust healthcare system), and their concerns that AI will worsen health inequities (e.g., through health AI biases they deemed inevitable and the problematic reality of having to trust healthcare providers to use AI equitably). We advance health AI research by articulating previously-unreported health AI perspectives from a population experiencing significant health inequities, and presenting key considerations for future work. View details
    Preview abstract We introduce ALPS (Activation-based Length Prediction for Scheduling), a method for predicting LLM generation length from prefill activations before any tokens are generated. Unlike existing approaches that require model fine-tuning or complex entropy-weighted pooling, ALPS uses a simple linear probe on the last-token activation at intermediate layers. We discover that generation length is encoded in prefill representations: a ridge regression probe achieves R-squared > 0.85 across three model families. Validation across Llama-3.1-8B, Gemma-2-9B, and Qwen-2.5-7B demonstrates: (1) intermediate layers generally perform well, with some architectural variation; (2) simple last-token extraction outperforms complex pooling strategies; (3) activations improve substantially over surface-feature baselines (24 percentage points over input length plus lexical features). The best models achieve R-squared = 0.943 (Gemma), R-squared = 0.880 (Llama), and R-squared = 0.857 (Qwen) with MAE of 38-80 tokens. All test prompts terminated naturally (100% EOS), eliminating truncation confounds. While our evaluation uses 200 curated prompts—sufficient for demonstrating the phenomenon but requiring broader validation—cross-validation confirms generalization beyond training data. ALPS enables practical applications including budget-constrained inference, request scheduling, and resource allocation. The probe adds negligible overhead (~16KB direction vector, single dot product), making ALPS practical for production deployment. View details
    Preview abstract When managing complex, unpredictable (non-deterministic) AI agents using simple, fixed control systems (like finite state machines), operational failures and accountability issues often arise. This document introduces a probabilistic governance and telemetry framework to resolve these problems. Instead of following a rigid sequence of steps, this framework defines a multi-dimensional operational boundary, a 'behavioral volume', and assigns the agent a goal. This allows the agent to use its own reasoning to achieve the goal while remaining within the defined boundaries. A separate telemetry layer monitors the agent's actions by calculating metrics, such as alignment scores and drift velocity, to measure how much the agent deviates from its intended behavior. This system provides a method for guiding, monitoring, and securing autonomous agents, effectively managing the performance and security of an unpredictable AI workforce in complex environments. View details
    SAC133 - SSAC Comments on Proposed Root KSK Algorithm Rollover
    Wes Hardaker
    Internet Corporation for Assigned Names and Numbers (ICANN), ICANN Security and Stability Advisory Committee (SSAC) Reports and Advisories (2026), pp. 9
    Preview abstract The SSAC supports the transition from RSA with SHA-256 (Algorithm 8) to ECDSA P-256 with SHA-256 (Algorithm 13) as the cryptographic algorithm for the RootKSK. The root zone has relied on RSA-based algorithms since DNSSEC signing began in 2010. The algorithm did not change during the first KSK rollover in 2018 or during the second rollover currently underway and scheduled to complete in October 2026. Establishing a clear and predictable process for algorithm transitions is essential to the long-term security of the root zone, and the SSAC observes that the proposal addresses the Recommendation 23 of the SSR2 Review accordingly. The SSAC notes that the proposal builds upon the Root Zone DNSSEC Algorithm Rollover Study published by ICANN in May 2024, which assessed resolver and authoritative server support for alternative algorithms, analyzed rollover methodologies, and evaluated operational risks. The SSAC finds that the proposal implements the study’s recommendations. The SSAC also notes that this proposal is consistent with the SSAC’s prior work on DNSSEC key rollover, including SAC063, SAC073, SAC102, and SAC108. The SSAC encourages ICANN to proceed with this rollover. Specific comments on the proposal’s methodology, timeline, and operational readiness follow View details
    CrossCheck: Input Validation for WAN Control Systems
    Rishabh Iyer
    Isaac Keslassy
    Sylvia Ratnasamy
    Networked Systems Design and Implementation (NSDI) (2026) (to appear)
    Preview abstract We present CrossCheck, a system that validates inputs to the Software-Defined Networking (SDN) controller in a Wide Area Network (WAN). By detecting incorrect inputs—often stemming from bugs in the SDN control infrastructure—CrossCheck alerts operators before they trigger network outages. Our analysis at a large-scale WAN operator identifies invalid inputs as a leading cause of major outages, and we show how CrossCheck would have prevented those incidents. We deployed CrossCheck as a shadow validation system for four weeks in a production WAN, during which it accurately detected the single incident of invalid inputs that occurred while sustaining a 0% false positive rate under normal operation, hence imposing little additional burden on operators. In addition, we show through simulation that CrossCheck reliably detects a wide range of invalid inputs (e.g., detecting demand perturbations as small as 5% with 100% accuracy) and maintains a near-zero false positive rate for realistic levels of noisy, missing, or buggy telemetry data (e.g., sustaining zero false positives with up to 30% of corrupted telemetry data). View details
    An Empirical Study of Tablet Ergonomics: The Interplay of Temperature, Orientation, and Use Behaviors
    Carmen Van Ommen
    Mikki Phan
    Arun Raghupathy
    Daniel Huynh
    Barbara Chaparro
    Ergonomics in Design: The Quarterly of Human Factors Applications Journal (2026)
    Preview abstract To balance computational performance with thermal comfort, this study explores a consolidated hotspot architecture at the top center of a tablet. We tested hotspot (39°C, 43°C, 45°C, 47°C) and ambient temperatures (25°C, 35°C) with 60 participants, measuring perception, action likelihood, and expectation. The hotspot was observed away from high contact areas, with 43°C identified as the threshold for significant discomfort. Discomfort increased with portrait mode use and higher device and ambient temperatures, while active use duration influenced acceptability. The findings underscore the importance of thermal mapping and contextual sensing, with direct applications for software throttling thresholds of coated aluminum enclosures. View details
    Preview abstract In modern Kubernetes environments, eBPF (Extended Berkeley Packet Filter) has become the de facto standard for high-performance dataplane enforcement. However, this architecture introduces a complex distributed state problem: the asynchronous synchronization between the Kubernetes control plane (Intent) and the kernel-space BPF maps (Reality). A critical failure mode, termed “Silent Divergence,” occurs when the control plane believes a network policy or identity is applied, but the underlying kernel state is missing or corrupted. In this “Gray Failure” state, standard observability tools—including logs, liveness probes, and agent status checks—report health, while the network silently drops traffic. This paper introduces eBPF-Auditor, a specialized consistency verification framework. Unlike standard agents that rely on event-based reconciliation, eBPF-Auditor performs a periodic “Two-Way State Audit” that mathematically verifies the intersection of Kubernetes Intent and BPF Reality. We demonstrate through fault injection and benchmarks on 5,000 pods that this approach successfully detects state drift with 100% accuracy and negligible sub-millisecond overhead (ms), making it a viable solution for high-frequency runtime verification in production hyperscale clusters. View details
    FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration
    Victor May
    Diganta Misra
    Yanqi Luo
    Anjali Sridhar
    Justine Gehring
    Silvio Soares Ribeiro Junior
    2026
    Preview abstract AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization. View details
    DeduBB: Binary Code Size Reduction via Post-Link Basic Block De-duplication
    Chaitanya Mamatha Ananda
    Rajiv Gupta
    Mahbod Afarin
    Han Shen
    LCTES (Languages, Compilers, Tools and Theory of Embedded Systems) (2026) (to appear)
    Preview abstract Binary sizes of newer versions of software applications tend to be larger, primarily due to feature bloat. This poses various challenges, particularly for mobile applications. It affects upgrade rates directly impacting revenues, increases maintenance costs of supporting multiple versions, and prevents some users from getting critical security fixes. Code bloat also poses a problem for large warehouse-scale applications. Such applications experience performance degradation when their code size exceeds what smaller and more efficient code models can handle. In this paper, we introduce a post-link optimization tech nique called DeduBB, which deduplicates basic blocks of an application across procedure boundaries. While prior tech- niques used function outlining to de-duplicate redundant code sequences, it missed out on many opportunities as it cannot handle code that manipulates the program stack. In addition, previous techniques were either limited to the scope of a module or lacked scalable implementations required to handle large warehouse-scale applications. Our technique, DeduBB, handles all types of code duplication as we use a novel save-and-jump code pattern to execute de-duplicated code blocks. In addition, DeduBB has been designed to work on scalable post-link optimizers and can even be applied to large warehouse-scale datacenter applications. Finally, DeduBB is profile-guided and can be applied selectively to infrequently executed cold basic blocks to not affect application performance. In fact, in several cases, the performance of the smaller application binary improves due to reductions in its hot working set size. We have implemented our technique on the state-of-the-art post link optimizers, BOLT and Propeller. Experiments show that we can significantly reduce the code size of several benchmarks by 1.55% to 18.63%, on both Arm and x86 platforms, and on binaries that have already been heavily optimized for size using existing code size reduction features. Furthermore, aided by profiles, our technique can retain more than 80% of the maximal code size savings without affecting performance. View details
    Preview abstract Object-Counting for remote-sensing (RS) imagery is raising increasing research interest due to its crucial role in a wide and diverse set of applications. While several promising methods for RS object-counting have been proposed, existing methods focus on a closed, pre-defined set of object classes. This limitation necessitates costly re-annotation and model re-training to adapt current approaches for counting of novel objects that have not been seen during training, and severely inhibits their application in dynamic, real-world monitoring scenarios. To address this gap, in this work we propose RS-OVC - an adaptation of existing work for Open Vocabulary Counting (OVC) approach from general computer vision to the RS domain. We show that our model is capable of accurate counting of novel object classes, that are unseen during training, based solely on textual and/or visual conditioning. View details
    Compact Conformal Subgraphs
    Kamesh Munagala
    Aravindan Vijayaraghavan
    ICML (2026)
    Preview abstract Conformal prediction provides rigorous uncertainty guarantees for model outputs but can produce prohibitively large prediction sets in structured domains such as routing, planning, or sequential recommendation. We introduce \emph{graph-based conformal compression}, a framework for constructing compact subgraphs that preserve the statistical validity of conformal prediction while reducing structural complexity. We study a formulation that selects a smallest subgraph capturing a prescribed fraction of conditional probability mass, and reduce to a weighted version of densest $k$-subgraphs in hypergraphs, in the regime where the subgraph has a large fraction of edges. We design efficient approximation algorithms that achieve constant factor coverage and size trade-offs. Our results highlight an algorithmic regime, distinct from classical densest-$k$-subgraph hardness settings, where the problem can be approximated efficiently, bridging conformal prediction with combinatorial graph compression. We finally validate our algorithmic approach on synthetic and real-world instances of trip planning and navigation, showing in each case that our approach handily beats natural baselines. View details
    Beyond Vector Similarity: Hierarchical Context-Aware Graph RAG vs Standard RAG in Enterprise Code Migration
    Suddhasatwa Bhaumik
    Nilesh Jaiswal
    Arjit Shukla
    Divya Malhotra
    Aniket Agrawal
    Saurabh Garg
    Suchit Puri
    Google Cloud India, Google, S. No, AP81, 83, N Main Rd, near Hard Rock Cafe, Koregaon Park Annexe, Mundhwa, Pune, Maharashtra 411036 (2026)
    Preview abstract As enterprises modernize legacy systems (e.g., monolithic Java architectures to Python microservices), Large Language Models (LLMs) have become instrumental in automated code translation. However, traditional vector-based Retrieval-Augmented Generation (Standard RAG) struggles with topological relationships, fetching isolated text chunks that frequently sever inheritance chains and lead to high compilation failure rates. This paper presents a comparative analysis between Standard RAG and a novel Hierarchical Context-Resident Graph (HCRG) methodology. Our pipeline utilizes tree-sitter for polyglot Abstract Syntax Tree (AST) extraction, mapping architectural edges into a Google Cloud Spanner Property Graph, and serializing this structure into a Gemini (on Vertex AI) Context Cache to enable topological, parent-first code translation. By shifting evaluation from naive text-overlap to a custom 7-metric framework measuring Software Engineering (SE) utility, empirical evaluations on the spring-petclinic-genai repository demonstrate significant structural improvements. Graph RAG decisively mitigates dependency loss, dropping the API hallucination rate from 56.4% to 16.2%. Furthermore, it improves Dependency Resolution Quality (DRQ) from 34.8% to 65.9% and enhances Parent-Child Consistency (PCC) from 26.7% to 45.5%. Interestingly, traditional lexical metrics fail to capture this divergence; both methodologies achieved an identical 91% average CodeBLEU score, effectively masking Standard RAG’s structural failures behind syntactically plausible but broken code. However, the results indicate that Graph RAG is not strictly superior across all dimensions. Providing the LLM with dense, global structural context introduces new vulnerabilities: Graph RAG suffers a severe degradation in Cyclomatic Complexity Consistency (dropping from Standard RAG’s 71.6% to 46.7%) due to defensive over-engineering by the LLM, alongside a slight drop in Docstring Preservation (67.0% down to 61.0%) caused by prompt attention dilution. Ultimately, this research validates that while Graph RAG trades an increase in code complexity for critical reductions in API hallucinations, it offers a substantially more viable and architecturally sound path for automated enterprise codebase modernisation. View details
    An experimental evaluation of an AI-powered interactive learning platform
    Nicole Miller
    Yael Haramaty
    Lidan Hackmon
    Lior Belinsky
    Abraham Oritz Tapia
    Lucy Tootill
    Scott Siebert
    Frontiers in Artificial Intelligence (2026) (to appear)
    Preview abstract Generative AI, which is capable of transforming static content into dynamic learning experiences, holds the potential to revolutionize student engagement in educational contexts. However, questions still remain around whether or not these tools are effective at facilitating student learning. In this research, we test the effectiveness of an AI-powered platform incorporating multiple representations and assessment through Learn Your Way, an experimental research platform that transforms textbook chapters into dynamic visual and audio representations. Through a between-subjects, mixed methods experiment with 60 US-based students, we demonstrate that students who used Learn Your Way had a more positive learning experience and had better learning outcomes compared to students learning the same content through a digital textbook. These findings indicate that AI-driven tools, capable of providing choice among interactive representations of content, constitute an effective and promising method for enhancing student learning. View details
    Preview abstract As AI redefines identity verification in high stakes systems, it introduces novel risks like deepfake fraud and algorithmic bias, creating a critical trust deficit. This session will provide a practical framework for ethical governance, equipping leaders to build and manage secure, fair, and fundamentally trustworthy AI systems by design. View details
    What does your wearable know about the festive season?
    Justin Phillips
    Katarina Vukosavljević
    Abram Schönfeldt
    YongSuk Cho
    Conor Heneghan
    Robert Harle
    (2026)
    Preview abstract As we reach the end of the year and people look forward to spending quality time with loved ones, here at Fitbit, we wonder what our Pixel watches and Fitbit trackers can tell us about how we are spending the festive season. We looked at the data of 11.8 million of our users all over the world between January 2022 and July 2025. Here are the key stats we wanted to share with you! View details
    ×