Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10855 publications
    FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration
    Diganta Misra
    Yanqi Luo
    Anjali Sridhar
    Justine Gehring
    Silvio Soares Ribeiro Junior
    2026
    Preview abstract AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization. View details
    CrossCheck: Input Validation for WAN Control Systems
    Bharath Modhipalli
    Rishabh Iyer
    Isaac Keslassy
    Sylvia Ratnasamy
    Networked Systems Design and Implementation (NSDI) (2026) (to appear)
    Preview abstract We present CrossCheck, a system that validates inputs to the Software-Defined Networking (SDN) controller in a Wide Area Network (WAN). By detecting incorrect inputs—often stemming from bugs in the SDN control infrastructure—CrossCheck alerts operators before they trigger network outages. Our analysis at a large-scale WAN operator identifies invalid inputs as a leading cause of major outages, and we show how CrossCheck would have prevented those incidents. We deployed CrossCheck as a shadow validation system for four weeks in a production WAN, during which it accurately detected the single incident of invalid inputs that occurred while sustaining a 0% false positive rate under normal operation, hence imposing little additional burden on operators. In addition, we show through simulation that CrossCheck reliably detects a wide range of invalid inputs (e.g., detecting demand perturbations as small as 5% with 100% accuracy) and maintains a near-zero false positive rate for realistic levels of noisy, missing, or buggy telemetry data (e.g., sustaining zero false positives with up to 30% of corrupted telemetry data). View details
    Productionizing Quantum Mass Production
    Bill Huggins
    Nathan Wiebe
    arXiv for now (2026) (to appear)
    Preview abstract For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step. View details
    Preview abstract We consider the Coalition Structure Learning (CSL) problem in multi-agent systems, motivated by the existence of coalitions in many real-world systems, e.g., trading platforms and auction systems. In this problem, there is a hidden coalition structure within a set of $n$ agents, which affects the behavior of the agents in games. Our goal is to actively design a sequence of games for the agents to play, such that observations in these games can be used to learn the hidden coalition structure. In particular, we consider the setting where in each round, we design and present a game together with a strategy profile to the agents, and receive a multiple-bit observation -- for each agent, we observe whether or not they would like to deviate from the specified strategy in this given game. Our contributions are three-fold: First, we show that we can learn the coalition structure in $O(\log n)$ rounds if we are allowed to choose any normal-form game in each round, matching the information-theoretical lower bound, and the result can be extended to congestion games. Second, in a more restricted setting where we can only choose a graphical game with degree limit $d$, we develop an algorithm to learn the coalition structure in $O(n/d+\log d)$ rounds. Third, when we can only learn the coalition structure through running second-price auctions with personalized reserve prices, we show that the coalition structure can be learned in $O(c\log n)$ rounds, where $c$ is the size of the largest coalition. View details
    Pragmatic Fairness: Evaluating ML Fairness Within the Constraints of Industry
    Jessie Smith
    Michael Madaio
    Robin Burke
    Casey Fiesler
    2025
    Preview abstract Machine learning (ML) fairness evaluation in real-world, industry settings presents unique challenges due to business-driven constraints that influence decision-making processes. While prior research has proposed fairness frameworks and evaluation methodologies, these approaches often focus on idealized conditions and may lack consideration for the practical realities faced by industry practitioners. To understand these practical realities, we conducted a semi-structured interview study with 21 experts from academia and industry specializing in ML fairness. Through this study, we explore three constraints of ML fairness evaluation in industry— balancing competing interests, lacking power/access, and getting buy-in—and how these constraints lead to satisficing, seeking satisfactory rather than ideal outcomes. We define the path from these constraints to satisficing as pragmatic fairness. Using recommender systems as a case study, we explore how practitioners navigate these constraints and highlight actionable strategies to improve fairness evaluations within these business-minded boundaries. This paper provides practical insights to guide fairness evaluations in industry while also showcasing how the FAccT community can better align research goals with the operational realities of practitioners. View details
    Preview abstract Storage on Android has evolved significantly over the years, with each new Android version introducing changes aimed at enhancing usability, security, and privacy. While these updates typically help with restricting app access to storage through various mechanisms, they may occasionally introduce new complexities and vulnerabilities. A prime example is the introduction of scoped storage in Android 10, which fundamentally changed how apps interact with files. While intended to enhance user privacy by limiting broad access to shared storage, scoped storage has also presented developers with new challenges and potential vulnerabilities to address. However, despite its significance for user privacy and app functionality, no systematic studies have been performed to study Android’s scoped storage at depth from a security perspective. In this paper, we present the first systematic security analysis of the scoped storage mechanism. To this end, we design and implement a testing tool, named ScopeVerif, that relies on differential analysis to uncover security issues and implementation inconsistencies in Android’s storage. Specifically, ScopeVerif takes a list of security properties and checks if there are any file operations that violate any security properties defined in the official Android documentation. Additionally, we conduct a comprehensive analysis across different Android versions as well as a cross-OEM analysis to identify discrepancies in different implementations and their security implications. Our study identifies both known and unknown issues of scoped storage. Our cross-version analysis highlights undocumented changes as well as partially fixed security loopholes across versions. Additionally, we discovered several vulnerabilities in scoped storage implementations by different OEMs. These vulnerabilities stem from deviations from the documented and correct behavior, which potentially poses security risks. The affected OEMs and Google have acknowledged our findings and offered us bug bounties in response. View details
    XR Blocks: Accelerating Human-Centered AI + XR Innovation
    Nels Numan
    Evgenii Alekseev
    Alex Cooper
    Min Xia
    Scott Chung
    Jeremy Nelson
    Xiuxiu Yuan
    Jolica Dias
    Tim Bettridge
    Benjamin Hersh
    Michelle Huynh
    Konrad Piascik
    Ricardo Cabello
    Google, XR, XR Labs (2025)
    Preview abstract We are on the cusp where Artificial Intelligence (AI) and Extended Reality (XR) are converging to unlock new paradigms of interactive computing. However, a significant gap exists between the ecosystems of these two fields: while AI research and development is accelerated by mature frameworks like PyTorch and benchmarks like LMArena, prototyping novel AI-driven XR interactions remains a high-friction process, often requiring practitioners to manually integrate disparate, low-level systems for perception, rendering, and interaction. To bridge this gap, we present XR Blocks, a cross-platform framework designed to accelerate human-centered AI + XR innovation. XR Blocks provides a modular architecture with plug-and-play components for core abstraction in AI + XR: user, world, peers; interface, context, and agents. Crucially, it is designed with the mission of "minimum code from idea to reality", accelerating rapid prototyping of complex AI + XR apps. Built upon accessible technologies (WebXR, three.js, TensorFlow, Gemini), our toolkit lowers the barrier to entry for XR creators. We demonstrate its utility through a set of open-source templates, samples, and advanced demos, empowering the community to quickly move from concept to interactive prototype. View details
    Preview abstract The integration of vector search into databases, driven by advancements in embedding models, semantic search, and Retrieval-Augmented Generation (RAG), enables powerful combined querying of structured and unstructured data. This paper focuses on filtered vector search (FVS), a core operation where relational predicates restrict the dataset before or during the vector similarity search (top-k). While approximate near neighbor (ANN) indices are commonly used to accelerate vector search by trading latency for recall, the addition of filters complicates performance optimization and makes achieving stable, declarative recall guarantees challenging. Filters alter the effective dataset size and distribution, impacting the search effort required. We discuss the primary FVS execution strategies – pre-filtering, post-filtering, and inline-filtering – whose efficiencies depend on factors like filter selectivity, cardinality, and data correlation. We review existing approaches that modify index structures and search algorithms (e.g., iterative post-filtering, filter-aware index traversal) to enhance FVS performance. This tutorial provides a comprehensive overview of filtered vector search, discussing its use cases, classifying current solutions and their trade-offs, and highlighting crucial research challenges and future directions for developing efficient and accurate FVS systems.   View details
    A Novel CI Coding Strategy Based on a Cochlear Model and Deep Neural Network
    Maryam Hosseini
    Tim Brochier
    Zachary Smith
    Brett Swanson
    Andrew Vandali
    Alan Kan
    Fadwa Alnafjan
    Kat Fernandez
    Conference on Implantable Auditory Prostheses 2025
    Preview abstract Objective: Many CI recipients face difficulties in understanding speech in noisy environments and express frustration with the quality of music. This may be partly due to the simple filter banks used in current CI technology, which do not fully replicate the natural processes of the cochlea. This project aims to improve CI perception by more accurately mimicking the responses of the auditory nerve. Method: Audio signals were applied to CARFAC (Cascade of Asymmetric Resonators with Fast-Acting Compression) [1] to produce a representation of the auditory nerve response, known as a normal hearing (NH) “neurogram”. The NH neurogram was down-sampled and applied to a deep neural network (DNN) to produce 22 electrode stimulation currents. These currents were applied to an electrical hearing (EH) model incorporating current spread, neural adaptation, and refractoriness, to produce a CI neurogram. The DNN was trained on sentences from the TIMIT database to minimise the difference between the NH and CI neurograms. Results: The CI neurograms produced by the CARFAC-DNN strategy were more similar to the NH neurograms than the CI neurograms produced by the Nucleus ACE strategy. Similarity was quantified by the structural similarity index and mean squared error. Conclusions: The CARFAC-DNN strategy may provide a more natural auditory nerve response than traditional CI sound coding strategies. A sound-booth study with CI recipients is planned. This work was funded by Google through the Australian Future Hearing Initiative. References: [1]  Lyon, R. F. (2017). Human and machine hearing. Cambridge University Press. View details
    Preview abstract This post delves into the shift within enterprise AI, moving from traditional Large Language Models (LLMs) to advanced, goal-oriented AI Agents and sophisticated Multi-Agent Systems (MAS). While individual agents, such as the "Data Agent" in Looker Conversational Analytics, excel at querying specific, governed datasets, they often fall short when addressing complex business challenges that span diverse, isolated systems across departments like Sales, Marketing, and Operations. To overcome these "data silos," we introduce and detail the architecture of a Multi-Agent System. This system, built on the Gemini Enterprise platform and utilizing the Agent Development Kit (ADK), features a Master Agent that coordinates various specialized Sub-Agents (including Data, Jira, and Salesforce agents). This coordination enables the system to independently break down intricate queries, gather validated information from disparate sources, and generate a cohesive, data-driven insight. This innovative architectural approach significantly boosts employee efficiency and effectiveness by automating the laborious process of data integration, thereby empowering users with a unified and intelligent platform. These AI Agents are designed to reason, plan, utilize tools, and autonomously complete complex, multi-step business tasks, with or without human involvement. Organizations globally are prioritizing the integration of AI Agents to enhance the efficiency and effectiveness of their workforce. View details
    A Foot in the Backdoor
    Richard Bondi
    Ruben Barroso
    Garrett Holthaus
    John P. Thomas
    (2025)
    Preview abstract We applied systems theory control loops to the 2024 cyberattack https://nvd.nist.gov/vuln/detail/CVE-2024-3094, in which a backdoor was inserted into Linux distros by modifying the xz utils compression package. Our work illustrates how to apply STAMP, CAST, and STPA to cyberattacks, and advantages over traditional threat modeling. View details
    Calibration Properties of Time-Series Foundation Models: An Empirical Analysis
    Coen Adler
    Samar Abdi
    Yuxin Chang
    Padhraic Smyth
    2025
    Preview abstract Recent development of foundation models for time series data has generated considerable interest in using such models across a variety of applications. Although they achieve state-of-the-art predictive performance, the ability to produce well-calibrated probabilistic distributions is critical for practical applications and is relatively underexplored. In this paper, we investigate the calibration-related properties of five recent time series foundation models and two competitive baselines. We perform systematic evaluations and identify significant variation in calibration performances across models. View details
    Preview abstract This paper discusses the migration of data orchestration workflows from a legacy tool like Autosys to a modern, cloud - based solution, Google Cloud Composer. It explores the transition from traditional job scheduling to Directed Acyclic Graph (DAG) - based workflows using Apache Airflow, culminating in the deployment and management of these workflows in Cloud Composer. The benefits and challenges of this migration are examined, highlighting the advantages of scalability, flexibility, and cloud integration offered by Cloud Composer. View details
    SMaCk: Efficient Instruction Cache Attacks via Self-Modifying Code Conflicts
    Seonghun Son
    Berk Gulmezoglu
    ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2025)
    Preview abstract Self-modifying code (SMC) allows programs to alter their own instructions, optimizing performance and functionality on x86 processors. Despite its benefits, SMC introduces unique microarchitectural behaviors that can be exploited for malicious purposes. In this paper, we explore the security implications of SMC by examining how specific x86 instructions affecting instruction cache lines lead to measurable timing discrepancies between cache hits and misses. These discrepancies facilitate refined cache attacks, making them less noisy and more effective. We introduce novel attack techniques that leverage these timing variations to enhance existing methods such as Prime+Probe and Flush+Reload. Our advanced techniques allow adversaries to more precisely attack cryptographic keys and create covert channels akin to Spectre across various x86 platforms. Finally, we propose a dynamic detection methodology utilizing hardware performance counters to mitigate these enhanced threats. View details
    ×