Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10822 publications
    FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration
    Diganta Misra
    Yanqi Luo
    Anjali Sridhar
    Justine Gehring
    Silvio Soares Ribeiro Junior
    2026
    Preview abstract AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization. View details
    mmMUSE: An mmWave-based Motion-resilient Universal Speech Enhancement System
    Chenming He
    Yanyong Zhang
    Kai Wang
    Dequan Wang
    Lingyu Wang
    the Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), ACM (2026) (to appear)
    Preview abstract Voice-based smart systems can greatly enhance user experiences by allowing higher-quality interactions through better voice perception. Speech enhancement can benefit such systems by isolating noise from speech. Recently, integrating millimeter-wave (mmWave) with audio for speech perception has gained increasing attention due to microphones' limitations in noisy environments. However, mmWave-based vocal extraction is severely affected by motion, which disperses vocal signals across ranges and introduces distortions. In this paper, we propose an mmWave-based motion-resilient universal speech enhancement system called mmMUSE, which fuses mmWave and audio signals. To mitigate motion interference, we develop a Doppler-based method for motion-robust vocal signal extraction. Moreover, by introducing the Vocal-Noise-Ratio metric to assess the prominence of vocal signals from mmWave, we achieve real-time voice activity detection that gains 3.81 dB of SISDR in noisy speeches. Additionally, we design a two-stage complex-valued network that includes an attention-based fusion network for cross-modal complementing and a time-frequency masking network for correcting amplitude and phase of speech to isolate noises. Using mmWave and audio datasets from 46 participants, mmMUSE outperforms the state-of-the-art speech enhancement models, achieving an average SISDR improvement of 3.12 dB. Additionally, mmMUSE achieves SISDR improvements of 16.51 dB, 17.93 dB, 14.93 dB, and 18.95 dB in controlled environments involving intense noise, extensive motion, multiple speakers, and various obstructive materials, respectively. Finally, we evaluate mmMUSE in real-world scenarios including running, public spaces, and driving, maintaining a word error rate (WER) below 10%. View details
    Productionizing Quantum Mass Production
    Bill Huggins
    Nathan Wiebe
    arXiv for now (2026) (to appear)
    Preview abstract For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step. View details
    Preview abstract In my talk, I will present a historical overview of the different numerical methods used to find optimal contraction paths. I will introduce TNSA, which has been used for the optimization of the contraction used for [Nature, 634 (8033), 2024], and TNCO, which is an ongoing effort to build upon TNSA. View details
    Preview abstract The need for characterizing global variability of atmospheric carbon dioxide (CO2) is quickly increasing, with a growing urgency for tracking greenhouse gasses with sufficient resolution, precision and accuracy so as to support independent verification of CO2 fluxes at local to global scales. The current generation of space-based sensors, however, can only provide sparse observations in space and/or in time, by design. While upcoming missions may address some of these challenges, most are still years away from launch. This challenge has fueled interest in the potential use of data from existing missions originally developed for other applications for inferring global greenhouse gas variability. The Advanced Baseline Imager (ABI) onboard the Geostationary Operational Environmental Satellite (GOES-East), operational since 2017, provides full coverage of much of the western hemisphere at 10-minute intervals from geostationary orbit at 16 wavelengths. We leverage this high temporal resolution by developing a single-pixel, fully-connected neural network to estimate dry-air column CO2 mole fractions (XCO2). The model employs a time series of GOES-East's 16 spectral bands, which aids in disentangling atmospheric CO2 from surface reflectance, alongside ECMWF ERA5 lower tropospheric meteorology, solar angles, and day of year. Training used collocated GOES-East and OCO-2/OCO-3 observations (2017-2020, within 5 km and 10 minutes), with validation and testing performed on 2021 data. The model successfully captures monthly latitudinal XCO2 gradients and shows reasonable agreement with ground-based TCCON measurements. Furthermore, we demonstrate the model's ability to detect elevated XCO2 signals from high-emitting power plants, particularly over low-reflectance surfaces. We also confirm that removing bands 5 (1.6 µm) and 16 (13.3 µm) substantially decreases performance, indicating that the model is able to extract useful information from these bands. Although GOES-East derived XCO2 precision may not rival dedicated instruments, its unprecedented combination of contiguous geographic coverage, 10-minute temporal frequency, and multi-year record offers the potential to observe aspects of atmospheric CO2 variability currently unseen from space, with further potential through spatio-temporal aggregation. View details
    Preview abstract Inference-time scaling has been successful in enhancing large language model (LLM) performance by increasing computation at test time, but it often relies on external verifiers or is not optimized for manageable computational budgets. To address these, we propose DynScaling, which addresses these limitations through two primary innovations: an integrated parallel-sequential sampling strategy and a bandit-based dynamic budget allocation framework. The integrated sampling strategy unifies parallel and sequential sampling by constructing synthetic sequential reasoning chains from initially independent parallel responses, promoting diverse and coherent reasoning trajectories. The dynamic budget allocation framework formulates the allocation of computational resources as a multi-armed bandit problem, adaptively distributing the inference budget across queries based on the uncertainty of previously sampled responses, thereby maximizing computational efficiency. By synergizing these components, DynScaling effectively improves LLM performance under practical resource constraints without the need for external verifiers. Experimental results demonstrate that DynScaling consistently surpasses existing verifier-free inference scaling baselines in both task performance and computational cost. View details
    Preview abstract The accelerating pace of innovation is fundamentally reshaping product development, creating a complex environment that demands rapid decision-making and efficient information management. To remain competitive, organizations must integrate Generative AI (GenAI) tools into their Product Lifecycle Management (PLM) processes. This integration is crucial because traditional PLM systems, often built on decades-old architectures, struggle to manage modern product complexity, vast data volumes, and interconnected supply chains.1 Limitations such as data silos, inflexible change management, and inadequate collaboration capabilities hinder the agility required today.3 GenAI offers transformative potential by automating complex tasks, enhancing data analysis, and facilitating more dynamic design and collaboration within the PLM ecosystem.5 This integration is not merely an upgrade but an essential evolution to overcome the inherent architectural and process constraints of legacy systems, which impede the speed and data fluidity necessary in the current market. View details
    Preview abstract This short paper describes a new circuit to measure surface codes, which allows them to be implemented on the heavy-square lattice. The circuits perform far worse than the usual surface code, but are more efficient in terms of the distance they can achieve for a given number of qubits and couplers. Paper Abstract: We present and benchmark an interesting subfamily of circuits within the LUCI framework, which we refer to as diamond circuits, that implement a surface code on a Lieb or “Heavy-Square” lattice. This makes them more qubit- and measurement-efficient than previous constructions. These circuits are built around a mid-cycle state that resembles a Bravyi-Bacon-Shor surface code on the data and measurement qubits. These circuits preserve the spacelike distance of the code, but suffer a penalty in timelike distance. This could be useful in regimes where quantum computers are limited by the number of control lines or frequency collisions. View details
    Preview abstract Continuous Integration (CI) is an essential software development practice that establishes processes to minimize bugs and errors in production. In a similar vein, experimentation of software products is vital for evaluating user satisfaction, quality, performance and other key business metrics. Experimentation allows product owners to evaluate the user impact of changes. This can help make informed decisions regarding feature launches. Experimentation also allows developers to tweak internal processes and algorithms to maximize the impact of new features and changes. Additionally, it can sometimes detect errors not detected by CI. Unlike CI systems, experimentation platforms are meant to closely imitate production and usually run the system under test (SUT) against a large scale of input. Despite this, experimentation platforms have a lot in common with CI systems. The mechanisms for continuously integrating and testing changes can be modified and applied to experimentation platforms. Google Search's experimentation platform started as a command line tool many years ago. Over time, this tool has evolved into a platform that serves the evaluation needs for many of Google's products like Search, Assistant, YouTube, Play, Lens, etc., running thousands of large experiments every day. In this workshop, we will present the evolution of Google Search's experimentation platform and how it was transformed from a simple CLI tool into a platform that works at scale, fulfills continuous experimentation needs and provides many CI-like functionalities to its users. View details
    Preview abstract This perspective outlines promising pathways and critical obstacles on the road to developing useful quantum computing applications, drawing on insights from the Google Quantum AI team. We propose a five-stage framework for this process, spanning from theoretical explorations of quantum advantage to the practicalities of compilation and resource estimation. For each stage, we discuss key trends, milestones, and inherent scientific and sociological impediments. We argue that two central stages -- identifying concrete problem instances expected to exhibit quantum advantage, and connecting such problems to real-world use cases -- represent essential and currently under-resourced challenges. Throughout, we touch upon related topics, including the promise of generative artificial intelligence for aspects of this research, criteria for compelling demonstrations of quantum advantage, and the future of compilation as we enter the era of early fault-tolerant quantum computing. View details
    PageFlex: Flexible and Efficient User-space Delegation of Linux Paging Policies with eBPF
    Kan Wu
    Zhiyuan Guo
    Suli Yang
    Rajath Shashidhara
    Wei Xu
    Alex Snoeren
    Kim Keeton
    2025
    Preview abstract To increase platform memory efficiency, hyperscalers like Google and Meta transparently demote “cold” application data to cheaper cost-per-byte memory tiers like compressed memory and NVMe SSDs. These systems rely on standard kernel paging policies and mechanisms to maximize the achievable memory savings without hurting application performance. Although the literature promises better policies, implementing and deploying them within the Linux kernel is challenging. Delegating policies and mechanisms to user space, through userfaultfd or library-based approaches, incurs overheads and may require modifying application code. We present PageFlex, a framework for delegating Linux paging policies to user space with minimal overhead and full compatibility with existing real-world deployments. PageFlex uses eBPF to delegate policy decisions while providing low-overhead access to in-kernel memory state and access information, thus balancing flexibility and performance. Additionally, PageFlex supports different paging strategies for distinct memory regions and application phases. We show that PageFlex can delegate existing kernel-based policies with little (< 1%) application slowdown, effectively realizing the benefits of state-of-the-art policies like Hyperbolic caching and Leap prefetching, and unlocking application-specific benefits through region- and phase-aware policy specialization. View details
    PolarQuant: Quantizing KV Caches with Polar Transformation
    Insu Han
    Amin Karbasi
    Praneeth Kacham
    Amir Zandieh
    2025
    Preview abstract Large language models (LLMs) require significant memory to store Key-Value (KV) embeddings in their KV cache, especially when handling long-range contexts. Quantization of these KV embeddings is a common technique to reduce memory consumption. This work introduces PolarQuant, a novel quantization method employing random preconditioning and polar transformation. Our method first preconditions the embedding vectors using a random projection matrix. Then, we transform these vectors into polar coordinates and quantize the resulting polar representation. Our key insight is that, after random preconditioning, the angles in the polar representation exhibit a tightly bounded and concentrated distribution with an analytically computable form. This eliminates the need for explicit normalization, a computationally expensive step required by traditional quantization methods. Normalization introduces significant memory overhead because quantization parameters (e.g., zero point and scale) must be stored in full precision for each data block. This can add 1 to 2 bits per quantized value, depending on the block size. PolarQuant bypasses this normalization step, enabling substantial memory savings. Empirical evaluation demonstrates that PolarQuant achieves lower memory overheads than existing normalization-based KV quantization techniques. Moreover, it improves performance across various generation tasks, particularly those involving long-context understanding. View details
    Preview abstract The proliferation of IoT in cities, combined with Digital Twins, creates a rich data foundation for Smart Cities aimed at improving urban life and operations. Generative AI (GenAI) significantly enhances this potential, moving beyond traditional AI analytics by processing multimodal content and generating novel outputs like text and simulations. Using specialized or foundational models, GenAI's natural language abilities such as Natural Language Understanding (NLU) and Generation (NLG) can power tailored applications and unified interfaces, dramatically lowering barriers for users interacting with complex smart city systems. In this paper, we focus on GenAI applications based on conversational interfaces within the context of three critical user archetypes in a Smart City - Citizens, Operators and Planners. We identify and review GenAI models and techniques that have been proposed or deployed for various urban subsystems in the contexts of these user archetypes. We also consider how GenAI can be built on the existing data foundation of official city records, IoT data streams and Urban Digital Twins. We believe this work represents the first comprehensive summarization of GenAI techniques for Smart Cities from the lens of the critical users in a Smart City. View details
    Preview abstract We give a new privacy amplification analysis for truncated Poisson sampling, a Poisson sampling variant that truncates a batch if it exceeds a given maximum batch size. View details