Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10462 publications
    Reasoning-SQL: Reinforcement Learning with Partial Rewards for Reasoning-Enhanced Text-to-SQL
    Mohammadreza Pourreza
    Shayan Talaei
    Hailong Li
    Azalia Mirhoseini
    Amin Saberi
    Conference on Language Modeling (COLM) (2025) (to appear)
    Preview abstract Text-to-SQL is a challenging task involving multiple reasoning-intensive subtasks, including natural language understanding, database schema comprehension, and precise SQL query formulation. Existing approaches often rely on handcrafted reasoning paths with inductive biases that can limit their overall effectiveness. Motivated by the recent success of reasoning-enhanced models such as DeepSeek R1 and OpenAI o1, which effectively leverage reward-driven self-exploration to enhance reasoning capabilities and generalization, we propose a novel set of partial rewards tailored specifically for the Text-to-SQL task. Our reward set includes schema-linking, AI feedback, n-gram similarity, and syntax check, explicitly designed to address the reward sparsity issue prevalent in reinforcement learning (RL). Leveraging group relative policy optimization (GRPO), our approach explicitly encourages large language models (LLMs) to develop intrinsic reasoning skills necessary for accurate SQL query generation. With models of different sizes, we demonstrate that RL-only training with our proposed rewards consistently achieves higher accuracy and superior generalization compared to supervised fine-tuning (SFT). Remarkably, our RL-trained 14B-parameter model significantly outperforms larger proprietary models, e.g. o3-mini by 4% and Gemini-1.5-Pro-002 by 3% on the BIRD benchmark. These highlight the efficacy of our proposed RL-training framework with partial rewards for enhancing both accuracy and reasoning capabilities in Text-to-SQL tasks. View details
    Fast ACS: Low-Latency File-Based Ordered Message Delivery at Scale
    Anil Raghunath Iyer
    Neel Bagora
    Chang Yu
    Olivier Pomerleau
    Vivek Kumar
    Prunthaban Kanthakumar
    Usenix Annual Technical Conference (2025)
    Preview abstract Low-latency message delivery is crucial for real-time systems. Data originating from a producer must be delivered to consumers, potentially distributed in clusters across metropolitan and continental boundaries. With the growing scale of computing, there can be several thousand consumers of the data. Such systems require a robust messaging system capable of transmitting messages containing data across clusters and efficiently delivering them to consumers. The system must offer guarantees like ordering and at-least-once delivery while avoiding overload on consumers, allowing them to consume messages at their own pace. This paper presents the design of Fast ACS (an abbreviation for Ads Copy Service), a file-based ordered message delivery system that leverages a combination of two-sided (inter-cluster) and one-sided (intra-cluster) communication primitives—namely, Remote Procedure Call and Remote Direct Memory Access, respectively—to deliver messages. The system has been successfully deployed to dozens of production clusters and scales to accommodate several thousand consumers within each cluster, which amounts to Tbps-scale intra-cluster consumer traffic at peak. Notably, Fast ACS delivers messages to consumers across the globe within a few seconds or even sub-seconds (p99) based on the message volume and consumer scale, at a low resource cost. View details
    Preview abstract Generative AI is revolutionizing content creation and holds promise for real-time, personalized educational experiences. We investigated the effectiveness of converting textbook chapters into AI-generated podcasts and explored the impact of personalizing these podcasts for individual learner profiles. We conducted a 3x3 user study with 180 college students in the United States, comparing traditional textbook reading with both generalized and personalized AI-generated podcasts across three textbook subjects. The personalized podcasts were tailored to students’ majors, interests, and learning styles. Our findings show that students found the AI-generated podcast format to be more enjoyable than textbooks and that personalized podcasts led to significantly improved learning outcomes, although this was subject-specific. These results highlight that AI-generated podcasts can offer an engaging and effective modality transformation of textbook material, with personalization enhancing content relevance. We conclude with design recommendations for leveraging AI in education, informed by student feedback. View details
    Preview abstract This IEEE Spectrum article reflects on advocacy for U.S. technological leadership during my Congressional visit through IEEE-USA. Leading an expert group of other distinguished IEEE members, we urged lawmakers to support critical initiatives. Key priorities included sustained funding for federal research institutions like NIST, NASA, and the NSF, reauthorizing the SBIR/STTR programs vital for small business innovation, and passing the CREATE AI Act to democratize AI resources by establishing the National AI Research Resource (NAIRR). We also emphasized strengthening the STEM talent pipeline through the CHIPS and Science Act and expanding high-skilled immigrant visas. We highlighted rapid AI advancements, such as autonomous vehicles, the surge in FDA-approved AI based medical devices, as underscoring the need for these strategic investments and policy actions. The article conveys a sense of urgency, calling for concrete congressional action to ensure the U.S. maintains its technological edge while also sharing my personal experiences. View details
    Preview abstract The rapid emergence of generative AI models and AI powered systems has surfaced a variety of concerns around responsibility, safety, and inclusion. Some of these concerns address specific vulnerable communities, including people with disabilities. At the same time, these systems may introduce harms upon disabled users that do not fit neatly into existing accessibility classifications, and may not be addressed by current accessibility practices. In this paper, we investigate how stakeholders across a variety of job types are encountering and addressing potentially negative impacts of AI on users with disabilities. Through interviews with 25 practitioners, we identify emerging challenges related to AI’s impact on disabled users, systemic obstacles that contribute to problems, and effective strategies for impacting change. Based on these findings, we offer suggestions for improving existing processes for creating AI-powered systems and supporting practitioners in developing skills to address these emerging challenges. View details
    Preview abstract In the differentially private partition selection problem (a.k.a. private set union, private key discovery), users hold subsets of items from an unbounded universe. The goal is to output as many items as possible from the union of the users' sets while maintaining user-level differential privacy. Solutions to this problem are a core building block for many privacy-preserving ML applications including vocabulary extraction in a private corpus, computing statistics over categorical data and learning embeddings over user-provided items. We propose an algorithm for this problem, MaxAdaptiveDegree(MAD), which adaptively reroutes weight from items with weight far above the threshold needed for privacy to items with smaller weight, thereby increasing the probability that less frequent items are output. Our algorithm can be efficiently implemented in massively parallel computation systems allowing scalability to very large datasets. We prove that our algorithm stochastically dominates the standard parallel algorithm for this problem. We also develop a two-round version of our algorithm, MAD2R, where results of the computation in the first round are used to bias the weighting in the second round to maximize the number of items output. In experiments, our algorithms provide the best results across the board among parallel algorithms and scale to datasets with hundreds of billions of items, up to three orders of magnitude larger than those analyzed by prior sequential algorithms. View details
    Preview abstract Summary: Silent Data Corruption by 10x Test Escapes Threatens Reliable Computing" highlights a critical issue: manufacturing defects, dubbed "test escapes," are evading current testing methods at an alarming rate, ten times higher than industry targets. These defects lead to Silent Data Corruption (SDC), where applications produce incorrect outputs without error indications, costing companies significantly in debugging, data recovery, and service disruptions. The paper proposes a three-pronged approach: quick diagnosis of defective chips directly from system-level behaviors, in-field detection using advanced testing and error detection techniques like CASP, and new, rigorous test experiments to validate these solutions and improve manufacturing testing practices. View details
    GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities
    Diganta Misra
    Nizar Islah
    Brice Rauby
    Zihan Wang
    Justine Gehring
    Antonio Orvieto
    Muawiz Chaudhary
    Eilif Muller
    Irina Rish
    Samira Ebrahimi Kahou
    Massimo Caccia
    2025
    Preview abstract The rapid evolution of software libraries poses a considerable hurdle for code generation, necessitating continuous adaptation to frequent version updates while preserving backward compatibility. While existing code evolution benchmarks provide valuable insights, they typically lack execution-based evaluation for generating code compliant with specific library versions. To address this, we introduce GitChameleon 2.0, a novel, meticulously curated dataset comprising 328 Python code completion problems, each conditioned on specific library versions and accompanied by executable unit tests. GitChameleon 2.0 rigorously evaluates the capacity of contemporary large language models (LLMs), LLM-powered agents, code assistants, and RAG systems to perform version-conditioned code generation that demonstrates functional accuracy through execution. Our extensive evaluations indicate that state-of-the-art systems encounter significant challenges with this task; enterprise models achieving baseline success rates in the 48-51% range, underscoring the intricacy of the problem. By offering an execution-based benchmark emphasizing the dynamic nature of code libraries, GitChameleon 2.0 enables a clearer understanding of this challenge and helps guide the development of more adaptable and dependable AI code generation methods. View details
    Preview abstract As large language models (LLMs) improve in their capacity to serve as personal AI assistants, their ability to output uniquely tailored, personalized responses that align with the soft preferences of their users is imperative for maximizing user satisfaction and retention. However, lay users are notoriously bad at prompt specification and often struggle with conveying their latent preferences to AI assistants. To resolve this, we demonstrate that activation steering, an inference-time method, can effectively control the response of the LLMs towards expressing different preferences. In contrast to memory-based personalization methods that require long user history, steering is extremely lightweight and easily-controllable via an interpretable linear strength factor. We further conduct a within-subjects user study (n=14) to investigate how end users personalize their conversations through three different steerable chatbot interfaces. The results demonstrate the effectiveness of preference-based steering for aligning real-world conversations with user preferences, and we discuss qualitative findings on how diverse values around control, transparency, and usability of personalization lead users to prefer different interfaces. View details
    Preview abstract We investigate Learning from Label Proportions (LLP), a partial information setting where examples in a training set are grouped into bags, and only aggregate label values in each bag are available. Despite the partial observability, the goal is still to achieve small regret at the level of individual examples. We give results on the sample complexity of LLP under square loss, showing that our sample complexity is essentially optimal. From an algorithmic viewpoint, we rely on carefully designed variants of Empirical Risk Minimization, and Stochastic Gradient Descent algorithms, combined with ad hoc variance reduction techniques. On one hand, our theoretical results improve in important ways on the existing literature on LLP, specifically in the way the sample complexity depends on the bag size. On the other hand, we validate our algorithmic solutions on several datasets, demonstrating improved empirical performance (better accuracy for less samples) against recent baselines. View details
    Heterogeneous graph neural networks for species distribution modeling
    Christine Kaeser-Chen
    Keith Anderson
    Michelangelo Conserva
    Elise Kleeman
    Maxim Neumann
    Matt Overlan
    Millie Chapman
    Drew Purves
    arxiv (2025)
    Preview abstract Species distribution models (SDMs) are necessary for measuring and predicting occurrences and habitat suitability of species and their relationship with environmental factors. We introduce a novel presence-only SDM with graph neural networks (GNN). In our model, species and locations are treated as two distinct node sets, and the learning task is predicting detection records as the edges that connect locations to species. Using GNN for SDM allows us to model fine-grained interactions between species and the environment. We evaluate the potential of this methodology on the six-region dataset compiled by National Center for Ecological Analysis and Synthesis (NCEAS) for benchmarking SDMs. For each of the regions, the heterogeneous GNN model is comparable to or outperforms previously-benchmarked single-species SDMs as well as a feed-forward neural network baseline model. View details
    The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning
    Pratik Fegade
    Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (2025), pp. 5-17
    Preview abstract A chief enabler of large-scale deep learning is the distribution of computation across multiple interconnected hardware accelerators. In order to unlock the maximum possible performance, a compiler must first select a reasonable strategy to parallelize a model's operations. Since neural network architectures admit multiple flavors of parallelism, determining the proper strategy for each instruction is a critical (albeit non-trivial) task. To solicit new ideas toward solving this challenging combinatorial optimization problem, we organized the ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning, a multi-month competition focused on advancing the state-of-the-art for model partitioning algorithms. In this paper, we offer a retrospective of this event, including the basic problem formulation, key challenges & opportunities, our new benchmark suite, and the quality of submissions received. View details
    On the Design of the Binaural Rendering Library for Eclipsa Audio Immersive Audio Container
    Tomasz Rudzki
    Gavin Kearney
    AES 158th Convention of the Audio Engineering Society (2025)
    Preview abstract Immersive Audio Media and Formats (IAMF), also known as Eclipsa Audio, is an open-source audio container developed to accommodate multichannel and scene-based audio formats. Headphone-based delivery of IAMF audio requires efficient binaural rendering. This paper introduces the Open Binaural Renderer (OBR), which is designed to render IAMF audio. It discusses the core rendering algorithm, the binaural filter design process as well as real-time implementation of the renderer in a form of an open-source C++ rendering library. Designed for multi-platform compatibility, the renderer incorporates a novel approach to binaural audio processing, leveraging a combination of spherical harmonic (SH) based virtual listening room model and anechoic binaural filters. Through its design, the IAMF binaural renderer provides a robust solution for delivering high-quality immersive audio across diverse platforms and applications. View details
    Visualizing Dynamics of Charges and Strings in (2+1)D Lattice Gauge Theories
    Tyler Cochran
    Bernhard Jobst
    Yuri Lensky
    Gaurav Gyawali
    Norhan Eassa
    Melissa Will
    Aaron Szasz
    Dmitry Abanin
    Rajeev Acharya
    Laleh Beni
    Trond Andersen
    Markus Ansmann
    Frank Arute
    Kunal Arya
    Abe Asfaw
    Juan Atalaya
    Brian Ballard
    Alexandre Bourassa
    Michael Broughton
    David Browne
    Brett Buchea
    Bob Buckley
    Tim Burger
    Nicholas Bushnell
    Anthony Cabrera
    Juan Campero
    Hung-Shen Chang
    Jimmy Chen
    Benjamin Chiaro
    Jahan Claes
    Agnetta Cleland
    Josh Cogan
    Roberto Collins
    Paul Conner
    William Courtney
    Alex Crook
    Ben Curtin
    Sayan Das
    Laura De Lorenzo
    Agustin Di Paolo
    Paul Donohoe
    ILYA Drozdov
    Andrew Dunsworth
    Alec Eickbusch
    Aviv Elbag
    Mahmoud Elzouka
    Vinicius Ferreira
    Ebrahim Forati
    Austin Fowler
    Brooks Foxen
    Suhas Ganjam
    Robert Gasca
    Élie Genois
    William Giang
    Dar Gilboa
    Raja Gosula
    Alejo Grajales Dau
    Dietrich Graumann
    Alex Greene
    Steve Habegger
    Monica Hansen
    Sean Harrington
    Paula Heu
    Oscar Higgott
    Jeremy Hilton
    Robert Huang
    Ashley Huff
    Bill Huggins
    Cody Jones
    Chaitali Joshi
    Pavol Juhas
    Hui Kang
    Amir Karamlou
    Kostyantyn Kechedzhi
    Trupti Khaire
    Bryce Kobrin
    Alexander Korotkov
    Fedor Kostritsa
    John Mark Kreikebaum
    Vlad Kurilovich
    Dave Landhuis
    Tiano Lange-Dei
    Brandon Langley
    Kim Ming Lau
    Justin Ledford
    Kenny Lee
    Loick Le Guevel
    Wing Li
    Alexander Lill
    Will Livingston
    Daniel Lundahl
    Aaron Lunt
    Sid Madhuk
    Ashley Maloney
    Salvatore Mandra
    Leigh Martin
    Orion Martin
    Cameron Maxfield
    Seneca Meeks
    Anthony Megrant
    Reza Molavi
    Sebastian Molina
    Shirin Montazeri
    Ramis Movassagh
    Charles Neill
    Michael Newman
    Murray Ich Nguyen
    Chia Ni
    Kris Ottosson
    Alex Pizzuto
    Rebecca Potter
    Orion Pritchard
    Ganesh Ramachandran
    Matt Reagor
    David Rhodes
    Gabrielle Roberts
    Kannan Sankaragomathi
    Henry Schurkus
    Mike Shearn
    Aaron Shorter
    Noah Shutty
    Vladimir Shvarts
    Vlad Sivak
    Spencer Small
    Clarke Smith
    Sofia Springer
    George Sterling
    Jordan Suchard
    Alex Sztein
    Doug Thor
    Mert Torunbalci
    Abeer Vaishnav
    Justin Vargas
    Sergey Vdovichev
    Guifre Vidal
    Steven Waltman
    Shannon Wang
    Brayden Ware
    Kristi Wong
    Cheng Xing
    Jamie Yao
    Ping Yeh
    Bicheng Ying
    Juhwan Yoo
    Grayson Young
    Yaxing Zhang
    Ningfeng Zhu
    Yu Chen
    Vadim Smelyanskiy
    Adam Gammon-Smith
    Frank Pollmann
    Michael Knap
    Nature, 642 (2025), 315–320
    Preview abstract Lattice gauge theories (LGTs) can be used to understand a wide range of phenomena, from elementary particle scattering in high-energy physics to effective descriptions of many-body interactions in materials. Studying dynamical properties of emergent phases can be challenging, as it requires solving many-body problems that are generally beyond perturbative limits. Here we investigate the dynamics of local excitations in a LGT using a two-dimensional lattice of superconducting qubits. We first construct a simple variational circuit that prepares low-energy states that have a large overlap with the ground state; then we create charge excitations with local gates and simulate their quantum dynamics by means of a discretized time evolution. As the electric field coupling constant is increased, our measurements show signatures of transitioning from deconfined to confined dynamics. For confined excitations, the electric field induces a tension in the string connecting them. Our method allows us to experimentally image string dynamics in a (2+1)D LGT, from which we uncover two distinct regimes inside the confining phase: for weak confinement, the string fluctuates strongly in the transverse direction, whereas for strong confinement, transverse fluctuations are effectively frozen. We also demonstrate a resonance condition at which dynamical string breaking is facilitated. Our LGT implementation on a quantum processor presents a new set of techniques for investigating emergent excitations and string dynamics. View details
    Linear Elastic Caching via Ski Rental
    Todd Lipcon
    The biennial Conference on Innovative Data Systems Research (2025)
    Preview abstract In this work we study the Linear Elastic Caching problem, where the goal is to minimize the total cost of a cache inclusive of not just its misses, but also its memory footprint integrated over time. We demonstrate a theoretical connection to the classic ski rental problem and propose a practical algorithm that combines online caching algorithms with ski rental policies. We also introduce a lightweight machine learning-based algorithm for ski rental that is optimized for production workloads and is easy to integrate within existing database systems. Evaluations on both production workloads in Google Spanner and publicly available traces show that the proposed elastic caching approach can significantly reduce the total cache cost compared to traditional fixed-size cache policies. View details