Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10129 publications
    Practical Performance Guarantees for Pipelined DNN Inference
    Kuikui Liu
    Proceedings of the 41st International Conference on Machine Learning (2024), pp. 1655-1671
    Preview abstract This work optimizes pipeline parallelism of machine learning model inference by partitioning computation graphs into $k$ stages and minimizing the running time of the bottleneck stage. We design practical algorithms for this NP-complete problem and prove they are nearly optimal in practice by comparing against lower bounds obtained from solving novel mixed-integer programming (MIP) formulations. We apply these algorithms and lower-bound techniques to production models to achieve substantial improvements in the approximation guarantees, compared to simple combinatorial lower bounds. For example, our new MIP formulations improve the lower bounds enough to drop the geometric mean approximation ratio from $2.175$ to $1.082$ across production data with $k=16$ pipeline stages. This work shows that while bottleneck partitioning is theoretically hard, in practice we have a handle on the algorithmic side of the problem and much of the remaining challenge is in developing more accurate cost models to give to the partitioning algorithms. View details
    AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation
    Yuanwen Yue
    Sabarinath Mahadevan
    Jonas Schult
    Francis Engelmann
    Bastian Leibe
    Konrad Schindler
    Theodora Kontogianni
    ICLR (2024)
    Preview abstract During interactive segmentation, a model and a user work together to delineate objects of interest in a 3D point cloud. In an iterative process, the model assigns each data point to an object (or the background), while the user corrects errors in the resulting segmentation and feeds them back into the model. The current best practice formulates the problem as binary classification and segments objects one at a time. The model expects the user to provide positive clicks to indicate regions wrongly assigned to the background and negative clicks on regions wrongly assigned to the object. Sequentially visiting objects is wasteful since it disregards synergies between objects: a positive click for a given object can, by definition, serve as a negative click for nearby objects. Moreover, a direct competition between adjacent objects can speed up the identification of their common boundary. We introduce AGILE3D, an efficient, attention-based model that (1) supports simultaneous segmentation of multiple 3D objects, (2) yields more accurate segmentation masks with fewer user clicks, and (3) offers faster inference. Our core idea is to encode user clicks as spatial-temporal queries and enable explicit interactions between click queries as well as between them and the 3D scene through a click attention module. Every time new clicks are added, we only need to run a lightweight decoder that produces updated segmentation masks. In experiments with four different 3D point cloud datasets, AGILE3D sets a new state-of-the-art. Moreover, we also verify its practicality in real-world setups with real user studies. Project page: https://ywyue.github.io/AGILE3D. View details
    FP-Fed: Privacy-Preserving Federated Detection of Browser Fingerprinting
    Meenatchi Sundaram Muthu Selva Annamalai
    Emiliano De Cristofaro
    Network and Distributed System Security (NDSS) Symposium (2024)
    Preview abstract Browser fingerprinting often provides an attractive alternative to third-party cookies for tracking users across the web. In fact, the increasing restrictions on third-party cookies placed by common web browsers and recent regulations like the GDPR may accelerate the transition. To counter browser fingerprinting, previous work proposed several techniques to detect its prevalence and severity. However, these rely on 1) centralized web crawls and/or 2) computationally intensive operations to extract and process signals (e.g., information-flow and static analysis). To address these limitations, we present FP-Fed, the first distributed system for browser fingerprinting detection. Using FP-Fed, users can collaboratively train on-device models based on their real browsing patterns, without sharing their training data with a central entity, by relying on Differentially Private Federated Learning (DP-FL). To demonstrate its feasibility and effectiveness, we evaluate FP-Fed’s performance on a set of 18.3k popular websites with different privacy levels, numbers of participants, and features extracted from the scripts. Our experiments show that FP-Fed achieves reasonably high detection performance and can perform both training and inference efficiently, on-device, by only relying on runtime signals extracted from the execution trace, without requiring any resource-intensive operation. View details
    Towards Conversational Diagnostic AI
    Anil Palepu
    Khaled Saab
    Jan Freyberg
    Ryutaro Tanno
    Amy Wang
    Brenna Li
    Nenad Tomašev
    Karan Singhal
    Le Hou
    Albert Webson
    Kavita Kulkarni
    Sara Mahdavi
    Juro Gottweis
    Joelle Barral
    Kat Chou
    Arxiv (2024) (to appear)
    Preview abstract At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical Intelligence Explorer), a Large Language Model (LLM) based AI system optimized for diagnostic dialogue. AMIE uses a novel self-play based simulated environment with automated feedback mechanisms for scaling learning across diverse disease conditions, specialties, and contexts. We designed a framework for evaluating clinically-meaningful axes of performance including history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy. We compared AMIE's performance to that of primary care physicians (PCPs) in a randomized, double-blind crossover study of text-based consultations with validated patient actors in the style of an Objective Structured Clinical Examination (OSCE). The study included 149 case scenarios from clinical providers in Canada, the UK, and India, 20 PCPs for comparison with AMIE, and evaluations by specialist physicians and patient actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors. Our research has several limitations and should be interpreted with appropriate caution. Clinicians were limited to unfamiliar synchronous text-chat which permits large-scale LLM-patient interactions but is not representative of usual clinical practice. While further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI. View details
    Preview abstract Cardiovascular diseases (CVDs) are responsible for a large proportion of premature deaths in low- and middle-income countries. Early CVD detection and intervention is critical in these populations, yet many existing CVD risk scores require a physical examination or lab measurements, which can be challenging in such health systems due to limited accessibility. We investigated the potential to use photoplethysmography (PPG), a sensing technology available on most smartphones that can potentially enable large-scale screening at low cost, for CVD risk prediction. We developed a deep learning PPG-based CVD risk score (DLS) to predict the probability of having major adverse cardiovascular events (MACE: non-fatal myocardial infarction, stroke, and cardiovascular death) within ten years, given only age, sex, smoking status and PPG as predictors. We compare the DLS with the office-based refit-WHO score, which adopts the shared predictors from WHO and Globorisk scores (age, sex, smoking status, height, weight and systolic blood pressure) but refitted on the UK Biobank (UKB) cohort. All models were trained on a development dataset (141,509 participants) and evaluated on a geographically separate test (54,856 participants) dataset, both from UKB. DLS’s C-statistic (71.1%, 95% CI 69.9–72.4) is non-inferior to office-based refit-WHO score (70.9%, 95% CI 69.7–72.2; non-inferiority margin of 2.5%, p<0.01) in the test dataset. The calibration of the DLS is satisfactory, with a 1.8% mean absolute calibration error. Adding DLS features to the office-based score increases the C-statistic by 1.0% (95% CI 0.6–1.4). DLS predicts ten-year MACE risk comparable with the office-based refit-WHO score. Interpretability analyses suggest that the DLS-extracted features are related to PPG waveform morphology and are independent of heart rate. Our study provides a proof-of-concept and suggests the potential of a PPG-based approach strategies for community-based primary prevention in resource-limited regions. View details
    API Governance at Scale
    Mak Ahmad
    JJ Geewax
    David R Karger
    Kwan-Liu Ma
    ICSE 2024 Software Engineering in Practice (2024)
    Preview abstract API Governance, the process of applying standardized sets of policies and guardrails to the design and development of APIs, has only grown in importance and prominence given the continued growth in APIs being produced. In this paper, we present an Action Research style approach to investigate and understand the utility of a multi-faceted API Governance process being adopted inside Google. We first reflect on past research around API Governance, and then introduce three new components, 1. API Improvement Proposals (AIPs) the documented source of truth for API design rules, 2. API Linter, an automated analysis tool which checks for adherence to / violations of AIPs, and 3. API Readability, a program to educate and certify API design experts. These three components are designed to build upon pre-existing processes to scale and improve API design. Through a mixed-methods research strategy, containing both a survey and a series of interviews, we evaluate the utility of these approaches in supporting API Producers. Our research shows that API Producers have positive sentiment towards API Governance, validating the general direction of the program. Specifically, our study participants highlighted the positive impact of API Governance on the quality of the APIs they produced, via consistency in both the outcome and approach. This paper also discusses future research opportunities to enhance API Governance, specifically with regards to newer API Producers, who reported worse sentiment towards the program than their more experienced peers. View details
    Preview abstract Effective model calibration is a critical and indispensable component in developing Media Mix Models (MMMs). One advantage of Bayesian-based MMMs lies in their capacity to accommodate the information from experiment results and the modelers' domain knowledge about the ad effectiveness by setting priors for the model parameters. However, it remains ambiguous about how and which Bayesian priors should be tuned for calibration purpose. In this paper, we propose a new calibration method through model reparameterization. The reparameterized model includes Return on Ads Spend (ROAS) as a model parameter, enabling straightforward adjustment of its prior distribution to align with either experiment results or the modeler's prior knowledge. The proposed method also helps address several key challenges regarding combining MMMs and incrementality experiments. We use simulations to demonstrate that our approach can significantly reduce the bias and uncertainty in the resultant posterior ROAS estimates. View details
    Preview abstract We present an approach to modeling an image-space prior on scene motion. Our prior is learned from a collection of motion trajectories extracted from real video sequences depicting natural, oscillatory dynamics such as trees, flowers, candles, and clothes swaying in the wind. We model this dense, long-term motion prior in the Fourier domain:given a single image, our trained model uses a frequency-coordinated diffusion sampling process to predict a spectral volume, which can be converted into a motion texture that spans an entire video. Along with an image-based rendering module, these trajectories can be used for a number of downstream applications, such as turning still images into seamlessly looping videos, or allowing users to realistically interact with objects in real pictures by interpreting the spectral volumes as image-space modal bases, which approximate object dynamics. View details
    Solidarity not Charity! Empowering Local Communities for Disaster Relief during COVID-19 through Grassroots Support
    Jeongwon Jo
    Oluwafunke Alliyu
    John M. Carroll
    Computer Supported Cooperative Work (2024) (2024)
    Preview abstract The COVID-19 pandemic brought wide-ranging, unanticipated societal changes as communities rushed to slow the spread of the novel coronavirus. In response, mutual aid groups bloomed online across the United States to fill in the gaps in social services and help local communities cope with infrastructural breakdowns. Unlike many previous disasters, the long-haul nature of COVID-19 necessitates sustained disaster relief efforts. In this paper, we conducted an interview study with online mutual aid group administrators to understand how groups facilitated disaster relief, and how disaster relief initiatives developed and maintained over the course of the first year of COVID-19. Our findings suggest that the groups were crucial sources of community-based support for immediate needs, innovated long-term solutions for chronic community issues and grew into a vehicle for justice-centered work. Our insights shed light on the strength of mutual aid as a community capacity that can support communities to collectively be more prepared for future long-haul disasters than they were with COVID-19. View details
    Solving olympiad geometry without human demonstrations
    Trieu Trinh
    Yuhuai Tony Wu
    He He
    Nature, 625 (2024), pp. 476-482
    Preview abstract Proving mathematical theorems at the olympiad level represents a notable milestone in human-level automated reasoning, owing to their reputed difficulty among the world’s best talents in pre-university mathematics. Current machine-learning approaches, however, are not applicable to most mathematical domains owing to the high cost of translating human proofs into machine-verifiable format. The problem is even worse for geometry because of its unique translation challenges, resulting in severe scarcity of training data. We propose AlphaGeometry, a theorem prover for Euclidean plane geometry that sidesteps the need for human demonstrations by synthesizing millions of theorems and proofs across different levels of complexity. AlphaGeometry is a neuro-symbolic system that uses a neural language model, trained from scratch on our large-scale synthetic data, to guide a symbolic deduction engine through infinite branching points in challenging problems. On a test set of 30 latest olympiad-level problems, AlphaGeometry solves 25, outperforming the previous best method that only solves ten problems and approaching the performance of an average International Mathematical Olympiad (IMO) gold medallist. Notably, AlphaGeometry produces human-readable proofs, solves all geometry problems in the IMO 2000 and 2015 under human expert evaluation and discovers a generalized version of a translated IMO theorem in 2004. View details
    RewriteLM: An Instruction-Tuned Large LanguageModel for Text Rewriting
    Yun Zhu
    Simon Tong
    Lei Meng
    Proceedings of the AAAI Conference on Artificial Intelligence, 38(17), 18970-18980 (2024)
    Preview abstract In recent years, Large Language Models (LLMs) have demonstrated impressive zero-shot capabilities in text generation tasks expressed through natural language instructions. However, text rewriting is a challenging task, and unintended modifications can negatively impact the system's performance. To address this challenge, we introduce a novel benchmark for text rewriting that covers a wide variety of rewriting types expressed through natural language instructions. Unlike previous benchmarks, which were primarily focused on limited rewrite styles and sentence-level rewriting, our benchmark is specifically designed to facilitate open-ended rewriting of long-form text. Additionally, we present a strong baseline model, RewriteLM, which is an instruction-tuned large language model for text rewriting. The model is trained using supervised fine-tuning, reward training, and reinforcement learning. To minimize human intervention in the data collection process, we develop new data generation strategies: (1) utilizing high-quality, long-form edits from Wikipedia as our primary natural training data source, (2) generating a synthetic dataset that includes diverse edit types and non-Wiki domains using chain-of-thoughts and the capabilities of LLMs, and (3) employing human-designed heuristic rankers to generate preference data. Our experiments demonstrate the effectiveness of our proposed benchmark and baseline model, as well as the benefits of our data collection strategies in minimizing human intervention. View details
    Stable quantum-correlated many-body states through engineered dissipation
    Xiao Mi
    Alexios Michailidis
    Sara Shabani
    Jerome Lloyd
    Rajeev Acharya
    Igor Aleiner
    Trond Andersen
    Markus Ansmann
    Frank Arute
    Kunal Arya
    Juan Atalaya
    Gina Bortoli
    Alexandre Bourassa
    Leon Brill
    Michael Broughton
    Bob Buckley
    Tim Burger
    Nicholas Bushnell
    Jimmy Chen
    Benjamin Chiaro
    Desmond Chik
    Charina Chou
    Josh Cogan
    Roberto Collins
    Paul Conner
    William Courtney
    Alex Crook
    Ben Curtin
    Alejo Grajales Dau
    Dripto Debroy
    Agustin Di Paolo
    ILYA Drozdov
    Andrew Dunsworth
    Lara Faoro
    Edward Farhi
    Reza Fatemi
    Vinicius Ferreira
    Ebrahim Forati
    Brooks Foxen
    Élie Genois
    William Giang
    Dar Gilboa
    Raja Gosula
    Steve Habegger
    Michael Hamilton
    Monica Hansen
    Sean Harrington
    Paula Heu
    Markus Hoffmann
    Trent Huang
    Ashley Huff
    Bill Huggins
    Sergei Isakov
    Justin Iveland
    Cody Jones
    Pavol Juhas
    Kostyantyn Kechedzhi
    Marika Kieferova
    Alexei Kitaev
    Andrey Klots
    Alexander Korotkov
    Fedor Kostritsa
    John Mark Kreikebaum
    Dave Landhuis
    Pavel Laptev
    Kim Ming Lau
    Lily Laws
    Joonho Lee
    Kenny Lee
    Yuri Lensky
    Alexander Lill
    Wayne Liu
    Orion Martin
    Amanda Mieszala
    Shirin Montazeri
    Alexis Morvan
    Ramis Movassagh
    Wojtek Mruczkiewicz
    Charles Neill
    Ani Nersisyan
    Michael Newman
    JiunHow Ng
    Murray Ich Nguyen
    Tom O'Brien
    Alex Opremcak
    Andre Petukhov
    Rebecca Potter
    Leonid Pryadko
    Charles Rocque
    Negar Saei
    Kannan Sankaragomathi
    Henry Schurkus
    Christopher Schuster
    Mike Shearn
    Aaron Shorter
    Noah Shutty
    Vladimir Shvarts
    Jindra Skruzny
    Clarke Smith
    Rolando Somma
    George Sterling
    Doug Strain
    Marco Szalay
    Alfredo Torres
    Guifre Vidal
    Cheng Xing
    Jamie Yao
    Ping Yeh
    Juhwan Yoo
    Grayson Young
    Yaxing Zhang
    Ningfeng Zhu
    Jeremy Hilton
    Anthony Megrant
    Yu Chen
    Vadim Smelyanskiy
    Dmitry Abanin
    Science, 383 (2024), pp. 1332-1337
    Preview abstract Engineered dissipative reservoirs have the potential to steer many-body quantum systems toward correlated steady states useful for quantum simulation of high-temperature superconductivity or quantum magnetism. Using up to 49 superconducting qubits, we prepared low-energy states of the transverse-field Ising model through coupling to dissipative auxiliary qubits. In one dimension, we observed long-range quantum correlations and a ground-state fidelity of 0.86 for 18 qubits at the critical point. In two dimensions, we found mutual information that extends beyond nearest neighbors. Lastly, by coupling the system to auxiliaries emulating reservoirs with different chemical potentials, we explored transport in the quantum Heisenberg model. Our results establish engineered dissipation as a scalable alternative to unitary evolution for preparing entangled many-body states on noisy quantum processors. View details
    Preview abstract Google Cloud SQL customers encounter PostgreSQL bugs corrupting databases, rarely but reproducibly. This talk will cover use of tools, especially amcheck, to grasp these bugs sufficiently to write fixes and test cases. Those fixes are now part of core PostgreSQL. It will include lessons for avoiding such bugs in future PostgreSQL development. Finally, it will share a diagnostic feature wish list. View details
    Generative Powers of Ten
    Xiaojuan Wang
    Steve Seitz
    Ben Mildenhall
    Pratul Srinivasan
    Dor Verbin
    Aleksander Hołyński
    Preview abstract We present a method that uses a text-to-image model to generate consistent content across multiple image scales, enabling extreme semantic zooms into a scene, e.g., ranging from a wide-angle landscape view of a forest to a macro shot of an insect sitting on one of the tree branches. This representation allows us to render continuously zooming videos, or explore different scales of the scene interactively. We achieve this through a joint multi-scale diffusion sampling approach that encourages consistency across different scales while preserving the integrity of each individual sampling process. Since each generated scale is guided by a different text prompt, our method enables deeper levels of zoom than traditional super-resolution methods that may struggle to create new contextual structure at vastly different scales. We compare our method qualitatively with alternative techniques in image super-resolution and outpainting, and show that our method is most effective at generating consistent multi-scale content. View details
    Preview abstract Misgendering refers to the act of incorrectly identifying or addressing someone's gender. While misgendering is both a factual inaccuracy and a toxic act of identity erasure, research on fact-checking and toxicity detection does not address it. We are the first to bridge this gap by introducing a dataset, \dataset, to assist in developing interventions for misgendering. The misgendering interventions task can be divided into two sub-tasks: (i) detecting misgendering, followed by (ii) editing misgendering where misgendering is present, in domains where editing is appropriate. We introduce a dataset containing a total of 3806 instances of tweets, YouTube comments, and LLM-generated text about 30 non-cisgender individuals annotated for whether they contain misgendering or not. LLM-generated text is also annotated for edits required to fix misgendering. Using this dataset, we set initial benchmarks by evaluating existing NLP systems and highlight challenges for future models to address. Additionally, we conducted a survey of non-cisgender individuals in the US to understand opinions about automated interventions for text-based misgendering. We find interest for interventions along with concerns for potential harm. View details