Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10081 publications
    API Governance at Scale
    Mak Ahmad
    JJ Geewax
    David R Karger
    Kwan-Liu Ma
    ICSE 2024 Software Engineering in Practice (2024)
    Preview abstract API Governance, the process of applying standardized sets of policies and guardrails to the design and development of APIs, has only grown in importance and prominence given the continued growth in APIs being produced. In this paper, we present an Action Research style approach to investigate and understand the utility of a multi-faceted API Governance process being adopted inside Google. We first reflect on past research around API Governance, and then introduce three new components, 1. API Improvement Proposals (AIPs) the documented source of truth for API design rules, 2. API Linter, an automated analysis tool which checks for adherence to / violations of AIPs, and 3. API Readability, a program to educate and certify API design experts. These three components are designed to build upon pre-existing processes to scale and improve API design. Through a mixed-methods research strategy, containing both a survey and a series of interviews, we evaluate the utility of these approaches in supporting API Producers. Our research shows that API Producers have positive sentiment towards API Governance, validating the general direction of the program. Specifically, our study participants highlighted the positive impact of API Governance on the quality of the APIs they produced, via consistency in both the outcome and approach. This paper also discusses future research opportunities to enhance API Governance, specifically with regards to newer API Producers, who reported worse sentiment towards the program than their more experienced peers. View details
    Preview abstract Stereotypes are oversimplified beliefs and ideas about particular groups of people. These cognitive biases are omnipresent in our language, reflected in human-generated dataset and potentially learned and perpetuated by language technologies. Although mitigating stereotypes in language technologies is necessary for preventing harms, stereotypes can impose varying levels of risks for targeted individuals and social groups by appearing in various contexts. Technical challenges in detecting stereotypes are rooted in the societal nuances of stereotyping, making it impossible to capture all intertwined interactions of social groups in diverse cultural context in one generic benchmark. This paper delves into the nuances of detecting stereotypes in an annotation task with humans from various regions of the world. We iteratively disambiguate our definition of the task, refining it as detecting ``generalizing language'' and contribute a multilingual, annotated dataset consisting of sentences mentioning a wide range of social identities in 9 languages and labeled on whether they make broad statements and assumptions about those groups. We experiment with training generalizing language detection models, which provide insight about the linguistic context in which stereotypes can appear, facilitating future research in addressing the dynamic, social aspects of stereotypes. View details
    Can Query Expansion Improve Generalization of Strong Cross-Encoder Rankers?
    Minghan Li
    Jimmy Lin
    Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24) (2024)
    Preview abstract Query expansion has been widely used to improve the search results of first-stage retrievers, yet its influence on second-stage, crossencoder rankers remains under-explored. A recent study shows that current expansion techniques benefit weaker models but harm stronger rankers. In this paper, we re-examine this conclusion and raise the following question: Can query expansion improve generalization of strong cross-encoder rankers? To answer this question, we first apply popular query expansion methods to different crossencoder rankers and verify the deteriorated zero-shot effectiveness. We identify two vital steps in the experiment: high-quality keyword generation and minimally-disruptive query modification. We show that it is possible to improve the generalization of a strong neural ranker, by generating keywords through a reasoning chain and aggregating the ranking results of each expanded query via selfconsistency, reciprocal rank weighting, and fusion. Experiments on BEIR and TREC Deep Learning 2019/2020 show that the nDCG@10 scores of both MonoT5 and RankT5 following these steps are improved, which points out a direction for applying query expansion to strong cross-encoder rankers. View details
    Preview abstract Modern code review is a process in which incremental code contributions made by one software developer are reviewed by one or more peers before it is committed to the version control system. An important element of modern code review is verifying that the code under review adheres to style guidelines and best practices of the corresponding programming language. Some of these rules are universal and can be checked automatically or enforced via code formatters. Other rules, however, are context-dependent and the corresponding checks are commonly left to developers who are experts in the given programming language and whose time is expensive. Many automated systems have been developed that attempt to detect various rule violations without any human intervention. Historically, such systems implement targeted analyses and were themselves expensive to develop. This paper presents AutoCommenter, a system that uses a state of the art large language model to automatically learn and enforce programming language best practices. We implemented AutoCommenter for four programming languages: C++, Java, Python and Go. We evaluated its performance and adoption in a large industrial setting. Our evaluation shows that a model that automatically learns language best practices is feasible and has a measurable positive impact on the developer workflow. Additionally, we present the challenges we faced when deploying such a model to tens of thousands of developers and provide lessons we learned for any practitioners that would like to replicate the work or build on top of it. View details
    Human Language to Analog Layout Using Glayout Layout Automation
    Ali Hammoud
    Chetanya Goyal
    Sakib Pathen
    Arlene Dai
    Anhang Li
    Mehdi Saligane
    Preview abstract Current approaches to Analog Layout Automation apply ML techniques such as Graph Convolutional Neural Networks (GCN) to translate netlist to layout. While these ML approaches have proven to be effective, they lack the powerful reasoning capabilities, an intuitive human interface, and standard evaluation benchmarks that have been improving at a rapid de- velopment pace in Large Language Models (LLMs). The GLayout framework introduced in this work translates analog layout into an expressive, technology generic, compact text representation. Then, an LLM is taught to understand analog layout through fine-tuning and in-context learning using Retrieval Augmented Generation (RAG). The LLM is able to successfully layout unseen circuits based on new information provided in-context. We train 3.8, 7, and 22 Billion parameter quantized LLMs on a dataset of less than 50 unique circuits, and text documents providing layout knowledge. The 22B parameter model is tuned in 2 hours on a single NVIDIA A100 GPU. The open-source evaluation set is proposed as an automation benchmark for LLM layout automation tasks, and ranges from 2-transistor circuits to a ∆Σ ADC. The 22B model completes 70% of the tasks in the evaluation set, and is able to pass DRC and LVS verification on unseen 4 transistor blocks. View details
    Broadly Enabling KLEE to Effortlessly Find Unrecoverable Errors
    Ying Zhang
    Peng Li
    Lingxiang Wang
    Na Meng
    Dan Williams
    (2024)
    Preview abstract Rust is a general-purpose programming language designed for performance and safety. Unrecoverable errors (e.g., Divide by Zero) in Rust programs are critical, as they signal bad program states and terminate programs abruptly. Previous work has contributed to utilizing KLEE, a dynamic symbolic test engine, to verify the program would not panic. However, it is difficult for engineers who lack domain expertise to write test code correctly. Besides, the effectiveness of KLEE in finding panics in production Rust code has not been evaluated. We created an approach, called PanicCheck, to hide the complexity of verifying Rust programs with KLEE. Using PanicCheck, engineers only need to annotate the function-to-verify with #[panic_check]. The annotation guides PanicCheck to generate test code, compile the function together with tests, and execute KLEE for verification. After applying PanicCheck to 21 open-source and 2 closed-source projects, we found 61 test inputs that triggered panics; 60 of the 61 panics have been addressed by developers so far. Our research shows promising verification results by KLEE, while revealing technical challenges in using KLEE. Our experience will shed light on future practice and research in program verification. View details
    KATch: A Fast Symbolic Verifier for NetKAT
    Mark Moeller
    Jules Jacobs
    Olivier Savary Belanger
    David Darais
    Cole Schlesinger
    Nate Foster
    Alexandra Silva
    Programming Languages and Implementation (PLDI) (2024) (to appear)
    Preview abstract We develop new data structures and algorithms for checking verification queries in NetKAT, a domain-specific language for specifying the behavior of network data planes. Our results extend the techniques obtained in prior work on symbolic automata and provide a framework for building efficient and scalable verification tools. We present \KATch, an implementation of these ideas in Scala, including extended logical operators that are useful for expressing network-wide specifications and optimizations that construct a bisimulation quickly or generate a counter-example showing that none exists. We evaluate the performance of our implementation on real-world and synthetic benchmarks, verifying properties such as reachability and slice isolation, typically returning a result in well under a second, which is orders of magnitude faster than previous approaches. View details
    BigLake: BigQuery’s Evolution toward a Multi-Cloud Lakehouse
    Garrett Casto
    Mingge Deng
    Rushabh Desai
    Thibaud Hottelier
    Amir Hormati
    Jeff Johnson
    Dawid Kurzyniec
    Prem Ramanathan
    Gaurav Saxena
    Vidya Shanmugam
    Yuri Volobuev
    SIGMOD (2024)
    Preview abstract BigQuery’s cloud-native disaggregated architecture has allowed Google Cloud to evolve the system to meet several customer needs across the analytics and AI/ML workload spectrum. A key customer requirement for BigQuery centers around the unification of data lake and enterprise data warehousing workloads. This approach combines: (1) the need for core data management primitives, e.g., security, governance, common runtime metadata, performance acceleration, ACID transactions, provided by an enterprise data warehouses coupled with (2) harnessing the flexibility of the open source format and analytics ecosystem along with new workload types such as AI/ML over unstructured data on object storage. In addition, there is a strong requirement to support BigQuery as a multi-cloud offering given cloud customers are opting for a multi-cloud footprint by default. This paper describes BigLake, an evolution of BigQuery toward a multi-cloud lakehouse to address these customer requirements in novel ways. We describe three main innovations in this space. We first present BigLake tables, making open-source table formats (e.g., Apache Parquet, Iceberg) first class citizens, providing fine-grained governance enforcement and performance acceleration over these formats to BigQuery and other open-source analytics engines. Next, we cover the design and implementation of BigLake Object tables that allow BigQuery to integrate AI/ML for inferencing and processing over unstructured data. Finally, we present Omni, a platform for deploying BigQuery on non-GCP clouds, focusing on the infrastructure and operational innovations we made to provide an enterprise lakehouse product regardless of the cloud provider hosting the data. View details
    VideoPoet: A Large Language Model for Zero-Shot Video Generation
    Dan Kondratyuk
    Xiuye Gu
    Jonathan Huang
    Grant Schindler
    Rachel Hornung
    Vighnesh Birodkar
    Jimmy Yan
    Ming-Chang Chiu
    Hassan Akbari
    Josh Dillon
    Agrim Gupta
    Meera Hahn
    Anja Hauth
    David Hendon
    Alonso Martinez
    Kihyuk Sohn
    Xuan Yang
    Huisheng Wang
    Lu Jiang
    ICML (2024)
    Preview abstract We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder-only transformer architecture that processes multimodal inputs -- including images, videos, text, and audio. The training protocol follows that of Large Language Models (LLMs), consisting of two stages: pretraining and task-specific adaptation. During pretraining, VideoPoet incorporates a mixture of multimodal generative objectives within an autoregressive Transformer framework. The pretrained LLM serves as a foundation that can be adapted for a range of video generation tasks. We present empirical results demonstrating the model's state-of-the-art capabilities in zero-shot video generation, specifically highlighting VideoPoet's ability to generate high-fidelity motions. Project page: http://sites.research.google/videopoet/ View details
    Preview abstract Interactions with Extended Reality Head Mounted Devices (XR HMDs) applications require precise, intuitive and efficient input methods. Current approaches either rely on power-intensive sensors, such as cameras for hand-tracking, or specialized hardware in the form of handheld controllers. As an alternative, past works have explored the use of devices already present with the user, in the form of smartphones and smartwatches as practical input solutions. However, this approach risks interaction overload---how can one determine whether the user’s interaction gestures on the watch-face or phone screen are directed toward control of the mobile device itself or the XR device? To this effect, we propose a novel framework for cross-device input routing and device arbitration by employing Inertial Measurement Units (IMUs) within these devices. We validate our approach in a user study with six participants. By making use of the relative orientation between the headset and the target input device, we can estimate the intended device of interaction with 93.7% accuracy. Our method offers a seamless, energy-efficient alternative for input management in XR, enhancing user experience through natural and ergonomic interactions. View details
    Preview abstract Measurement is one of the essential components of quantum algorithms, and for superconducting qubits it is often the most error prone. Here, we demonstrate a model-based readout optimization achieving low measurement errors while avoiding detrimental side-effects. For simultaneous and mid-circuit measurements across 17 qubits we observe 1.5% error per qubit with a duration of 500 ns end-to-end and minimal excess reset error from residual resonator photons. We also suppress measurement-induced state transitions and achieve a qubit leakage rate limited by natural heating.This technique can scale to hundreds of qubits, and be used to enhance performance of error-correcting codes as well as near-term applications View details
    Scalable Learning of Segment-Level Traffic Congestion Functions
    Shushman Choudhury
    Aboudy Kreidieh
    Alexandre Bayen
    IEEE Intelligent Transportation Systems Conference (2024)
    Preview abstract We propose and study a data-driven framework for identifying traffic congestion functions (numerical relationships between observations of traffic variables) at global scale and segment-level granularity. In contrast to methods that estimate a separate set of parameters for each roadway, ours learns a single black-box function over all roadways in a metropolitan area. First, we pool traffic data from all segments into one dataset, combining static attributes with dynamic time-dependent features. Second, we train a feed-forward neural network on this dataset, which we can then use on any segment in the area. We evaluate how well our framework identifies congestion functions on observed segments and how it generalizes to unobserved segments and predicts segment attributes on a large dataset covering multiple cities worldwide. For identification error on observed segments, our single data-driven congestion function compares favorably to segment-specific model-based functions on highway roads, but has room to improve on arterial roads. For generalization, our approach shows strong performance across cities and road types: both on unobserved segments in the same city and on zero-shot transfer learning between cities. Finally, for predicting segment attributes, we find that our approach can approximate critical densities for individual segments using their static properties. View details
    ASTRA-5G: Automated Over-the-Air Security Testing and Research Architecture for 5G SA Devices
    Aanjhan Ranganathan
    Christina Pöpper
    Evangelos Bitsikas
    Michele Guerra
    Syed Khandker
    WiSec '24: Proceedings of the 17th ACM Conference on Security and Privacy in Wireless and Mobile Networks, ACM (2024)
    Preview abstract Despite the widespread deployment of 5G technologies, there exists a critical gap in security testing for 5G Standalone (SA) devices. Existing methods, largely manual and labor-intensive, are ill-equipped to fully uncover the state of security in the implementations of 5G-SA protocols and standards on devices, severely limiting the ability to conduct comprehensive evaluations. To address this issue, in this work, we introduce an novel, open-source framework that auto- mates the security testing process for 5G SA devices. By leveraging enhanced functionalities of 5G SA core and Radio Access Network (RAN) software, our framework offers a streamlined approach to generating, executing, and evaluating test cases, specifically focusing on the Non-Access Stratum (NAS) layer. Our application of this framework across multiple 5G SA devices provides in-depth security insights, significantly improving testing efficiency and breadth. View details
    Federated Variational Inference: Towards Improved Personalization and Generalization
    Elahe Vedadi
    Josh Dillon
    Philip Mansfield
    Karan Singhal
    Arash Afkanpour
    Warren Morningstar
    AAAI Federated Learning on the Edge Symposium (2024)
    Preview abstract Conventional federated learning algorithms train a single global model by leveraging all participating clients' data. However, due to heterogeneity in client generative distributions and predictive models, these approaches may not appropriately approximate the predictive process, converge to an optimal state, or generalize to new clients. We study personalization and generalization in stateless cross-device federated learning setups assuming heterogeneity in client data distributions and predictive models. We first propose a hierarchical generative model and formalize it using Bayesian Inference. We then approximate this process using Variational Inference to train our model efficiently. We call this algorithm Federated Variational Inference (FedVI). We use PAC-Bayes analysis to provide generalization bounds for FedVI. We evaluate our model on FEMNIST and CIFAR-100 image classification and show that FedVI beats the state-of-the-art on both tasks. View details
    Preview abstract WindowMirror is a framework for using XR headsets in productivity scenarios. The toolkit provides users with a simulated, extended screen real-estate. It allows users to interact with multiple desktop applications in real-time within a XR environment. Our architecture has two main modules: one a Unity package and a Python backend, which makes it easy to use and extend. WindowMirror supports traditional desktop interaction methods such as mouse, keyboard, and hand tracking. Furthermore, it features a Cylindrical Window Layout, an emerging design pattern which is particularly effective for single-user, egocentric perspectives. The introduction of WindowMirror aims to set a foundation for future research in XR screen-focused productivity scenarios. View details