Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10128 publications
Media Mix Model Calibration With Bayesian Priors
Mike Wurm
Brenda Price
Ying Liu
research.google (2024)
Preview abstract
Effective model calibration is a critical and indispensable component in developing Media Mix Models (MMMs). One advantage of Bayesian-based MMMs lies in their capacity to accommodate the information from experiment results and the modelers' domain knowledge about the ad effectiveness by setting priors for the model parameters. However, it remains ambiguous about how and which Bayesian priors should be tuned for calibration purpose. In this paper, we propose a new calibration method through model reparameterization. The reparameterized model includes Return on Ads Spend (ROAS) as a model parameter, enabling straightforward adjustment of its prior distribution to align with either experiment results or the modeler's prior knowledge. The proposed method also helps address several key challenges regarding combining MMMs and incrementality experiments. We use simulations to demonstrate that our approach can significantly reduce the bias and uncertainty in the resultant posterior ROAS estimates.
View details
Resolving Code Review Comments with Machine Learning
Alexander Frömmgen
Peter Choy
Elena Khrapko
Marcus Revaj
2024 IEEE/ACM 46th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP) (to appear)
Preview abstract
Code reviews are a critical part of the software development process, taking a significant amount of the code authors’ and the code reviewers’ time. As part of this process, the reviewer inspects the proposed code and asks the author for code changes through comments written in natural language. At Google, we see millions of reviewer comments per year, and authors require an average of ∼60 minutes active shepherding time between sending changes for review and finally submitting the change. In our measurements, the required active work time that the code author must devote to address reviewer comments grows almost linearly with the number of comments. However, with machine learning (ML), we have an opportunity to automate and streamline the code-review process, e.g., by proposing code changes based on a comment’s text.
We describe our application of recent advances in large sequence models in a real-world setting to automatically resolve code-review comments in the day-to-day development workflow at Google. We present the evolution of this feature from an asynchronous generation of suggested edits after the reviewer sends feedback, to an interactive experience that suggests code edits to the reviewer at review time. In deployment, code-change authors at Google address 7.5% of all reviewer comments by applying an ML-suggested edit. The impact of this will be to reduce the time spent on code reviews by hundreds of thousands of engineer hours annually at Google scale. Unsolicited, very positive feedback highlights that the impact of ML-suggested code edits increases Googlers’ productivity and allows them to focus on more creative and complex tasks.
View details
Creative ML Assemblages: The Interactive Politics of People, Processes, and Products
Ramya Malur Srinivasan
Katharina Burgdorf
Jennifer Lena
ACM Conference on Computer Supported Cooperative Work and Social Computing (2024) (to appear)
Preview abstract
Creative ML tools are collaborative systems that afford artistic creativity through their myriad interactive relationships. We propose using ``assemblage thinking" to support analyses of creative ML by approaching it as a system in which the elements of people, organizations, culture, practices, and technology constantly influence each other. We model these interactions as ``coordinating elements" that give rise to the social and political characteristics of a particular creative ML context, and call attention to three dynamic elements of creative ML whose interactions provide unique context for the social impact a particular system as: people, creative processes, and products. As creative assemblages are highly contextual, we present these as analytical concepts that computing researchers can adapt to better understand the functioning of a particular system or phenomena and identify intervention points to foster desired change. This paper contributes to theorizing interactions with AI in the context of art, and how these interactions shape the production of algorithmic art.
View details
Developer Ecosystems for Software Safety
ACM Queue, 22 (2024), 73–99
Preview abstract
This paper reflects on work at Google over the past decade to address common types of software safety and security defects. Our experience has shown that software safety is an emergent property of the software and tooling ecosystem it is developed in and the production environment into which it is deployed. Thus, to effectively prevent common weaknesses at scale, we need to shift-left the responsibility for ensuring safety and security invariants to the end-to-end developer ecosystem, that is, programming languages, software libraries, application frameworks, build and deployment tooling, the production platform and its configuration surfaces, and so forth.
Doing so is practical and cost effective when developer ecosystems are designed with application archetypes in mind, such as web or mobile apps: The design of the developer ecosystem can address threat model aspects that apply commonly to all applications of the respective archetype, and investments to ensure safety invariants at the ecosystem level amortize across many applications.
Applying secure-by-design principles to developer ecosystems at Google has achieved drastic reduction and in some cases near-zero residual rates of common classes of defects, across hundreds of applications being developed by thousands of developers.
View details
Experiencing InstructPipe: Building Multi-modal AI Pipelines via Prompting LLMs and Visual Programming
Zhongyi Zhou
Jing Jin
Xiuxiu Yuan
Jun Jiang
Jingtao Zhou
Yiyi Huang
Kristen Wright
Jason Mayes
Mark Sherwood
Ram Iyengar
Na Li
Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems, ACM, pp. 5
Preview abstract
Foundational multi-modal models have democratized AI access, yet the construction of complex, customizable machine learning pipelines by novice users remains a grand challenge. This paper demonstrates a visual programming system that allows novices to rapidly prototype multimodal AI pipelines. We first conducted a formative study with 58 contributors and collected 236 proposals of multimodal AI pipelines that served various practical needs. We then distilled our findings into a design matrix of primitive nodes for prototyping multimodal AI visual programming pipelines, and implemented a system with 65 nodes. To support users' rapid prototyping experience, we built InstructPipe, an AI assistant based on large language models (LLMs) that allows users to generate a pipeline by writing text-based instructions. We believe InstructPipe enhances novice users onboarding experience of visual programming and the controllability of LLMs by offering non-experts a platform to easily update the generation.
View details
UI Mobility Control in XR: Switching UI Positionings between Static, Dynamic, and Self Entities
Siyou Pei
Yang Zhang
Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, ACM, pp. 12 (to appear)
Preview abstract
Extended reality (XR) has the potential for seamless user interface (UI) transitions across people, objects, and environments. However, the design space, applications, and common practices of 3D UI transitions remain underexplored. To address this gap, we conducted a need-finding study with 11 participants, identifying and distilling a taxonomy based on three types of UI placements --- affixed to static, dynamic, or self entities. We further surveyed 113 commercial applications to understand the common practices of 3D UI mobility control, where only 6.2% of these applications allowed users to transition UI between entities. In response, we built interaction prototypes to facilitate UI transitions between entities. We report on results from a qualitative user study (N=14) on 3D UI mobility control using our FingerSwitches technique, which suggests that perceived usefulness is affected by types of entities and environments. We aspire to tackle a vital need in UI mobility within XR.
View details
Preview abstract
Situationally Induced Impairments and Disabilities (SIIDs) can significantly hinder user experience in everyday activities. Despite their prevalence, existing adaptive systems predominantly cater to specific tasks or environments and fail to accommodate the diverse and dynamic nature of SIIDs. We introduce Human I/O, a real-time system that detects SIIDs by gauging the availability of human input/output channels. Leveraging egocentric vision, multimodal sensing and reasoning with large language models, Human I/O achieves good performance in availability prediction across 60 in-the-wild egocentric videos in 32 different scenarios. Further, while the core focus of our work is on the detection of SIIDs rather than the creation of adaptive user interfaces, we showcase the utility of our prototype via a user study with 10 participants. Findings suggest that Human I/O significantly reduces effort and improves user experience in the presence of SIIDs, paving the way for more adaptive and accessible interactive systems in the future.
View details
KATch: A Fast Symbolic Verifier for NetKAT
Mark Moeller
Jules Jacobs
Olivier Savary Belanger
David Darais
Cole Schlesinger
Nate Foster
Alexandra Silva
Programming Languages and Implementation (PLDI) (2024) (to appear)
Preview abstract
We develop new data structures and algorithms for checking verification queries in NetKAT, a domain-specific language for specifying the behavior of network data planes. Our results extend the techniques obtained in prior work on symbolic automata and provide a framework for building efficient and scalable verification tools. We present \KATch, an implementation of these ideas in Scala, including extended logical operators that are useful for expressing network-wide specifications and optimizations that construct a bisimulation quickly or generate a counter-example showing that none exists. We evaluate the performance of our implementation on real-world and synthetic benchmarks, verifying properties such as reachability and slice isolation, typically returning a result in well under a second, which is orders of magnitude faster than previous approaches.
View details
Preview abstract
2022 marked the 50th anniversary of memory safety vulnerabilities, first reported by Anderson et al. Half a century later, we are still dealing with memory safety bugs despite substantial investments to improve memory unsafe languages.
Like others', Google’s data and internal vulnerability research show that memory safety bugs are widespread and one of the leading causes of vulnerabilities in memory-unsafe codebases. Those vulnerabilities endanger end users, our industry, and the broader society.
At Google, we have decades of experience addressing, at scale, large classes of vulnerabilities that were once similarly prevalent as memory safety issues. Based on this experience we expect that high assurance memory safety can only be achieved via a Secure-by-Design approach centered around comprehensive adoption of languages with rigorous memory safety guarantees.
We see no realistic path for an evolution of C++ into a language with rigorous memory safety guarantees that include temporal safety. As a consequence, we are considering a gradual transition of C++ code at Google towards other languages that are memory safe.
Given the large volume of pre-existing C++, we believe it is nonetheless necessary to improve the safety of C++ to the extent practicable. We are considering transitioning to a safer C++ subset, augmented with hardware security features like MTE.
View details
An artificial neural network to estimate the foliar and ground cover input variables of the Rangeland Hydrology and Erosion Model
Mahmoud Saeedimoghaddam
David Goodrich
Mariano Hernandez
David Phillip Guertin
Loretta J. Metz
Guillermo Ponce-Campos
Haiyan Wei
Shea Burns
Sarah E. McCord
Mark A. Nearing
C. Jason Williams
Carrie-Ann Houdeshell
Mashrekur Rahman
Menberu B. Meles
Steve Barker
Journal of Hydrology (2024)
Preview abstract
Models like the Rangeland Hydrology and Erosion Model (RHEM) are useful for estimating soil erosion, however, they rely on input parameters that are sometimes difficult or expensive to measure. Specifically, RHEM requires information about foliar and ground cover fractions that generally must be measured in situ, which makes it difficult to use models like RHEM to produce erosion or soil risk maps for areas exceeding the size of a hillslope such as a large watershed. We previously developed a deep learning emulator of RHEM that has low computational expense and can, in principle, be run over large areas (e.g., over the continental US). In this paper, we develop a deep learning model to estimate the RHEM ground cover inputs from remote sensing time series, reducing the need for extensive field surveys to produce erosion maps. We achieve a prediction accuracy on hillslope runoff of r2=0.9, and on soil loss and sediment yield of r2 = 0.4 at 66,643 field locations within the US. We demonstrate how this approach can be used for mapping by developing runoff, soil loss, and sediment yield maps over a 1356 km2 region of interest in Nebraska.
View details
Generative AI in Creative Practice: ML-Artist Folk Theories of T2I Use, Harm, and Harm-Reduction
Shalaleh Rismani
Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), Association for Computing Machinery (2024), pp. 1-17 (to appear)
Preview abstract
Understanding how communities experience algorithms is necessary to mitigate potential harmful impacts. This paper presents folk theories of text-to-image (T2I) models to enrich understanding of how artist communities experience creative machine learning (ML) systems. This research draws on data collected from a workshop with 15 artists from 10 countries who incorporate T2I models in their creative practice. Through reflexive thematic analysis of workshop data, we highlight theorization of T2I use, harm, and harm-reduction. Folk theories of use envision T2I models as an artistic medium, a mundane tool, and locate true creativity as rising above model affordances. Theories of harm articulate T2I models as harmed by engineering efforts to eliminate glitches and product policy efforts to limit functionality. Theories of harm-reduction orient towards protecting T2I models for creative practice through transparency and distributed governance. We examine how these theories relate, and conclude by discussing how folk theorization informs responsible AI efforts.
View details
Preview abstract
We present a method for generating Streetscapes --- long sequences of views through an on-the-fly synthesized city-scale scene. Our generation is conditioned by language input (e.g., city name, weather), as well as an underlying map/layout hosting the desired trajectory. Compared to recent models for video generation or 3D view synthesis, our method can scale to much longer-range camera trajectories, spanning several city blocks, while maintaining visual quality and consistency. To achieve this goal, we build on recent work on video diffusion, used within an autoregressive framework that can easily scale to long sequences. In particular, we introduce a new temporal imputation method that prevents our autoregressive approach from drifting from the distribution of realistic city imagery. We train our Streetscapes system on a compelling source of data-posed imagery from Google Street View, along with contextual map data-which allows users to generate city views conditioned on any desired city layout, with controllable camera poses.
View details
Density-based User Representation through Gaussian Process Regression for Multi-interest Personalized Retrieval
Haolun Wu
Xue Liu
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS-24), Vancouver (2024)
Preview abstract
Personalized recommendation systems are increasingly essential in our information-rich society, aiding users in navigating the expansive online realm. However, accurately modeling the diverse and dynamic interests of the users remains a formidable challenge. Existing user modeling methods, like Single-point User Representation (SUR) and Multi-point User Representation (MUR), have their limitations in terms of accuracy, diversity, computation cost, and adaptability. To overcome these challenges, we introduce a novel model, the Density-based User Representation (DUR), leveraging Gaussian Process Regression (GPR), which has not been extensively explored in multi-interest recommendation and retrieval. Our approach inherently captures user interest dynamics without manual tuning, provides uncertainty-awareness, and is more efficient than point-based representation methods. This paper outlines the development and implementation of GPR4DUR, details its evaluation protocols, and presents extensive analysis demonstrating its effectiveness and efficiency. Experiments on real-world offline datasets confirm our method’s adaptability and efficiency. Further online experiments simulating user behavior illuminate the benefits of our method in the exploration-exploitation trade-off by effectively utilizing model uncertainty.
View details
CodeQueries: A Dataset of Semantic Queries over Code
Surya Prakash Sahu
Madhurima Mandal
Shikhar Bharadwaj
Aditya Kanade
Shirish Shevade
Innovations in Software Engineering (ISEC), ACM, Bangalore, India (2024)
Preview abstract
Developers often have questions about semantic aspects of code
they are working on, e.g., “Is there a class whose parent classes
declare a conflicting attribute?”. Answering them requires understanding code semantics such as attributes and inheritance relation
of classes. An answer to such a question should identify code spans
constituting the answer (e.g., the declaration of the subclass) as well
as supporting facts (e.g., the definitions of the conflicting attributes).
The existing work on question-answering over code has considered
yes/no questions or method-level context. We contribute a labeled
dataset, called CodeQueries, of semantic queries over Python code.
Compared to the existing datasets, in CodeQueries, the queries
are about code semantics, the context is file level and the answers
are code spans. We curate the dataset based on queries supported
by a widely-used static analysis tool, CodeQL, and include both
positive and negative examples, and queries requiring single-hop
and multi-hop reasoning.
To assess the value of our dataset, we evaluate baseline neural
approaches. We study a large language model (GPT3.5-Turbo) in
zero-shot and few-shot settings on a subset of CodeQueries. We
also evaluate a BERT style model (CuBERT) with fine-tuning. We
find that these models achieve limited success on CodeQueries.
CodeQueries is thus a challenging dataset to test the ability of
neural models, to understand code semantics, in the extractive
question-answering setting
View details
Take it, Leave it, or Fix it: Measuring Productivity and Trust in Human-AI Collaboration
29th International Conference on Intelligent User Interfaces (IUI ’24), ACM, New York, NY, USA (2024)
Preview abstract
Although recent developments in generative AI have greatly enhanced the capabilities of conversational agents such as Google's Bard or OpenAI's ChatGPT, it's unclear whether the usage of these agents aids users across various contexts. To better understand how access to conversational AI affects productivity and trust, we conducted a mixed-methods, task-based user study, observing 76 software engineers (N=76) as they completed a programming exam with and without access to Bard. Effects on performance, efficiency, satisfaction, and trust vary depending on user expertise, question type (open-ended "solve" questions vs. definitive "search" questions), and measurement type (demonstrated vs. self-reported). Our findings include evidence of automation complacency, increased reliance on the AI over the course of the task, and increased performance for novices on “solve”-type questions when using the AI. We discuss common behaviors, design recommendations, and impact considerations to improve collaborations with conversational AI.
View details