Jump to Content

Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Publications

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 203 publications
    Broadly Enabling KLEE to Effortlessly Find Unrecoverable Errors
    Ying Zhang
    Peng Li
    Lingxiang Wang
    Na Meng
    Dan Williams
    (2024)
    Preview abstract Rust is a general-purpose programming language designed for performance and safety. Unrecoverable errors (e.g., Divide by Zero) in Rust programs are critical, as they signal bad program states and terminate programs abruptly. Previous work has contributed to utilizing KLEE, a dynamic symbolic test engine, to verify the program would not panic. However, it is difficult for engineers who lack domain expertise to write test code correctly. Besides, the effectiveness of KLEE in finding panics in production Rust code has not been evaluated. We created an approach, called PanicCheck, to hide the complexity of verifying Rust programs with KLEE. Using PanicCheck, engineers only need to annotate the function-to-verify with #[panic_check]. The annotation guides PanicCheck to generate test code, compile the function together with tests, and execute KLEE for verification. After applying PanicCheck to 21 open-source and 2 closed-source projects, we found 61 test inputs that triggered panics; 60 of the 61 panics have been addressed by developers so far. Our research shows promising verification results by KLEE, while revealing technical challenges in using KLEE. Our experience will shed light on future practice and research in program verification. View details
    Resolving Code Review Comments with Machine Learning
    Alexander Frömmgen
    Peter Choy
    Elena Khrapko
    Marcus Revaj
    2024 IEEE/ACM 46th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP) (to appear)
    Preview abstract Code reviews are a critical part of the software development process, taking a significant amount of the code authors’ and the code reviewers’ time. As part of this process, the reviewer inspects the proposed code and asks the author for code changes through comments written in natural language. At Google, we see millions of reviewer comments per year, and authors require an average of ∼60 minutes active shepherding time between sending changes for review and finally submitting the change. In our measurements, the required active work time that the code author must devote to address reviewer comments grows almost linearly with the number of comments. However, with machine learning (ML), we have an opportunity to automate and streamline the code-review process, e.g., by proposing code changes based on a comment’s text. We describe our application of recent advances in large sequence models in a real-world setting to automatically resolve code-review comments in the day-to-day development workflow at Google. We present the evolution of this feature from an asynchronous generation of suggested edits after the reviewer sends feedback, to an interactive experience that suggests code edits to the reviewer at review time. In deployment, code-change authors at Google address 7.5% of all reviewer comments by applying an ML-suggested edit. The impact of this will be to reduce the time spent on code reviews by hundreds of thousands of engineer hours annually at Google scale. Unsolicited, very positive feedback highlights that the impact of ML-suggested code edits increases Googlers’ productivity and allows them to focus on more creative and complex tasks. View details
    Dynamic Inference of Likely Symbolic Tensor Shapes in Python Machine Learning Programs
    Koushik Sen
    International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP) (2024) (to appear)
    Preview abstract In machine learning programs, it is often tedious to annotate the dimensions of shapes of various tensors that get created during execution. We present a dynamic likely tensor shape inference analysis that annotates the dimensions of shapes of tensor expressions with symbolic dimension values. Such annotations can be used for understanding the machine learning code written in popular frameworks, such as TensorFlow, PyTorch, JAX, and for finding bugs related to tensor shape mismatch. View details
    Preview abstract This paper reflects on work at Google over the past decade to address common types of software safety and security defects. Our experience has shown that software safety is an emergent property of the software and tooling ecosystem it is developed in and the production environment into which it is deployed. Thus, to effectively prevent common weaknesses at scale, we need to shift-left the responsibility for ensuring safety and security invariants to the end-to-end developer ecosystem, that is, programming languages, software libraries, application frameworks, build and deployment tooling, the production platform and its configuration surfaces, and so forth. Doing so is practical and cost effective when developer ecosystems are designed with application archetypes in mind, such as web or mobile apps: The design of the developer ecosystem can address threat model aspects that apply commonly to all applications of the respective archetype, and investments to ensure safety invariants at the ecosystem level amortize across many applications. Applying secure-by-design principles to developer ecosystems at Google has achieved drastic reduction and in some cases near-zero residual rates of common classes of defects, across hundreds of applications being developed by thousands of developers. View details
    Secure by Design at Google
    Google Security Engineering (2024)
    Preview abstract This whitepaper provides an overview of Google's approach to secure design. View details
    Preview abstract The article summarizes the unique challenges and strategies required for a successful GTM (Go to market) strategy in enterprise world. We cover how enterprise PM function is unique from regular PM, and why enterprise PMs must look at distribution as an inherent product process. We also share a framework for thinking about various components of GTM strategy. Key aspects include customer segmentation, account acquisition strategies, product packaging, positionining and marketing; and technical enablement and content distribution. View details
    Preview abstract Measuring the productivity of software developers is inherently difficult; it requires measuring humans doing a complex, creative task. They are affected by both technological and sociological aspects of their job, and these need to be evaluated in concert to deeply understand developer productivity. View details
    What Breaks Google?
    Abhayendra Singh
    Avi Kondareddy
    (2023)
    Preview abstract In an ideal Continuous Integration (CI) workflow, all potentially impacted builds/tests would be run before submission for every proposed change. In large-scale environments like Google’s mono-repository, this is not feasible to do in terms of both latency and compute cost, given the frequency of change requests and overall size of the codebase. Instead, the compromise is to run more comprehensive testing at later stages of the development lifecycle – batched after submission. These runs detect newly broken tests. Automated culprit finders determine which change broke the tests, the culprit change. In order to make testing at Google more efficient and lower latency, TAP Postsubmit is developing a new scheduling algorithm that utilizes Bug Prediction metrics, features from the change, and historical information about the tests and targets to predict and rank targets by their likelihood to fail. This presentation examines the association between some of our selected features with culprits in Google3. View details
    Change Management in Physical Network Lifecycle Automation
    Virginia Beauregard
    Kevin Grant
    Angus Griffith
    Jahangir Hasan
    Chen Huang
    Quan Leng
    Jiayao Li
    Alexander Lin
    Zhoutao Liu
    Ahmed Mansy
    Bill Martinusen
    Nikil Mehta
    Andrew Narver
    Anshul Nigham
    Melanie Obenberger
    Sean Smith
    Kurt Steinkraus
    Sheng Sun
    Edward Thiele
    Proc. 2023 USENIX Annual Technical Conference (USENIX ATC 23)
    Preview abstract Automated management of a physical network's lifecycle is critical for large networks. At Google, we manage network design, construction, evolution, and management via multiple automated systems. In our experience, one of the primary challenges is to reliably and efficiently manage change in this domain -- additions of new hardware and connectivity, planning and sequencing of topology mutations, introduction of new architectures, new software systems and fixes to old ones, etc. We especially have learned the importance of supporting multiple kinds of change in parallel without conflicts or mistakes (which cause outages) while also maintaining parallelism between different teams and between different processes. We now know that this requires automated support. This paper describes some of our network lifecycle goals, the automation we have developed to meet those goals, and the change-management challenges we encountered. We then discuss in detail our approaches to several specific kinds of change management: (1) managing conflicts between multiple operations on the same network; (2) managing conflicts between operations spanning the boundaries between networks; (3) managing representational changes in the models that drive our automated systems. These approaches combine both novel software systems and software-engineering practices. While this paper reports on our experience with large-scale datacenter network infrastructures, we are also applying the same tools and practices in several adjacent domains, such as the management of WAN systems, of machines, and of datacenter physical designs. Our approaches are likely to be useful at smaller scales, too. View details
    Improving Design Reviews at Google
    Ben Greenberg
    International Conference on Automated Software Engineering, IEEE/ACM (2023)
    Preview abstract Design review is an important initial phase of the software development life-cycle where stakeholders gain and discuss early insights into the design’s viability, discover potentially costly mistakes, and identify inconsistencies and inadequacies. For improved development velocity, it is important that design owners get their designs approved as quickly as possible. In this paper, we discuss how engineering design reviews are typically conducted at Google, and propose a novel, structured, automated solution to improve design review velocity. Based on data collected on 141,652 approved documents authored by 41,030 users over four years, we show that our proposed solution decreases median time-to-approval by 25%, and provides further gains when used consistently. We also provide qualitative data to demonstrate our solution’s success, discuss factors that impact design review latency, propose strategies to tackle them, and share lessons learned from the usage of our solution. View details
    Flake Aware Culprit Finding
    Eric Nickell
    Collin Johnston
    Avi Kondareddy
    Proceedings of the 16th IEEE International Conference on Software Testing, Verification and Validation (ICST 2023), IEEE (to appear)
    Preview abstract When a change introduces a bug into a large software repository, there is often a delay between when the change is committed and when bug is detected. This is true even when the bug causes an existing test to fail! These delays are caused by resource constraints which prevent the organization from running all of the tests on every change. Due to the delay, a Continuous Integration system needs to locate buggy commits. Locating them is complicated by flaky tests that pass and fail non-deterministically. The flaky tests introduce noise into the CI system requiring costly reruns to determine if a failure was caused by a bad code change or caused by non-deterministic test behavior. This paper presents an algorithm, Flake Aware Culprit Finding, that locates buggy commits more accurately than a traditional bisection search. The algorithm is based on Bayesian inference and noisy binary search, utilizing prior information about which changes are most likely to contain the bug. A large scale empirical study was conducted at Google on 13,000+ test breakages. The study evaluates the accuracy and cost of the new algorithm versus a traditional deflaked bisection search. View details
    Beyond the repo: best practices for open source ecosystems researchers
    Julia Ferraioli
    Juniper Lovato
    ACM Queue, vol. 21 (2023), pp. 14-34
    Preview abstract Industry + scientific researchers using data collected from open source ecosystems need better guidelines and best practices to responsibly work with communities. When researchers fail to consider the human element of open source, it harms open source ecosystems, such as: - increasing work and emotional stress on volunteer groups - impacting costly infrastructure systems not designed to support research use cases - treating critical open source systems as test beds for scholarly research into known problems without consent of the community or contributing back to correct these problems This article presents best practices and guidelines for researchers working in open source to have a greater impact while respecting the production and sociotechnical environment they are observing. View details
    Preview abstract This chapter explores the possibility of building a unified assessment methodology for software reliability and security. The fault injection methodology originally designed for reliability assessment is extended to quantify and characterize the security defense aspect of native applications. Native application refers to system software written in C/C++ programming language. Specifically, software fault injection is used to measure the portion of injected software faults caught by the built-in error detection mechanisms of a target program (e.g., the detection coverage of assertions). To automatically activate as many injected faults as possible, a gray box fuzzing technique is used. Using dynamic analyzers during fuzzing further helps us catch the critical error propagation paths of injected (but undetected) faults, and identify code fragments as targets for security hardening. Because conducting software fault injection experiments for fuzzing is an expensive process, a novel, locality-based fault selection algorithm is presented. The presented algorithm increases the fuzzing failure ratios by 3–19 times, accelerating the speed of experiment. The case studies use all the above experimental techniques in order to compare the effectiveness of fuzzing and testing, and consequently assess the security defense of native benchmark programs. View details
    A Model-based, Quality Attribute-guided Architecture Re-Design Process at Google
    Qin Jia
    Yuanfang Cai
    2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)
    Preview abstract Communicating and justifying design decisions are difficult, especially when the architecture design has to evolve. In this paper, we report our experiences of using formal but lightweight design models to communicate, justify, and analyze the quality trade-offs of an architecture revision plan for Monarch, a large-scale legacy system from Google. We started from a few critical user scenarios and their associated quality attribute scenarios, which makes these models lightweight and concise, expressing high-level abstractions only. We also separated static views from dynamic views so that each diagram can be precise and suitable for analyzing different types of quality attributes respectively. The combination of scenarios, quality attributes, and lightweight modeling was well accepted by the team as an effective way to analyze and communicate the trade-offs. A few days after we presented and shared this process, two new projects within the Monarch team adopted component and sequence diagrams in their design documents, and two other product areas within Google started to learn and to adopt the process as well. Our experience indicates that these architecture modeling and analysis techniques can be integrated into software development process to communicate and assess features, quality attributes, or design decisions continuously and iteratively. View details
    Preview abstract The execution behavior of a program often depends on external resources, such as program inputs or file contents, and so cannot be run in isolation. Nevertheless, software developers benefit from fast iteration loops where automated tools identify errors as early as possible, even before programs can be compiled and run. This presents an interesting machine learning challenge: can we predict runtime errors in a ``static'' setting, where program execution is not possible? Here, we introduce a real-world dataset and task for predicting runtime errors, which we show is difficult for generic models like Transformers. As an alternative, we develop an interpreter-inspired architecture with an inductive bias towards mimicking program executions, which models exception handling and ``learns to execute'' descriptions of the contents of external resources. Surprisingly, we show that the model can also predict the location of the error, despite being trained only on labels indicating the presence/absence and kind of error. In total, we present a practical and difficult-yet-approachable challenge problem related to learning program execution and we demonstrate promising new capabilities of interpreter-inspired machine learning models for code. View details