Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 223 publications
PreFix: Optimizing the Performance of Heap-Intensive Applications
Chaitanya Mamatha Ananda
Rajiv Gupta
Han Shen
CGO 2025: International Symposium on Code Generation and Optimization, Las Vegas, NV, USA (to appear)
Preview abstract
Analyses of heap-intensive applications show that a small fraction of heap objects account for the majority of heap accesses and data cache misses. Prior works like HDS and HALO have shown that allocating hot objects in separate memory regions can improve spatial locality leading to better application performance. However, these techniques are constrained in two primary ways, limiting their gains. First, these techniques have Imperfect Separation, polluting the hot memory region with several cold objects. Second, reordering of objects across allocations is not possible as the original object allocation order is preserved. This paper presents a novel technique that achieves near perfect separation of hot objects via a new context mechanism that efficiently identifies hot objects with high precision. This technique, named PreFix, is based upon Preallocating memory for a Fixed small number of hot objects. The program, guided by profiles, is instrumented to compute context information derived from
dynamic object identifiers, that precisely identifies hot object allocations that are then placed at predetermined locations in the preallocated memory. The preallocated memory region for hot objects provides the flexibility to reorder objects across allocations and allows colocation of objects that are part of a hot data stream (HDS), improving spatial locality. The runtime overhead of identifying hot objects is not significant as this optimization is only focused on a small number of static hot allocation sites and dynamic hot objects. While there is an increase in the program’s memory foot-print, it is manageable and can be controlled by limiting the size of the preallocated memory. In addition, PreFix incorporates an object recycling optimization that reuses the same preallocated space to store different objects whose lifetimes are not expected to overlap. Our experiments with 13 heap-intensive applications yields reductions in execution times ranging from 2.77% to 74%. On average PreFix reduces execution time by 21.7% compared to 7.3% by HDS and 14% by HALO. This is due to PreFix’s precision in hot object identification, hot object colocation, and low runtime overhead.
View details
Explainable Artificial Intelligence Techniques for Software Development Lifecycle: A phase-specific survey
Aman Raj
IEEE Compsac (2025)
Preview abstract
Artificial Intelligence (AI) is rapidly expanding and integrating more into daily life to automate tasks, guide decision-making and enhance efficiency. However, complex AI models, which make decisions without providing clear explanations (known as the "black-box problem"), currently restrict trust and widespread adoption of AI.
Explainable Artificial intelligence (XAI) has emerged to address the black-box problem of making AI systems more interpretable and transparent so stakeholders can trust, verify, and act upon AI-based outcomes. Researcher have come up with various techniques to foster XAI in Software Development Lifecycle. However, there are gaps in the application of XAI in Software Engineering phases. Literature shows that 68% of XAI in Software Engineering research focused on maintenance as opposed to 8% on software management and requirements [7].
In this paper we present a comprehensive survey of the applications of XAI methods (e.g., concept-based explanations, LIME/SHAP, rule extraction, attention mechanisms, counterfactual explanations, example-based explanations) to the different phases of Software Development Lifecycles (SDLC) mainly requirements elicitation, design and development, testing and deployment, and evolution.
To the best of our knowledge, this paper presents the first comprehensive survey of XAI techniques for every phase of the Software Development Life Cycle (SDLC). In doing so, we aim to promote explainable AI in Software Engineering and facilitate the use of complex AI models in AI-driven software development.
View details
Preview abstract
Measuring software development can help drive impactful change. However, it’s a complex task, and getting started can be daunting as it involves understanding what you should measure, and determining what you can measure. This article provides a guide to selecting a framework that aligns with organizational measurement strategy.
View details
Safe Coding: Rigorous Modular Reasoning about Software Safety (Extended Version)
Google Security Engineering (2025)
Preview abstract
Many persistent and dangerous software vulnerabilities, including memory safety violations and code injection, arise from a common root cause: Developers inadvertently violate the implicit safety preconditions of widely-used programming constructs. These preconditions—such as pointer validity, array-access bounds, and the trustworthy provenance of code fragments to be evaluated as SQL, HTML, or JavaScript—are traditionally the developer's responsibility to ensure. In complex systems, meeting these obligations often relies on non-local, whole-program invariants that are notoriously difficult to reason about correctly, leading to vulnerabilities that are difficult to detect after the fact.
This article introduces Safe Coding, a collection of software design patterns and practices designed to cost-effectively provide a high degree of assurance against entire classes of such vulnerabilities. The core principle of Safe Coding is to shift responsibility for safety from individual developers to the programming language, software libraries, and frameworks. This is achieved by systematically eliminating the direct use of risky operations—those with complex safety preconditions—in application code. Instead, these operations are encapsulated within safe abstractions: modules with public APIs that are safe by design, whose implementations fully ensure all module-internal safety preconditions through a combination of local runtime checks and by elevating safety preconditions into type invariants.
Safe Coding facilitates a modular and compositional approach to whole-program safety: Difficult reasoning is localized to the implementation of safe abstractions, which undergo focused expert scrutiny. The composition of these abstractions with the majority of the codebase (which is kept free of risky operations) is then automatically verified by the language’s type checker. This form of compositional reasoning, drawing from patterns used in formal software verification, can be viewed as a semi-formal approach that balances rigor with broad applicability to large industrial codebases. We discuss the successful application of these practices at Google, where they have nearly eliminated vulnerabilities such as Cross-Site Scripting (XSS) and SQL injection, and their critical role in ensuring memory safety in Rust, collectively demonstrating a favorable cost-assurance tradeoff for achieving software safety at scale.
This extended version explores the formal underpinnings of Safe Coding in detail, examining how concepts such as function contracts and modular proofs are pragmatically adapted for industrial-scale use.
View details
Preview abstract
This paper outlines a grammar of data analysis, as distinct from grammars of data manipulation. The primitives of this grammar are metrics and dimensions. We describe a Python implementation of this grammar called Meterstick, which is agnostic to the underlying data source, which may be a DataFrame or a SQL database.
View details
Preview abstract
Rust is a general-purpose programming language designed for performance and safety. Unrecoverable errors (e.g., Divide by Zero) in Rust programs are critical, as they signal bad program states and terminate programs abruptly. Previous work has contributed to utilizing KLEE, a dynamic symbolic test engine, to verify the program would not panic. However, it is difficult for engineers who lack domain expertise to write test code correctly. Besides, the effectiveness of KLEE in finding panics in production Rust code has not been evaluated. We created an approach, called PanicCheck, to hide the complexity of verifying Rust programs with KLEE. Using PanicCheck, engineers only need to annotate the function-to-verify with #[panic_check]. The annotation guides PanicCheck to generate test code, compile the function together with tests, and execute KLEE for verification. After applying PanicCheck to 21 open-source and 2 closed-source projects, we found 61 test inputs that triggered panics; 60 of the 61 panics have been addressed by developers so far. Our research shows promising verification results by KLEE, while revealing technical challenges in using KLEE. Our experience will shed light on future practice and research in program verification.
View details
Resolving Code Review Comments with Machine Learning
Alexander Frömmgen
Jacob Austin
Peter Choy
Elena Khrapko
Marcus Revaj
Satish Chandra
2024 IEEE/ACM 46th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP) (to appear)
Preview abstract
Code reviews are a critical part of the software development process, taking a significant amount of the code authors’ and the code reviewers’ time. As part of this process, the reviewer inspects the proposed code and asks the author for code changes through comments written in natural language. At Google, we see millions of reviewer comments per year, and authors require an average of ∼60 minutes active shepherding time between sending changes for review and finally submitting the change. In our measurements, the required active work time that the code author must devote to address reviewer comments grows almost linearly with the number of comments. However, with machine learning (ML), we have an opportunity to automate and streamline the code-review process, e.g., by proposing code changes based on a comment’s text.
We describe our application of recent advances in large sequence models in a real-world setting to automatically resolve code-review comments in the day-to-day development workflow at Google. We present the evolution of this feature from an asynchronous generation of suggested edits after the reviewer sends feedback, to an interactive experience that suggests code edits to the reviewer at review time. In deployment, code-change authors at Google address 7.5% of all reviewer comments by applying an ML-suggested edit. The impact of this will be to reduce the time spent on code reviews by hundreds of thousands of engineer hours annually at Google scale. Unsolicited, very positive feedback highlights that the impact of ML-suggested code edits increases Googlers’ productivity and allows them to focus on more creative and complex tasks.
View details
Dynamic Inference of Likely Symbolic Tensor Shapes in Python Machine Learning Programs
Koushik Sen
International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP) (2024) (to appear)
Preview abstract
In machine learning programs, it is often tedious to annotate the dimensions of shapes of various tensors that get created during execution. We present a dynamic likely tensor shape inference analysis that annotates the dimensions of shapes of tensor expressions with symbolic dimension values. Such annotations can be used for understanding the machine learning code written in popular frameworks, such as TensorFlow, PyTorch, JAX, and for finding bugs related to tensor shape mismatch.
View details
Developer Ecosystems for Software Safety
ACM Queue, 22 (2024), 73–99
Preview abstract
This paper reflects on work at Google over the past decade to address common types of software safety and security defects. Our experience has shown that software safety is an emergent property of the software and tooling ecosystem it is developed in and the production environment into which it is deployed. Thus, to effectively prevent common weaknesses at scale, we need to shift-left the responsibility for ensuring safety and security invariants to the end-to-end developer ecosystem, that is, programming languages, software libraries, application frameworks, build and deployment tooling, the production platform and its configuration surfaces, and so forth.
Doing so is practical and cost effective when developer ecosystems are designed with application archetypes in mind, such as web or mobile apps: The design of the developer ecosystem can address threat model aspects that apply commonly to all applications of the respective archetype, and investments to ensure safety invariants at the ecosystem level amortize across many applications.
Applying secure-by-design principles to developer ecosystems at Google has achieved drastic reduction and in some cases near-zero residual rates of common classes of defects, across hundreds of applications being developed by thousands of developers.
View details
Secure by Design at Google
Google Security Engineering (2024)
Preview abstract
This whitepaper provides an overview of Google's approach to secure design.
View details
Crafting a go-to-market strategy in enterprise product management
Aqsa Fulara
Ketaki Vaidya
Mind The Product (2024)
Preview abstract
The article summarizes the unique challenges and strategies required for a successful GTM (Go to market) strategy in enterprise world. We cover how enterprise PM function is unique from regular PM, and why enterprise PMs must look at distribution as an inherent product process. We also share a framework for thinking about various components of GTM strategy. Key aspects include customer segmentation, account acquisition strategies, product packaging, positionining and marketing; and technical enablement and content distribution.
View details
Characterizing a Memory Allocator at Warehouse Scale
Zhuangzhuang Zhou
Nilay Vaish
Patrick Xia
Christina Delimitrou
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, Association for Computing Machinery, La Jolla, CA, USA (2024), 192–206
Preview abstract
Memory allocation constitutes a substantial component of warehouse-scale computation. Optimizing the memory allocator not only reduces the datacenter tax, but also improves application performance, leading to significant cost savings.
We present the first comprehensive characterization study of TCMalloc, a warehouse-scale memory allocator used in our production fleet. Our characterization reveals a profound diversity in the memory allocation patterns, allocated object sizes and lifetimes, for large-scale datacenter workloads, as well as in their performance on heterogeneous hardware platforms. Based on these insights, we redesign TCMalloc for warehouse-scale environments. Specifically, we propose optimizations for each level of its cache hierarchy that include usage-based dynamic sizing of allocator caches, leveraging hardware topology to mitigate inter-core communication overhead, and improving allocation packing algorithms based on statistical data. We evaluate these design choices using benchmarks and fleet-wide A/B experiments in our production fleet, resulting in a 1.4% improvement in throughput and a 3.4% reduction in RAM usage for the entire fleet. At our scale, even a single percent CPU or memory improvement translates to significant savings in server costs.
View details
Productive Coverage: Improving the Actionability of Code Coverage
Gordon
Luka Kalinovcic
Marko Ivanković
Mateusz Lewko
Rene Just
Yana Kulizhskaya
ICSE-SEIP '24: Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice (2024) (to appear)
Preview abstract
Code coverage is an intuitive and established test adequacy measure.
However, not all parts of the code base are equally important, and
hence additional testing may be critical for some uncovered code,
whereas it may not be worthwhile for other uncovered code. As a
result, simply visualizing uncovered code is not reliably actionable.
To make code coverage actionable and further improve code
coverage in our codebase, we developed Productive Coverage —
a novel approach to code coverage that guides developers to uncovered code that that should be tested by (unit) tests. Specifically,
Productive Coverage identifies uncovered code that is similar to
existing code, which in turn is tested and/or frequently executed in
production. We implemented and evaluated Productive Coverage
for four programming languages (C++, Java, Go, and Python). The
evaluation shows: (1) The developer sentiment, measured at the
point of use, is strongly positive; (2) Productive Coverage meaningfully increases code coverage above a strong baseline; (3) Productive
Coverage has no negative effect on code authoring efficiency; (4)
Productive Coverage modestly improves code-review effiency; (5)
Productive Coverage directly improves code quality and prevents
bugs from being introduced, in addition to improving test quality
View details
Preview abstract
This is the seventh installment of the Developer Productivity for Humans column. This installment focuses on software quality: what it means, how developers see it, how we break it down into 4 types of quality, and the impact these have on each other.
View details
Preview abstract
AI-powered software development tooling is changing the way that developers interact with tools and write code. However, the ability for AI to truly transform software development depends on developers' level of trust in the tools. In this work, we take a mixed methods approach to measuring the factors that influence developers' trust in AI-powered code completion. We identified that familiarity with AI suggestions, quality of the suggestion, and level of expertise with the language all increased acceptance rate of AI-powered suggestions. While suggestion length and presence in a test file decreased acceptance rates. Based on these findings we propose recommendations for the design of AI-powered development tools to improve trust.
View details