Goran Petrovic
Goran Petrovic has been working in Google since 2012. His main focus areas are Mutation Testing and Engineering Productivity.
Research Areas
Authored Publications
Google Publications
Other Publications
Sort By
Please fix this mutant: How do developers resolve mutants surfaced during code review?
Gordon Fraser
René Just
2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)
Preview abstract
This paper studies the effects of surfacing undetected mutants during code review. Based on a dataset of 633 merge requests and 78,000 mutants, it answers three research questions around the change in mutant location over the course of a merge request, how often mutants are resolved during code review, and the observed changes after mutant intervention. The results show that (1) for 64% of mutants, the mutated code changes as the merge request evolves; (2) overall, 38% of all mutants and 60% of productive mutants are resolved via code changes or test additions; (3) unresolved productive mutants stem from developers questioning the value of adding tests for surfaced mutants, mutants being later resolved in deferred code changes (atomicity of merge requests), and false positives (mutants being
resolved by tests not considered in the experiment infrastructure); (4) resolved productive mutants are associated with more test and code changes, compared to unproductive mutants.
View details
MuRS: Suppressing and Ranking Mutants with IdentifierTemplates
Malgorzata (Gosia) Salawa
René Just
Zimin Chen
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2023), 1798–1808
Preview abstract
Diff-based mutation testing is a mutation testing approach thatonly generates mutants in changed lines. At Google we executemore than 150 000 000 tests and submit more than40 000 commits per day. We have successfully integrated mutation testing into ourcode review process. Over the years, we have continuously gath-ered developer feedback on the surfaced mutants and measuredthe negative feedback rate. To enhance the developer experience,we manually implemented a large number of static rules, whichare used to suppress certain mutants. In this paper, we proposeMuRS, an automatic tool that finds patterns in the source codeunder test, and uses these patterns to rank and suppress futuremutants based on historical performance of similar mutants. Because MuRS learns mutant suppression rules fully automatically,it significantly reduces the build and maintenance cost of the mu-tation testing system. To evaluate the effectiveness of MuRS, we conducted an A/B experiment, where mutants in the experimentgroup were ranked and suppressed byMuRS, and mutants in thecontrol group were randomly shuffled. The experiment showeda statistically significant negative feedback rate of 11.45% in the experiment group versus 12.41% in the control group. Furthermore,we found that statement removal mutants received both most positive and negative developer feedback, suggesting a need for furtherinvestigation to identify valuable statement removal mutants.
View details
Practical Mutation Testing at Scale: A view from Google
Gordon Fraser
René Just
IEEE Transactions on Software Engineering (2021)
Preview abstract
Mutation analysis assesses a test suites adequacy by measuring its ability to detect small artificial faults, systematically seeded into the tested program. Mutation analysis is considered one of the strongest test-adequacy criteria. Mutation testing builds on top of mutation analysis and is a testing technique that uses mutants as test goals to create or improve a test suite. Mutation testing has long been considered intractable because the sheer number of mutants that can be created represents an insurmountable problemboth in terms of human and computational effort. This has hindered the adoption of mutation testing as an industry standard. For example, Google has a codebase of two billion lines of code and more than 150,000,000 tests are executed on a daily basis. The traditional approach to mutation testing does not scale to such an environment; even existing solutions to speed up mutation analysis are insufficient to make it computationally feasible at such a scale. To address these challenges, this paper presents a scalable approach to mutation testing based on the following main ideas: (1) mutation testing is done incrementally, mutating only changed code during code review, rather than the entire code base; (2) mutants are filtered, removing mutants that are likely to be irrelevant to developers, and limiting the number of mutants per line and per code review process; (3) mutants are selected based on the historical performance of mutation operators, further eliminating irrelevant mutants and improving mutant quality. This paper empirically validates the proposed approach by analyzing its effectiveness in a code-review-based setting, used by more than 24,000 developers on more than 1,000 projects. The results show that the proposed approach produces orders of magnitude fewer mutants and that context-based mutant filtering and selection improve mutant quality and actionability. Overall, the proposed approach represents a mutation testing framework that seamlessly integrates into the software development workflow and is applicable to industrial settings of any size.
View details
Long Term Effects of Mutation Testing
Gordon Fraser
René Just
2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), IEEE, pp. 910-921
Preview abstract
Various proxy metrics for test quality have been
defined in order to guide developers when writing tests. Code
coverage is particularly well established in practice, even though
the question of how coverage relates to test quality is a matter of
ongoing debate. Mutation testing offers a promising alternative:
Artificial defects can identify holes in a test suite, and thus provide
concrete suggestions for additional tests. Despite the obvious
advantages of mutation testing, it is not yet well established in
practice. Until recently, mutation testing tools and techniques
simply did not scale to complex systems. Although they now
do scale, a remaining obstacle is lack of evidence that writing
tests for mutants actually improves test quality. In this paper, we
fill this gap. We analyze a large dataset of 15 million mutants
and investigate how the mutants influenced developers over time,
and how the mutants relate to real faults. Our analyses suggest
that developers using mutation testing write more tests, and
actively improve their test suites with high quality tests such
that fewer mutants remain. By analyzing a dataset of historic
fixes of real faults we further provide evidence that mutants are
indeed coupled with real faults. In other words, had mutation
testing been used for the changes introducing the faults, it would
have reported a live mutant that could have prevented the bug.
View details
Code coverage at Google
René Just
Gordon Fraser
Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ACM, pp. 955-963
Preview abstract
Code coverage is a measure of the degree to which a test suite
exercises a software system. Although coverage is well established
in software engineering research, deployment in industry is often
inhibited by the perceived usefulness and the computational costs
of analyzing coverage at scale. At Google, coverage information is
computed for one billion lines of code daily, for seven programming
languages. A key aspect of making coverage information actionable
is to apply it at the level of changesets and code review.
This paper describes Google’s code coverage infrastructure and
how the computed code coverage information is visualized and
used. It also describes the challenges and solutions for adopting
code coverage at scale. To study how code coverage is adopted and
perceived by developers, this paper analyzes adoption rates, error
rates, and average code coverage ratios over a five-year period,
and it reports on 512 responses, received from surveying 3000
developers. Finally, this paper provides concrete suggestions for
how to implement and use code coverage in an industrial setting.
View details
An Industrial Application of Mutation Testing: Lessons, Challenges, and Research Directions
Robert Kurtz
Paul Ammann
René Just
Proceedings of the 13th International Workshop on Mutation Analysis (Mutation 2018)
Preview abstract
Mutation analysis evaluates a testing or debugging
technique by measuring how well it detects mutants, which
are systematically seeded, artificial faults. Mutation analysis is
inherently expensive due to the large number of mutants it
generates and due to the fact that many of these generated
mutants are not effective; they are redundant, equivalent, or
simply uninteresting and waste computational resources. A large
body of research has focused on improving the scalability of
mutation analysis and proposed numerous optimizations to, e.g.,
select effective mutants or efficiently execute a large number of
tests against a large number of mutants. However, comparatively
little research has focused on the costs and benefits of mutation
testing, in which mutants are presented as testing goals to a
developer, in the context of an industrial-scale software devel-
opment process. This paper aims to fill that gap. Specifically,
it first reports on a case study from an open source context,
which quantifies the costs of achieving a mutation adequate
test set. The results suggest that achieving mutation adequacy
is neither practical nor desirable. This paper then draws on
an industrial application of mutation testing, involving more
than 30,000+ developers and 1,890,442 change sets, written in
4 programming languages. It shows that mutation testing does
not add a significant overhead to the software development
process and reports on mutation testing benefits perceived by
developers. Finally, this paper describes lessons learned from
these studies, highlights the current challenges of efficiently
and effectively applying mutation testing in an industrial-scale
software development process, and outlines research directions.
View details
State of Mutation Testing at Google
Proceedings of the 40th International Conference on Software Engineering 2017 (SEIP) (2018) (to appear)
Preview abstract
Mutation testing assesses test suite efficacy by inserting small faults into programs and measuring the ability of the test suite to detect them. It is widely considered the strongest test criterion in terms of finding the most faults and it subsumes a number of other coverage criteria. Traditional mutation analysis is computationally prohibitive which hinders its adoption as an industry standard. In order to alleviate the computational issues, we present a diff-based probabilistic approach to mutation analysis that drastically reduces the number of mutants by omitting lines of code without statement coverage and lines that are determined to be uninteresting - we dub these arid lines. Furthermore, by reducing the number of mutants and carefully selecting only the most interesting ones we make it easier for humans to understand and evaluate the result of mutation analysis. We propose a heuristic for judging whether a node is arid or not, conditioned on the programming language. We focus on a code-review based approach and consider the effects of surfacing mutation results on developer attention. The described system is used by 6,000 engineers in Google on all code changes they author or review, affecting in total more than 14,000 code authors as part of the mandatory code review process. The system processes about 30% of all diffs across Google that have statement coverage
calculated.
View details
No Results Found