The DaCapo Benchmarks: Java Benchmarking Development and Analysis

Robin Garner
Chris Hoffmann
Asjad M. Khan
Rotem Bentzu
Daniel Feinberg
Daniel Frampton
Samuel Z. Guyer
Martin Hirzel
Antony Hosking
Maria Jump
Han Lee
J. Elliot B. Moss
Aashish Phansalkar
Darko Stefanovic
Thomas VanDrunen
Ben Wiedermann
Proceedings of OOSPLA, ACM (2006)

Abstract

Since benchmarks drive computer science research and industry
product development, which ones we use and how we evaluate
them are key questions for the community. Despite complex runtime
tradeoffs due to dynamic compilation and garbage collection
required for Java programs, many evaluations still use methodologies
developed for C, C++, and Fortran. SPEC, the dominant purveyor
of benchmarks, compounded this problem by institutionalizing
these methodologies for their Java benchmark suite. This paper
recommends benchmarking selection and evaluation methodologies,
and introduces the DaCapo benchmarks, a set of open source,
client-side Java benchmarks. We demonstrate that the complex interactions
of (1) architecture, (2) compiler, (3) virtual machine, (4)
memory management, and (5) application require more extensive
evaluation than C, C++, and Fortran which stress (4) much less, and
do not require (3). We use and introduce new value, time-series,
and statistical metrics for static and dynamic properties such as
code complexity, code size, heap composition, and pointer mutations.
No benchmark suite is definitive, but these metrics show that
DaCapo improves over SPEC Java in a variety of ways, including
more complex code, richer object behaviors, and more demanding
memory system requirements. This paper takes a step towards improving
methodologies for choosing and evaluating benchmarks to
foster innovation in system design and implementation for Java and
other managed languages.