The DaCapo Benchmarks: Java Benchmarking Development and Analysis
Abstract
Since benchmarks drive computer science research and industry
product development, which ones we use and how we evaluate
them are key questions for the community. Despite complex runtime
tradeoffs due to dynamic compilation and garbage collection
required for Java programs, many evaluations still use methodologies
developed for C, C++, and Fortran. SPEC, the dominant purveyor
of benchmarks, compounded this problem by institutionalizing
these methodologies for their Java benchmark suite. This paper
recommends benchmarking selection and evaluation methodologies,
and introduces the DaCapo benchmarks, a set of open source,
client-side Java benchmarks. We demonstrate that the complex interactions
of (1) architecture, (2) compiler, (3) virtual machine, (4)
memory management, and (5) application require more extensive
evaluation than C, C++, and Fortran which stress (4) much less, and
do not require (3). We use and introduce new value, time-series,
and statistical metrics for static and dynamic properties such as
code complexity, code size, heap composition, and pointer mutations.
No benchmark suite is definitive, but these metrics show that
DaCapo improves over SPEC Java in a variety of ways, including
more complex code, richer object behaviors, and more demanding
memory system requirements. This paper takes a step towards improving
methodologies for choosing and evaluating benchmarks to
foster innovation in system design and implementation for Java and
other managed languages.
product development, which ones we use and how we evaluate
them are key questions for the community. Despite complex runtime
tradeoffs due to dynamic compilation and garbage collection
required for Java programs, many evaluations still use methodologies
developed for C, C++, and Fortran. SPEC, the dominant purveyor
of benchmarks, compounded this problem by institutionalizing
these methodologies for their Java benchmark suite. This paper
recommends benchmarking selection and evaluation methodologies,
and introduces the DaCapo benchmarks, a set of open source,
client-side Java benchmarks. We demonstrate that the complex interactions
of (1) architecture, (2) compiler, (3) virtual machine, (4)
memory management, and (5) application require more extensive
evaluation than C, C++, and Fortran which stress (4) much less, and
do not require (3). We use and introduce new value, time-series,
and statistical metrics for static and dynamic properties such as
code complexity, code size, heap composition, and pointer mutations.
No benchmark suite is definitive, but these metrics show that
DaCapo improves over SPEC Java in a variety of ways, including
more complex code, richer object behaviors, and more demanding
memory system requirements. This paper takes a step towards improving
methodologies for choosing and evaluating benchmarks to
foster innovation in system design and implementation for Java and
other managed languages.