Hannes Payer
Research Areas
Authored Publications
Sort By
Preview abstract
A collaborative approach to reclaiming memory in heterogeneous software systems.
View details
Concurrent Marking of Shape-Changing Objects
Ulan Degenbaev
Proceedings of the 2019 ACM SIGPLAN International Symposium on Memory Management, ACM, New York, NY, USA, pp. 89-102
Preview abstract
Efficient garbage collection is a key goal in engineering high-performance runtime systems. To reduce pause times, many collector designs traverse the object graph concurrently with the application, an optimization known as concurrent marking. Traditional concurrent marking imposes strict invariants on the object shapes: 1) static type layout of objects, 2) static object memory locations, 3) static object sizes. High performance virtual machines for dynamic languages, for example, the V8 JavaScript virtual machine used in the Google Chrome web browser, generally violate these constraints in pursuit of high throughput for a single thread. Taking V8 as an example, we show that some object shape changes are safe and can be handled by traditional concurrent marking algorithms. For unsafe shape changes, we introduce novel wait-free object snapshotting and lock-based concurrent marking algorithms and prove that they preserve key invariants. We implemented both algorithms in V8 and achieved performance improvements on various JavaScript benchmark suites and real-world web workloads. Concurrent marking of shape-changing objects using the wait-free object snapshotting algorithm is enabled by default in Chrome since version 64.
View details
Cross-Component Garbage Collection
Ulan Degenbaev
Proceedings of the ACM on Programming Languages, 2 Issue OOPSLA (2018), 151:1-151:24
Preview abstract
Embedding a modern language runtime as a component in a larger software system is popular these days. Communication between these systems often requires keeping references to each others' objects. In this paper we present and discuss the problem of cross-component memory management where reference cycles across component boundaries may lead to memory leaks and premature reclamation of objects may lead to dangling cross-component references. We provide a generic algorithm for effective, efficient, and safe garbage collection over component boundaries, which we call cross-component tracing. We designed and implemented cross-component tracing in the Chrome web browser where the JavaScript virtual machine V8 is embedded into the rendering engine Blink. Cross-component tracing from V8's JavaScript heap to Blink's C++ heap improves garbage collection latency and eliminates long-standing memory leaks for real websites in Chrome. We show how cross-component tracing can help web developers to reason about reachability and retainment of objects spanning both V8 and Blink components based on Chrome's heap snapshot memory tool. Cross-component tracing was enabled by default for all websites in Chrome version 57 and is also deployed in other widely used software systems such as Opera, Cobalt, and Electron.
View details
Preview abstract
Over the last years, web browsing has been steadily shifting from desktop computers to mobile devices like smartphones and tablets. However, mobile browsers available today have mainly focused on performance rather than power consumption, although the battery life of a mobile device is one of the most important usability metrics. This is because many of these browsers have originated in the desktop domain and have been ported to the mobile domain. Such browsers have multiple power hungry components such as the rendering engine, and the JavaScript engine, and generate high workload without considering the capabilities and the power consumption characteristics of the underlying hardware platform. Also, the lack of coordination between a browser application and the power manager in the operating system (such as Android) results in poor power savings. In this paper, we propose a power manager that takes into account the internal state of a browser – that we refer to as a phase – and show with Google’s Chrome running on Android that up to 57.4% more energy can be saved over Android’s default power managers. We implemented and evaluated our technique on a heterogeneous multi-processing (HMP) ARM big.LITTLE platform such as the ones found in most modern smartphones.
View details
Idle Time Garbage Collection Scheduling
Ulan Degenbaev
Manfred Ernst
37th annual ACM SIGPLAN conference on Programming Language Design and Implementation, ACM, New York, NY, USA (2016), pp. 570-583
Preview abstract
Efficient garbage collection is increasingly important in today's managed language runtime systems that demand low latency, low memory consumption, and high throughput. Garbage collection may pause the application for many milliseconds to identify live memory, free unused memory, and compact fragmented regions of memory, even when employing concurrent garbage collection. In animation-based applications that require 60 frames per second, these pause times may be observable, degrading user experience. This paper introduces idle time garbage collection scheduling to increase the responsiveness of applications by hiding expensive garbage collection operations inside of small, otherwise unused idle portions of the application's execution, resulting in smoother animations. Additionally we take advantage of idleness to reduce memory consumption while allowing higher memory use when high throughput is required. We implemented idle time garbage collection scheduling in V8, an open-source, production JavaScript virtual machine running within Chrome. We present performance results on various benchmarks running popular webpages and show that idle time garbage collection scheduling can significantly improve latency and memory consumption. Furthermore, we introduce a new metric called frame time discrepancy to quantify the quality of the user experience and precisely measure the improvements that idle time garbage collection scheduling provides for a WebGL-based game benchmark. Idle time garbage collection scheduling is shipped and enabled by default in Chrome.
View details
Web Browser Workload Characterization for Power Management on HMP Platforms
Nadja Peters
Samarjit Chakraborty
Sangyoung Park
Proceedings of the Eighth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '12) (2016)
Preview abstract
The volume of mobile web browsing traffic has significantly
increased as well as the complexity of the mobile websites
mandating high-performance JavaScript engines such as Google’s
V8 to be used on mobile devices. Although there has been a
significant improvement in performance of JavaScript engine
on mobile phones in recent years, the power consumption re-
duction has not been addressed much. This paper presents
a case study for power management of JavaScript engine
V8 from Google in web browsers on a heterogeneous multi-
processing (HMP) platform. We analyze the detailed traces
of the thread workload generated by the web browser and
JavaScript engine, and discuss the power saving potentials
in relation to power management policies on Android. We
believe that this work will lead to development of practi-
cal power management techniques considering thread allo-
cation, dynamic voltage and frequency scaling (DVFS) and
power-gating.
View details
Memento Mori: Dynamic Allocation-site-based Optimizations
Michael Stanton
Ben L. Titzer
Proceedings of the 2015 ACM SIGPLAN International Symposium on Memory Management, ACM, New York, NY, USA, pp. 105-117
Preview abstract
Languages that lack static typing are ubiquitous in the world of mobile and web applications. The rapid rise of larger applications like interactive web GUIs, games, and cryptography presents a new range of implementation challenges for modern virtual machines to close the performance gap between typed and untyped languages. While all languages can benefit from efficient automatic memory management, languages like JavaScript present extra thrill with innocent-looking but difficult features like dynamically-sized arrays, deletable properties, and prototypes. Optimizing such languages requires complex dynamic techniques with more radical object layout strategies such as dynamically evolving representations for arrays. This paper presents a general approach for gathering temporal allocation site feedback that tackles both the general problem of object lifetime estimation and improves optimization of these problematic language features. We introduce a new implementation technique where allocation mementos processed by the garbage collector and runtime system efficiently tie objects back to allocation sites in the program and dynamically estimate object lifetime, representation, and size to inform three optimizations: pretenuring, pretransitioning, and presizing. Unlike previous work on pretenuring, our system utilizes allocation mementos to achieve fully dynamic allocation-site-based pretenuring in a production system. We implement all of our techniques in V8, a high performance virtual machine for JavaScript, and demonstrate solid performance improvements across a range of benchmarks.
View details
Allocation Folding Based on Dominance
Michael Starzinger
Ben L. Titzer
Proceedings of the 2014 International Symposium on Memory Management, ACM, New York, NY, USA
Preview abstract
Memory management system performance is of increasing importance in today's managed languages.
Two lingering sources of overhead are the direct costs of memory allocations and write barriers.
This paper introduces allocation folding, an optimization technique where the virtual machine automatically folds multiple memory allocation operations in optimized code together into a single, larger allocation group.
An allocation group comprises multiple objects and requires just a single bounds check in a bump-pointer style allocation, rather than a check for each individual object.
More importantly, all objects allocated in a single allocation group are guaranteed to be contiguous after allocation and thus exist in the same generation, which makes it possible to statically remove write barriers for reference stores involving objects in the same allocation group.
Unlike object inlining, object fusing, and object colocation, allocation folding requires no special connectivity or ownership relation between the objects in an allocation group.
We present our analysis algorithm to determine when it is safe to fold allocations together and discuss our implementation in V8, an open-source, production JavaScript virtual machine.
We present performance results for the Octane and Kraken benchmark suites and show that allocation folding is a strong performance improvement, even in the presence of some heap fragmentation.
Additionally, we use four hand-selected benchmarks JPEGEncoder, NBody, Soft3D, and Textwriter where allocation folding has a large impact.
View details
ACDC-JS: explorative benchmarking of javascript memory management
Martin Aigner
Thomas Huetter
Christoph M. Kirsch
Alexander Miller
Mario Preishuber
Proceedings of the 10th ACM Symposium on Dynamic Languages, ACM, New York, NY, USA (2014), pp. 67-78
Preview abstract
We present ACDC-JS, an open-source JavaScript memory management benchmarking tool. ACDC-JS incorporates a heap model based on real web applications and may be configured to expose virtually any relevant performance characteristics of JavaScript memory management systems. ACDC-JS is based on ACDC, a benchmarking tool for C/C++ that models periodic allocation and deallocation behavior (AC) as well as persistent memory (DC). We identify important characteristics of JavaScript mutator behavior and propose a configurable heap model based on typical distributions of these characteristics as foundation for ACDC-JS. We describe heap analyses of 13 real web applications extending existing work on JavaScript behavior analysis. Our experimental results show that ACDC-JS enables performance benchmarking and debugging of state-of-the-art JavaScript virtual machines such as V8 and SpiderMonkey by exposing key aspects of their memory management performance.
View details