Hannes Payer

Hannes Payer

Research Areas

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Concurrent Marking of Shape-Changing Objects
    Ulan Degenbaev
    Proceedings of the 2019 ACM SIGPLAN International Symposium on Memory Management, ACM, New York, NY, USA, pp. 89-102
    Preview abstract Efficient garbage collection is a key goal in engineering high-performance runtime systems. To reduce pause times, many collector designs traverse the object graph concurrently with the application, an optimization known as concurrent marking. Traditional concurrent marking imposes strict invariants on the object shapes: 1) static type layout of objects, 2) static object memory locations, 3) static object sizes. High performance virtual machines for dynamic languages, for example, the V8 JavaScript virtual machine used in the Google Chrome web browser, generally violate these constraints in pursuit of high throughput for a single thread. Taking V8 as an example, we show that some object shape changes are safe and can be handled by traditional concurrent marking algorithms. For unsafe shape changes, we introduce novel wait-free object snapshotting and lock-based concurrent marking algorithms and prove that they preserve key invariants. We implemented both algorithms in V8 and achieved performance improvements on various JavaScript benchmark suites and real-world web workloads. Concurrent marking of shape-changing objects using the wait-free object snapshotting algorithm is enabled by default in Chrome since version 64. View details
    Preview abstract A collaborative approach to reclaiming memory in heterogeneous software systems. View details
    Cross-Component Garbage Collection
    Ulan Degenbaev
    Proceedings of the ACM on Programming Languages, 2 Issue OOPSLA (2018), 151:1-151:24
    Preview abstract Embedding a modern language runtime as a component in a larger software system is popular these days. Communication between these systems often requires keeping references to each others' objects. In this paper we present and discuss the problem of cross-component memory management where reference cycles across component boundaries may lead to memory leaks and premature reclamation of objects may lead to dangling cross-component references. We provide a generic algorithm for effective, efficient, and safe garbage collection over component boundaries, which we call cross-component tracing. We designed and implemented cross-component tracing in the Chrome web browser where the JavaScript virtual machine V8 is embedded into the rendering engine Blink. Cross-component tracing from V8's JavaScript heap to Blink's C++ heap improves garbage collection latency and eliminates long-standing memory leaks for real websites in Chrome. We show how cross-component tracing can help web developers to reason about reachability and retainment of objects spanning both V8 and Blink components based on Chrome's heap snapshot memory tool. Cross-component tracing was enabled by default for all websites in Chrome version 57 and is also deployed in other widely used software systems such as Opera, Cobalt, and Electron. View details
    Preview abstract Over the last years, web browsing has been steadily shifting from desktop computers to mobile devices like smartphones and tablets. However, mobile browsers available today have mainly focused on performance rather than power consumption, although the battery life of a mobile device is one of the most important usability metrics. This is because many of these browsers have originated in the desktop domain and have been ported to the mobile domain. Such browsers have multiple power hungry components such as the rendering engine, and the JavaScript engine, and generate high workload without considering the capabilities and the power consumption characteristics of the underlying hardware platform. Also, the lack of coordination between a browser application and the power manager in the operating system (such as Android) results in poor power savings. In this paper, we propose a power manager that takes into account the internal state of a browser – that we refer to as a phase – and show with Google’s Chrome running on Android that up to 57.4% more energy can be saved over Android’s default power managers. We implemented and evaluated our technique on a heterogeneous multi-processing (HMP) ARM big.LITTLE platform such as the ones found in most modern smartphones. View details
    Web Browser Workload Characterization for Power Management on HMP Platforms
    Nadja Peters
    Samarjit Chakraborty
    Sangyoung Park
    Proceedings of the Eighth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '12) (2016)
    Preview abstract The volume of mobile web browsing traffic has significantly increased as well as the complexity of the mobile websites mandating high-performance JavaScript engines such as Google’s V8 to be used on mobile devices. Although there has been a significant improvement in performance of JavaScript engine on mobile phones in recent years, the power consumption re- duction has not been addressed much. This paper presents a case study for power management of JavaScript engine V8 from Google in web browsers on a heterogeneous multi- processing (HMP) platform. We analyze the detailed traces of the thread workload generated by the web browser and JavaScript engine, and discuss the power saving potentials in relation to power management policies on Android. We believe that this work will lead to development of practi- cal power management techniques considering thread allo- cation, dynamic voltage and frequency scaling (DVFS) and power-gating. View details
    Idle Time Garbage Collection Scheduling
    Ulan Degenbaev
    Manfred Ernst
    37th annual ACM SIGPLAN conference on Programming Language Design and Implementation, ACM, New York, NY, USA (2016), pp. 570-583
    Preview abstract Efficient garbage collection is increasingly important in today's managed language runtime systems that demand low latency, low memory consumption, and high throughput. Garbage collection may pause the application for many milliseconds to identify live memory, free unused memory, and compact fragmented regions of memory, even when employing concurrent garbage collection. In animation-based applications that require 60 frames per second, these pause times may be observable, degrading user experience. This paper introduces idle time garbage collection scheduling to increase the responsiveness of applications by hiding expensive garbage collection operations inside of small, otherwise unused idle portions of the application's execution, resulting in smoother animations. Additionally we take advantage of idleness to reduce memory consumption while allowing higher memory use when high throughput is required. We implemented idle time garbage collection scheduling in V8, an open-source, production JavaScript virtual machine running within Chrome. We present performance results on various benchmarks running popular webpages and show that idle time garbage collection scheduling can significantly improve latency and memory consumption. Furthermore, we introduce a new metric called frame time discrepancy to quantify the quality of the user experience and precisely measure the improvements that idle time garbage collection scheduling provides for a WebGL-based game benchmark. Idle time garbage collection scheduling is shipped and enabled by default in Chrome. View details
    Memento Mori: Dynamic Allocation-site-based Optimizations
    Michael Stanton
    Ben L. Titzer
    Proceedings of the 2015 ACM SIGPLAN International Symposium on Memory Management, ACM, New York, NY, USA, pp. 105-117
    Preview abstract Languages that lack static typing are ubiquitous in the world of mobile and web applications. The rapid rise of larger applications like interactive web GUIs, games, and cryptography presents a new range of implementation challenges for modern virtual machines to close the performance gap between typed and untyped languages. While all languages can benefit from efficient automatic memory management, languages like JavaScript present extra thrill with innocent-looking but difficult features like dynamically-sized arrays, deletable properties, and prototypes. Optimizing such languages requires complex dynamic techniques with more radical object layout strategies such as dynamically evolving representations for arrays. This paper presents a general approach for gathering temporal allocation site feedback that tackles both the general problem of object lifetime estimation and improves optimization of these problematic language features. We introduce a new implementation technique where allocation mementos processed by the garbage collector and runtime system efficiently tie objects back to allocation sites in the program and dynamically estimate object lifetime, representation, and size to inform three optimizations: pretenuring, pretransitioning, and presizing. Unlike previous work on pretenuring, our system utilizes allocation mementos to achieve fully dynamic allocation-site-based pretenuring in a production system. We implement all of our techniques in V8, a high performance virtual machine for JavaScript, and demonstrate solid performance improvements across a range of benchmarks. View details
    ACDC-JS: explorative benchmarking of javascript memory management
    Martin Aigner
    Thomas Huetter
    Christoph M. Kirsch
    Alexander Miller
    Mario Preishuber
    Proceedings of the 10th ACM Symposium on Dynamic Languages, ACM, New York, NY, USA (2014), pp. 67-78
    Preview abstract We present ACDC-JS, an open-source JavaScript memory management benchmarking tool. ACDC-JS incorporates a heap model based on real web applications and may be configured to expose virtually any relevant performance characteristics of JavaScript memory management systems. ACDC-JS is based on ACDC, a benchmarking tool for C/C++ that models periodic allocation and deallocation behavior (AC) as well as persistent memory (DC). We identify important characteristics of JavaScript mutator behavior and propose a configurable heap model based on typical distributions of these characteristics as foundation for ACDC-JS. We describe heap analyses of 13 real web applications extending existing work on JavaScript behavior analysis. Our experimental results show that ACDC-JS enables performance benchmarking and debugging of state-of-the-art JavaScript virtual machines such as V8 and SpiderMonkey by exposing key aspects of their memory management performance. View details
    Allocation Folding Based on Dominance
    Michael Starzinger
    Ben L. Titzer
    Proceedings of the 2014 International Symposium on Memory Management, ACM, New York, NY, USA
    Preview abstract Memory management system performance is of increasing importance in today's managed languages. Two lingering sources of overhead are the direct costs of memory allocations and write barriers. This paper introduces allocation folding, an optimization technique where the virtual machine automatically folds multiple memory allocation operations in optimized code together into a single, larger allocation group. An allocation group comprises multiple objects and requires just a single bounds check in a bump-pointer style allocation, rather than a check for each individual object. More importantly, all objects allocated in a single allocation group are guaranteed to be contiguous after allocation and thus exist in the same generation, which makes it possible to statically remove write barriers for reference stores involving objects in the same allocation group. Unlike object inlining, object fusing, and object colocation, allocation folding requires no special connectivity or ownership relation between the objects in an allocation group. We present our analysis algorithm to determine when it is safe to fold allocations together and discuss our implementation in V8, an open-source, production JavaScript virtual machine. We present performance results for the Octane and Kraken benchmark suites and show that allocation folding is a strong performance improvement, even in the presence of some heap fragmentation. Additionally, we use four hand-selected benchmarks JPEGEncoder, NBody, Soft3D, and Textwriter where allocation folding has a large impact. View details