Ross McIlroy

Ross McIlroy

Research Areas

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Over the last years, web browsing has been steadily shifting from desktop computers to mobile devices like smartphones and tablets. However, mobile browsers available today have mainly focused on performance rather than power consumption, although the battery life of a mobile device is one of the most important usability metrics. This is because many of these browsers have originated in the desktop domain and have been ported to the mobile domain. Such browsers have multiple power hungry components such as the rendering engine, and the JavaScript engine, and generate high workload without considering the capabilities and the power consumption characteristics of the underlying hardware platform. Also, the lack of coordination between a browser application and the power manager in the operating system (such as Android) results in poor power savings. In this paper, we propose a power manager that takes into account the internal state of a browser – that we refer to as a phase – and show with Google’s Chrome running on Android that up to 57.4% more energy can be saved over Android’s default power managers. We implemented and evaluated our technique on a heterogeneous multi-processing (HMP) ARM big.LITTLE platform such as the ones found in most modern smartphones. View details
    Idle Time Garbage Collection Scheduling
    Ulan Degenbaev
    Manfred Ernst
    37th annual ACM SIGPLAN conference on Programming Language Design and Implementation, ACM, New York, NY, USA(2016), pp. 570-583
    Preview abstract Efficient garbage collection is increasingly important in today's managed language runtime systems that demand low latency, low memory consumption, and high throughput. Garbage collection may pause the application for many milliseconds to identify live memory, free unused memory, and compact fragmented regions of memory, even when employing concurrent garbage collection. In animation-based applications that require 60 frames per second, these pause times may be observable, degrading user experience. This paper introduces idle time garbage collection scheduling to increase the responsiveness of applications by hiding expensive garbage collection operations inside of small, otherwise unused idle portions of the application's execution, resulting in smoother animations. Additionally we take advantage of idleness to reduce memory consumption while allowing higher memory use when high throughput is required. We implemented idle time garbage collection scheduling in V8, an open-source, production JavaScript virtual machine running within Chrome. We present performance results on various benchmarks running popular webpages and show that idle time garbage collection scheduling can significantly improve latency and memory consumption. Furthermore, we introduce a new metric called frame time discrepancy to quantify the quality of the user experience and precisely measure the improvements that idle time garbage collection scheduling provides for a WebGL-based game benchmark. Idle time garbage collection scheduling is shipped and enabled by default in Chrome. View details
    A JVM for the Barrelfish operating system
    Martin Maas
    Proceedings of the 2nd Workshop on Systems for Future Multicore Architectures(2012)
    AC: composable asynchronous IO for native languages
    Tim Harris 0001
    Rebecca Isaacs
    OOPSLA(2011), pp. 903-920
    Hera-JVM: a runtime system for heterogeneous multi-core architectures
    Joe Sventek
    OOPSLA '10 Proceedings of the ACM international conference on Object Oriented Programming Systems Languages and Applications, ACM(2010), pp. 205-222
    Preview abstract Heterogeneous multi-core processors, such as the IBM Cell processor, can deliver high performance. However, these processors are notoriously difficult to program: different cores support different instruction set architectures, and the processor as a whole does not provide coherence between the different cores' local memories. We present Hera-JVM, an implementation of the Java Virtual Machine which operates over the Cell processor, thereby making this platforms more readily accessible to mainstream developers. Hera-JVM supports the full Java language; threads from an unmodified Java application can be simultaneously executed on both the main PowerPC-based core and on the additional SPE accelerator cores. Migration of threads between these cores is transparent from the point of view of the application, requiring no modification to Java source code or bytecode. Hera-JVM supports the existing Java Memory Model, even though the underlying hardware does not provide cache coherence between the different core types. We examine Hera-JVM's performance under a series of real-world Java benchmarks from the SpecJVM, Java Grande and Dacapo benchmark suites. These benchmarks show a wide variation in relative performance on the different core types of the Cell processor, depending upon the nature of their workload. Execution of these benchmarks on Hera-JVM can achieve speedups of up to 2.25x by using one of the Cell processor's SPE accelerator cores, compared to execution on the main PowerPC-based core. When all six SPE cores are exploited, parallel workloads can achieve speedups of up to 13x compared to execution on the single PowerPC core. View details
    Helios: heterogeneous multiprocessing with satellite kernels
    Edmund B. Nightingale
    Orion Hodson
    Chris Hawblitzel
    Galen Hunt
    Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP), ACM(2009), pp. 221-234
    Preview abstract Helios is an operating system designed to simplify the task of writing, deploying, and tuning applications for heterogeneous platforms. Helios introduces satellite kernels, which export a single, uniform set of OS abstractions across CPUs of disparate architectures and performance characteristics. Access to I/O services such as file systems are made transparent via remote message passing, which extends a standard microkernel message-passing abstraction to a satellite kernel infrastructure. Helios retargets applications to available ISAs by compiling from an intermediate language. To simplify deploying and tuning application performance, Helios exposes an affinity metric to developers. Affinity provides a hint to the operating system about whether a process would benefit from executing on the same platform as a service it depends upon. We developed satellite kernels for an XScale programmable I/O card and for cache-coherent NUMA architectures. We offloaded several applications and operating system components, often by changing only a single line of metadata. We show up to a 28% performance improvement by offloading tasks to the XScale I/O card. On a mail-server benchmark, we show a 39% improvement in performance by automatically splitting the application among multiple NUMA domains. View details
    Xenotiny: Emulating wireless sensor networks on xen
    Joseph Sventek
    Alasdair Maclean
    Grzegorz Miłoś
    University of Glasgow(2009)
    Preview abstract The large-scale and inaccessibility of deployed wireless sensor networks mandate that the code installed in sensor nodes be rigorously tested prior to deployment. Such testing is primarily achieved using discrete event simulators designed to provide “high fidelity” simulation of the communications between nodes. Discrete event simulators, by their very nature, mask race conditions in the code since simulated interrupts never interrupt running code; an additional limitation of most such simulators is the requirement that all simulated nodes execute the same application code, at variance with common practice in actual deployments. Since both of these problems reduce confidence in the deployed system, the focus of this work is to eliminate these problems via complete emulation of wireless sensor networks using virtualization techniques. In particular, a version of TinyOS is described, XenoTiny, which can be executed as a guest domain over the Xen virtualization hypervisor. XenoTiny is well integrated with the TinyOS build process. Since each node runs independently in its own guest domain, race conditions are able to manifest themselves, and each node can run a node-appropriate application. The hardware emulation is performed at the lowest possible hardware abstraction layer, thus maximizing the amount of actual TinyOS code that is executed during emulation. Finally, a novel Xen-specific radio model mechanism has been introduced, easing the introduction of different radio models for use during emulation runs. View details
    Efficient Dynamic Heap Allocation of Scratch-Pad Memory
    Peter Dickman
    Joe Sventek
    Proc. 7th International Symposium on Memory Management, ACM, Tucson(2008), pp. 31-40