![Ross McIlroy](https://storage.googleapis.com/gweb-research2023-media/pubtools/643.png)
Ross McIlroy
Research Areas
Authored Publications
Google Publications
Other Publications
Sort By
Preview abstract
Over the last years, web browsing has been steadily shifting from desktop computers to mobile devices like smartphones and tablets. However, mobile browsers available today have mainly focused on performance rather than power consumption, although the battery life of a mobile device is one of the most important usability metrics. This is because many of these browsers have originated in the desktop domain and have been ported to the mobile domain. Such browsers have multiple power hungry components such as the rendering engine, and the JavaScript engine, and generate high workload without considering the capabilities and the power consumption characteristics of the underlying hardware platform. Also, the lack of coordination between a browser application and the power manager in the operating system (such as Android) results in poor power savings. In this paper, we propose a power manager that takes into account the internal state of a browser – that we refer to as a phase – and show with Google’s Chrome running on Android that up to 57.4% more energy can be saved over Android’s default power managers. We implemented and evaluated our technique on a heterogeneous multi-processing (HMP) ARM big.LITTLE platform such as the ones found in most modern smartphones.
View details
Idle Time Garbage Collection Scheduling
Ulan Degenbaev
Manfred Ernst
37th annual ACM SIGPLAN conference on Programming Language Design and Implementation, ACM, New York, NY, USA(2016), pp. 570-583
Preview abstract
Efficient garbage collection is increasingly important in today's managed language runtime systems that demand low latency, low memory consumption, and high throughput. Garbage collection may pause the application for many milliseconds to identify live memory, free unused memory, and compact fragmented regions of memory, even when employing concurrent garbage collection. In animation-based applications that require 60 frames per second, these pause times may be observable, degrading user experience. This paper introduces idle time garbage collection scheduling to increase the responsiveness of applications by hiding expensive garbage collection operations inside of small, otherwise unused idle portions of the application's execution, resulting in smoother animations. Additionally we take advantage of idleness to reduce memory consumption while allowing higher memory use when high throughput is required. We implemented idle time garbage collection scheduling in V8, an open-source, production JavaScript virtual machine running within Chrome. We present performance results on various benchmarks running popular webpages and show that idle time garbage collection scheduling can significantly improve latency and memory consumption. Furthermore, we introduce a new metric called frame time discrepancy to quantify the quality of the user experience and precisely measure the improvements that idle time garbage collection scheduling provides for a WebGL-based game benchmark. Idle time garbage collection scheduling is shipped and enabled by default in Chrome.
View details
A JVM for the Barrelfish operating system
Martin Maas
Proceedings of the 2nd Workshop on Systems for Future Multicore Architectures(2012)
AC: composable asynchronous IO for native languages
Hera-JVM: a runtime system for heterogeneous multi-core architectures
Joe Sventek
OOPSLA '10 Proceedings of the ACM international conference on Object Oriented Programming Systems Languages and Applications, ACM(2010), pp. 205-222
Preview abstract
Heterogeneous multi-core processors, such as the IBM Cell processor, can deliver high performance. However, these processors are notoriously difficult to program: different cores support different instruction set architectures, and the processor as a whole does not provide coherence between the different cores' local memories.
We present Hera-JVM, an implementation of the Java Virtual Machine which operates over the Cell processor, thereby making this platforms more readily accessible to mainstream developers. Hera-JVM supports the full Java language; threads from an unmodified Java application can be simultaneously executed on both the main PowerPC-based core and on the additional SPE accelerator cores. Migration of threads between these cores is transparent from the point of view of the application, requiring no modification to Java source code or bytecode. Hera-JVM supports the existing Java Memory Model, even though the underlying hardware does not provide cache coherence between the different core types.
We examine Hera-JVM's performance under a series of real-world Java benchmarks from the SpecJVM, Java Grande and Dacapo benchmark suites. These benchmarks show a wide variation in relative performance on the different core types of the Cell processor, depending upon the nature of their workload. Execution of these benchmarks on Hera-JVM can achieve speedups of up to 2.25x by using one of the Cell processor's SPE accelerator cores, compared to execution on the main PowerPC-based core. When all six SPE cores are exploited, parallel workloads can achieve speedups of up to 13x compared to execution on the single PowerPC core.
View details
Helios: heterogeneous multiprocessing with satellite kernels
Edmund B. Nightingale
Orion Hodson
Chris Hawblitzel
Galen Hunt
Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP), ACM(2009), pp. 221-234
Preview abstract
Helios is an operating system designed to simplify the task of writing, deploying, and tuning applications for heterogeneous platforms. Helios introduces satellite kernels, which export a single, uniform set of OS abstractions across CPUs of disparate architectures and performance characteristics. Access to I/O services such as file systems are made transparent via remote message passing, which extends a standard microkernel message-passing abstraction to a satellite kernel infrastructure. Helios retargets applications to available ISAs by compiling from an intermediate language. To simplify deploying and tuning application performance, Helios exposes an affinity metric to developers. Affinity provides a hint to the operating system about whether a process would benefit from executing on the same platform as a service it depends upon.
We developed satellite kernels for an XScale programmable I/O card and for cache-coherent NUMA architectures. We offloaded several applications and operating system components, often by changing only a single line of metadata. We show up to a 28% performance improvement by offloading tasks to the XScale I/O card. On a mail-server benchmark, we show a 39% improvement in performance by automatically splitting the application among multiple NUMA domains.
View details
Xenotiny: Emulating wireless sensor networks on xen
Preview abstract
The large-scale and inaccessibility of deployed wireless sensor
networks mandate that the code installed in sensor nodes be rigorously
tested prior to deployment. Such testing is primarily
achieved using discrete event simulators designed to provide
“high fidelity” simulation of the communications between nodes.
Discrete event simulators, by their very nature, mask race conditions
in the code since simulated interrupts never interrupt running
code; an additional limitation of most such simulators is the
requirement that all simulated nodes execute the same application
code, at variance with common practice in actual deployments.
Since both of these problems reduce confidence in the deployed
system, the focus of this work is to eliminate these problems via
complete emulation of wireless sensor networks using virtualization
techniques. In particular, a version of TinyOS is described,
XenoTiny, which can be executed as a guest domain over the Xen
virtualization hypervisor. XenoTiny is well integrated with the
TinyOS build process. Since each node runs independently in its
own guest domain, race conditions are able to manifest themselves,
and each node can run a node-appropriate application.
The hardware emulation is performed at the lowest possible
hardware abstraction layer, thus maximizing the amount of actual
TinyOS code that is executed during emulation. Finally, a novel
Xen-specific radio model mechanism has been introduced, easing
the introduction of different radio models for use during emulation
runs.
View details
Efficient Dynamic Heap Allocation of Scratch-Pad Memory
Peter Dickman
Joe Sventek
Proc. 7th International Symposium on Memory Management, ACM, Tucson(2008), pp. 31-40