Michael R. Marty
Mike Marty is an engineer at Google currently working on advanced technologies for its computing platform. His interests include computer architecture, technical infrastructure, distributed software infrastructure, and data center networking. Marty received a PhD in computer science from the University of Wisconsin-Madison.
Authored Publications
Sort By
Snap: a Microkernel Approach to Host Networking
Jacob Adriaens
Sean Bauer
Carlo Contavalli
Mike Dalton
William C. Evans
Nicholas Kidd
Roman Kononov
Carl Mauer
Emily Musick
Lena Olson
Mike Ryan
Erik Rubow
Kevin Springborn
Valas Valancius
In ACM SIGOPS 27th Symposium on Operating Systems Principles, ACM, New York, NY, USA (2019) (to appear)
Preview abstract
This paper presents our design and experience with a microkernel-inspired approach to host networking called Snap. Snap is a userspace networking system that supports Google’s rapidly evolving needs with flexible modules that implement a range of network functions, including edge packet switching, virtualization for our cloud platform, traffic shaping policy enforcement, and a high-performance reliable messaging and RDMA-like service. Snap has been running in production for over three years, supporting the extensible communication needs of several large and critical systems.
Snap enables fast development and deployment of new networking features, leveraging the benefits of address space isolation and the productivity of userspace software development together with support for transparently upgrading networking services without migrating applications off of a machine. At the same time, Snap achieves compelling performance through a modular architecture that promotes principled synchronization with minimal state sharing, and supports real-time scheduling with dynamic scaling of CPU resources through a novel kernel/userspace CPU scheduler co-design. Our evaluation demonstrates over 3x Gbps/core improvement compared to a kernel networking stack for RPC workloads, software-based RDMA-like performance of up to 5M IOPS/core, and transparent upgrades that are largely imperceptible to user applications. Snap is deployed to over half of our fleet of machines and supports the needs of numerous teams.
View details
Low-Overhead Network-on-Chip Support for Location-Oblivious Task Placement
Gwangsun Kim
Lee, M.M.-J.
John Kim
Dennis Abts
IEEE Transactions on Computers, Volume 63, Issue 6 (2014), pp. 1487 - 1500
Preview abstract
Many-core processors will have many processing cores with a network-on-chip (NoC) that provides access to shared resources such as main memory and on-chip caches. However, locally-fair arbitration in multi-stage NoC can lead to globally unfair access to shared resources and impact system-level performance depending on where each task is physically placed. In this work, we propose an arbitration to provide equality-of-service (EoS) in the network and provide support for location-oblivious task placement. We propose using probabilistic arbitration combined with distance-based weights to achieve EoS and overcome the limitation of round-robin arbiter. However, the complexity of probabilistic arbitration results in high area and long latency which negatively impacts performance. In order to reduce the hardware complexity, we propose an hybrid arbiter that switches between a simple arbiter at low load and a complex arbiter at high load. The hybrid arbiter is enabled by the observation that arbitration only impacts the overall performance and global fairness at a high load. We evaluate our arbitration scheme with synthetic traffic patterns and GPGPU benchmarks. Our results shows that hybrid arbiter that combines round-robin arbiter with probabilistic distance-based arbitration reduces performance variation as task placement is varied and also improves average IPC.
View details
Probabilistic Distance-based Arbitration: Providing Equality of Service for Many-core CMPs
Michael M. Lee
John Kim
Dennis Abts
Jae W. Lee
MICRO43: Proceedings of the 43rd Annual International Symposium on Microarchitecture, IEEE/ACM (2010)
Preview abstract
Emerging many-core chip multiprocessors will integrate dozens of small
processing cores with an on-chip interconnect consisting of point-to-point links. The interconnect enables the processing cores to not onl communicate, but to share common resources such as main memory resources and I/O controllers. In this work, we propose an arbitration scheme to enable equality of service (EoS) in access to a chip's shared resources. That is, we seek to remove any bias in a core's access to a shared resource based on its location within the CMP.
We propose using probabilistic arbitration combined with distance-based weights to achieve EoSand overcome the limitation of conventional round-robin arbiter. We describe how nonlinear weights need to be used with probabilistic arbiters and propose three different arbitration weight metrics -- fixed weight, constantly increasing weight, and variably increasing weight. By only modifying the arbitration of an on-chip router, we do not require any additional buffers or virtual channels and create a simple, low-cost mechanism for achieving EoS. We evaluate our arbitration scheme across a wide range of traffic patterns. In addition to providing EoS, the proposed arbitration has additional benefits which include providing quality-of-service features (such as differentiated service) and providing fairness in terms of both throughput and latency that approaches the global fairness achieved with age-base arbitration -- thus, providing a more stable network
by achieving high sustained throughput beyond saturation.
View details
Energy Proportional Datacenter Networks
Dennis Abts
Peter Klausler
Proceedings of the International Symposium on Computer Architecture, ACM (2010), pp. 338-347
Preview abstract
Numerous studies have shown that datacenter computers rarely operate at full utilization, leading to a number of proposals for creating servers that are energy proportional with respect to the computation that they are performing. In this paper, we show that as servers themselves become more energy proportional, the datacenter network can become a significant fraction (up to 50%) of cluster power. In this paper we propose several ways to design a high-performance datacenter network whose power consumption is more proportional to the amount of traffic it is moving --- that is, we propose energy proportional datacenter networks.
We first show that a flattened butterfly topology itself is inherently more power efficient than the other commonly proposed topology for high-performance datacenter networks. We then exploit the characteristics of modern plesiochronous links to adjust their power and performance envelopes dynamically. Using a network simulator, driven by both synthetic workloads and production datacenter traces, we characterize and understand design tradeoffs, and demonstrate an 85% reduction in power --- which approaches the ideal energy-proportionality of the network.
Our results also demonstrate two challenges for the designers of future network switches: 1) We show that there is a significant power advantage to having independent control of each unidirectional channel comprising a network link, since many traffic patterns show very asymmetric use, and 2) system designers should work to optimize the high-speed channel designs to be more energy efficient by choosing optimal data rate and equalization technology. Given these assumptions, we demonstrate that energy proportional datacenter communication is indeed possible.
View details
Preview abstract
Augmenting Amdahl’s Law with a corollary for multicore hardware makes it relevant to future generations of chips with multiple processor cores. Obtaining optimal multicore performance will require further research in both extracting more parallelism and making sequential cores faster.
View details