Jump to Content
Michael R. Marty

Michael R. Marty

Mike Marty is an engineer at Google currently working on advanced technologies for its computing platform. His interests include computer architecture, technical infrastructure, distributed software infrastructure, and data center networking. Marty received a PhD in computer science from the University of Wisconsin-Madison.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Snap: a Microkernel Approach to Host Networking
    Jacob Adriaens
    Sean Bauer
    Carlo Contavalli
    Mike Dalton
    William C. Evans
    Nicholas Kidd
    Roman Kononov
    Carl Mauer
    Emily Musick
    Lena Olson
    Mike Ryan
    Erik Rubow
    Kevin Springborn
    Valas Valancius
    In ACM SIGOPS 27th Symposium on Operating Systems Principles, ACM, New York, NY, USA (2019) (to appear)
    Preview abstract This paper presents our design and experience with a microkernel-inspired approach to host networking called Snap. Snap is a userspace networking system that supports Google’s rapidly evolving needs with flexible modules that implement a range of network functions, including edge packet switching, virtualization for our cloud platform, traffic shaping policy enforcement, and a high-performance reliable messaging and RDMA-like service. Snap has been running in production for over three years, supporting the extensible communication needs of several large and critical systems. Snap enables fast development and deployment of new networking features, leveraging the benefits of address space isolation and the productivity of userspace software development together with support for transparently upgrading networking services without migrating applications off of a machine. At the same time, Snap achieves compelling performance through a modular architecture that promotes principled synchronization with minimal state sharing, and supports real-time scheduling with dynamic scaling of CPU resources through a novel kernel/userspace CPU scheduler co-design. Our evaluation demonstrates over 3x Gbps/core improvement compared to a kernel networking stack for RPC workloads, software-based RDMA-like performance of up to 5M IOPS/core, and transparent upgrades that are largely imperceptible to user applications. Snap is deployed to over half of our fleet of machines and supports the needs of numerous teams. View details
    Low-Overhead Network-on-Chip Support for Location-Oblivious Task Placement
    Gwangsun Kim
    Lee, M.M.-J.
    John Kim
    Dennis Abts
    IEEE Transactions on Computers, vol. Volume 63, Issue 6 (2014), pp. 1487 - 1500
    Preview abstract Many-core processors will have many processing cores with a network-on-chip (NoC) that provides access to shared resources such as main memory and on-chip caches. However, locally-fair arbitration in multi-stage NoC can lead to globally unfair access to shared resources and impact system-level performance depending on where each task is physically placed. In this work, we propose an arbitration to provide equality-of-service (EoS) in the network and provide support for location-oblivious task placement. We propose using probabilistic arbitration combined with distance-based weights to achieve EoS and overcome the limitation of round-robin arbiter. However, the complexity of probabilistic arbitration results in high area and long latency which negatively impacts performance. In order to reduce the hardware complexity, we propose an hybrid arbiter that switches between a simple arbiter at low load and a complex arbiter at high load. The hybrid arbiter is enabled by the observation that arbitration only impacts the overall performance and global fairness at a high load. We evaluate our arbitration scheme with synthetic traffic patterns and GPGPU benchmarks. Our results shows that hybrid arbiter that combines round-robin arbiter with probabilistic distance-based arbitration reduces performance variation as task placement is varied and also improves average IPC. View details
    Energy Proportional Datacenter Networks
    Dennis Abts
    Peter Klausler
    Proceedings of the International Symposium on Computer Architecture, ACM (2010), pp. 338-347
    Preview abstract Numerous studies have shown that datacenter computers rarely operate at full utilization, leading to a number of proposals for creating servers that are energy proportional with respect to the computation that they are performing. In this paper, we show that as servers themselves become more energy proportional, the datacenter network can become a significant fraction (up to 50%) of cluster power. In this paper we propose several ways to design a high-performance datacenter network whose power consumption is more proportional to the amount of traffic it is moving --- that is, we propose energy proportional datacenter networks. We first show that a flattened butterfly topology itself is inherently more power efficient than the other commonly proposed topology for high-performance datacenter networks. We then exploit the characteristics of modern plesiochronous links to adjust their power and performance envelopes dynamically. Using a network simulator, driven by both synthetic workloads and production datacenter traces, we characterize and understand design tradeoffs, and demonstrate an 85% reduction in power --- which approaches the ideal energy-proportionality of the network. Our results also demonstrate two challenges for the designers of future network switches: 1) We show that there is a significant power advantage to having independent control of each unidirectional channel comprising a network link, since many traffic patterns show very asymmetric use, and 2) system designers should work to optimize the high-speed channel designs to be more energy efficient by choosing optimal data rate and equalization technology. Given these assumptions, we demonstrate that energy proportional datacenter communication is indeed possible. View details
    Probabilistic Distance-based Arbitration: Providing Equality of Service for Many-core CMPs
    Michael M. Lee
    John Kim
    Dennis Abts
    Jae W. Lee
    MICRO43: Proceedings of the 43rd Annual International Symposium on Microarchitecture, IEEE/ACM (2010)
    Preview abstract Emerging many-core chip multiprocessors will integrate dozens of small processing cores with an on-chip interconnect consisting of point-to-point links. The interconnect enables the processing cores to not onl communicate, but to share common resources such as main memory resources and I/O controllers. In this work, we propose an arbitration scheme to enable equality of service (EoS) in access to a chip's shared resources. That is, we seek to remove any bias in a core's access to a shared resource based on its location within the CMP. We propose using probabilistic arbitration combined with distance-based weights to achieve EoSand overcome the limitation of conventional round-robin arbiter. We describe how nonlinear weights need to be used with probabilistic arbiters and propose three different arbitration weight metrics -- fixed weight, constantly increasing weight, and variably increasing weight. By only modifying the arbitration of an on-chip router, we do not require any additional buffers or virtual channels and create a simple, low-cost mechanism for achieving EoS. We evaluate our arbitration scheme across a wide range of traffic patterns. In addition to providing EoS, the proposed arbitration has additional benefits which include providing quality-of-service features (such as differentiated service) and providing fairness in terms of both throughput and latency that approaches the global fairness achieved with age-base arbitration -- thus, providing a more stable network by achieving high sustained throughput beyond saturation. View details
    Amdahl's Law in the Multicore Era
    Mark D. Hill
    IEEE Computer, vol. 41 (2008), pp. 33-38
    Preview abstract Augmenting Amdahl’s Law with a corollary for multicore hardware makes it relevant to future generations of chips with multiple processor cores. Obtaining optimal multicore performance will require further research in both extracting more parallelism and making sequential cores faster. View details
    Virtual Hierarchies
    Mark D. Hill
    IEEE Micro, vol. 28 (2008), pp. 99-109
    LogTM-SE: Decoupling Hardware Transactional Memory from Caches
    Luke Yen
    Jayaram Bobba
    Kevin E. Moore
    Haris Volos
    Mark D. Hill
    Michael M. Swift
    David A. Wood
    In HPCA 13 (2007), pp. 261-272
    Virtual Hierarchies to Support Server Consolidation
    Mark D. Hill
    Proceedings of the 34th annual International Symposium on Computer Architecture (ISCA), ACM (2007), pp. 46-56
    Coherence Ordering for Ring-based Chip Multiprocessors
    Mark D. Hill
    IEEE/ACM International Symposium on Microarchitecture (MICRO), vol. 0 (2006), pp. 309-320
    ASR: Adaptive Selective Replication for CMP Caches
    Bradford M. Beckmann
    David A. Wood
    IEEE/ACM International Symposium on Microarchitecture (MICRO), vol. 0 (2006), pp. 443-454
    Improving Multiple-CMP Systems Using Token Coherence
    Jesse D. Bingham
    Mark D. Hill
    Alan J. Hu
    Milo M. K. Martin
    David A. Wood
    High-Performance Computer Architecture, International Symposium on, vol. 0 (2005), pp. 328-339
    Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
    Milo M.K. Martin
    Daniel J. Sorin
    Bradford M. Beckmann
    Min Xu
    Alaa R. Alameldeen
    Kevin E. Moore
    Mark D. Hill
    David A. Wood
    ACM SIGARCH Computer Architecture News, vol. 33 (2005), pp. 92-99