Jump to Content
Jichuan Chang

Jichuan Chang

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Profiling Hyperscale Big Data Processing
    Aasheesh Kolli
    Abraham Gonzalez
    Samira Khan
    Sihang Liu
    Krste Asanovic
    ISCA (2023)
    Preview abstract Computing demand continues to grow exponentially, largely driven by "big data" processing on hyperscale data stores. At the same time, the slowdown in Moore's law is leading the industry to embrace custom computing in large-scale systems. Taken together, these trends motivate the need to characterize live production traffic on these large data processing platforms and understand the opportunity of acceleration at scale. This paper addresses this key need. We characterize three important production distributed database and data analytics platforms at Google to identify key hardware acceleration opportunities and perform a comprehensive limits study to understand the trade-offs among various hardware acceleration strategies. We observe that hyperscale data processing platforms spend significant time on distributed storage and other remote work across distributed workers. Therefore, optimizing storage and remote work in addition to compute acceleration is critical for these platforms. We present a detailed breakdown of the compute-intensive functions in these platforms and identify dominant key data operations related to datacenter and systems taxes. We observe that no single accelerator can provide a significant benefit but collectively, a sea of accelerators, can accelerate many of these smaller platform-specific functions. We demonstrate the potential gains of the sea of accelerators proposal in a limits study and analytical model. We perform a comprehensive study to understand the trade-offs between accelerator location (on-chip/off-chip) and invocation model (synchronous/asynchronous). We propose and evaluate a chained accelerator execution model where identified compute-intensive functions are accelerated and pipelined to avoid invocation from the core, achieving a 3x improvement over the baseline system while nearly matching identical performance to an ideal fully asynchronous execution model. View details
    Software-defined far memory in warehouse-scale computers
    Andres Lagar-Cavilla
    Suleiman Souhlal
    Neha Agarwal
    Radoslaw Burny
    Shakeel Butt
    Junaid Shahid
    Greg Thelen
    Kamil Adam Yurtsever
    Yu Zhao
    International Conference on Architectural Support for Programming Languages and Operating Systems (2019)
    Preview abstract Increasing memory demand and slowdown in technology scaling pose important challenges to total cost of ownership (TCO) of warehouse-scale computers (WSCs). One promising idea to reduce the memory TCO is to add a cheaper, but slower, "far memory" tier and use it to store infrequently accessed (or cold) data. However, introducing a far memory tier brings new challenges around dynamically responding to workload diversity and churn, minimizing stranding of capacity, and addressing brownfield (legacy) deployments. We present a novel software-defined approach to far memory that proactively compresses cold memory pages to effectively create a far memory tier in software. Our end-to-end system design encompasses new methods to define performance service-level objectives (SLOs), a mechanism to identify cold memory pages while meeting the SLO, and our implementation in the OS kernel and node agent. Additionally, we design learning-based autotuning to periodically adapt our design to fleet-wide changes without a human in the loop. Our system has been successfully deployed across Google's WSC since 2016, serving thousands of production services. Our software-defined far memory is significantly cheaper (67% or higher memory cost reduction) at relatively good access speeds (6 us) and allows us to store a significant fraction of infrequently accessed data (on average, 20%), translating to significant TCO savings at warehouse scale. View details
    Preview abstract The explosion in workload complexity and the recent slow-down in Moore's law scaling call for new approaches towards efficient computing. Researchers are now beginning to use recent advances in machine learning in software optimizations, augmenting or replacing traditional heuristics and data structures. However, the space of machine learning for computer hardware architecture is only lightly explored. In this paper, we demonstrate the potential of deep learning to address the von Neumann bottleneck of memory performance. We focus on the critical problem of learning memory access patterns, with the goal of constructing accurate and efficient memory prefetchers. We relate contemporary prefetching strategies to n-gram models in natural language processing, and show how recurrent neural networks can serve as a drop-in replacement. On a suite of challenging benchmark datasets, we find that neural networks consistently demonstrate superior performance in terms of precision and recall. This work represents the first step towards practical neural-network based prefetching, and opens a wide range of exciting directions for machine learning in computer architecture research. View details
    Near-Data Processing: Insights from a MICRO-46 Workshop
    Rajeev Balasubramonian
    Troy Manning
    Jaime H. Moreno
    Richard Murphy
    Ravi Nair
    Steven Swanson
    IEEE Micro (Special Issue on Big Data), vol. 34 (2014), pp. 36-43
    Preview abstract The cost of data movement in big-data systems motivates careful examination of near-data processing (NDP) frameworks. The concept of NDP was actively researched in the 1990s, but gained little commercial traction. After a decade-long dormancy, interest in this topic has spiked. A workshop on NDP was organized at MICRO-46 and was well attended. Given the interest, the organizers and keynote speakers have attempted to capture the key insights from the workshop into an article that can be widely disseminated. This article describes the many reasons why NDP is compelling today and identifies key upcoming challenges in realizing the potential of NDP. View details
    Hardware acceleration for similarity measurement in natural language processing
    Prateek Tandon
    Vahed Qazvinian
    Ronald G. Dreslinski
    Thomas F. Wenisch
    ISLPED (2013), pp. 409-414
    A limits study of benefits from nanostore-based future data-centric system architectures
    Trevor N. Mudge
    David Roberts
    Mehul A. Shah
    Kevin T. Lim
    Conf. Computing Frontiers (2012), pp. 33-42
    Free-p: A Practical End-to-End Nonvolatile Memory Protection Mechanism
    Doe Hyun Yoon
    Naveen Muralimanohar
    Norman P. Jouppi
    Mattan Erez
    IEEE Micro, vol. 32 (2012), pp. 79-87
    (Re)Designing Data-Centric Data Centers
    IEEE Micro, vol. 32 (2012), pp. 66-70
    Totally green: evaluating and designing servers for lifecycle environmental impact
    Justin Meza
    Amip Shah
    Rocky Shih
    Cullen Bash
    ASPLOS (2012), pp. 25-36
    Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management
    Justin Meza
    HanBin Yoon
    Onur Mutlu
    Computer Architecture Letters, vol. 11 (2012), pp. 61-64
    System-level implications of disaggregated memory
    Kevin T. Lim
    Yoshio Turner
    Jose Renato Santos
    Alvin AuYoung
    Thomas F. Wenisch
    HPCA (2012), pp. 189-200
    BOOM: Enabling mobile memory based low-power server DIMMs
    Doe Hyun Yoon
    Naveen Muralimanohar
    ISCA (2012), pp. 25-36
    FREE-p: Protecting non-volatile memory against both hard and soft errors
    Doe Hyun Yoon
    Naveen Muralimanohar
    Norman P. Jouppi
    Mattan Erez
    HPCA (2011), pp. 466-477
    System-level integrated server architectures for scale-out datacenters
    Sheng Li
    Kevin T. Lim
    Paolo Faraboschi
    Norman P. Jouppi
    MICRO (2011), pp. 260-271
    Saving the World, One Server at a Time, Together
    IEEE Computer, vol. 44 (2011), pp. 91-93
    Server Designs for Warehouse-Computing Environments
    Kevin T. Lim
    Chandrakant D. Patel
    Trevor N. Mudge
    Steven K. Reinhardt
    IEEE Micro, vol. 29 (2009), pp. 41-49
    Disaggregated memory for expansion and sharing in blade servers
    Kevin T. Lim
    Trevor N. Mudge
    Steven K. Reinhardt
    Thomas F. Wenisch
    ISCA (2009), pp. 267-278
    Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments
    Kevin T. Lim
    Chandrakant D. Patel
    Trevor N. Mudge
    Steven K. Reinhardt
    ISCA (2008), pp. 315-326