Google Research

TRAM: Optimizing Fine-grained Communication with Topological Routing and Aggregation of Messages

  • Lukasz Wesolowski
  • Ramprasad Venkataraman
  • A Gupta
  • Jae-Seung Yeom
  • Keith Bisset
  • Yanhua Sun
  • Pritish Jetley
  • Thomas Quinn
  • Laxmikant Kale
International Conference on Parallel Processing (2014)

Abstract

Fine-grained communication in supercomputing applications often limits performance through high communication overhead and poor utilization of network bandwidth. This paper presents Topological Routing and Aggregation Module (TRAM), a library that optimizes fine-grained communication performance by routing and dynamically combining short messages. TRAM collects units of fine-grained communication from the application and combines them into aggregated messages with a common intermediate destination. It routes these messages along a virtual mesh topology mapped onto the physical topology of the network. TRAM improves network bandwidth utilization and reduces communication overhead. It is particularly effective in optimizing patterns with global communication and large message counts, such as all to-all and many-to-many, as well as sparse, irregular, dynamic or data dependent patterns. We demonstrate how TRAM improves performance through theoretical analysis and experimental verification using benchmarks and scientific applications. We present speedups on petascale systems of 6x for communication benchmarks and up to 4x for applications.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work