Distributed Data Processing for Large-Scale Simulations on Cloud

Lily Hu

Qing Wang

Stephan Hoyer

TJ Lu

Yi-fan Chen

2021 IEEE INTERNATIONAL SYMPOSIUM ON ELECTROMAGNETIC COMPATIBILITY, SIGNAL & POWER INTEGRITY(2021) (to appear)

Download Google Scholar

Abstract

In this work, we proposed a distributed data pipeline for large-scale simulations by using libraries and frameworks available on Cloud services. The data pipeline is designed with careful considerations for the characteristics of the simulation data. The implementation of the data pipeline is with Apache Beam and Zarr. Beam is a unified, open-source programming model for building both batch- and streaming-data parallel-processing pipelines. By using Beam, one can simply focus on the logical composition of the data processing task and bypass the low-level details of distributed computing. The orchestration of distributed processing is fully managed by the runner, in this work, Dataflow on Google Cloud. Beam separates the programming layer from the runtime layer such that the proposed pipeline can be executed across various runners. The storage format of the output tensor of the data pipeline is Zarr. Zarr allows concurrent reading and writing, storage on a file system, and data compression before the storage. The performance of the data pipeline is analyzed with an example, of which the simulation data is obtained with an in-house developed computational fluid dynamic solver running in parallel on Tensor Processing Unit (TPU) clusters. The performance analysis demonstrates good storage and computational efficiency of the proposed data pipeline.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Distributed Data Processing for Large-Scale Simulations on Cloud

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Distributed Data Processing for Large-Scale Simulations on Cloud

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities