Back

Minisymposium Presentation

Efficient Training of GNN-based Material Science Applications at Scale: An Orchestration of Data Movement Approach

Tuesday, June 4, 2024
12:00
-
12:30
CEST
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Chemistry and Materials
Chemistry and Materials
Chemistry and Materials
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Humanities and Social Sciences
Humanities and Social Sciences
Humanities and Social Sciences
Engineering
Engineering
Engineering
Life Sciences
Life Sciences
Life Sciences
Physics
Physics
Physics

Presenter

Khaled
Ibrahim
-
Lawrence Berkeley National Laboratory

Khaled Ibrahim is a staff scientist with the Applied Mathematics and Computational Research Division at Lawrence Berkeley National Laboratory. His research interests include performance modeling and code optimization for high performance computing and machine learning applications. He has also done extensive research in communication runtime optimizations.

Description

Scalable data management techniques are crucial to effectively processing large volumes of scientific data on HPC platforms for distributed deep learning (DL) model training. Because of the need to access data randomly and frequently in stochastic optimizers, in-memory distributed storage that keeps the dataset in the local memory of each computing node is widely adopted over file-based I/O for its rapid speed. In this presentation, we discuss the tradeoff of various data exchange mechanisms. We present a hybrid in-memory data loader with multiple communication backends for distributed graph neural network training. We introduce a model-driven performance estimator to switch between communication mechanisms automatically at runtime. The performance estimator uses Tree of Parzen Estimators (TPE), a Bayesian Optimization method, to optimize model parameters and dynamically select the most efficient communication method for data loading. We present our evaluation on two US DOE supercomputers, NERSC Perlmutter and OLCF Summit, on a wide set of runtime configurations. Our optimized implementation outperforms a baseline using single-backend loaders by up to 2.83x and can accurately predict the suitable communication method with an average success rate of 96.3% (Perlmutter) and 94.3% (Summit).

Authors