Minisymposium Presentation
HydraGNN: Scalable Machine Learning and Generative AI for Accelerating Materials Design
Description
We discuss the challenges involved in developing large-scale training for generative AI models aimed at material design. We employ HydraGNN, a scalable graph neural network (GNN) framework, alongside DDStore, a distributed in-memory data store, to facilitate large-scale data distribution across the supercomputing resources provided by the US Department of Energy (DOE). Our discussion includes insights into our implementation and the notable decrease in I/O overhead within HPC environments. The effectiveness of HydraGNN and DDStore is showcased through its application for molecular design, where a GNN model learns to predict the ultraviolet-visible spectrum based on a dataset comprising over 10 million molecules. By enabling efficient training scale-up to thousands of GPUs on the Summit and Perlmutter supercomputers, DDStore has achieved a significant boost in DL training speed, recording up to a 6.15 times faster performance than our initial methods. We will discuss the performance advancements on the new Frontier supercomputer at the Oak Ridge National Laboratory (ORNL), highlighting the evolving landscape of supercomputing in AI research.