Back

Minisymposium Presentation

Scaling Coupled Simulation and AI Workflows on Aurora with Dragon

Tuesday, June 4, 2024
17:30
-
18:00
CEST
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Chemistry and Materials
Chemistry and Materials
Chemistry and Materials
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Humanities and Social Sciences
Humanities and Social Sciences
Humanities and Social Sciences
Engineering
Engineering
Engineering
Life Sciences
Life Sciences
Life Sciences
Physics
Physics
Physics

Description

The advent of exascale computing has enabled computational workflows coupling simulation and AI of unprecedented scale and complexity. However, the scale of these workflows presents challenges for the efficient distribution of data between the various compute tasks spread across large node counts. In this talk, we present the Dragon open-source library as a tool for designing and executing data-intensive scientific workflows on modern HPC systems. In particular, Dragon’s sharded memory model allows compute tasks to access data stored in memory regardless of node locality by means of automated RDMA transfers, which are made available to the user through high-level data transfer APIs written in C, C++, and Python. This enables the transfer of interdependent data across different components of the workflow, avoiding costly I/O to the filesystem or deploying a database. We demonstrate the use of Dragon and its performance on the Aurora supercomputer at the Argonne Leadership Computing Facility with a workflow designed to identify new candidates for cancer drugs by combining simulation with ML training and inference to accelerate high-throughput screening of 22 billion molecular compounds.

Authors