Back

Minisymposium Presentation

TorchFort: A Library for Online Deep Learning in Fortran HPC Programs

Tuesday, June 4, 2024
17:00
-
17:30
CEST
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Chemistry and Materials
Chemistry and Materials
Chemistry and Materials
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Humanities and Social Sciences
Humanities and Social Sciences
Humanities and Social Sciences
Engineering
Engineering
Engineering
Life Sciences
Life Sciences
Life Sciences
Physics
Physics
Physics

Presenter

Thorsten
Kurth
-
NVIDIA Inc.

Thorsten works at NVIDIA on optimizing scientific codes for GPU based supercomputers. His main focus is on providing optimized deep learning applications for HPC systems, including MLPerf HPC benchmark applications. These include end-to-end optimizations such as input pipeline including IO tuning and distributed training. In 2018 he was awarded the Gordon Bell Prize for the first Deep Learning application which achieved more than 1 ExaOp peak performance on the OLCF Summit HPC system. In 2020 he was awarded the Gordon Bell Special Prize for HPC-based Covid-19 research for efficiently generating large ensembles of scientifically relevant Spike Trimer confirmations using the AI driven MD simulations workflow DeepDriveMD.

Description

Deep learning has shown promise in reducing computational cost or as an alternative method for modeling physical phenomena for a broad range of scientific applications. In these domains, the data sources are numerical simulation programs typically implemented in C, C++, or still often, Fortran. This is in contrast to popular deep learning frameworks that users interact with using Python. A source of friction that often arises is how to efficiently couple the simulation program with the DL framework for training or inference.

In this talk, we discuss TorchFort, a library for online DL training and inference implemented with LibTorch, the C++ backend used by PyTorch. This library can be invoked directly from Fortran/C/C++, enabling transparent sharing of data arrays from the simulation program to the DL framework, all contained within the simulation process. We will talk about the library design and some implementation examples to present opportunities this tight coupling presents for DL applications.

Authors