Back

Minisymposium

MS3E - In Situ Coupling of Simulations and AI/ML for HPC: Software, Methodologies, and Applications - Part I

Fully booked
Tuesday, June 4, 2024
11:00
-
13:00
CEST
HG E 3

Replay

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Session Chair

Description

Motivated by the remarkable success of artificial intelligence (AI) and machine learning (ML) in the fields of computer vision and natural language processing, over the last decade there has been a host of successful applications of AI/ML to a variety of scientific domains. In most cases, the models are trained using the traditional offline (or post hoc) approach, wherein the training data is produced, assembled, and curated separately before training is deployed. While more straightforward, the offline training workflow can impose some important restrictions to the adoption of ML models for scientific applications. To solve these limitations, in situ (or online) ML approaches, wherein ML tasks are performed concurrently to the ongoing simulation, have recently emerged as an attractive new paradigm. In this minisymposium, we explore novel approaches to enable the coupling of state-of-the-art simulation codes with different AI/ML techniques. We discuss the open-source software libraries that are being developed to solve the software engineering challenges of in situ ML workflows, as well as the methodologies adopted to scale on modern HPC systems and their applications to solve complex problems in different computational science domains.

Presentations

11:00
-
11:30
CEST
A Technical Overview of SmartSim and its Use Cases

Since its release in 2021, SmartSim has been gaining momentum in several scientific domains, including climate modeling and CFD. SmartSim’s unique set of features allow researchers to perform in-situ data analysis, extend capabilities of well-established numerical software with cutting-edge AI techniques, and orchestrat the launch of applications on HPC systems. This integration not only streamlines modern scientific workflows but also addresses the longstanding challenge of language and paradigm disparity in scientific computing.In our talk, we will delve into the core functionalities of SmartSim, highlighting its latest enhancements, and discuss how they are used in state-of-the-art projects developed at the intersection of HPC and AI.

Alessandro Rigazzi and Andrew Shao (HPE)
With Thorsten Kurth (NVIDIA Inc.)
11:30
-
12:00
CEST
Relexi: Reinforcement Learning for Applications in Computational Fluid Dynamics

Relexi is a powerful tool that allows to use existing simulation codes as training environments for reinforcement learning (RL) on high-performance computing (HPC) systems. This framework allows to apply RL to problems typically requiring HPC hardware such as computational fluid dynamics (CFD) or related fields. For this, Relexi applies the SmartSim library, which allows to manage the individual simulation environments on HPC systems and provides an efficient communication channel between itself and the simulation code. In this talk, we demonstrate two specific applications for the use of RL in CFD. First, we apply the framework to a task in active flow control. Here, the RL agent is trained to minimize the drag for the flow around a two-dimensional cylinder using blowing and suction jets at the cylinder’s poles. For this case, the agent is demonstrated to reduce the experienced drag by about 15%. Moreover, Relexi is applied to the task of turbulence modeling in large eddy simulation, where it was found to outperform traditional models, while being robust against changes in resolution, Reynolds number, and also when applied to heavily deformed meshes.

Marius Kurz (University of Stuttgart), Philipp Offenhäuser (HPE), Benjamin Sanderse (Centrum Wiskunde & Informatica (CWI)), and Andrea Beck (University of Stuttgart)
With Thorsten Kurth (NVIDIA Inc.)
12:00
-
12:30
CEST
Machine-Learning Emulation of the Radiative Transfer Module in a Surface Continental Model ORCHIDEE

The ORCHIDEE land surface model is one of the IPSL's Earth System Model components. The radiative transfer portion of the land model calculates reflected, absorbed and transmitted light at multiple canopy levels. This calculation is crucial to the climate system, but is also the most time-consuming in ORCHIDEE. To solve this problem, we show the results of our random forest-based emulator to represent the calculation in a fast and accurate way. This emulator closely mimics the original numerics-based model with relative errors of < 10% and correlations > 0.9, while taking ~50% less computational time. The second challenge describes the process of integrating this emulator in an online-way within the context of ORCHIDEE. We describe the process of integrating using the SmartSim open-source, machine-learning tool into the Fortran-based model and will show initial results and performance benchmarks that demonstrate the future viability of hybrid HPC/AI climate modelling.

Xiaoni Wang (CNRS, Le Laboratoire des Sciences du Climat et de l'Environnement); Andrew Shao (HPE); Mandresy Rasolonjatovo (UVSQ, Le Laboratoire des Sciences du Climat et de l'Environnement); Fabienne Maignan (CEA, Le Laboratoire des Sciences du Climat et de l'Environnement); and Philippe Peylin (CNRS, Le Laboratoire des Sciences du Climat et de l'Environnement)
With Thorsten Kurth (NVIDIA Inc.)
12:30
-
13:00
CEST
Challenges and Opportunities in Combining LLMs with Conventional Simulation Workflows

The effectiveness of AI for a variety of scientific tasks has rapidly improved over the last few years, changing the way that we can perform scientific workflows on HPC. Our recent work deploying a workflow around a Large-Language Model (LLM) for generating protein sequences is a good example of many of the opportunities and challenges. Embedding the LLM within a larger protein screening workflow enabled us to target simulations more effectively and find better sequences faster; but was not easily accomplished with conventional workflow tools. Interleaving AI predictions required expressing dynamic actions with the workflow application, the size of data being transferred around the workflow required adding a secondary data transfer fabric, and the evolving nature of the tasks being deployed on HPC required particular attention to caching elements of workflow tasks. We will discuss how we addressed these and other challenges in this presentation.

Logan Ward, Gautham Dharuman, Arvind Ramanathan, and Ian Foster (Argonne National Laboratory)
With Thorsten Kurth (NVIDIA Inc.)