Back

Minisymposium

MS4A - GPU Acceleration in Earth System Modeling: Strategies, Automated Refactoring Tools, Benefits and Challenges

Fully booked
Tuesday, June 4, 2024
16:00
-
18:00
CEST
HG F 1

Replay

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Session Chair

Description

Earth System models simulate the complex interactions of the atmosphere, oceans, land and sea ice, providing valuable insights into short-term weather forecasts and long-term climate research, which are important for understanding and mitigating the impacts of weather-related disasters and climate change. As the complexity and computational demands of these models increase, the need for Graphic Processing Unit (GPU) acceleration becomes increasingly apparent. GPUs are computational architectures that efficiently support massive parallelism. Whilst several studies have shown promising computational performance by porting GPUs to Earth System models and thus enabling higher-resolution simulations, some of them have also discussed the challenges of adapting existing codes to run on GPUs. To address refactoring and portability issues, automating code refactoring tools have been developed to increase the efficiency of porting code to GPU and improve portability and maintainability. This minisymposium aims to bring together scientists, computational researchers, and model developers to explore the role of GPU acceleration in optimizing Earth System models, share the experience, and look to the future. Topics also include optimization strategies (e.g., parallelization techniques, memory management, data transfer, etc.), automating code refactoring tools (e.g., PSyclone), benefits and challenges (e.g., speedup, memory constraints, code management, etc.).

Presentations

16:00
-
16:30
CEST
Porting NEMO to GPUs with PSyclone

PSyclone is a source-to-source code-generation and transformation system designed to enable performance portability and code maintainability for weather and climate codes written in Fortran. To achieve this, it separates the scientific code, written in Fortran, from the optimisation and parallelisation steps, encoded as python scripts. HPC experts can then prepare the PSyclone recipes that are needed to take advantage of each hardware platform without altering the domain science code. PSyclone is being used to optimise unaltered directly-addressed MPI applications, such as NEMO, and offload their computations to GPUs. In this talk I will demonstrate the use and performance of PSyclone for a production configuration of NEMO used by the UK MetOffice and I will provide an update about the integration of PSyclone into the NEMO build system and its use by the NEMO community.

Sergi Siso (Science and Technology Facilities Council)
With Thorsten Kurth (NVIDIA Inc.)
16:30
-
17:00
CEST
Porting and Optimizing Momentum, CASIM and SOCRATES for GPU Architectures

Exploiting GPUs is both an opportunity and a challenge for weather and climate codes. They present an opportunity as the massive parallelism they possess can allow these codes to achieve very high computational performance. They present a challenge as exploiting this parallelism can require the refactoring of many thousands of lines of science code and a new programming model from existing CPU code bases. In this presentation, we will describe the development of the GPU-enabled cloud microphysics scheme - CASIM and radiation scheme - SOCRATES, and a Domain Specific Language the Met Office is using to achieve performance portability for its new weather and climate model, LFRic and the wider modelling system known as Momentum. We will show how the Met Office is using PSyclone, a Domain Specific Compiler to keep single source science code whilst targeting multiple programming models for different processor architectures. The presentation will conclude the strategy and progress for porting and optimizing Momentum for GPUs, and how PSyclone follows the porting experience of ORNL to optimize CASIM and SOCRATES on GPUs.

Christopher Maynard (Met Office); Wei Zhang, Matthew Norman, Min Xu, Salil Mahajan, and Katherine Evans (Oak Ridge National Laboratory); and Jonathan Wilkinson, James Manners, and Ben Shipway (Met Office)
With Thorsten Kurth (NVIDIA Inc.)
17:00
-
17:30
CEST
Progress Report on the GPU Adapatation of IFS and the Destination Earth DTs

The use of GPUs and accelerator architectures has become widespread in high-performance computing to achieve unprecedented throughput and computational model performance. The efficient usage of GPU architectures has long been envisioned at ECMWF, but it requires significant, often invasive code refactoring that can harm operational model performance on CPUs. In this talk we will describe the ongoing efforts at ECMWF to prepare the Integrated Forecasting System (IFS) for GPU accelerators through a combination of library development, data-structure refactoring and source-to-source translation. In close collaboration with ECMWF member states and supported by the Destination Earth initiative, we are aiming to restructure core model components of the IFS, and various technical infrastructure packages, to allow hybrid CPU-GPU execution. The focus is on sustainable solutions through modern software engineering methods that allow adaptation of the code to multiple architectures for continuous performance evaluation. We will report on the progress of the GPU adaptation of IFS via specific build modes of the IFS and provide an update on the adaptation of various sub-components. We will highlight specific code characteristics and subsequent challenges and present initial performance results to assess the potential performance gains on current and future architectures.

Michael Lange (ECMWF)
With Thorsten Kurth (NVIDIA Inc.)
17:30
-
18:00
CEST
A GPU-Accelerated Implementation of the Semi-Implicit Barotropic Mode Solver for the MPAS-Ocean

A semi-implicit barotropic mode solver for the Model for Prediction Across Scales Ocean (MPAS-Ocean), an ocean component of the Energy Exascale Earth System Model (E3SM), has been ported on GPU using OpenACC directives. Since the semi-implicit solver in MPAS-O consists of a linear iterative solver and a preconditioner that requires linear algebra operations, we introduced the Matrix Algebra on GPU and Multicore Architecture (MAGMA) and CUBLAS which are collections of linear algebra libraries for heterogeneous architectures. We applied several methodologies such as algorithmic changes of the iterative solver, refactorization of loops, and the GPU-aware Message Passing Interface for the global all-to-all node communications to obtain optimized GPU performance. For runtime of main solver iterations including data staging, we achieved 5.4x (1.4x) speedup on 20 (100) Summit nodes. We will also show the GPU-accelerated solver performance using Cray LibSci_ACC supporting AMD MI250X GPU on Frontier. We will briefly discuss the recent update to MPAS-Ocean that changed the baroclinic time stepping method from the forward-backward to the second-order Adams Bashforth and its impact on the computational efficiency and model accuracy. This research is still underway, so methodologies may be further improved for better computational performance on GPUs.

Hyun-Gyu Kang, Youngsung Kim, and Sarat Sreepathi (Oak Ridge National Laboratory) and Luke Van Roekel (Los Alamos National Laboratory)
With Thorsten Kurth (NVIDIA Inc.)