Minisymposium
MS4A - GPU Acceleration in Earth System Modeling: Strategies, Automated Refactoring Tools, Benefits and Challenges
Replay
Session Chair
Description
Earth System models simulate the complex interactions of the atmosphere, oceans, land and sea ice, providing valuable insights into short-term weather forecasts and long-term climate research, which are important for understanding and mitigating the impacts of weather-related disasters and climate change. As the complexity and computational demands of these models increase, the need for Graphic Processing Unit (GPU) acceleration becomes increasingly apparent. GPUs are computational architectures that efficiently support massive parallelism. Whilst several studies have shown promising computational performance by porting GPUs to Earth System models and thus enabling higher-resolution simulations, some of them have also discussed the challenges of adapting existing codes to run on GPUs. To address refactoring and portability issues, automating code refactoring tools have been developed to increase the efficiency of porting code to GPU and improve portability and maintainability. This minisymposium aims to bring together scientists, computational researchers, and model developers to explore the role of GPU acceleration in optimizing Earth System models, share the experience, and look to the future. Topics also include optimization strategies (e.g., parallelization techniques, memory management, data transfer, etc.), automating code refactoring tools (e.g., PSyclone), benefits and challenges (e.g., speedup, memory constraints, code management, etc.).
Presentations
PSyclone is a source-to-source code-generation and transformation system designed to enable performance portability and code maintainability for weather and climate codes written in Fortran. To achieve this, it separates the scientific code, written in Fortran, from the optimisation and parallelisation steps, encoded as python scripts. HPC experts can then prepare the PSyclone recipes that are needed to take advantage of each hardware platform without altering the domain science code. PSyclone is being used to optimise unaltered directly-addressed MPI applications, such as NEMO, and offload their computations to GPUs. In this talk I will demonstrate the use and performance of PSyclone for a production configuration of NEMO used by the UK MetOffice and I will provide an update about the integration of PSyclone into the NEMO build system and its use by the NEMO community.
Exploiting GPUs is both an opportunity and a challenge for weather and climate codes. They present an opportunity as the massive parallelism they possess can allow these codes to achieve very high computational performance. They present a challenge as exploiting this parallelism can require the refactoring of many thousands of lines of science code and a new programming model from existing CPU code bases. In this presentation, we will describe the development of the GPU-enabled cloud microphysics scheme - CASIM and radiation scheme - SOCRATES, and a Domain Specific Language the Met Office is using to achieve performance portability for its new weather and climate model, LFRic and the wider modelling system known as Momentum. We will show how the Met Office is using PSyclone, a Domain Specific Compiler to keep single source science code whilst targeting multiple programming models for different processor architectures. The presentation will conclude the strategy and progress for porting and optimizing Momentum for GPUs, and how PSyclone follows the porting experience of ORNL to optimize CASIM and SOCRATES on GPUs.
The use of GPUs and accelerator architectures has become widespread in high-performance computing to achieve unprecedented throughput and computational model performance. The efficient usage of GPU architectures has long been envisioned at ECMWF, but it requires significant, often invasive code refactoring that can harm operational model performance on CPUs. In this talk we will describe the ongoing efforts at ECMWF to prepare the Integrated Forecasting System (IFS) for GPU accelerators through a combination of library development, data-structure refactoring and source-to-source translation. In close collaboration with ECMWF member states and supported by the Destination Earth initiative, we are aiming to restructure core model components of the IFS, and various technical infrastructure packages, to allow hybrid CPU-GPU execution. The focus is on sustainable solutions through modern software engineering methods that allow adaptation of the code to multiple architectures for continuous performance evaluation. We will report on the progress of the GPU adaptation of IFS via specific build modes of the IFS and provide an update on the adaptation of various sub-components. We will highlight specific code characteristics and subsequent challenges and present initial performance results to assess the potential performance gains on current and future architectures.
A semi-implicit barotropic mode solver for the Model for Prediction Across Scales Ocean (MPAS-Ocean), an ocean component of the Energy Exascale Earth System Model (E3SM), has been ported on GPU using OpenACC directives. Since the semi-implicit solver in MPAS-O consists of a linear iterative solver and a preconditioner that requires linear algebra operations, we introduced the Matrix Algebra on GPU and Multicore Architecture (MAGMA) and CUBLAS which are collections of linear algebra libraries for heterogeneous architectures. We applied several methodologies such as algorithmic changes of the iterative solver, refactorization of loops, and the GPU-aware Message Passing Interface for the global all-to-all node communications to obtain optimized GPU performance. For runtime of main solver iterations including data staging, we achieved 5.4x (1.4x) speedup on 20 (100) Summit nodes. We will also show the GPU-accelerated solver performance using Cray LibSci_ACC supporting AMD MI250X GPU on Frontier. We will briefly discuss the recent update to MPAS-Ocean that changed the baroclinic time stepping method from the forward-backward to the second-order Adams Bashforth and its impact on the computational efficiency and model accuracy. This research is still underway, so methodologies may be further improved for better computational performance on GPUs.