Minisymposium

MS5A - Bridging the Gap: Addressing Software Engineering Challenges for High Resolution Weather and Climate Simulations

Fully booked

Wednesday, June 5, 2024

9:00

11:00

CEST

HG F 1

Replay

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Session Chair

Sergey

Smolentsev

Oak Ridge National Laboratory

Description

HPC is driving scientific progress in fields as diverse as weather forecasting, life sciences and physics simulations. However, adapting to the evolving hardware diversity is a major challenge for the climate, weather, and geoscience community, as their codes are often very large and have evolved over decades. This minisymposium focuses on the challenges faced by weather and climate model developers: 1. Optimising performance and parallel programming on diverse supercomputers, despite programming standards that often require manual coding for system transitions. 2. Exploring new tools and languages for better productivity and performance on different hardware, balancing optimisations without compromising performance. 3. Improve code modularity and software practices for large scientific bases, adapting to evolving hardware and scientific needs. 4. Bridging the gap between domain scientists and research software engineers, balancing coding expertise with performance optimisation. The minisymposium is twinned with the session “High resolution simulations on large HPC systems”, which takes a closer look at recent results gained on (pre-)exascale systems for different weather and climate models.

Presentations

9:00

9:30

CEST

Managing Complexity of Weather and Climate Code with Diversity of Skills and Workflows

LFRic is the new weather and climate model developed by the Met Office to replace the existing Unified Model (UM). LFRic is at the core of Momentum®, a new Unified Earth Environment Prediction Framework created by the Met Office and its partners to deliver a seamless modelling capability that meets the challenges of exascale computing. LFRic relies on fundamentally different data structures to UM. Those, as well increase in resolution, led to complex technical challenges, such as compiler support, high volume of data and utilising opportunities presented by heterogeneous architectures. These fundamental changes mean changing several components of the operational forecast workflow. The key elements for success of any complex programme are expertise and skills of people working on it. People working in NGMS need to have a wide range of skills that are at intersection of several STEM-related areas, which poses challenges in identifying and developing technical expertise. In NGMS we are making use from resources across multiple teams, as well as from collaboration with external partners. We are continuously working on widening and diversifying our talent pool and developing it by providing training and support, as well as improving our recruitment process to support this.

Iva Kavcic (Met Office)

With Thorsten Kurth (NVIDIA Inc.)

9:30

10:00

CEST

Portable Programming Approaches in the ICON Climate Model

The most powerful supercomputers in the (pre)-exascale class are often based on heterogeneous architectures and are produced by different vendors. In order to make use of these various hardware resources, the scientific applications are required to continuously increase their portability. However, the ICON model has been monolithic and Fortran-based for decades, with rather limited GPU support via OpenACC. Firstly, this prevents the scientists from running the model on specific High Performance Computing nodes, whose architecture is not yet supported by Fortran compilers. Secondly, it hinders the ability to use newer heterogeneous frameworks and application programming interfaces, which simplify the development by ensuring portability without the need of rewriting the code for each targeted architecture. In order to enhance the (performance) portability of the model, we investigated several C++ parallel programming approaches and propose a prototype solution that enables ICON to use different backends via a common interface. Additionally, this was aligned with the modularisation efforts, which allow incremental upgrades without disrupting the existing functionality. In this talk, we show the evaluation results in terms of performance, portability and productivity using a standalone microphysics code extracted from ICON; furthermore, we discuss ways of integrating heterogeneous code into the Fortran repository.

Georgiana Mania (DKRZ)

With Thorsten Kurth (NVIDIA Inc.)

10:00

10:30

CEST

What if Weather and Climate Models were Written in Python?

As of today, the majority of weather and climate models are implemented in Fortran and extended with a selection of compiler directives to enable execution of multicore CPUs, vector units or graphics processing units (GPUs). The consequence is that developers struggle to keep pace with a repidly evolving HPC software and hardware ecosystem and are not able to leverage efficiently current and emergy leadership class supercomputing infrastructures. Also, developer productivity in resolving problems or introducing new features is generally low.

But what if weather and climate models were implemented in Python? We present Pace, a Python-based performance portable implementation of a subset of the x-SHiELD model of NOAA/GFDL. We demonstrate scaling Pace to 4000 GPUs on the Piz Daint supercomputer at CSCS and achieving a 3.9x speedup over the Fortran reference code on CPUs. Pace is a proof of concept that high-level languages like Python can achieve performance portability in atmopsheric models and provide a more productive development environment. Additionally, Pace enables entirely novel use cases and workflows such as easy integration of machine learning components, taking full advantage of the Python ecosystem. We finish with an outlook for the ICON model.

Oliver Fuhrer (MeteoSwiss, ETH Zurich)

With Thorsten Kurth (NVIDIA Inc.)

10:30

11:00

CEST

Loki: A Source-To-Source Translation Tool for Numerical Weather Prediction Codes and More

Established numerical weather prediction (NWP) and climate modeling codes, such as ECMWF's Integrated Forecasting System (IFS), have been developed over decades and comprise a large monolithic code base. Over the course of their lifetime, compute architectures have evolved from vector computers to distributed memory multi-processors and hybrid accelerated supercomputers. Not least thanks to the systems procured by the EuroHPC JU, NWP codes are run on a larger variety of systems and have to target diverse hardware architectures today. Meanwhile, the objective of performance portability using a single programming model remains elusive and accommodating multiple bespoke and sometimes conflicting optimisations for specific hardware architectures becomes increasingly unsustainable.

We present Loki, an open-source Python package purpose-built for the IFS that offers source-to-source translation capabilities for Fortran code. It provides experts with a freely-programmable API and inter-procedural analysis features to encode custom transformations that are applied programmatically across large source trees. Loki is a cornerstone of the GPU adaptation strategy for ECMWF's IFS and has been deployed successfully to adapt multiple components of the forecast model for GPU execution. Recently, other users in the weather and climate community have started evaluating Loki for their own code adaptation work.

Balthasar Reuter and Michael Lange (ECMWF)

With Thorsten Kurth (NVIDIA Inc.)

Bookmark
this session

Unbookmark
this session

Saving...