Minisymposium

MS6D - What the FORTRAN? Lost in Formula Translation

Fully booked

Wednesday, June 5, 2024

11:30

13:30

CEST

HG E 1.2

Replay

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Session Chair

Description

Fortran, the primary programming language underpinning many operational weather and climate codes, was built around the fundamental principle that performance optimisation is left to the compiler. However, with the emergence of GPU accelerators, significant refactoring, often beyond simple addition of pragmas, is needed to achieve good GPU performance on established, operational, vectorised CPU code. This has led to the rise of DSLs and source-to-source methods that often use elements from compiler theory to bridge the CPU-GPU gap, leaving one unspoken question unanswered: Why the FORTRAN does my compiler not do this for me? In this minisymposium we aim to explore this question by looking at ECMWF's CLOUDSC benchmark - an NWP mini-app designed to assess (and torture) compilers. For this benchmark, many GPU-optimised flavours exist, including Fortran and C-based offload flavours (OpenACC, OpenMP, CUDA, HIP), that provide an established performance baseline on different GPU architectures. Instead of further optimising these with more intrusive code changes, we ask the question "How close to the original vector-style Fortran code can we get without sacrificing performance?" We aim to explore this question with technologists and compiler enthusiasts from across the HPC and academic spectrum.

Presentations

11:30

12:00

CEST

The Role of Optimising Compilers in the GPU Programming Evolution

GPU programming has traditionally centered around established languages like C/C++, Fortran, and more recently, Python. Despite this, there exists a diverse range of programming models and APIs for specifying computation offloading to GPUs. Initially, GPU programming relied on target-specific offload APIs such as CUDA. However, a notable shift has occurred with the emergence of higher-level pragma-based models like OpenACC. These models empower developers to abstract away from intricate hardware details, enabling them to write code that expresses the essential properties of applications rather than dictating how it should be mapped to heterogeneous hardware. This evolution extends to standard Fortran, now providing the features facilitating offloading to GPUs. Such a rise in the level of programming abstractions is facilitated by advancements in optimizing compilers, performing common code optimizations automatically and reducing the need for routine manual code tuning for each target platform. This presentation will focus on NVIDIA's HPC Compilers, exploring the supported programming models and emphasizing the need for different programming abstractions e.g. standard Fortran, OpenACC, or CUDA Fortran, as well as the compiler role in supporting each programming model. To exemplify the differences in programming experiences, references will be made to CLOUDSC and other relevant weather and climate codes.

Anastasia Stulova (NVIDIA Inc.)

With Thorsten Kurth (NVIDIA Inc.)

12:00

12:30

CEST

Making Fortran Fly on AMD Instinct Accelerators

Fortran has played and is playing a vital tole in the HPC ecosystem, especially in the domain of weather forecasting. AMD is a key contributor to the LLVM/Flang compiler infrastructure and plays a key role in establishing an LLVM-based compiler for Fortran for both CPU and GPU. In this talk, we will briefly recap the AMD EPYC(tm) Processor architecture and AMD Instinct(tm) GPU and APU platform as well as the available AMD software ecosystem. We will then turn towards how to utilize AMD GPUs and APUs from Fortran code and how to offload compute kernels to GPUs and APUs. In particular, we will focus on the ECMWF CloudSC microbenchmarks, i.e., a set of resource (register) hungry compute kernels, as our barometer of offloaded kernel compute performance. We will show some of the tuning opportunities available to programmers and compare performance with native HIP implementation on MI250X and MI300A architectures.

Michael Klemm (AMD, OpenMP) and Paul Mullowney (AMD)

With Thorsten Kurth (NVIDIA Inc.)

12:30

13:00

CEST

Impact of Code Refactoring for Heterogeneous Platforms to Performance Portability

We investigate how code refactoring of CLOUDSC benchmark impacts the performance portability across different hardware platforms and code revisions. For hardware platforms, we choose Intel® Xeon® Processor CPUs and Intel® Data Center GPU Max Series GPUs. For code revisions, we choose Fortran with OpenMP, Fortran with OpenMP offload and SYCL versions of the CLOUDSC benchmark. Besides performance, we discuss main challenges identified in cross-platform comparisons and propose some possible methods to resolve them.

Mikko Byckling, Camilo Moreno, and Jacob Poulsen (Intel Corporation)

With Thorsten Kurth (NVIDIA Inc.)

13:00

13:30

CEST

Same Fortran, New Performance Portability

Fortran is the dominant language on many supercomputers. Fortran codes are highly tuned towards CPU performance but struggle to achieve the same performance on GPUs despite their increasing adoption in supercomputers all over the world. To better support Fortran applications in HPC, we automatically translate Fortran to a flexible, data-centric representation, preserving the rich semantic information it offers. We leverage this information, along with data-centric transformations to express both functional and data parallelism and allow efficient code to be generated for both CPUs and GPUs. Using a representative code from an operational weather forecasting model as a case study, we create a perfectly data-parallel program representation and generate CPU and GPU implementations with runtimes outperforming both serial and OpenMP parallel CPU implementations as well as being on par with the best manually written versions on GPU, without changing the Fortran code.

Alexandru Calotoiu and Torsten Hoefler (ETH Zurich)

With Thorsten Kurth (NVIDIA Inc.)

Bookmark
this session

Unbookmark
this session

Saving...