Back

Minisymposium Presentation

Making Fortran Fly on AMD Instinct Accelerators

Wednesday, June 5, 2024
12:00
-
12:30
CEST
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Chemistry and Materials
Chemistry and Materials
Chemistry and Materials
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Humanities and Social Sciences
Humanities and Social Sciences
Humanities and Social Sciences
Engineering
Engineering
Engineering
Life Sciences
Life Sciences
Life Sciences
Physics
Physics
Physics

Presenter

Michael
Klemm
-
AMD

Dr. Michael Klemm is a Principal Member of Technical Staff in the Compilers, Languages, Runtimes & Tools team of the Machine Learning & Software Engineering group at AMD. He is part of the OpenMP compiler team, focusing on application and kernel performance for AMD Instinct accelerators for High Performance and Throughput Computing. He holds an M.Sc. in Computer Science and a Doctor of Engineering degree (Dr.-Ing.) in Computer Science from the Friedrich-Alexander-University Erlangen-Nuremberg, Germany. Michael's research focus is on compilers and runtime optimizations for distributed systems. His areas of interest include compiler construction, design of programming languages, parallel programming, and performance analysis and tuning. Michael is the Chief Executive Officer of the OpenMP Architecture Review Board.

Description

Fortran has played and is playing a vital tole in the HPC ecosystem, especially in the domain of weather forecasting. AMD is a key contributor to the LLVM/Flang compiler infrastructure and plays a key role in establishing an LLVM-based compiler for Fortran for both CPU and GPU. In this talk, we will briefly recap the AMD EPYC(tm) Processor architecture and AMD Instinct(tm) GPU and APU platform as well as the available AMD software ecosystem. We will then turn towards how to utilize AMD GPUs and APUs from Fortran code and how to offload compute kernels to GPUs and APUs. In particular, we will focus on the ECMWF CloudSC microbenchmarks, i.e., a set of resource (register) hungry compute kernels, as our barometer of offloaded kernel compute performance. We will show some of the tuning opportunities available to programmers and compare performance with native HIP implementation on MI250X and MI300A architectures.

Authors