Minisymposium

MS6B - Motif-Based Automated Performance Engineering for HPC

Fully booked

Wednesday, June 5, 2024

11:30

13:30

CEST

HG F 3

Replay

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Session Chair

Phillip

Colella

Lawrence Berkeley National Laboratory

Description

We will describe here domain-specific libraries (DSLs) that express mathematical/programming motifs (data objects and operations on those data objects), along with software back-ends that translate the library calls into high-performance code. By the use of a motif-aware software stack, the scientific application code written is much smaller than fully optimized code, with the applications-level code remaining unchanged in moving between platforms, thus leading to a less expensive development process. The four talks being given cover multiple motifs, and different approaches to supporting motif-based DSLs. (1) George Bisbas (ICL) will talk about an approach to structured-grid DSLs based on lowering the abstractions written in Python to the LLVM Multi-Level Intermediate Representation (MLIR). (2) Het Mankad (CMU / ORNL) will talk about Proto / ProtoX, a DSL for the structured-grid motif that targets CPUs and GPUs, based on the Spiral toolchain. (3) Sanil Rao (CMU) will talk about FFTX for supporting FFTs on CPU and GPU systems, based on the Spiral toolchain. (4) Sam Reeve (ORNL) will talk about Cabana, a DSL for supporting grid free particle methods and hybrid particle / mesh methods on GPUs, based on the use of the Kokkos run-time libraries for GPU parallelism.

Presentations

11:30

12:00

CEST

A Shared Compilation Stack for HPC Stencil DSLs

Domain Specific Languages can massively improve computational scienceproductivity and also provide high performance. High-level DSLs captureabstractions that the DSL compiler can exploitto target current- and next-generation supercomputers.Consequently, there have been many DSL projects, notably in finite-difference stencilcomputations - but implementations fail to share code and fail to harness combined developer effort.A large portion of their code base is dedicated to reasoning about generic HPC concepts, such as generation of directives for shared-memory parallelism, message-passing communications for distributed-memory parallelism, vectorization, arithmetic (factorization, sub-expression elimination), and loop optimizations (blocking, fusion, fission). These general-purpose optimizations are often combined with domain-specific ones to maximize performance.This talk presents joint work spanning three stencil DSL projects, which aims to realize this lost potential - in Devito, Psyclone, and the Open Earth Compiler. We present a) how we tailor the widely adopted MLIR compiler framework to support optimizations suitable for FD-stencil computations and generate HPC-ready code running on multi-node CPUs and single-node GPUs, and b) how, for example, in Devito, we can leverage these HPC-enabling contributed abstractions in MLIR and achieve better or on-par performance by building on MLIR dialects and transformations that are shared across all three projects.

George Bisbas (Imperial College London); Anton Lydike, Emilien Bauer, Nick Brown, and Mathieu Fehr (University of Edinburgh); Paul H.J. Kelly (Imperial College London); and Tobias Grosser (University of Cambridge)

With Thorsten Kurth (NVIDIA Inc.)

12:00

12:30

CEST

ProtoX : A Code Generation Framework for Stencil Operations

ProtoX is a code generation framework for stencil and pointwise operations - the key components in numerically approximating the solution to various partial differential equations (PDEs). The frontend for ProtoX uses Proto - a C++ based domain specific library that provides a high level of abstraction and an intuitive interface that optimizes the designing and scheduling of an algorithm aimed at solving various PDEs numerically on structured grids. The high level of abstractions used in Proto can be fused together to improve its current performance. However, abstraction fusion cannot be performed easily by a compiler. To overcome this issue ProtoX uses SPIRAL as its backend. SPIRAL is a code generation system that focuses on generating highly optimized target code in C/C++. The performance gain that is thus obtained in ProtoX is demonstrated for examples like the 2D Poisson problem as well as the 2D and 3D Euler equations that are used in the study of gas dynamics. The results obtained from CPU and GPU implementations will be discussed.

Het Yagnesh Mankad (Oak Ridge National Laboratory), Sanil Rao (Carnegie Mellon University), Phillip Colella and Brain Van Straalen (Lawrence Berkeley National Laboratory), and Franz Franchetti (Carnegie Mellon University)

With Thorsten Kurth (NVIDIA Inc.)

12:30

13:00

CEST

Scalable and Performance Portable Particle and Structured-Mesh Simulation with Cabana

We present Cabana, a performance portable library for building scientific applications, including mesh-free techniques from atomistic (molecular dynamics) to cosmology (N-body), hybrid particle-mesh (e.g. particle-in-cell), and structured grid simulation. Cabana was created through the U.S. Department of Energy Exascale Computing Project to enable particle simulations on exascale supercomputers, as well as local workstations. Cabana uses a Kokkos+MPI strategy to separate the concerns of the application physics from the threaded parallelism and vendor backends, as well as from domain decomposition and distributed parallelism. Cabana implements data structure, parallelism, and algorithmic extensions to Kokkos for both particles and structured grids, as well as MPI communication for both. Examples of performance engineering on leadership supercomputers across critical application kernels will be presented for mesh-free, hybrid particle-mesh, and structured mesh applications. We will next discuss how to create Cabana-based applications or to adopt it within an existing application and highlight recent scientific results obtained with Cabana codes across fracture mechanics, materials manufacturing, and plasma physics.

Sam Reeve, Lance Bullerwell, Kwitae Chong, David Joy, John Coleman, Pablo Seleson, Steve DeWitt, Matt Rolchigo, Jamie Stump, Wenjun Ge, Tim Younkin, and Stuart Slattery (Oak Ridge National Laboratory)

With Thorsten Kurth (NVIDIA Inc.)

13:00

13:30

CEST

FFTX, SpectralPack and Beyond

We present the design of the API and runtime environment of FFTX as well as future project directions. FFTX is developed as part of the DOE ExaScale effort by LBL, Carnegie Mellon University, and SpiralGen, Inc. We aim at translating the LAPACK/BLAS approach from the numerical linear algebra world to the spectral algorithm domain. FFTX is extending and updating FFTW for the exascale era and beyond while providing backwards compatibility. Unlike traditional math libraries FFTX utilizes the SPIRAL code generation system and runtime compilation as backend implementation and execution. A key innovation is the concept of "integrated algorithms" that allows for cross-library call optimization. We will discuss how we are leveraging the FFTX software stack post ECP for cross-motif applications.

Sanil Rao (Carnegie Mellon University)

With Thorsten Kurth (NVIDIA Inc.)

Bookmark
this session

Unbookmark
this session

Saving...