Minisymposium
MS6B - Motif-Based Automated Performance Engineering for HPC
Replay
Session Chair
Description
We will describe here domain-specific libraries (DSLs) that express mathematical/programming motifs (data objects and operations on those data objects), along with software back-ends that translate the library calls into high-performance code. By the use of a motif-aware software stack, the scientific application code written is much smaller than fully optimized code, with the applications-level code remaining unchanged in moving between platforms, thus leading to a less expensive development process. The four talks being given cover multiple motifs, and different approaches to supporting motif-based DSLs. (1) George Bisbas (ICL) will talk about an approach to structured-grid DSLs based on lowering the abstractions written in Python to the LLVM Multi-Level Intermediate Representation (MLIR). (2) Het Mankad (CMU / ORNL) will talk about Proto / ProtoX, a DSL for the structured-grid motif that targets CPUs and GPUs, based on the Spiral toolchain. (3) Sanil Rao (CMU) will talk about FFTX for supporting FFTs on CPU and GPU systems, based on the Spiral toolchain. (4) Sam Reeve (ORNL) will talk about Cabana, a DSL for supporting grid free particle methods and hybrid particle / mesh methods on GPUs, based on the use of the Kokkos run-time libraries for GPU parallelism.
Presentations
Domain Specific Languages can massively improve computational scienceproductivity and also provide high performance. High-level DSLs captureabstractions that the DSL compiler can exploitto target current- and next-generation supercomputers.Consequently, there have been many DSL projects, notably in finite-difference stencilcomputations - but implementations fail to share code and fail to harness combined developer effort.A large portion of their code base is dedicated to reasoning about generic HPC concepts, such as generation of directives for shared-memory parallelism, message-passing communications for distributed-memory parallelism, vectorization, arithmetic (factorization, sub-expression elimination), and loop optimizations (blocking, fusion, fission). These general-purpose optimizations are often combined with domain-specific ones to maximize performance.This talk presents joint work spanning three stencil DSL projects, which aims to realize this lost potential - in Devito, Psyclone, and the Open Earth Compiler. We present a) how we tailor the widely adopted MLIR compiler framework to support optimizations suitable for FD-stencil computations and generate HPC-ready code running on multi-node CPUs and single-node GPUs, and b) how, for example, in Devito, we can leverage these HPC-enabling contributed abstractions in MLIR and achieve better or on-par performance by building on MLIR dialects and transformations that are shared across all three projects.
ProtoX is a code generation framework for stencil and pointwise operations - the key components in numerically approximating the solution to various partial differential equations (PDEs). The frontend for ProtoX uses Proto - a C++ based domain specific library that provides a high level of abstraction and an intuitive interface that optimizes the designing and scheduling of an algorithm aimed at solving various PDEs numerically on structured grids. The high level of abstractions used in Proto can be fused together to improve its current performance. However, abstraction fusion cannot be performed easily by a compiler. To overcome this issue ProtoX uses SPIRAL as its backend. SPIRAL is a code generation system that focuses on generating highly optimized target code in C/C++. The performance gain that is thus obtained in ProtoX is demonstrated for examples like the 2D Poisson problem as well as the 2D and 3D Euler equations that are used in the study of gas dynamics. The results obtained from CPU and GPU implementations will be discussed.
We present Cabana, a performance portable library for building scientific applications, including mesh-free techniques from atomistic (molecular dynamics) to cosmology (N-body), hybrid particle-mesh (e.g. particle-in-cell), and structured grid simulation. Cabana was created through the U.S. Department of Energy Exascale Computing Project to enable particle simulations on exascale supercomputers, as well as local workstations. Cabana uses a Kokkos+MPI strategy to separate the concerns of the application physics from the threaded parallelism and vendor backends, as well as from domain decomposition and distributed parallelism. Cabana implements data structure, parallelism, and algorithmic extensions to Kokkos for both particles and structured grids, as well as MPI communication for both. Examples of performance engineering on leadership supercomputers across critical application kernels will be presented for mesh-free, hybrid particle-mesh, and structured mesh applications. We will next discuss how to create Cabana-based applications or to adopt it within an existing application and highlight recent scientific results obtained with Cabana codes across fracture mechanics, materials manufacturing, and plasma physics.
We present the design of the API and runtime environment of FFTX as well as future project directions. FFTX is developed as part of the DOE ExaScale effort by LBL, Carnegie Mellon University, and SpiralGen, Inc. We aim at translating the LAPACK/BLAS approach from the numerical linear algebra world to the spectral algorithm domain. FFTX is extending and updating FFTW for the exascale era and beyond while providing backwards compatibility. Unlike traditional math libraries FFTX utilizes the SPIRAL code generation system and runtime compilation as backend implementation and execution. A key innovation is the concept of "integrated algorithms" that allows for cross-library call optimization. We will discuss how we are leveraging the FFTX software stack post ECP for cross-motif applications.