Paper
Enabling Performance Portability for Shallow Water Equations on CPUs, GPUs, and FPGAs with SYCL
Presenter
Description
In order to make the best use of the diverse hardware architectures in present and future high-performance computers, developers and maintainers of scientific simulation codes strive for performance portability. The goal is to reach a good fraction of the hardware-specific practically achievable performance while maintaining a largely unified codebase. In benchmarks and first production codes, SYCL has been demonstrated to be a promising programming model for this purpose when targeting different CPU and GPUs. In this work, we utilize SYCL to develop a performance portable implementation of the 2D shallow water equations, discretized on unstructured triangular meshes using the discontinuous Galerkin method with polynomial orders zero, one, and two. In addition to GPUs from three and CPUs from two vendors, we also broaden the scope of target architectures by including Intel Stratix FPGAs with a fundamentally different execution model. We show that with a few targeted and encapsulated specializations, it is possible to adapt the execution flow to the respective targets. The performance analysis shows how FPGAs complement the other two architectures with particularly good performance for small problem sizes.