Paper
AP2B - ACM Papers Session 2B
Replay
In numerical weather prediction and high-performance computing, the primary computational bottleneck has gradually evolved from floating-point arithmetic to the throughput of data to and from the storage. This phenomenon is commonly referred to as the I/O performance gap. We present MultIO, a set of software libraries that provide two mechanisms to mitigate this effect: an asynchronous I/O-server to decouple data output from model computations, and user-programmable processing pipelines that operate on model output directly. MultIO is a metadata-driven, message-based system. This means that the I/O-server and processing pipelines fundamentally handle and operate on discrete self-describing messages. The behaviour of the I/O-server, data routing decisions and selection of actions undertaken are driven by the metadata attached to each message. The user may control the type and amount of post-processing by setting the message metadata via the Fortran/C/Python APIs, and by configuring a processing pipeline of actions. Users are also able to implement custom actions to be incorporated into the pipelines. The MultIO system has been used with the NEMOv4 model to implement the upcoming ocean re-analysis dataset, which will feed into the production runs of the next generation of global re-analysis dataset, ERA6. It has also been used to move computation closer to the model for climate runs at scale in the nextGEMS and Destination Earth projects.
Operational Numerical Weather Prediction (NWP) workflows are highly data-intensive. Data volumes have increased by many orders of magnitude over the last 40 years, and are expected to continue to do so, especially given the upcoming adoption of Machine Learning in forecast processes. Parallel POSIX-compliant file systems have been the dominant paradigm in data storage and exchange in HPC workflows for many years. This paper presents ECMWF's move beyond the POSIX paradigm, implementing a backend for their storage library to support DAOS --- a novel high-performance object store designed for massively distributed Non-Volatile Memory. This system is demonstrated to be able to outperform the highly mature and optimised POSIX backend when used under high load and contention, as per typical forecast workflow I/O patterns. This work constitutes a significant step forward, beyond the performance constraints imposed by POSIX semantics.
The cryosphere plays a significant role in Earth's climate system. Therefore, an accurate simulation of sea ice is of great importance to improve climate projections. To enable higher resolution simulations, graphics processing units (GPUs) have become increasingly attractive as they offer higher floating point peak performance and better energy efficiency compared to CPUs. However, making use of this theoretical peak performance, which is based on massive data parallelism, usually requires more care and effort in the implementation. In recent years, a number of frameworks have become available that promise to simplify general purpose GPU programming. In this work, we compare multiple such frameworks, including CUDA, SYCL, Kokkos and PyTorch, for the parallelization of neXtSIM-DG, a finite-element based dynamical core for sea ice. We evaluate the different approaches according to their usability and performance.