P49 - The Task-Based GPU-Enabled Distributed Eigensolver available in DLA-Future
Description
DLA-Future implements an efficient GPU-enabled distributed eigenvalue solver using asynchronous methods based on the C++ std::execution API. Using a task-based approach reduces the number of synchronization points and allows for simple overlapping of communication and computation which helps improve performance relative to fork join parallelism techniques as found in other libraries such as LAPACK and ScaLAPACK.In certain cases when multiple algorithms with suitable problem sizes are run independently, they can be co-scheduled to run at the same time producing noticeable improvements in time to solution. We present results of our task-based generalized eigensolver and show the current optimization status using both multicore-only and GPU-enabled systems (including both Nvidia and AMD devices). We also present full application results generated with CP2K and SIRIUS, where DLA-future support was easily added thanks to the C-API provided, which is compatible with the widely used ScaLAPACK interface.
Presenter(s)
Presenter
Alberto Invernizzi works as software engineer at the ETH Zürich within CSCS (Swiss National Supercomputing Center) located in Lugano (Switzerland), since 2019. There, he is part of the team developing dla-future, a task-based distributed linear algebra library, aimed at providing a full generalized eigensolver able to exploit modern HPC node architectures. Previously, he spent 4 years working in the computer vision field, working with stereo camera-based devices for 3D reconstruction with metrologic value, and as a research assistant at the University of Milano-Bicocca (Italy) in the machine perception and mobile robotics laboratory (IRALab). He got his Master degree in Computer Science in 2014 from University of Milano-Bicocca (Italy), with a thesis on pedestrian dynamics simulations under the supervision of Prof. Giuseppe Vizzari and Luca Crociani, PhD.