Paper
Hybrid Multi-GPU Distributed Octrees Construction for Massively Parallel Code Coupling Applications
Presenter
Robin Cazalbou received the MS degree in high performance computing and simulation from the Ecole Normale Supérieure (ENS) Paris-Saclay, France. He is currently working toward the PhD degree with the ONERA, Palaiseau, France, and the CERFACS, Toulouse, France. His research interests include applied mathematics and high performance GPU computing.
Description
This paper presents two new hybrid MPI-GPU algorithms for building distributed octrees. The first algorithm redistributes data between processes and is used to globally sort the points on which the octree is generated, according to their SFC codes. The second algorithm proposes a bottom-up approach to merge leaves from the maximum depth to their final level, ensuring that each leaf contains no more than Nmax points. This method is better suited for GPU implementation because it maximises parallelism from the beginning of the algorithm. The methods have been implemented in the CWIPI library to reduce the execution time of the point-in-mesh location algorithm, which is performed several times when moving non-coincident meshes are used. Tests on large cases have shown speedups of up to x120 compared to a conventional CPU version, with scaling as good as the full CPU version.