Back

Paper

Performance Analysis and Optimizations of ERO2.0 Fusion Code

Monday, June 3, 2024
17:00
-
17:30
CEST
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Chemistry and Materials
Chemistry and Materials
Chemistry and Materials
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Humanities and Social Sciences
Humanities and Social Sciences
Humanities and Social Sciences
Engineering
Engineering
Engineering
Life Sciences
Life Sciences
Life Sciences
Physics
Physics
Physics

Presenter

Marta
Garcia-Gasulla
-
Barcelona Supercomputing Center

I am a researcher at the Computer Science department of the Barcelona Supercomputing Center (BSC) since 2006. My research interest includes load balancing, parallel programming, performance analysis, and optimization.I co-lead the Best Practices for Performance and Productivity (BePPP) group at BSC. BePPP aims to be the bridge between scientific domain researchers and computer scientists. Promoting best practices for programmers to productively (re)structure their codes in ways that can result in high efficiency and portability. And capture the fundamental co-design input forwarding it to the appropriate system software and architecture team to target their developments in the most useful direction.

Description

In this paper, we present the thorough performance analysis of a highly parallel Monte Carlo code for modeling global erosion and redeposition in fusion devices, ERO2.0. The study shows that the main bottleneck preventing the code from efficiently using the resources is the load imbalance at different levels. Load imbalance is inherent to the problem being solved, particle transport, and deposition. Based on the findings of the analysis, we also describe the optimizations implemented on the code to improve its performance on HPC clusters. The proposed optimizations use MPI and OpenMP features, making them portable across architectures and achieving a 3.34x speedup.

Authors