Paper
Lockstep-Parallel Dualization of Surface Triangulations
Description
We present a massively parallel lockstep algorithm for dualizing large numbers of surface triangulation graphs, and an effective implementation for CPU, GPU and multi-GPU. The algorithm is fully combinatorial, i.e., it does not require or use a planar or spatial embedding, only the graph.
This work is motivated by a wish to perform computational chemistry experiments on entire isomerspaces of polyhedral molecules, comprising billions of distinct molecules, each represented by a cubic graph. However, the algorithm applies not only to triangulations of the sphere, but to any triangulations of oriented surfaces of any genus, for example toroidal topologies.
Our multi-vendor implementation in SYCL outperforms the previous sequential state-of-the-art by 4 orders of magnitude on our consumer NVIDIA RTX3080 Graphics Processing Unit (GPU), with average throughput 37ps(+/- 0.1ps) per vertex (varying from 50ps to 31ps for C72-C200). Thus, dualizing e.g. all 214,127,742 C200 fullerene molecules adds a mere 1.49s(+/- 0.01s) to the total processing time, negligible compared to the two hours required to generate the graphs. We subsequently perform extreme multi-node-multi-GPU scaling experiments on the LUMI-G supercomputer, achieving near-perfect scaling up to 1024 MI250x Graphics Compute Dies (GCD), in total 14.5 million cores. Calculations show that dualization has moved from a bottle-neck to being ready to contribute to our planned large-scale chemical experiments for all 2.7 x 10^12 fullerene molecules from C20 through C400.