Paper
AP1E - ACM Papers Session 1E
Replay
In computational fluid dynamics, the Boussinesq approximation is a popular model for the numerical simulation of natural convection problems. Although using the Boussinesq approximation leads to significant performance gains over a full-fledged compressible flow simulation, the model is only plausible for scenarios where the temperature differences are relatively small, which limits its applicability. This paper bridges the gap between Boussinesq flow and compressible flow via deep learning: we introduce a computationally-efficient CNN-based framework that corrects Boussinesq flow simulations by learning from the full compressible model. Based on a modified U-Net architecture and incorporating a weighted physics penalty loss, our model is trained with and evaluated against a specific natural convection problem. Our results show that by correcting Boussinesq simulations using the trained network, we can enhance the accuracy of velocity, temperature, and pressure variables over the Boussinesq baseline—even for cases beyond the regime of validity of the Boussinesq approximation.
Hardware accelerators are used to speed up computationally expensive
applications. Offloading
tasks to accelerator cards requires data to be transferred between
the memory of the host and the external memory of the accelerator
card; this data movement becomes the bottleneck for increasing
accelerator performance. Here, we explore the use
of a software cache to optimize communication and alleviate the
data-movement bottleneck by transparently exploiting locality and
data reuse. We present a generic, application-agnostic framework,
dubbed SoftCache, that can be used with GPU and FPGA accelerator
cards. SoftCache exploits locality to optimize data movement
in a non-intrusive manner (i.e., no algorithmic changes are
necessary) and allows the programmer to tune the cache size,
organization, and replacement policy toward the application needs.
Each cache line can store data of any size, thereby eliminating the
need for separate caches for different data types. We used a phylogenetic
application to showcase SoftCache. Phylogenetics study
the evolutionary history and relationships among different species
or groups of organisms. The phylogenetic application implements
a tree-search algorithm to create and evaluate phylogenetic trees,
while hardware accelerators are used to reduce the computation
time of probability vectors at every tree node. Using SoftCache,
we observed that the total number of bytes transferred during a
complete run of the application was reduced by as much as 89%,
resulting in up to 1.7x (81% of the theoretical peak) and 3.5x (75%
of the theoretical peak) higher accelerator performance (as seen by
the application) for a GPU and an FPGA accelerator, respectively.