Back

Minisymposium

MS5E - Julia for HPC: Tools and Applications - Part I

Fully booked
Wednesday, June 5, 2024
9:00
-
11:00
CEST
HG E 3

Replay

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Session Chair

Description

Performance portability and scalability on large-scale heterogeneous hardware represent crucial aspects challenging current scientific software development. Beyond software engineering considerations, workflows making further use of large datasets to constrain physical models are also emerging and are indispensable to develop, e.g., digital twins. GPU computing and differentiable programming constitute leading-edge tools that provide a promising way to combine physics-based simulations with novel machine learning and AI based methods to address interdisciplinary problems in science. The Julia language leverages both tools, as it includes first-class support for various accelerator types and an advanced compiler interface that supports native automatic differentiation capabilities. Julia makes it possible to differentiate efficiently through both CPU and GPU code without significant impact on performance. The goal of this minisymposium is to bring together scientists who work on or show interest in large-scale Julia HPC development, with a particular focus on the necessary tool stack for automatic differentiation and machine learning in the Julia GPU ecosystem, and on applications built on top of it. The selection of speakers, with expertise spanning from computer to domain science, offers a unique opportunity to learn about the latest development of Julia for HPC to drive discoveries in natural sciences.

Presentations

9:00
-
9:30
CEST
Enzyme.jl: High-Performance, Cross-Language, and Parallel Automatic Differentiation in Julia

Automatic differentiation (AD) is key to training neural networks, Bayesian inference, and scientific computing. Applying these techniques requires rewriting code in a specific machine learning framework or manually providing derivatives. This talk presents Enzyme, a high-performance automatic differentiation compiler plugin for the low-level virtual machine (LLVM) compiler capable of synthesizing gradients of programs expressed in the LLVM intermediate representation (IR). Enzyme differentiates programs in any language whose compiler targets LLVM, including C/C++, Fortran, Julia, Rust, JaX, Swift, etc., thereby providing native AD capabilities in these languages with state-of-the-art performance. Unlike traditional tools, Enzyme performs AD on optimized IR. We show that AD on optimized IR achieves a geometric mean speedup of 4.2x over AD on IR before optimization, and orders of magnitude speedups on GPU accelerator codes.

This talk will discuss AD from the lens of Enzyme.jl, Julia bindings for Enzyme. While Enzyme is applicable to any LLVM-based programming language, working within Julia presents several opportunities and challenges. Julia makes it easy to write generic code that can be automatically retargeted for any backend, without the programmer needing to also become an expert in these models. This flexibility, however, comes at a cost of just-in-time compilation, and garbage collection.

William Moses (University of Illinois Urbana-Champaign)
With Thorsten Kurth (NVIDIA Inc.)
9:30
-
10:00
CEST
Towards High-Performant, Large-Scale Tensor Network Simulations

Tensor Networks are in the center of every disproval of quantum advantage claims in these recent years. Achieving quantum advantage, the predicted theoretical moment when quantum computers will beat classical computers on some problems, is key for the general adoption of quantum computers. But proving it theoretically has proven to be a task of great difficulty. Due to that, researchers have tried to attack the problem from an empirical point of view, by finding a problem which is solvable in a quantum computer, but intratable in a supercomputer. Tensor Networks are a mathematical framework for multilinear operators which originated in the Condensed Matter community but have also proven to be very powerful in Quantum Information/Computation and Machine Learning tasks. In this talk, I will speak about our efforts on developing a modular Tensor Network and Quantum Simulation framework for high-performant and large-scale simulations in some of the top supercomputers in the world.

Sergio Sánchez Ramírez (Barcelona Supercomputing Center)
With Thorsten Kurth (NVIDIA Inc.)
10:00
-
10:30
CEST
AMD GPU Programming in Julia for High-Performance Real-Time Neural Rendering

AMD GPU programming in Julia has seen significant improvements in performance and stability over the past year, transitioning from two disjoined runtime APIs to a single stack, improving the device support and adding new features.In this presentation, we demostrate key changes that have been made to the AMDGPU.jl package, which provides support for programming AMD GPUs, enabling a host of new applications.

To showcase the impact of these changes, we present state-of-the-art real-time neural rendering algorithms developed entirely in Julia in a backend-agnostic manner.Given a set of images, these algorithms reconstruct an environment in a matter of minutes and allow the user to interact with it during training and evaluation.To achieve real-time performance, these implementations incorporate optimized hand-written GPU kernels that integrate with Automatic Differentiation systems, offering a high-level interface without sacrificing performance.

Simplicity of implementation, seamless support for multiple backends, and real-time performance of these algorithms position the Julia language as a strong candidate for high-performance computing.

Anton Smirnov (AMD)
With Thorsten Kurth (NVIDIA Inc.)
10:30
-
11:00
CEST
Multi-GPU Optimization of a Large-Scale Cortical Model of Human-Like Gaze Behaviour

We introduce a large-scale biophysical model for dynamic visual target selection, mimicking human gaze behavior, optimized using Julia programming on multiple GPUs. Our dynamic mean-field model sequentially generates visual targets, accommodating network sizes up to 25600 neural populations, with connectivity matrices reaching up to 25600x25600 neural connections, totaling over 600 million connections. To achieve human-like behavior, we employ Bayesian optimization for parameter tuning, enabling efficient optimization through iterative updates of a probabilistic surrogate model. This enables the model to generate temporally-accurate visual targets to relevant scene locations. Optimization procedures are executed in parallel on 96 instances of the network via GPU supercomputing, simulating over 60 billion neural connections. One iteration of optimizing the largest model takes 70 seconds using 96 GPUs with 99% parallel efficiency. Implementation relies on Julia programming, accessing highly optimized vendor libraries for matrix-vector operations and fast Fourier transformations (CUBLAS and CUFFT for Nvidia GPUs), and utilizing ParallelStencil.jl for stencil computations. MPI enables distributed memory parallelization without communication during function evaluation. We unify the codebase using ParallelStenci.jl to enable both single CPU prototyping and large-scale GPU or CPU runs. This multi-GPU application achieves near-optimal performance and scales efficiently to thousands of NVIDIA Tesla P100 GPUs at CSCS.

Vaishnavi Narayanan (Maastricht University), Samuel Omlin (ETH Zurich / CSCS), and Mario Senden (Maastricht University)
With Thorsten Kurth (NVIDIA Inc.)