Back

Minisymposium

MS4B - Machine Learning Support for the Lifetime of Software (ML4SW)

Fully booked
Tuesday, June 4, 2024
16:00
-
18:00
CEST
HG F 3

Replay

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Session Chair

Description

Scientific simulations running on High Performance Computing (HPC) systems play a critical rolein advancing science and engineering. The HPC community stands to gain significantly by applyingcutting edge AI technologies, such as Large Language Models (LLMs), Deep Neural Networks(DNNs), or Transformers, in various aspects of scientific software development and execution. The Machine Learning Support for The Lifetime of Software (ML4SW) minisymposium aims to establish a platform where scientists, developers, and system programmers can come together to exchange ideas and explore how artificial intelligence can help in the effective use of future systems as well as how Scientific Machine Learning can be scaled on HPC systems.

Presentations

16:00
-
16:30
CEST
Machine Learning for Performance Engineering Across Applications

Developing fast and portable HPC codes is an evolving process that spans the entire lifetime of an application, adapting to changes in target hardware and new software optimization techniques over time. Unfortunately, because of subtle differences in codes, the lessons learned from optimizing one application are often re-learned in others, despite sharing similar optimization spaces. This talk will shine a spotlight on ML-based techniques to identify optimization opportunities across applications. Using fuzzy matching, performance embedding spaces, and transfer tuning, performance engineers are now able to cluster subprograms by static and dynamic characteristics.We will discuss the productivity benefits of state-of-the-art methods and the potential computational reduction (within and across applications) - turning hundreds of hours of auto-tuning to a "performance database" query and local search - using a real-world case study in an atmospheric model. Lastly, we will discuss future directions in the field and how we can potentially leverage machine learning to aid performance engineering more generally.

Tal Ben-Nun (Lawrence Livermore National Laboratory)
With Thorsten Kurth (NVIDIA Inc.)
16:30
-
17:00
CEST
Large Language Models for Parallel and HPC Code

Large Language Model-based coding assistants have already proven to be extremely useful tools for aiding the efficiency and correctness of software developers. Adapting these tools in scientific software development will greatly improve the quality and quantity of scientific code being developed, leading to the creation of more robust and dependable software tools and frameworks. However, scientific code often comes with many requirements such as parallelism, performance, and fidelity that LLMs do not yet handle well. In this presentation, we talk about the current status of state of the art LLMs for parallel and HPC code and how to improve their performance in assisting with scientific software development.

Daniel Nichols (University of Maryland)
With Thorsten Kurth (NVIDIA Inc.)
17:00
-
17:30
CEST
High Performance Kernel Code Generation Using Generative AI

Generative Artificial Intelligence (AI) technologies, such as GPT and Llama, have shown promise in facilitating code generation across a variety of programming languages. However, the domain of high-performance scientific computing, which demands specialized expertise, presents unique challenges that have led to mixed results in terms of both performance and correctness when applying Generative AI. This presentation will delve into our experiments with employing Generative AI to develop established high-performance computing kernels, such as AXPY, GEMV, and GEMM. We examine the deployment of these AI models across various parallel programming models and languages, including C++ (with OpenMP, OpenMP Offload, OpenACC, CUDA, HIP), Fortran (utilizing OpenMP, OpenMP Offload, OpenACC), Python (via numpy, Numba, pyCUDA, cuPy), and Julia (through Threads, CUDA.jl, AMDGPU.jl). Our analysis aims to assess the efficacy and correctness of Generative AI in generating scientific computing kernels, as well as its adaptability to the specialized requirements of high-performance scientific computing. Through this exploration, we intend to illuminate the potential of Generative AI as a tool for innovation within scientific computing, highlighting its capabilities and identifying its challenges that need to be overcome to fully leverage its potential.

Pedro Valero-Lara, William Godoy, and Keita Teranishi (Oak Ridge National Laboratory); Mustafa Al Lali and Alexis Huante (Texas A&M University); and Prasanna Balaprakash and Jeffery Vetter (Oak Ridge National Laboratory)
With Thorsten Kurth (NVIDIA Inc.)
17:30
-
18:00
CEST
Learning to Predict and Improve Build Successes in Package Ecosystems Using Graph Neural Networks

Modern software has reached an unprecedented level of complexity, consisting of tens or even hundreds of dependencies on various packages. In order to tackle this complexity, software ecosystems rely on automated package managers to analyze compatibility constraints among different packages and select a compatible set of package versions to install. Current approaches rely on experts with in-depth knowledge of packages and constraints to identify compatible versions. In practice, users often have to explore different choices of package versions to find an appropriate one that builds successfully. In this talk, we present a tool, called BuildCheck, to understand build incompatibilities, predict bad configurations, and assist developers in managing version constraints. We combine the capabilities of Graph Neural Networks and advanced package management technologies to offer solutions for managing package dependencies. Our tool, BuildCheck, evaluated on E4S software ecosystem consisting of 45, 837 data points can predict build outcomes with 91% accuracy eliminating very expensive trial-and-error exercises to find working builds. Furthermore, our novel self-supervised pre-training method using masked modeling was shown to improve the prediction accuracy when only a limited amount of data is available.

Harshitha Menon (Lawrence Livermore National Laboratory)
With Thorsten Kurth (NVIDIA Inc.)