Minisymposium
MS2E - Complex Autonomous Workflows that Interconnect AI/ML, Modeling and Simulation, and Experimental Instruments
Replay
Session Chair
Description
Recent advancements in edge computing, automation, and artificial intelligence (AI) have spurred the development of "smart" autonomous laboratories and facilities that integrate complex workflows across experimental instruments and distributed heterogeneous computational resources. The primary objective of our minisymposium is to showcase recent advances aimed at establishing research ecosystem capabilities that seamlessly integrate real-time experiment and computational workflows across edge devices, cloud computing, and high-performance computing (HPC) resources. While edge computing enables laboratories to rapidly process data at its source to reduce latency and improve real-time decision-making, AI/ML and HPC simulations are fundamental components that allow autonomous laboratories to rapidly learn, adapt, and optimize workflow processes. Our minisymposium presentations offer a distinct perspective that highlight common smart autonomous science workflows that transcend disciplinary boundaries to seamlessly foster cross-disciplinary collaboration and knowledge exchange. The full integration of AI/ML, modeling and simulation, and experimental instruments is an impossible task for a single group. Hence, our minisymposium is intended to both inform attendees of recent complex workflow efforts and build new collaborations to tackle future challenges with building an interoperable ecosystem for complex workflows that enable autonomous laboratories and facilities.
Presentations
As higher precision experimental instruments become available scientific facilities are shifting the way they run experiments. With the increase in experimental data rates scientists have had to rely more on large HPC resources for experimental post processing. One of the goals in developing the Superfacility model at The National Energy Scientific Research Center (NERSC) was to build up the ecosystem and expertise to facilitate connecting remote science experiments with supercomputing resources. In this talk I will highlight the work going into connecting some of these projects to our systems such as: experimental beamlines, electron microscopes, fusion experiments and more. I will go into detail about how these projects can use our API to automatically drive their computations as well as how the knowledge and expertise gained on one experiment is transferred to other experiments with the NERSC Science Acceleration Program (NESAP). I will provide insight into plans for expanding the APIs capabilities in preparation for our next system, N10, which is being designed specifically to enhance scientific workflows.
This presentation delves into cutting-edge developments in the creation of interoperable workflows that bridge the gap between experimental instruments and diverse computational platforms, highlighting the push towards an integrated research ecosystem with a focus on seamlessly integrating experiments with sophisticated modeling and simulations across the edge-to-cloud-HPC continuum. These workflows, spanning various scientific domains, require the intricate melding of AI/ML, computational modeling, and complex multi-domain simulations. This presentation will showcase recent initiatives to enable the integration of distributed and diverse scientific systems, underscoring the necessity of establishing a common, interoperable ecosystem to support both current and future scientific workflows.
Scientific workflows are becoming more complex. While previously computations were focused only on a single computational domain, nowadays it is common that users pursue the combination of traditional HPC simulations, together with High Performance Data Analytics and Machine Learning algorithms, while having complex mechanisms to manage their executions (fault tolerance, heterogeneity, …). In this presentation, we will show our current efforts towards simplifying the lifecycle management of HPC, data analytics and AI workflows for both scientific and industrial applications. The concept of HPC Workflows as a Service will be introduced and its application will be shown in the domains of prototyping of complex manufactured objects, adaptive workflows for climate and tropical cyclones studies and urgent computing for natural hazards (in particular, earthquakes and their associated tsunamis). Besides, we will also show our efforts towards the adaptation of such complex workflows for the computing continuum (cloud, edge and IoT infrastructures) through the design of a meta-OS. Applications on the domain of personalised healthcare (stroke detection) will be shown. Finally, our newest proposal focused on providing a programming model and architecture for an easy development of applications in swarm computing environments will also be introduced.
In modern experimental facilities, advanced detectors gather data at rates of multiple gigabytes per second. To manage this massive influx of data effectively, experiment-time analysis techniques that allow tailoring of experiment run (e.g. filter out irrelevant data elements ) are required. Achieving such automation at scale, necessitates the development of distributed computing pipelines, which connect instruments, computers (for analysis, simulation, and AI model training), edge computing systems (for analysis), data repositories, and metadata catalogs. Further, resulting data or information needs to be delivered in a form that can be easily discovered and accessed by end users. The Globus platform (globus.org) offers various capabilities to that end: reliable and secure data and compute management, and managed task orchestrations system for automation, all underpinned by sophisticated standards compliant security to bridge across distributed systems. In this talk, we’ll describe these capabilities and their application for accelerating time to science. Drawing from our experiences processing data from various beamlines at Argonne National Lab’s Advanced Photon Source by utilizing powerful capabilities at the Argonne Leadership Computing Facility and other services, we discuss the common patterns and implementation, along with the implications of these methods for both facility operators and scientists.