Back

Minisymposium Presentation

Optimizing Dataflow Pipelines from Self-Driving Labs to the Cloud

Monday, June 3, 2024
15:30
-
16:00
CEST
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Chemistry and Materials
Chemistry and Materials
Chemistry and Materials
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Humanities and Social Sciences
Humanities and Social Sciences
Humanities and Social Sciences
Engineering
Engineering
Engineering
Life Sciences
Life Sciences
Life Sciences
Physics
Physics
Physics

Presenter

Michela
Taufer
-
University of Tennessee

Michela Taufer, an ACM Distinguished Scientist, holds the Dongarra Professorship in High-Performance Computing at the University of Tennessee Knoxville. She completed her PhD in Computer Science from ETH Zurich. Taufer was a postdoctoral fellow at UC San Diego and The Scripps Research Institute, focusing on interdisciplinary projects in computational chemistry and computer systems. Her research spans high-performance computing, cloud computing, and volunteer computing, aiming to enhance AI for science. Taufer has held leadership roles at major HPC conferences and is the editor-in-chief of Future Generation Computer Systems. She is an advocate for diversity in academia and interdisciplinary research.

Description

The rapid advancements in cloud computing and the integration of experimental facilities, including self-driving labs, have resulted in an era where scientists can generate unprecedented amounts of data and conduct more extensive analyses across various scientific domains, including chemistry, materials sciences, molecular biology, and drug design. This capability enables a broader exploration of natural phenomena but also introduces significant challenges in effectively composing and scaling dataflow pipelines. This talk addresses these challenges by presenting innovative solutions for optimizing dataflow pipelines across cloud resources, thereby enhancing the study and application of scientific dataflows.

This talk will cover three main research components of our work when optimizing dataflow pipelines from self-driving labs to the cloud. First, we establish a taxonomy of common dataflow motifs ranging from simple producer-consumer pairs to complex multi-scale pipelines, applying these motifs to real-world use cases. Second, we discuss methods to mitigate data loss and pipeline inefficiencies, especially those arising from disparities in moving pipelines traditionally executed on high performance computing systems to the cloud. Last, we highlight our efforts to train and build a community of experts, emphasizing the development of tailored data analytics material across scientific domains.

Authors