Minisymposium Presentation
Foundational Models and Workflows: Enhancing Deep Learning Comparisons in Drug Response Studies
Presenter
Dr. Neeraj Kumar is a Chief Data Scientist in the Advanced Computing, Mathematics, and Data Division at the Pacific Northwest National Laboratory (PNNL). With a profound dedication to advancing the fields of data science, computing, and applied mathematics, Dr. Kumar has dedicated over a decade to the exploration and expansion of the horizons in applied machine learning, artificial intelligence, probabilistic programming, natural language processing, quantum computing, and innovative modeling and simulation methods. Dr. Kumar's expertise transcends theoretical frameworks, extending into practical applications in science, and engineering missions. He tackles both fundamental and applied scientific challenges, with a strategic focus on developing scalable AI/ML products, enhancing computational chemistry, and materials science, and pioneering digital molecular discovery through advanced analytics and high performance computing. He has published many peer-reviewed articles, highlights, workshops, and technical conference proceedings. His leadership and expertise have guided numerous data science research programs, underpinning multidisciplinary team efforts to push the boundaries of scientific discovery and innovation.
Description
In the evolving field of computational drug design and discovery, the accurate prediction of drug responses through deep learning models remains a significant challenge due to varying methodologies in model implementation and validation. This inconsistency hampers the objective assessment of model capabilities across different drug representation methods, architectures, and datasets. As models become more complex and datasets more diverse, the necessity for standardized model comparison methodologies becomes imperative. Traditional comparison approaches, which typically rely on performance scores from disparate studies, lead to incomparable and inconsistent results, obstructing the understanding of factors critical to predictive performance. Addressing this issue, I will discuss our results based on foundaitonal models and large scale CMP-CV workflow, an automated cross-validation framework designed for the consistent training and evaluation of multiple deep-learning models. By employing standardized datasets, preprocessing techniques, and performance metrics, CMP-CV fosters controlled experimentation while allowing systematic variation in model hyperparameters and architectures. Additionally, the framework supports custom analytical functions, enabling a more profound investigation into model representations and associated uncertainties, thereby establishing a more standardized and comprehensive approach to model comparison in drug response prediction.