Minisymposium Presentation

Transferring a Molecular Foundation Model for Polymer Property Predictions

Tuesday, June 4, 2024

12:30

13:00

CEST

Climate, Weather and Earth Sciences

Chemistry and Materials

Computer Science and Applied Mathematics

Engineering

Life Sciences

Physics

Presenter

John

Gounley

Oak Ridge National Laboratory

John Gounley is a senior computational scientist and group leader in the Computational Sciences and Engineering Division at Oak Ridge National Laboratory, where he works on scalability of algorithms for biomedical simulations and data.

Watch replay

Description

Transformer-based large language models have remarkable potential to accelerate design optimization for applications such as drug development and material discovery. Self-supervised pretraining of transformer models requires large-scale data sets, which are often sparsely populated in topical areas such as polymer science. State-of-the-art approaches for polymers conduct data augmentation to generate additional samples but unavoidably incur extra computational costs. In contrast, large-scale open-source data sets are available for small molecules and provide a potential solution to data scarcity through transfer learning. In this presentation, we discuss using transformers pretrained on small molecules and fine-tuned on polymer properties. We find that this approach achieves comparable accuracy to those trained on augmented polymer data sets for a series of benchmark prediction tasks.