Minisymposium Presentation
Transferring a Molecular Foundation Model for Polymer Property Predictions
Presenter
John Gounley is a senior computational scientist and group leader in the Computational Sciences and Engineering Division at Oak Ridge National Laboratory, where he works on scalability of algorithms for biomedical simulations and data.
Description
Transformer-based large language models have remarkable potential to accelerate design optimization for applications such as drug development and material discovery. Self-supervised pretraining of transformer models requires large-scale data sets, which are often sparsely populated in topical areas such as polymer science. State-of-the-art approaches for polymers conduct data augmentation to generate additional samples but unavoidably incur extra computational costs. In contrast, large-scale open-source data sets are available for small molecules and provide a potential solution to data scarcity through transfer learning. In this presentation, we discuss using transformers pretrained on small molecules and fine-tuned on polymer properties. We find that this approach achieves comparable accuracy to those trained on augmented polymer data sets for a series of benchmark prediction tasks.