Minisymposium Presentation
Julia-Based Multitask Surrogate Models for Heterogeneous Data Generated by Physical Models
Description
Physical data is increasingly openly accessible though it may be challenging to definitively rank the accuracy of different information sources. We demonstrate that multitask Gaussian process regression can leverage “datasets of opportunity” to efficiently construct surrogate models. In particular, we consider training sets constructed from coupled-cluster (CC) and density functional theory (DFT) data generated with multiple exchange-correlation functional approximations. The cost of CC calculation scales at a rate of N to the power of seven where N is the number of atoms in the system while DFT demonstrates relatively tractable N cubed scaling. We report that multitask surrogates can predict at CC level accuracy with a reduction to data generation cost by over an order of magnitude. This interdisciplinary effort has been facilitated by Julia packages for atomistic computation and for the custom design of optimization and Gaussian process models. If time permits, we will discuss the extension of our computational models to produce calibrated uncertainty indicators for each prediction.