Minisymposium Presentation
Leveraging Large Datasets to Assess the Potential of Machine Learning for Drug Target Prediction through Reverse Screening
Presenter
Description
Estimating protein targets of compounds based on the similarity principle is a long-standing strategy in drug discovery. Building upon prior quantification of this principle, the large-scale assessment of its predictive power was performed using an unprecedented vast external test set of more than 300’000 active small molecules against another bioactivity set of more than 500’000 compounds. It was found that machine-learning can predict the correct targets, with the highest probability among 2069 proteins, for more than 51% of the external molecules. The strong enrichment thus obtained demonstrates its usefulness in supporting phenotypic screens, polypharmacology, or repurposing. Moreover, the impact of the bioactivity knowledge available for proteins in terms of number and diversity of actives was investigated. This study advocates for the adoption of application-oriented benchmarking strategies to prevent accidental overestimation of their predictive ability, and the use of large, high-quality, non-overlapping datasets.