Understanding active learning of molecular docking and its applications
CoRR(2024)
Abstract
With the advancing capabilities of computational methodologies and resources,
ultra-large-scale virtual screening via molecular docking has emerged as a
prominent strategy for in silico hit discovery. Given the exhaustive nature of
ultra-large-scale virtual screening, active learning methodologies have
garnered attention as a means to mitigate computational cost through iterative
small-scale docking and machine learning model training. While the efficacy of
active learning methodologies has been empirically validated in extant
literature, a critical investigation remains in how surrogate models can
predict docking score without considering three-dimensional structural
features, such as receptor conformation and binding poses. In this paper, we
thus investigate how active learning methodologies effectively predict docking
scores using only 2D structures and under what circumstances they may work
particularly well through benchmark studies encompassing six receptor targets.
Our findings suggest that surrogate models tend to memorize structural patterns
prevalent in high docking scored compounds obtained during acquisition steps.
Despite this tendency, surrogate models demonstrate utility in virtual
screening, as exemplified in the identification of actives from DUD-E dataset
and high docking-scored compounds from EnamineReal library, a significantly
larger set than the initial screening pool. Our comprehensive analysis
underscores the reliability and potential applicability of active learning
methodologies in virtual screening campaigns.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined