Trading-off Accuracy with Computational Cost : Adaptive Algorithms to Reduce Time to Clinical Insight

Jumana Dakka,Kristof Farkas-Pall,Vivek Balasubramanian,Matteo Turilli,Shunzhou Wan,David W Wright,Stefan Zasada,Peter V Coveney,Shantenu Jha

semanticscholar（2018）

引用 0|浏览1

暂无评分

摘要

The efficacy of drug treatments depends on how tightly small molecules bind to their target proteins. Quantifying the strength of these interactions (the so called ‘binding affinity’) is a grand challenge of computational chemistry, surmounting which could revolutionize drug design and provide the platform for patient specific medicine. Recently, evidence from blind challenge predictions and retrospective validation studies has suggested that molecular dynamics (MD) can now achieve useful predictive accuracy (≤ 1 kcal/mol) This accuracy is sufficient to greatly accelerate hit to lead and lead optimization. To translate these advances in predictive accuracy so as to impact clinical and/or industrial decision making requires that binding free energy results must be turned around in timescales of hours without loss of accuracy. This demands advances in algorithms, scalable software systems, and intelligent and efficient utilization of supercomputing resources. Specifically, it necessitates refining algorithms and developing technologies to marshal huge simulation campaigns. This work is motivated by the real world problem of providing insight from drug candidate data on a time scale that is as short as possible. Specifically, we reproduce results from a collaborative project between UCL and GlaxoSmithKline to study a congeneric series of drug candidates binding to the BRD4 protein – inhibitors of which have shown promising preclinical efficacy in pathologies ranging from cancer to inflammation. We demonstrate the use of a framework called HTBAC, designed to support the aforementioned requirements of accurate and rapid drug binding affinity calculations. HTBAC facilitates the execution of the numbers of simulations while supporting the adaptive execution of algorithms. Furthermore, HTBAC enables the selection of simulation parameters during runtime which can, in principle, optimize the use of computational resources whilst producing results within a target uncertainty. I. SCIENTIFIC MOTIVATION Bromodomain-containing proteins, and in particular the four members of the BET (bromodomain and extra terminal domain) family, are currently a major focus of research in the pharmaceutical industry. Small molecule inhibitors of these proteins have shown promising preclinical efficacy in pathologies ranging from cancer to inflammation. Indeed, several compounds are progressing through early stage clinical trials and are showing exciting early results [1]. One of the most extensively studied targets in this family is the first bromodFig. 1. (L) Cartoon representation of the BRD4 bound to an inhibitor shown in chemical representation (based on PDB:4BJX). (R) Ligand in cartoon representation with the tetrahydroquinoline scaffold highlighted in magenta. The regions which are modified between ligands investigated are labelled 1 to 4. omain of bromodomain-containing protein 4 (BRD4-BD1) for which extensive crystallographic and ligand binding data are available [2]. We have investigated a congeneric series of ligands binding to BRD4-BD1 (we shall from now on refer to this are simply BRD4). This was performed in the context of a blind test of the protocols in collaboration with GlaxoSmithKline [3]. The goal was to benchmark free energy calculations in a realistic drug discovery scenario. In the original study, we investigated chemical structures of 16 ligands based on a single tetrahydroquinoline (THQ) scaffold. These studies employed two different algorithms (simulation protocols), known as TIES and ESMACS [4], both based on multiple simulations of the same system. Drug design projects have limited resources, so initially large numbers of compounds must be cheaply screened to eliminate poor binders (using ESMACS), before more accurate methods (such as TIES) are needed as good binders are refined and improved. In order to support such investigations, in addition to scale, the protocols must be executed to utilize flexible resource management schemes where based upon intermediate results at runtime, resources can be (re-)allocated between instances of different protocols or systems, for example, when one calculation has converged whilst another has not. Such adaptability makes it easier to manage complex programs where efficient use of resources is required in order to achieve a time to completion of studies comparable to those of high throughput chemistry. This work is motivated by the real world problem of providing insight from drug candidate data on a time scale that is as short as possible. We demonstrate the use of a framework – HTBAC, designed to support the aforementioned requirements of accurate and rapid drug binding affinity calculations. HTBAC facilitates the execution of the numbers of simulations while supporting the adaptive execution of algorithms. Furthermore, HTBAC enables the selection of experimental parameters during runtime, which can in principle optimize the use of computational resources whilst producing results with a target uncertainty. II. METHODS AND MODELS In this section we outline the computational methods employed and the physical system (drug candidates) studied. We discuss how computational methods have been co-designed with the software systems to support scalable approaches on the largest supercomputers. A. Binding Affinity Calculation Protocols Computing accurate protein-drug binding affinities (also known as binding free energies) requires a simulation technique which captures the chemical detail of the system. MD simulations are the time dependent numerical integration of the classical equations of motion for molecular systems. Application of MD to atomistic models of proteins and their ligands can be used to answer questions about the properties of a specific system often more readily than experiments on the actual system. Free-energy calculations in the framework of MD simulations not only yield quantitative estimates of binding strength but also provide insights into the most important interactions driving the process. Evidence from blind challenge predictions and retrospective validation studies has suggested that molecular dynamics (MD) can now achieve useful predictive accuracy (1 kcal/mol) [5], [6]. This accuracy is sufficient to greatly accelerate lead optimization [7]. Statistical mechanics provides the prescription for calculating such macroscopic quantities as ensemble averages of microscopic states. Traditionally, these macroscopic properties have usually been calculated from the time average of a single “long” duration trajectory. An intuitive and potentially more time efficient method to capture the mixing dynamics required to describe an equilibrium thermodynamic state is the use of an ensemble of separate trajectories. [8] The major sources of error in free energy calculations are the representation of the system chemistry encoded in the forcefield used, finite sampling and the free energy estimator. Protocols developed in the Coveney labs have obtained accurate and precise results which successfully reproduce experimental binding free energies from a wide range of systems. [9], [3], [10], [11] Comparisons of results obtained for a large set of sequences will provide valuable insights on the importance of choices made in simulation and analysis for the overall accuracy and predictive power of free energy calculations, and facilitate the refinement of our protocols. Most methods for calculating binding affinities fit into one of two broad classes; so called alchemical and endpoint methodologies. Alchemical free energy calculations employ unphysical (“alchemical”) intermediates to calculate changes in free energies between two systems. It is common in these methods to refer to a variable, λ, which describes the path taken to transform one protein sequence (or ligand) into another. Endpoint methods, as the name suggests, consider the difference in energy between bound and unbound structures. To obtain information on the differences in binding affinity of different sequences for a panel of kinase inhibitors requires a deployment of various strategies, incorporating both alchemical and endpoint approaches. In this work we deploy approaches from both of these classes. 1) Alchemical Protocol (TIES): Alchemical methods employ MD simulations of unphysical, alchemical intermediate states that attenuate the interactions of the small molecule with its environment. These alchemical intermediate states include both the fully-interacting complex as well as replicas in which the ligand does not interact with its environment, and allow the total free energy of binding—including entropic and enthalpic contributions—to be efficiently computed. Typically, the alchemical path between the states of interest is described by a parameter, λ, which varies between 0 for the initial and 1 for the final state of the transformation of interest. Sampling is then performed at a series of points along this path and the gradient of the energy integrated to calculate the binding affinity. Simulations conducted at a given λ value are said to be sampling a λ window at that point. The TIES (thermodynamic integration with enhanced sampling) protocol, developed within the Coveney lab, employs ensemble sampling at each λ window to yield reproducible, accurate, and precise relative binding affinities. [3] Based on the direct calculation of ensemble averages, it allows us to determine statistically meaningful results along with complete control of errors. As currently designed, TIES computes the change in binding affinity between two related system (termed ‘relative binding affinities’). 2) Endpoint Protocol (ESMACS): Computationally cheaper, but less rigorous methods, endpoint methods have been used to directly compute the binding strength of a drug to the target protein from MD simulations (as opposed to differences in affinity). We have developed an ensemble-based endpoint protocol called ESMACS (enhanced sampling of

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要