# Deep neural network in QSAR studies using deep belief network

Applied Soft Computing, pp. 251-258, 2018.

EI

Weibo:

Abstract:

•A novel QSAR network to improve the biological activity prediction is proposed.•In contrast with previous common methods, it is appropriate for high throughput screening.•Deep belief network (DBN) is suggested to solve QSAR problems such as over-fitting.•DBN was exploited to select the initial parameters of deep neural network (DNN).•The...More

Code:

Data:

Introduction

- Machine learning is a computer programming technique applicable in statistical and mathematical research.
- Some conventional techniques include: ANN1 [2,3], KNN2 [4,5,6], RF3 [7], SVM4 [8,9,10], MLR5 [11] one against one [12], Bayes classifier [13] and kernel based methods such as Gaussian process [14,15]
- These methods mostly suffer from the same drawbacks, i.e. relying on a small number of ligands and a limited selection of descriptors.
- The number of these operations refers to the longest path from an input node to an output one

Highlights

- Machine learning is a computer programming technique applicable in statistical and mathematical research
- Lots of machine learning algorithms have been applied in drug design
- Deep architecture becomes essential when high amount of data are under process
- Experimental results show that DBN approach can be used to determine the initial deep neural network (DNN) parameters, biases and weights, which in most cases outperform DNN method
- The results indicated that DBN-DNN model performed well, and the mean correlation of all data sets were greater than mentioned algorithms while the range of the changes was smaller than others
- The results revealed that an optimization in initialization will improve DNN potential to provide high quality predicting models

Methods

- The proposed model was executed in Matlab (2016).
- The main aim of DBN is the weight initialization of a deep neural network to produce optimum model in comparison to the model by random weights.
- This approach makes the predictions extremely effective.
- DBN can be effectively used to perform layer by layer pre-training intended to initialize training of a back propagation algorithm

Results

- Seventy five percent of each group was extracted randomly as training set including descriptors and activity values.
- They were used as to realize optimal initial parameters of DNN by DBN.
- Four different results were averaged to achieve a single prediction
- This procedure was repeated for thirty times.
- All of the thirty results were averaged to predict the molecular activities

Conclusion

- The main purpose of this research was to examine the proper parameter initialization in deep neural networks using deep belief networks.
- RBM is the best suited method used in each layer of network.
- This combination was assumed to be an efficient novel answer to the problems pointed out earlier.
- This solution is an important advantage to avoid output variation from one input to the other.
- It helps them to control the problems such as, over-fitting and being stuck in local minimum

Summary

## Introduction:

Machine learning is a computer programming technique applicable in statistical and mathematical research.- Some conventional techniques include: ANN1 [2,3], KNN2 [4,5,6], RF3 [7], SVM4 [8,9,10], MLR5 [11] one against one [12], Bayes classifier [13] and kernel based methods such as Gaussian process [14,15]
- These methods mostly suffer from the same drawbacks, i.e. relying on a small number of ligands and a limited selection of descriptors.
- The number of these operations refers to the longest path from an input node to an output one
## Objectives:

The main purpose of this research was to examine the proper parameter initialization in deep neural networks using deep belief networks.## Methods:

The proposed model was executed in Matlab (2016).- The main aim of DBN is the weight initialization of a deep neural network to produce optimum model in comparison to the model by random weights.
- This approach makes the predictions extremely effective.
- DBN can be effectively used to perform layer by layer pre-training intended to initialize training of a back propagation algorithm
## Results:

Seventy five percent of each group was extracted randomly as training set including descriptors and activity values.- They were used as to realize optimal initial parameters of DNN by DBN.
- Four different results were averaged to achieve a single prediction
- This procedure was repeated for thirty times.
- All of the thirty results were averaged to predict the molecular activities
## Conclusion:

The main purpose of this research was to examine the proper parameter initialization in deep neural networks using deep belief networks.- RBM is the best suited method used in each layer of network.
- This combination was assumed to be an efficient novel answer to the problems pointed out earlier.
- This solution is an important advantage to avoid output variation from one input to the other.
- It helps them to control the problems such as, over-fitting and being stuck in local minimum

- Table1: Fifteen different Kaggle data sets utilized in this study [<a class="ref-link" id="c14" href="#r14">14</a>]
- Table2: Squared correlation of all targets with their standard deviations for MLR, RF, ANN, DNN and DBN-DNN

Related work

- Recently, the number of biologically active molecules and molecular descriptors has raised exponentially. Parallel to this increment, deep neural network (DNN) which is a multilayer perceptron (MLP) network with many hidden layers and plenty of nodes in each layer could not overcome prone to over-fitting and getting stuck in local minima problems in drug discovery the same as other research area such as image processing and speech processing [1,2]. Hinton et al [16] introduced a fast and greedy algorithm to improve each layer using RBM.7 This method was used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm [16]. He showed that this invented algorithm could prevent over-fitting problem. Later, Benjio et al in 2009 proposed deep architecture, in which single-layer models such as RBM were exploited as unsupervised learning building blocks to construct deeper models such as DBN8 [17]. After that, Hinton (2012) introduced a practical guide useful to construct RBM algorithm step by step [18]. In 2014, new algorithm was introduced to prevent over-fitting problem by Srivastava named drop-out [19]. Nowadays, DL has been successfully applied in different processing fields such as computer vision, speech processing, image processing and chemo-informatics [20].

Funding

- This work was supported from the vice chancellery of research in Isfahan University of Medical Sciences.

Study subjects and analysis

targets of Kaggle data sets: 15

In the current study, deep belief network is exploited to initialize deep neural networks. All fifteen targets of Kaggle data sets containing more than 70 k molecules have been utilized to investigate the model performance. The results revealed that an optimization in parameter initialization will improve the ability of deep neural networks to provide high quality model predictions

data sets: 15

To comparison outputs of two different models, the differences of RMSE between them for all of the targets were calculated. Fig. 6 shows that the RMSE averaged over all DBN-DNN and all 15 data sets are 0.658 greater in value than that of DNN. The red lines indicate the average of RMSE obtained by DNN and DBN-DNN methods, respectively

Reference

- J.P. Ceron-Carrasco, T. Coronado-Parra, B. Imbernón-Tudela, A.J. Banegas-Luna, F. Ghasemi, J.M. Vegara-Meseguer, I. Luque, S. Sik, S. Trædal-Henden, H. Pérez-Sánchez, Application of Computational Drug Discovery Techniques for Designing New Drugs Against Zika Virus, Open Access, Drug Designing, 2016, pp. 1–2.
- M. Shahlaei, A. Fassihi, L. Saghaie, Application of PC-ANN and PC-LS-SVM in QSAR of CCR1 antagonist compounds: a comparative study, Eur. J. Med. Chem. 45 (2010) 1572–1582.
- H. Pérez-Sánchez, G. Cano, J. García-Rodríguez, Improving drug discovery using hybrid softcomputing methods, Appl. Soft Comput. 20 (2014) 119–126.
- S. Ajmani, K. Jadhav, S.A. Kulkarni, Three-dimensional QSAR using the k-nearest neighbor method and its interpretation, J. Chem. Inf. Model. 46 (2006) 24–31.
- P. Itskowitz, A. Tropsha, k nearest neighbors QSAR modeling as a variational problem: theory and applications, J. Chem. Inf. Model. 45 (2005) 777–785.
- F. Nigsch, A. Bender, B. van Buuren, J. Tissen, E. Nigsch, J.B. Mitchell, Melting point prediction employing k-nearest neighbor algorithms and genetic parameter optimization, J. Chem. Inf. Model. 46 (2006) 2412–2422.
- V.E. Kuz’min, P.G. Polishchuk, A.G. Artemenko, S.A. Andronati, Interpretation of QSAR models based on random forest methods, Mol. Inf. 30 (2011) 593–603.
- M. Shahlaei, A. Fassihi, QSAR analysis of some 1-(3, 3-diphenylpropyl)-piperidinyl amides and ureas as CCR5 inhibitors using genetic algorithm-least square support vector machine, Med. Chem. Res. 22 (2013) 4384–4400.
- M. Shahlaei, R. Sabet, M.B. Ziari, B. Moeinifard, A. Fassihi, R. Karbakhsh, QSAR study of anthranilic acid sulfonamides as inhibitors of methionine aminopeptidase-2 using LS-SVM and GRNN based on principal components, Eur. J. Med. Chem. 45 (2010) 4499–4508.
- G. Cano, J. García-Rodríguez, H. Pérez-Sánchez, Improvement of virtual screening predictions using computational intelligence methods, Lett. Drug Des. Discov. 11 (2014) 33–39.
- M. Shahlaei, A. Madadkar-Sobhani, A. Fassihi, L. Saghaie, D. Shamshirian, H. Sakhi, Comparative quantitative structure–activity relationship study of some 1-aminocyclopentyl-3-carboxyamides as CCR2 inhibitors using stepwise MLR, FA-MLR, and GA-PLS, Med. Chem. Res. 21 (2012) 100–115.
- F, A. Ghasemi, Mehri, J. Pena-García, H. den-Haan, A. Pérez-Garrido, H. Fassihi, Improving activity prediction of adenosine A2B receptor antagonists by nonlinear models, in: International Conference on Bioinformatics and Biomedical Engineering, Springer, 2015, pp. 635–644.
- A. Koutsoukas, R. Lowe, Y. KalantarMotamedi, H.Y. Mussa, W. Klaffke, J.B. Mitchell, R.C. Glen, A. Bender, In silico target predictions: defining a benchmarking data set and comparison of performance of the multiclass naïve bayes and parzen-rosenblatt window, J. Chem. Inf. Model. 53 (2013) 1957–1966.
- F.R. Burden, Quantitative structure-activity relationship studies using gaussian processes, J. Chem. Inf. Comput. Sci. 41 (2001) 830–835.
- R. Lowe, H.Y. Mussa, J.B. Mitchell, R.C. Glen, Classifying molecules using a sparse probabilistic kernel binary classifier, J. Chem. Inf. Model. 51 (2011) 1539–1544.
- G.E. Hinton, S. Osindero, Y.-W. Teh, A fast learning algorithm for deep belief nets, Neural Comput. 18 (2006) 1527–1554.
- Y. Bengio, Learning Deep Architectures for AI, vol. 2, Foundations and trends® in Machine Learning, 2009, pp. 1–127.
- G. Hinton, A practical guide to training restricted Boltzmann machines, Momentum 9 (2010) 926.
- N. Srivastava, G.E. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, J. Mach. Learn. Res. 15 (2014) 1929–1958.
- L. Hinton, D. Deng, G.E. Yu, N. Mohamed, A. Jaitly, V. Senior, P. Vanhoucke, T.N. Sainath, Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups, Signal Process. Mag. IEEE 29 (2012) 82–97.
- A. Lusci, G. Pollastri, P. Baldi, Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules, J. Chem. Inf. Model. 53 (2013) 1563–1575.
- Y. Wang, J. Zeng, Predicting drug-target interactions using restricted Boltzmann machines, Bioinformatics 29 (2013) i126–i134.
- T. Unterthiner, A. Mayr, G. Klambauer, M. Steijaert, J.K. Wegner, H. Ceulemans, S. Hochreiter, Deep Learning for Drug Target Prediction, 2014.
- J. Ma, R.P. Sheridan, A. Liaw, G.E. Dahl, V. Svetnik, Deep neural nets as a method for quantitative structure–activity relationships, J. Chem. Inf. Model. 55 (2015) 263–274.
- T.B. Hughes, G.P. Miller, S.J. Swamidass, Modeling epoxidation of drug-like molecules with a deep machine learning network, ACS Cent. Sci. 1 (2015) 168–180.
- F. Ghasemi, A. Fassihi, H. Pérez-Sánchez, A. Mehri Dehnavi, The role of different sampling methods in improving biological activity prediction using deep belief network, J. Comput. Chem. (2016).
- V. Svetnik, A. Liaw, C. Tong, J.C. Culberson, R.P. Sheridan, B.P. Feuston, Random forest: a classification and regression tool for compound classification and QSAR modeling, J. Chem. Inf. Comput. Sci. 43 (2003) 1947–1958.
- A. Liaw, M. Wiener, Classification and regression by randomForest, R news 2 (2002) 18–22.
- L.S. Aiken, S.G. West, S.C. Pitts, Multiple Linear Regression, Handbook of Psychology, 2003.
- A.R. Leach, V.J. Gillet, An Introduction to Chemoinformatics, Springer Science & Business Media, 2007.

Full Text

Tags

Comments