1 dimensionality reduction using neural networks
msra(2007)
摘要
A multi-layer neural network with multiple hidden layers was trained as an autoencoder using steepest descent, scaled conjugate gradient and alopex algorithms. These algorithms were used in different combinations with steepest descent and alopex used as pretraining algorithms followed by training using scaled conjugate gradient. All the algorithms were also used to train the autoencoders without any pretraining. Three datasets: USPS digits, MNIST digits, and Olivetti faces were used for training. The results were compared with those of Hinton et al. (Hinton and Salakhutdinov, 2006) for MNIST and Olivetti face dataset. Results indicate that while we were able to prove that pretraining is important for obtaining good results, the pretraining approach used by Hinton et al. obtains lower RMSE than other methods. However, scaled conjugate gradient turned out to be the fastest, computationally. 1. INTRODUCTION Dimensionality reduction is a method of obtaining the information from a high dimensional feature space using fewer intrinsic dimensions. Reducing dimensionality of high dimensional data is good for better classification, regression, presentation and visualization of data. Recently Hinton et al. (Hinton and Salakhutdinov, 2006) used a deep autoencoder for dimensionality reduction of multiple datasets. The autoencoders are multi-layer identity mapping neural networks represented by a function f(x) = x, where x is a multidimensional input vector to the network. They argue that deep autoencoders could be easily trained using a gradient descent method provided the initial weights are near good solutions. They claimed that by pretraining, they were able to obtain a good set of initial weights and the fine tuning which followed the pretraining approach was able to reduce the data dimensionality very efficiently. Their results support their arguments, however, there still remain some areas which require additional studies. Firstly, they reported deep autoencoders showed significant improvement when pretrained over the ones without pretraining (see supporting material of (Hinton and Salakhutdinov, 2006) for details). However, they studied conjugate gradient algorithm with line search for fine tuning and their results cannot be extended to other gradient based methods which do not use line search. Secondly, their pretraining approach is very complicated. It assigns a probability to every possible image via an energy function. We thought it would be pertinent to pretrain a deep autoencoder with a less complicated approach. Thirdly, one of the problems that has been identified with training multi-layer neural networks using gradient based algorithms, is the problem of local minima. Keeping this …
更多查看译文
关键词
neural network,steepest descent
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络