# Deep Anomaly Detection Using Geometric Transformations

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), pp. 9758-9769, 2018.

EI

Keywords:

Weibo:

Abstract:

We consider the problem of anomaly detection in images, and present a new detection technique. Given a sample of images, all known to belong to a "normal" class (e.g., dogs), we show how to train a deep neural model that can detect out-of-distribution images (i.e., non-dog objects). The main idea behind our scheme is to train a multi-clas...More

Code:

Data:

Introduction

- Future machine learning applications such as self-driving cars or domestic robots will, inevitably, encounter various kinds of risks including statistical uncertainties.
- To be usable, these applications should be as robust as possible to such risks.
- The well-known problem of anomaly/novelty detection highlights some of these risks, and its resolution is of the utmost importance to mission critical machine learning applications.
- In machine vision applications, presently available novelty detection methods can suffer from poor performance in some problems, as demonstrated by the experiments

Highlights

- Future machine learning applications such as self-driving cars or domestic robots will, inevitably, encounter various kinds of risks including statistical uncertainties
- We presented a novel method for anomaly detection of images, which learns a meaningful representation of the learned training data in a fully discriminative fashion
- The proposed method is computationally efficient, and as simple to implement as a multi-class classification task
- Unlike best-known methods so far, our approach completely alleviates the need for a generative component
- Our method significantly advances the state-of-the-art by offering a dramatic improvement over the best available anomaly detection methods
- Our algorithm significantly outperformed the other methods
- It is important to develop a theory that grounds the use of geometric transformations

Methods

- The authors compare the method to state-of-the-art deep learning approaches as well as a few classic methods.

One-Class SVM. - The one-class support vector machine (OC-SVM) is a classic and popular kernelbased method for novelty detection [29, 30].
- It is typically employed with an RBF kernel, and learns a collection of closed sets in the input space, containing most of the training samples.
- The convolutional autoencoder is chosen to have a similar architecture to that of DCGAN [26], where the encoder is adapted from the discriminator, and the decoder is adapted from the generator

Results

- The authors describe the experimental setup and evaluation method, the baseline algorithms the authors use for comparison purposes, the datasets, and the implementation details of the technique.
- The first row contains the results for an anomaly detection problem where the normal class is class 0 in CIFAR-10, and the anomalous instances are images from all other classes in CIFAR-10.
- In this row, the authors see the average AUROC results over five runs and the corresponding standard error of the mean for all baseline methods.

Conclusion

- The authors presented a novel method for anomaly detection of images, which learns a meaningful representation of the learned training data in a fully discriminative fashion.
- It would be interesting to study the possibility of selecting transformations that would best serve a given training set, possibly with prior knowledge on the anomalous samples.
- Another avenue is explicitly optimizing the set of transformations.
- It would be interesting to consider using the techniques in settings where additional unlabeled “contaminated” data is provided, perhaps within a transductive learning framework [10]

Summary

## Introduction:

Future machine learning applications such as self-driving cars or domestic robots will, inevitably, encounter various kinds of risks including statistical uncertainties.- To be usable, these applications should be as robust as possible to such risks.
- The well-known problem of anomaly/novelty detection highlights some of these risks, and its resolution is of the utmost importance to mission critical machine learning applications.
- In machine vision applications, presently available novelty detection methods can suffer from poor performance in some problems, as demonstrated by the experiments
## Objectives:

The authors aim to learn a scoring function nS in a discriminative fashion.## Methods:

The authors compare the method to state-of-the-art deep learning approaches as well as a few classic methods.

One-Class SVM.- The one-class support vector machine (OC-SVM) is a classic and popular kernelbased method for novelty detection [29, 30].
- It is typically employed with an RBF kernel, and learns a collection of closed sets in the input space, containing most of the training samples.
- The convolutional autoencoder is chosen to have a similar architecture to that of DCGAN [26], where the encoder is adapted from the discriminator, and the decoder is adapted from the generator
## Results:

The authors describe the experimental setup and evaluation method, the baseline algorithms the authors use for comparison purposes, the datasets, and the implementation details of the technique.- The first row contains the results for an anomaly detection problem where the normal class is class 0 in CIFAR-10, and the anomalous instances are images from all other classes in CIFAR-10.
- In this row, the authors see the average AUROC results over five runs and the corresponding standard error of the mean for all baseline methods.
## Conclusion:

The authors presented a novel method for anomaly detection of images, which learns a meaningful representation of the learned training data in a fully discriminative fashion.- It would be interesting to study the possibility of selecting transformations that would best serve a given training set, possibly with prior knowledge on the anomalous samples.
- Another avenue is explicitly optimizing the set of transformations.
- It would be interesting to consider using the techniques in settings where additional unlabeled “contaminated” data is provided, perhaps within a transductive learning framework [10]

- Table1: Average area under the ROC curve in % with SEM (over 5 runs) of anomaly detection methods. For all datasets, each model was trained on the single class, and tested against all other classes. E2E column is taken from [<a class="ref-link" id="c27" href="#r27">27</a>]. OC-SVM hyperparameters in RAW and CAE variants were optimized with hindsight knowledge. The best performing method in each experiment is in bold

Related work

- The literature related to anomaly detection is extensive and beyond the scope of this paper (see, e.g., [5, 42] for wider scope surveys). Our focus is on anomaly detection in the context of images and deep learning. In this scope, most published works rely, implicitly or explicitly, on some form of (unsupervised) reconstruction learning. These methods can be roughly categorized into two approaches.

Reconstruction-based anomaly score. These methods assume that anomalies possess different visual attributes than their non-anomalous counterparts, so it will be difficult to compress and reconstruct them based on a reconstruction scheme optimized for single-class data. Motivated by this assumption, the anomaly score for a new sample is given by the quality of the reconstructed image, which is usually measured by the 2 distance between the original and reconstructed image. Classic methods belonging to this category include Principal Component Analysis (PCA) [18], and Robust-PCA [4]. In the context of deep learning, various forms of deep autoencoders are the main tool used for reconstruction-based anomaly scoring. Xia et al [37] use a convolutional autoencoder with a regularizing term that encourages outlier samples to have a large reconstruction error. Variational autoencoder is used by An and Cho [1], where they estimate the reconstruction probability through Monte-Carlo sampling, from which they extract an anomaly score. Another related method, which scores an unseen sample based on the ability of the model to generate a similar one, uses Generative Adversarial Networks (GANS) [16]. Schlegl et al [28] use this approach on optical coherence tomography images of the retina. Deecke et al [7] employ a variation of this model called ADGAN, reporting slightly superior results on CIFAR-10 [21] and MNIST [22].

Funding

- This research was partially supported by the Israel Science Foundation (grant No 710/18)

Study subjects and analysis

Reference

- J. An and S. Cho. Variational autoencoder based anomaly detection using reconstruction probability. SNU Data Mining Center, Tech. Rep., 2015.
- G. Blanchard, G. Lee, and C. Scott. Semi-supervised novelty detection. Journal of Machine Learning Research, 11(Nov):2973–3009, 2010.
- E. Burnaev, P. Erofeev, and D. Smolyakov. Model selection for anomaly detection. In Eighth International Conference on Machine Vision (ICMV 2015), volume 9875, page 987525. International Society for Optics and Photonics, 2015.
- E. J. Candès, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? Journal of the ACM (JACM), 58(3):11, 2011.
- V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Computing Surveys (CSUR), 41(3):15, 2009.
- J. Davis and M. Goadrich. The relationship between precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning, pages 233–240. ACM, 2006.
- L. Deecke, R. Vandermeulen, L. Ruff, S. Mandt, and M. Kloft. Anomaly detection with generative adversarial networks, 2018. URL https://openreview.net/forum?id=S1EfylZ0Z.
- R. El-Yaniv and M. Nisenson. Optimal single-class classification strategies. In Advances in Neural Information Processing Systems, pages 377–384, 2007.
- R. El-Yaniv and M. Nisenson. On the foundations of adversarial single-class classification. CoRR, abs/1010.4466, 2010. URL http://arxiv.org/abs/1010.4466.
- R. El-Yaniv and D. Pechyony. Transductive rademacher complexity and its applications. In Learning Theory, 20th Annual Conference on Learning Theory, (COLT), pages 157–171, 2007.
- J. Elson, J. J. Douceur, J. Howell, and J. Saul. Asirra: A captcha that exploits interest-aligned manual image categorization. In Proceedings of 14th ACM Conference on Computer and Communications Security (CCS). Association for Computing Machinery, Inc., October 2007.
- Y. Geifman and R. El-Yaniv. Deep active learning over the long tail. CoRR, 2017. URL http://arxiv.org/abs/1711.00941.
- Y. Geifman and R. El-Yaniv. Selective classification for deep neural networks. In Advances in Neural Information Processing Systems (NIPS), pages 4878–4887. 2017.
- Y. Geifman and R. El-Yaniv. Deep active learning with a neural architecture search. CoRR, abs/1811.07579, 2018. URL http://arxiv.org/abs/1811.07579.
- Y. Geifman, G. Uziel, and R. El-Yaniv. Boosting uncertainty estimation for deep neural classifiers. CoRR, 2018. URL http://arxiv.org/abs/1805.08206.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672–2680, 2014.
- T. Iwata and M. Yamada. Multi-view anomaly detection via robust probabilistic latent variable models. In Advances In Neural Information Processing Systems, pages 1136–1144, 2016.
- [19] J. Kim and C. D. Scott. Robust kernel density estimation. Journal of Machine Learning Research, 13(Sep):2529–2565, 2012.
- [20] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- [21] A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto, 2009.
- [22] Y. LeCun, C. Cortes, and C. Burges. Mnist handwritten digit database. AT&T Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010.
- [23] S. Liang, Y. Li, and R. Srikant. Enhancing the reliability of out-of-distribution image detection in neural networks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=H1VGkIxRZ.
- [24] T. Minka. Estimating a dirichlet distribution, 2000.
- [25] E. Parzen. On estimation of a probability density function and mode. The Annals of Mathematical Statistics, 33(3):1065–1076, 1962.
- [26] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
- [27] L. Ruff, N. Görnitz, L. Deecke, S. A. Siddiqui, R. Vandermeulen, A. Binder, E. Müller, and M. Kloft. Deep one-class classification. In International Conference on Machine Learning, pages 4390–4399, 2018.
- [28] T. Schlegl, P. Seeböck, S. M. Waldstein, U. Schmidt-Erfurth, and G. Langs. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In International Conference on Information Processing in Medical Imaging, pages 146–157.
- [29] B. Schölkopf, R. C. Williamson, A. J. Smola, J. Shawe-Taylor, and J. C. Platt. Support vector method for novelty detection. In Advances in Neural Information Processing Systems, pages 582–588, 2000.
- [30] D. M. Tax and R. P. Duin. Support vector data description. Machine learning, 54(1):45–66, 2004.
- [31] A. Taylor, S. Leblanc, and N. Japkowicz. Anomaly detection in automobile control network data with long short-term memory networks. In Data Science and Advanced Analytics (DSAA), 2016 IEEE International Conference on, pages 130–139. IEEE, 2016.
- [32] P. Vincent. A connection between score matching and denoising autoencoders. Neural Computation, 23(7):1661–1674, 2011.
- [33] S. Wang, Q. Liu, E. Zhu, F. Porikli, and J. Yin. Hyperparameter selection of one-class support vector machine by self-adaptive data shifting. Pattern Recognition, 74:198–211, 2018.
- [34] N. Wicker, J. Muller, R. K. R. Kalathur, and O. Poch. A maximum likelihood approximation method for dirichlet’s parameter estimation. Computational statistics & data analysis, 52(3): 1315–1322, 2008.
- [35] Y. Wiener and R. El-Yaniv. Agnostic selective classification. In Advances in Neural Information Processing Systems (NIPS), pages 1665–1673. 2011.
- [36] Y. Wiener and R. El-Yaniv. Pointwise tracking the optimal regression function. In Advances in Neural Information Processing Systems (NIPS), pages 2051–2059, 2012.
- [37] Y. Xia, X. Cao, F. Wen, G. Hua, and J. Sun. Learning discriminative reconstructions for unsupervised outlier removal. In Proceedings of the IEEE International Conference on Computer Vision, pages 1511–1519, 2015.
- [38] H. Xiao, K. Rasul, and R. Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017.
- [39] J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson. Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579, 2015.
- [40] S. Zagoruyko and N. Komodakis. Wide residual networks. In BMVC, 2016.
- [41] S. Zhai, Y. Cheng, W. Lu, and Z. Zhang. Deep structured energy based models for anomaly detection. In Proceedings of the 33rd International Conference on Machine Learning - Volume 48, ICML’16, pages 1100–1109. JMLR.org, 2016.
- [42] A. Zimek, E. Schubert, and H.-P. Kriegel. A survey on unsupervised outlier detection in high-dimensional numerical data. Statistical Analysis and Data Mining: The ASA Data Science Journal, 5(5):363–387, 2012.
- [43] B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. Cho, and H. Chen. Deep autoencoding gaussian mixture model for unsupervised anomaly detection. In International Conference on Learning Representations, 2018.

Full Text

Tags

Comments