AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We proposed a domain-agnostic post-training domain augmentation method, called iterative hierarchical data augmentation, to fine-tune trained deep networks to improve their generalization performance

Post-training Iterative Hierarchical Data Augmentation for Deep Networks

NIPS 2020, (2020)

Cited by: 0|Views14
EI
Full Text
Bibtex
Weibo

Abstract

In this paper, we propose a new iterative hierarchical data augmentation (IHDA) method to fine-tune trained deep neural networks to improve their generalization performance. The IHDA is motivated by three key insights: (1) Deep networks (DNs) are good at learning multi-level representations from data. (2) Performing data augmentation (DA)...More

Code:

Data:

0
Introduction
  • Despite the tremendous success of deep neural networks in solving discriminative tasks, improving the generalization ability of these models remains as one of the most difficult challenges.
  • The second approach, on the other hand, inflates the training data through deep learning
  • These include feature space augmentation [6, 7, 8], adversarial training [9, 10], generative-adversarial-network based augmentation [11, 12, 13, 14], and meta-learning data augmentations (MLDA) [15, 16, 17, 18, 19].
Highlights
  • Despite the tremendous success of deep neural networks in solving discriminative tasks, improving the generalization ability of these models remains as one of the most difficult challenges
  • Extensive empirical evaluations on several competitive image and non-image classification benchmarks showed that the iterative hierarchical data augmentation (IHDA) consistently improved the generalization performance of popular deep networks and outperformed the existing state of the art data augmentation (DA) algorithms for deep networks
  • Between IHDA and IHDA+, the latter provided slightly better performance, which shows that the proposed IHDA algorithm benefited from doing simple augmentations in the input space
  • The results indicate that IHDA can be used to improve the generalization performance of deep networks for different domains
  • We proposed a domain-agnostic post-training domain augmentation (DA) method, called IHDA, to fine-tune trained deep networks to improve their generalization performance
  • We proposed a new data augmentation technique to improve the generalization of any deep network, making our work general enough to be applied to a large variety of supervised learning problems
Methods
  • Let Fθ : xi → yi be a deep network, parameterized by θ, trained on D with respect to a supervised loss, such as cross entropy, and an optimization procedure, such as stochastic gradient descent.
  • The goal of this work is to implement an augmentation method A to generate new training samples to fine-tune Fθ such that EFθ ≥ EFθA , where EFθA is error of the model that is initially trained on D and fine-tuned using the augmented data generated by A.
Results
  • Please observe that the IHDA and the IHDA+ consistently improved the models’ performance by iteratively fine-tuning them on the generated data.
  • Their performance is better than the existing DA solutions for these networks in all cases, except for Wide-ResNet-28-10 for which AAA gave the best results.
  • The authors confirm this fact by implementing IHDA+ using DA policies of AA [17], results for which are presented in section 4.4
Conclusion
  • The authors proposed a domain-agnostic post-training domain augmentation (DA) method, called IHDA, to fine-tune trained deep networks to improve their generalization performance.
  • For effective DA, the IHDA synthesizes new samples in hard-to-learn regions by analyzing each data point’s neighborhood properties.
  • The new representations, generated, are used to fine-tune the parameters of the subsequent layers.
  • Superior results on three image classification datasets and one activity classification dataset demonstrate the effectiveness of IHDA in improving the generalization performance of deep networks
Tables
  • Table1: Test set error (%) of IHDA on CIFAR 10 with different models. Lower is better. We conducted five independent experiments, and report the mean values along with their standard deviations. The best results are bold-faced; whereas the second best results are italic-faced. For ISDA, PBA, AA and AAA, where available, we report the results from [<a class="ref-link" id="c20" href="#r20">20</a>, <a class="ref-link" id="c16" href="#r16">16</a>, <a class="ref-link" id="c17" href="#r17">17</a>, <a class="ref-link" id="c29" href="#r29">29</a>], respectively. “Baseline (B)” represents the initial accuracy of IHDA+
  • Table2: Test set error (%) of IHDA on CIFAR 100 with different models. The details are the same as those of Table 1. Note that the results of Shake-Shake are for its (26, 2x96d) implementation
  • Table3: Validation set Top 1/ Top 5 accuracy (%) of IHDA on ImageNet with different models. Higher is better. We conducted three independent experiments, and report the mean values. The best results are bold-faced; whereas the second best results are italic-faced. Results for baseline and AA are taken from [<a class="ref-link" id="c17" href="#r17">17</a>] and those of AAA are taken from [<a class="ref-link" id="c29" href="#r29">29</a>]
  • Table4: Test set error(%) of the ablation study of IHDA on CIFAR datasets for ResNet-110
Download tables as Excel
Related work
  • Several methods have emerged over the past decade to reduce overfitting and improve the generalization performance of deep neural networks. These include dropout [23], batch normalization [24], transfer learning [25, 26], pretraining [27], few-shot learning [28], and DA. The focus of this work is DA, which can be classified and described in various ways.

    Explicit or Implicit: DA can be explicit, such that it combats the overfitting problem by artificially increasing the size of training data through data warping or oversampling. Data warping-based DA generates new data by transforming existing data points while keeping their class labels preserved [1, 2, 3, 4, 5, 15, 16, 17, 18, 19]. The transformations could include elastic distortions, scaling, translation, rotation, mirroring, or color shift, etc. Oversampling-based DA works by expanding the training data through synthesizing new samples [11, 12, 13, 14, 22].
Funding
  • Extensive empirical evaluations on several competitive image and non-image classification benchmarks showed that the IHDA consistently improved the generalization performance of popular deep networks and outperformed the existing state of the art DA algorithms for deep networks
  • Our contribution is two-fold: (a) we propose the first post-training DA approach based on generative models that does DA iteratively in difficult regions of the learned representations to improve the generalization of deep networks. (b) we achieve better results than the state-of-the-art (SOTA) DA approaches on public benchmarks
  • After fine-tuning the model with the IHDA, the performance improved to 92% ± 0.05%
Study subjects and analysis
key observations: 3
Thus, θ = {φ, φ}, and Fθ is trained with respect to a supervised loss on D, optimizing the parameters of both components. The proposed augmentation method is based on three key observations. Firstly, it is known that deep networks have the ability to exploit the hidden input to learn/discover meaningful representations at multiple levels such that the higher level features are defined in terms of low-level feature [21]

samples: 50000
For CIFAR datasets, the validation set had 5000 images, which were taken from the training set. For ImageNet, we used its reduced subset, which was created by randomly choosing 150 classes and 50,000 samples. From this reduced subset, we held out 5000 images for the validation set to tune the hyperparameters

image classification datasets: 3
The new representations, thus generated, are used to fine-tune the parameters of the subsequent layers. Superior results on three image classification datasets and one (non-image) activity classification dataset demonstrate the effectiveness of IHDA in improving the generalization performance of deep networks. In this paper, we proposed a new data augmentation technique to improve the generalization of any deep network, making our work general enough to be applied to a large variety of supervised learning problems

Reference
  • Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.
    Google ScholarFindings
  • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, page 1097–1105, 2012.
    Google ScholarLocate open access versionFindings
  • Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • G. Huang, Z. Liu, G. Pleiss, L. Van Der Maaten, and K. Weinberger. Convolutional networks with dense connectivity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
    Google ScholarLocate open access versionFindings
  • Tomohiko Konno and Michiaki Iwazume. Icing on the cake: An easy and quick post-learnig method you can try after deep learning. ArXiv, abs/1807.06540, 2018.
    Findings
  • Terrance Devries and Graham W. Taylor. Dataset augmentation in feature space. In ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • S. C. Wong, A. Gatt, V. Stamatescu, and M. D. McDonnell. Understanding data augmentation for classification: When to warp? In DICTA, pages 1–6, 2016.
    Google ScholarLocate open access versionFindings
  • Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. CoRR, abs/1412.6572, 2015.
    Findings
  • Shuangtao Li, Yuanke Chen, Yanlin Peng, and Lin Bai. Learning more robust features with adversarial training. ArXiv, abs/1804.07757, 2018.
    Findings
  • Christopher Bowles, Liang Chen, Ricardo Guerrero, Paul Bentley, Roger N. Gunn, Alexander Hammers, David Alexander Dickie, Maria del C. Valdés Hernández, Joanna M. Wardlaw, and Daniel Rueckert. Gan augmentation: Augmenting training data using generative adversarial networks. ArXiv, abs/1810.10863, 2018.
    Findings
  • Luis Perez and Jason Wang. The effectiveness of data augmentation in image classification using deep learning. ArXiv, abs/1712.04621, 2017.
    Findings
  • Seongkyu Mun, Sangwook Park, David K Han, and Hanseok Ko. Generative adversarial network based acoustic scene training set augmentation and selection using SVM hyper-plane. In DCASE, pages 93–102, 2017.
    Google ScholarLocate open access versionFindings
  • Xinyue Zhu, Yifan Liu, Zengchang Qin, and Jiahong Li. Data augmentation in emotion classification using generative adversarial networks. ArXiv, abs/1711.00648, 2017.
    Findings
  • Joseph Lemley, Shabab Bazrafkan, and Peter Corcoran. Smart augmentation learning an optimal data augmentation strategy. IEEE Access, 5:5858–5869, 2017.
    Google ScholarLocate open access versionFindings
  • Daniel Ho, Eric Liang, Xi Chen, Ion Stoica, and Pieter Abbeel. Population based augmentation: Efficient learning of augmentation policy schedules. In ICML, pages 2731–2741, 2019.
    Google ScholarLocate open access versionFindings
  • Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V. Le. Autoaugment: Learning augmentation policies from data. CoRR, abs/1805.09501, 2018.
    Findings
  • Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V. Le. Randaugment: Practical automated data augmentation with a reduced search space. ArXiv, abs/1909.137198, 2019.
    Google ScholarLocate open access versionFindings
  • Sungbin Lim, Ildoo Kim, Taesup Kim, Chiheon Kim, and Sungwoong Kim. Fast autoaugment. In NIPS, pages 6665–6675, 2019.
    Google ScholarLocate open access versionFindings
  • Yulin Wang, Xuran Pan, Shiji Song, Hong Zhang, Cheng Wu, and Gao Huang. Implicit semantic data augmentation for deep networks. In NIPS, 2019.
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8): 1798–1828, 2013.
    Google ScholarLocate open access versionFindings
  • S. Barua, M. M. Islam, X. Yao, and K. Murase. Mwmote–majority weighted minority oversampling technique for imbalanced data set learning. IEEE Transactions on Knowledge and Data Engineering, 26(2):405–425, 2014.
    Google ScholarLocate open access versionFindings
  • Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(56):1929–1958, 2014.
    Google ScholarLocate open access versionFindings
  • Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, page 448–456, 2015.
    Google ScholarLocate open access versionFindings
  • Karl R. Weiss, Taghi M. Khoshgoftaar, and Dingding Wang. A survey of transfer learning. Journal of Big Data, 3(9):1–40, 2016.
    Google ScholarLocate open access versionFindings
  • L. Shao, F. Zhu, and X. Li. Transfer learning for visual categorization: A survey. IEEE Transactions on Neural Networks and Learning Systems, 26(5):1019–1034, 2015.
    Google ScholarLocate open access versionFindings
  • Dumitru Erhan, Aaron Courville, Yoshua Bengio, and Pascal Vincent. Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 9:625–660, 2010.
    Google ScholarLocate open access versionFindings
  • Mark Palatucci, Dean Pomerleau, Geoffrey Hinton, and Tom M. Mitchell. Zero-shot learning with semantic output codes. In NIPS, page 1410–1418, 2009.
    Google ScholarLocate open access versionFindings
  • Xinyu Zhang, Qiang Wang, Jian Zhang, and Zhao Zhong. Adversarial autoaugment. In ICLR, 2020.
    Google ScholarLocate open access versionFindings
  • Boyi Li, Felix Wu, Ser-Nam Lim, Serge Belongie, and Kilian Q. Weinberger. On feature normalization and data augmentation. ArXiv, 2020.
    Google ScholarLocate open access versionFindings
  • Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. In ICLR, 2014.
    Google ScholarLocate open access versionFindings
  • Jia Deng, Wei Dong, Richard Socher, Li jia Li, Kai Li, and Li Fei-fei. Imagenet: A large-scale hierarchical image database. In In CVPR, 2009.
    Google ScholarLocate open access versionFindings
  • Lin Wang, Hristijan Gjoreski, Mathias Ciliberto, Sami Mekki, Stefan Valentin, and Daniel Roggen. Benchmarking the shl recognition challenge with classical and deep-learning pipelines. In HASCA, page 1626–1635, 2018.
    Google ScholarLocate open access versionFindings
Author
Adil Khan
Adil Khan
Khadija Fraz
Khadija Fraz
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科