AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
It encourages the model to learn with fictitious target distributions by producing “hard” adversarial perturbations that enlarge predictive uncertainty of the current model

Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness

NIPS 2020, (2020)

Cited by: 0|Views34
EI
Full Text
Bibtex
Weibo

Abstract

Adversarial data augmentation has shown promise for training robust deep neural networks against unforeseen data shifts or corruptions. However, it is difficult to define heuristics to generate effective fictitious target distributions containing "hard" adversarial perturbations that are largely different from the source distribution. I...More
0
Introduction
  • Deep neural networks can achieve good performance on the condition that the training and testing data are drawn from the same distribution.
  • This condition might not hold true in practice.
  • Information Bottleneck Principle.
  • The Information Bottleneck (IB) [61] is a principled way to seek a latent representation Z that an input variable X contains about an output Y.
Highlights
  • Deep neural networks can achieve good performance on the condition that the training and testing data are drawn from the same distribution
  • We develop an efficient maximum-entropy regularizer to achieve the same goal by making the following contributions: (i) to the best of our knowledge, we are the first work to investigate adversarial data argumentation from an information theory perspective, and address the problem of generating “hard” adversarial perturbations from the Information Bottleneck (IB) principle which has not been studied yet; (ii) we theoretically show that the IB principle can be bounded by a maximum-entropy regularization term in the maximization phase of adversarial data argumentation, which results in a notable improvement over [68]; (iii) we show that our formulation holds in an approximate sense under certain non-deterministic conditions
  • After engaging the Bayesian Neural Networks (BNNs), our performance is further improved. We believe this is because the BNN provides a better estimation of the predictive uncertainty in the maximization phase
  • We introduced a maximum-entropy technique that regularizes adversarial data augmentation
  • Experimental results on three standard benchmarks demonstrate that our method consistently outperforms the existing state of the art by a statistically significant margin
  • It encourages the model to learn with fictitious target distributions by producing “hard” adversarial perturbations that enlarge predictive uncertainty of the current model
Methods
  • The authors' main idea is to incorporate the IB principle into adversarial data augmentation so as to improve model robustness to large domain shifts.
  • The authors start by adapting the IB Lagrangian (1) to supervised-learning scenarios so that the latent representation Z can be leveraged for classification purposes
  • To this end, the authors modify the IB Lagrangian (1) following [1, 2, 5] to LIB(θ; X, Y ) := LCE(θ; X, Y ) + βI(X; Z), where the constraint on I(Y ; Z) is replaced with the risk associated to the prediction according to the loss function LCE.
  • The network parameters are updated by the loss function LIB evaluated on the adversarial examples generated from the maximization phase
Results
  • The authors' method enjoys the best performance and improves previous state of the art by a large margin (5% of accuracy on CIFAR-10-C and 4% on CIFAR-100C)
  • These gains are achieved across different architectures and on both datasets.
  • From the Fourier perspective [74], the performance gains from the adversarial perturbations lie primarily in high frequency domains, which are commonly occurring image corruptions
  • These results demonstrate that the maximum-entropy term can regularize networks to be more robust to common image corruptions
Conclusion
  • The authors introduced a maximum-entropy technique that regularizes adversarial data augmentation.
  • It encourages the model to learn with fictitious target distributions by producing “hard” adversarial perturbations that enlarge predictive uncertainty of the current model.
  • The authors demonstrate that the technique obtains state-of-the-art performance on MNIST, PACS, and CIFAR-10/100-C, and is extremely simple to implement.
  • One major limitation of the method is that it cannot be directly applied to regression problems since the maximum-entropy lower bound is still difficult to compute in this case.
  • The authors' future work might consider alternative measurements of information [49, 63] that are more suited for general machine learning applications
Summary
  • Introduction:

    Deep neural networks can achieve good performance on the condition that the training and testing data are drawn from the same distribution.
  • This condition might not hold true in practice.
  • Information Bottleneck Principle.
  • The Information Bottleneck (IB) [61] is a principled way to seek a latent representation Z that an input variable X contains about an output Y.
  • Objectives:

    Motivated by this conceptual observation, the authors aim to regularize adversarial data augmentation through maximizing the IB function.
  • Methods:

    The authors' main idea is to incorporate the IB principle into adversarial data augmentation so as to improve model robustness to large domain shifts.
  • The authors start by adapting the IB Lagrangian (1) to supervised-learning scenarios so that the latent representation Z can be leveraged for classification purposes
  • To this end, the authors modify the IB Lagrangian (1) following [1, 2, 5] to LIB(θ; X, Y ) := LCE(θ; X, Y ) + βI(X; Z), where the constraint on I(Y ; Z) is replaced with the risk associated to the prediction according to the loss function LCE.
  • The network parameters are updated by the loss function LIB evaluated on the adversarial examples generated from the maximization phase
  • Results:

    The authors' method enjoys the best performance and improves previous state of the art by a large margin (5% of accuracy on CIFAR-10-C and 4% on CIFAR-100C)
  • These gains are achieved across different architectures and on both datasets.
  • From the Fourier perspective [74], the performance gains from the adversarial perturbations lie primarily in high frequency domains, which are commonly occurring image corruptions
  • These results demonstrate that the maximum-entropy term can regularize networks to be more robust to common image corruptions
  • Conclusion:

    The authors introduced a maximum-entropy technique that regularizes adversarial data augmentation.
  • It encourages the model to learn with fictitious target distributions by producing “hard” adversarial perturbations that enlarge predictive uncertainty of the current model.
  • The authors demonstrate that the technique obtains state-of-the-art performance on MNIST, PACS, and CIFAR-10/100-C, and is extremely simple to implement.
  • One major limitation of the method is that it cannot be directly applied to regression problems since the maximum-entropy lower bound is still difficult to compute in this case.
  • The authors' future work might consider alternative measurements of information [49, 63] that are more suited for general machine learning applications
Tables
  • Table1: Average classification accuracy (%) and standard deviation of models trained on MNIST [<a class="ref-link" id="c40" href="#r40">40</a>] and evaluated on SVHN [<a class="ref-link" id="c48" href="#r48">48</a>], MNIST-M [<a class="ref-link" id="c22" href="#r22">22</a>], SYN [<a class="ref-link" id="c22" href="#r22">22</a>] and USPS [<a class="ref-link" id="c15" href="#r15">15</a>]. The results are averaged over ten runs. Best performances are highlighted in bold. The results of PAR are obtained from [<a class="ref-link" id="c73" href="#r73">73</a>]
  • Table2: Classification accuracy (%) of our approach on PACS dataset [<a class="ref-link" id="c41" href="#r41">41</a>] in comparison with the previously reported state-of-the-art results. Bold numbers indicate the best performance (two sets, one for each scenario engaging or forgoing domain identifications, respectively)
  • Table3: Average classification accuracy (%). Across several architectures, our approach obtains CIFAR-10-C and CIFAR-100-C corruption robustness that exceeds the previous state of the art by a large margin. Best performances are highlighted in bold
  • Table4: The settings of different target domains on PACS
  • Table5: The settings of different network architectures on CIFAR-10-C and CIFAR-100-C
Download tables as Excel
Funding
  • Experimental results on three standard benchmarks demonstrate that our method consistently outperforms the existing state of the art by a statistically significant margin
  • We note that our method achieves the best performance among techniques forgoing domain identifications
  • Our method enjoys the best performance and improves previous state of the art by a large margin (5% of accuracy on CIFAR-10-C and 4% on CIFAR-100C)
Study subjects and analysis
datasets: 4
Other digit datasets, including SVHN [48], MNIST-M [22], SYN [22] and USPS [15], are leveraged for evaluating model performance. These four datasets contain large domain shifts from MNIST in terms of backgrounds, shapes and textures. PACS [41] is a recent dataset with different object style depictions and a more challenging domain shift than the MNIST experiment

samples: 10000
We follow the setup of [68] in experimenting with MNIST dataset. We use 10,000 samples from MNIST for training and evaluate prediction accuracy on the respective test sets of SVHN, MNIST-M, SYN and USPS. In order to work with comparable datasets, we resize all the images to 32 × 32, and treat images from MNIST and USPS as RGB images

Reference
  • Alessandro Achille and Stefano Soatto. Emergence of invariance and disentanglement in deep representations. Journal of Machine Learning Research, 19(1):1947–1980, 2018.
    Google ScholarLocate open access versionFindings
  • Alessandro Achille and Stefano Soatto. Information dropout: Learning optimal representations through noisy computation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 40(12):2897–2905, 2018.
    Google ScholarLocate open access versionFindings
  • Alexander A. Alemi, Ian Fischer, and Joshua V. Dillon. Uncertainty in the variational information bottleneck. In Proceedings of the Conference on Uncertainty in Artificial Intelligence Workshops, 2018.
    Google ScholarLocate open access versionFindings
  • Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, and Kevin Murphy. Deep variational information bottleneck. In Proceedings of the International Conference on Learning Representations (ICLR), 2017.
    Google ScholarLocate open access versionFindings
  • Rana Ali Amjad and Bernhard Claus Geiger. Learning representations for neural network-based classification using the information bottleneck principle. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019.
    Google ScholarLocate open access versionFindings
  • András Antos and Ioannis Kontoyiannis. Convergence properties of functional estimates for discrete distributions. Random Structures & Algorithms, 19(3-4):163–193, 2001.
    Google ScholarLocate open access versionFindings
  • Yogesh Balaji, Swami Sankaranarayanan, and Rama Chellappa. Metareg: Towards domain generalization using meta-regularization. In Advances in Neural Information Processing Systems (NeurIPS), pages 998–1008, 2018.
    Google ScholarLocate open access versionFindings
  • Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeshwar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, and Devon Hjelm. Mutual information neural estimation. In Proceedings of the International Conference on Machine Learning (ICML), pages 531–540, 2018.
    Google ScholarLocate open access versionFindings
  • Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight uncertainty in neural network. In Proceedings of the International Conference on Machine Learning (ICML), pages 1613–1622, 2015.
    Google ScholarLocate open access versionFindings
  • Konstantinos Bousmalis, George Trigeorgis, Nathan Silberman, Dilip Krishnan, and Dumitru Erhan. Domain separation networks. In Advances in Neural Information Processing Systems (NeurIPS), pages 343–351, 2016.
    Google ScholarLocate open access versionFindings
  • Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge University Press, 2004.
    Google ScholarFindings
  • Hao Cheng, Dongze Lian, Shenghua Gao, and Yanlin Geng. Utilizing information bottleneck to evaluate the capability of deep neural networks for image classification. Entropy, 21(5):456, 2019.
    Google ScholarLocate open access versionFindings
  • Thomas M. Cover and Joy A. Thomas. Elements of information theory. John Wiley & Sons, 2012.
    Google ScholarFindings
  • Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V. Le. AutoAugment: Learning augmentation strategies from data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 113–123, 2019.
    Google ScholarLocate open access versionFindings
  • John S. Denker, W. R. Gardner, Hans Peter Graf, Donnie Henderson, Richard E. Howard, W. Hubbard, Lawrence D. Jackel, Henry S. Baird, and Isabelle Guyon. Neural network recognizer for hand-written zip code digits. In Advances in Neural Information Processing Systems (NeurIPS), pages 323–331, 1989.
    Google ScholarLocate open access versionFindings
  • Terrance DeVries and Graham W. Taylor. Improved regularization of convolutional neural networks with Cutout. arXiv preprint arXiv:1708.04552, 2017.
    Findings
  • Sayna Ebrahimi, Mohamed Elhoseiny, Trevor Darrell, and Marcus Rohrbach. Uncertainty-guided continual learning with bayesian neural networks. In Proceedings of the International Conference on Learning Representations (ICLR), 2020.
    Google ScholarLocate open access versionFindings
  • Adar Elad, Doron Haviv, Yochai Blau, and Tomer Michaeli. Direct validation of the information bottleneck principle for deep nets. In Proceedings of the IEEE International Conference on Computer Vision Workshops, 2019.
    Google ScholarLocate open access versionFindings
  • Carlos Florensa, Yan Duan, and Pieter Abbeel. Stochastic neural networks for hierarchical reinforcement learning. In Proceedings of the International Conference on Learning Representations (ICLR), 2017.
    Google ScholarLocate open access versionFindings
  • Yarin Gal and Zoubin Ghahramani. Bayesian convolutional neural networks with bernoulli approximate variational inference. arXiv preprint arXiv:1506.02158, 2015.
    Findings
  • Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning (ICML), pages 1050–1059, 2016.
    Google ScholarLocate open access versionFindings
  • Yaroslav Ganin and Victor Lempitsky. Unsupervised domain adaptation by backpropagation. In Proceedings of the International Conference on Machine Learning (ICML), page 1180–1189, 2015.
    Google ScholarLocate open access versionFindings
  • Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.
    Google ScholarLocate open access versionFindings
  • Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In Proceedings of the International Conference on Learning Representations (ICLR), 2015.
    Google ScholarLocate open access versionFindings
  • Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. In Proceedings of the International Conference on Learning Representations (ICLR), 2019.
    Google ScholarLocate open access versionFindings
  • Dan Hendrycks, Norman Mu, Ekin D. Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshminarayanan. AugMix: A simple data processing method to improve robustness and uncertainty. In Proceedings of the International Conference on Learning Representations (ICLR), 2020.
    Google ScholarLocate open access versionFindings
  • Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4700–4708, 2017.
    Google ScholarLocate open access versionFindings
  • Daniel Kang, Yi Sun, Dan Hendrycks, Tom Brown, and Jacob Steinhardt. Testing robustness against unforeseen adversaries. arXiv preprint arXiv:1908.08016, 2019.
    Findings
  • Alex Kendall and Yarin Gal. What uncertainties do we need in bayesian deep learning for computer vision? In Advances in Neural Information Processing Systems (NeurIPS), pages 5574–5584, 2017.
    Google ScholarLocate open access versionFindings
  • Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), 2014.
    Google ScholarLocate open access versionFindings
  • Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. In Proceedings of the International Conference on Learning Representations (ICLR), 2014.
    Google ScholarLocate open access versionFindings
  • Artemy Kolchinsky, Brendan D. Tracey, and Steven Van Kuyk. Caveats for information bottleneck in deterministic scenarios. In Proceedings of the International Conference on Learning Representations (ICLR), 2019.
    Google ScholarLocate open access versionFindings
  • Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. 2009.
    Google ScholarFindings
  • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NeurIPS), pages 1097–1105, 2012.
    Google ScholarLocate open access versionFindings
  • Anders Krogh and John A. Hertz. A simple weight decay can improve generalization. In Advances in Neural Information Processing Systems (NeurIPS), pages 950–957, 1992.
    Google ScholarLocate open access versionFindings
  • Solomon Kullback and Richard A. Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22(1):79–86, 1951.
    Google ScholarLocate open access versionFindings
  • Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial machine learning at scale. In Proceedings of the International Conference on Learning Representations (ICLR), 2017.
    Google ScholarLocate open access versionFindings
  • Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems (NeurIPS), pages 6402–6413, 2017.
    Google ScholarLocate open access versionFindings
  • Yann LeCun, Bernhard Boser, John S. Denker, Donnie Henderson, Richard E. Howard, Wayne Hubbard, and Lawrence D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4):541–551, 1989.
    Google ScholarLocate open access versionFindings
  • Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
    Google ScholarLocate open access versionFindings
  • Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy M. Hospedales. Deeper, broader and artier domain generalization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 5542–5550, 2017.
    Google ScholarLocate open access versionFindings
  • Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy M. Hospedales. Learning to generalize: Meta-learning for domain generalization. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2018.
    Google ScholarLocate open access versionFindings
  • Da Li, Jianshu Zhang, Yongxin Yang, Cong Liu, Yi-Zhe Song, and Timothy M. Hospedales. Episodic training for domain generalization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 1446–1455, 2019.
    Google ScholarLocate open access versionFindings
  • Ilya Loshchilov and Frank Hutter. SGDR: Stochastic gradient descent with warm restarts. In Proceedings of the International Conference on Learning Representations (ICLR), 2016.
    Google ScholarLocate open access versionFindings
  • Massimiliano Mancini, Samuel Rota Bulò, Barbara Caputo, and Elisa Ricci. Best sources forward: domain generalization through source-specific nets. In Proceedings of the IEEE International Conference on Image Processing (ICIP), pages 1353–1357, 2018.
    Google ScholarLocate open access versionFindings
  • Colin McDiarmid. On the method of bounded differences, page 148–188. London Mathematical Society Lecture Note Series. Cambridge University Press, 1989.
    Google ScholarLocate open access versionFindings
  • Radford M. Neal. Bayesian learning for neural networks, volume 118. Springer Science & Business Media, 2012.
    Google ScholarFindings
  • Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. Reading digits in natural images with unsupervised feature learning. In NeurIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.
    Google ScholarLocate open access versionFindings
  • Sherjil Ozair, Corey Lynch, Yoshua Bengio, Aaron Van den Oord, Sergey Levine, and Pierre Sermanet. Wasserstein dependency measure for representation learning. In Advances in Neural Information Processing Systems (NeurIPS), pages 15578–15588, 2019.
    Google ScholarLocate open access versionFindings
  • Liam Paninski. Estimation of entropy and mutual information. Neural Computation, 15(6):1191–1253, 2003.
    Google ScholarLocate open access versionFindings
  • Fengchun Qiao, Long Zhao, and Xi Peng. Learning to learn single domain generalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 12556–12565, 2020.
    Google ScholarLocate open access versionFindings
  • Tim Salimans and Durk P. Kingma. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In Advances in Neural Information Processing Systems (NeurIPS), pages 901–909, 2016.
    Google ScholarLocate open access versionFindings
  • Ohad Shamir, Sivan Sabato, and Naftali Tishby. Learning and generalization with the information bottleneck. Theoretical Computer Science, 411(29-30):2696–2711, 2010.
    Google ScholarLocate open access versionFindings
  • Aman Sinha, Hongseok Namkoong, and John Duchi. Certifying some distributional robustness with principled adversarial training. In Proceedings of the International Conference on Learning Representations (ICLR), 2018.
    Google ScholarLocate open access versionFindings
  • Jasper Snoek, Yaniv Ovadia, Emily Fertig, Balaji Lakshminarayanan, Sebastian Nowozin, D. Sculley, Joshua Dillon, Jie Ren, and Zachary Nado. Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. In Advances in Neural Information Processing Systems (NeurIPS), pages 13969–13980, 2019.
    Google ScholarLocate open access versionFindings
  • Jiaming Song and Stefano Ermon. Understanding the limitations of variational mutual information estimators. In Proceedings of the International Conference on Learning Representations (ICLR), 2020.
    Google ScholarLocate open access versionFindings
  • Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. In Proceedings of the International Conference on Learning Representations Workshops, 2014.
    Google ScholarLocate open access versionFindings
  • Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014.
    Google ScholarLocate open access versionFindings
  • DJ Strouse and David J. Schwab. The deterministic information bottleneck. Neural Computation, 29(6):1611–1630, 2017.
    Google ScholarLocate open access versionFindings
  • Charlie Tang and Russ R. Salakhutdinov. Learning stochastic feedforward neural networks. In Advances in Neural Information Processing Systems (NeurIPS), pages 530–538, 2013.
    Google ScholarLocate open access versionFindings
  • Naftali Tishby, Fernando C. Pereira, and William Bialek. The information bottleneck method. In Proceedings of the Annual Allerton Conference on Communication, Control, and Computing, pages 368—-377, 1999.
    Google ScholarLocate open access versionFindings
  • Naftali Tishby and Noga Zaslavsky. Deep learning and the information bottleneck principle. In Proceedings of the IEEE Information Theory Workshop (ITW), pages 1–5, 2015.
    Google ScholarLocate open access versionFindings
  • Michael Tschannen, Josip Djolonga, Paul K. Rubenstein, Sylvain Gelly, and Mario Lucic. On mutual information maximization for representation learning. In Proceedings of the International Conference on Learning Representations (ICLR), 2020.
    Google ScholarLocate open access versionFindings
  • Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7167–7176, 2017.
    Google ScholarLocate open access versionFindings
  • Gregory Valiant and Paul Valiant. Estimating the unseen: an n/log (n)-sample estimator for entropy and support size, shown optimal via new clts. In Proceedings of the Annual ACM Symposium on Theory of Computing (STOC), pages 685–694, 2011.
    Google ScholarLocate open access versionFindings
  • Vladimir N. Vapnik. Statistical Learning Theory. Wiley, 1998.
    Google ScholarFindings
  • Matias Vera, Pablo Piantanida, and Leonardo Rey Vega. The role of the information bottleneck in representation learning. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), pages 1580–1584, 2018.
    Google ScholarLocate open access versionFindings
  • Riccardo Volpi, Hongseok Namkoong, Ozan Sener, John Duchi, Vittorio Murino, and Silvio Savarese. Generalizing to unseen domains via adversarial data augmentation. In Advances in Neural Information Processing Systems (NeurIPS), page 5339–5349, 2018.
    Google ScholarLocate open access versionFindings
  • Haohan Wang, Songwei Ge, Zachary Lipton, and Eric P. Xing. Learning robust global representations by penalizing local predictive power. In Advances in Neural Information Processing Systems (NeurIPS), pages 10506–10518, 2019.
    Google ScholarLocate open access versionFindings
  • Haohan Wang, Zexue He, Zachary C. Lipton, and Eric P. Xing. Learning robust representations by projecting superficial statistics out. In Proceedings of the International Conference on Learning Representations (ICLR), 2019.
    Google ScholarLocate open access versionFindings
  • Yihong Wu and Pengkun Yang. Minimax rates of entropy estimation on large alphabets via best polynomial approximation. IEEE Transactions on Information Theory, 62(6):3702–3720, 2016.
    Google ScholarLocate open access versionFindings
  • Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1492–1500, 2017.
    Google ScholarLocate open access versionFindings
  • Zhenlin Xu, Deyi Liu, Junlin Yang, and Marc Niethammer. Robust and generalizable visual representation learning via random convolutions. arXiv preprint arXiv:2007.13003, 2020.
    Findings
  • Dong Yin, Raphael Gontijo Lopes, Jon Shlens, Ekin Dogus Cubuk, and Justin Gilmer. A fourier perspective on model robustness in computer vision. In Advances in Neural Information Processing Systems (NeurIPS), pages 13255–13265, 2019.
    Google ScholarLocate open access versionFindings
  • Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 6023–6032, 2019.
    Google ScholarLocate open access versionFindings
  • Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. In Proceedings of the British Machine Vision Conference (BMVC), 2016.
    Google ScholarLocate open access versionFindings
  • Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. In Proceedings of the International Conference on Learning Representations (ICLR), 2018.
    Google ScholarLocate open access versionFindings
  • Long Zhao, Xi Peng, Yuxiao Chen, Mubbasir Kapadia, and Dimitris N Metaxas. Knowledge as priors: Cross-modal knowledge generalization for datasets without superior knowledge. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6528–6537, 2020.
    Google ScholarLocate open access versionFindings
  • Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia, and Dimitris N Metaxas. Semantic graph convolutional networks for 3D human pose regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3425–3435, 2019.
    Google ScholarLocate open access versionFindings
Author
Your rating :
0

 

Tags
Comments
小科