Viewmaker Networks: Learning Views for Unsupervised Representation Learning

Alex Tamkin
Alex Tamkin

international conference on learning representations, 2020.

Cited by: 0|Bibtex|Views12
Other Links: arxiv.org|academic.microsoft.com
Weibo:
We present a new generative model that produces views for contrastive learning, matching or outperforming hand-crafted views on image, speech, and wearable sensor datasets

Abstract:

Many recent methods for unsupervised representation learning involve training models to be invariant to different "views," or transformed versions of an input. However, designing these views requires considerable human expertise and experimentation, hindering widespread adoption of unsupervised representation learning methods across dom...More
0
Introduction
  • Unsupervised representation learning has made significant recent strides, including in computer vision where view-based methods have enabled strong performance on benchmark tasks (Wu et al, 2018; Oord et al, 2018; Bachman et al, 2019; Zhuang et al, 2019; Bachman et al, 2019; Misra & Maaten, 2020; He et al, 2020; Chen et al, 2020a).
  • In contrastive learning of visual representations models are trained to maximize the mutual information between different views of an image, including cropping, blurring, noise, and changes to color and contrast (Bachman et al, 2019; Chen et al, 2020a).
  • Much work has investigated the space of possible image views and understanding their effects on transfer learning (Chen et al, 2020a; Wu et al, 2020; Tian et al, 2019; Purushwalkam & Gupta, 2020).
  • Previous studies have investigated the properties of good views through the lens of mutual information (Tian et al, 2020; Wu et al, 2020), but a broadly-applicable approach for learning views remains unstudied
Highlights
  • Unsupervised representation learning has made significant recent strides, including in computer vision where view-based methods have enabled strong performance on benchmark tasks (Wu et al, 2018; Oord et al, 2018; Bachman et al, 2019; Zhuang et al, 2019; Bachman et al, 2019; Misra & Maaten, 2020; He et al, 2020; Chen et al, 2020a)
  • In contrastive learning of visual representations models are trained to maximize the mutual information between different views of an image, including cropping, blurring, noise, and changes to color and contrast (Bachman et al, 2019; Chen et al, 2020a)
  • On time-series data from wearable sensors, our model significantly outperforms baseline views on the task of human activity recognition
  • We introduce a method for learning views for contrastive learning, demonstrating its effectiveness across image, speech, and wearable sensor modalities
  • It is interesting to consider what happens as the viewmaker networks increase in size: do we see performance gains or robustness-accuracy trade-offs (Raghunathan et al, 2019)? our work is a step towards reducing the amount of expertise, time, and compute needed to make unsupervised learning work for a much wider variety of domains
Methods
Results
  • Table 1 shows the results, indicating comparable overall performance with SimCLR and InstDisc, all without use of human-crafted view functions.
  • This is noteworthy as the views cannot implement cropping-and-rescaling, which was shown to be the most important view function in Chen et al (2020a).
  • An interesting possibility is that the worse performance of viewmaker views may result from the model being able to identify and ablate such spurious correlations in the spectrograms
Conclusion
  • The authors introduce a method for learning views for contrastive learning, demonstrating its effectiveness across image, speech, and wearable sensor modalities.
  • The authors' novel generative model—viewmaker networks—enables them to efficiently learn views as part of the representation learning process, as opposed to relying on domain-specific knowledge or many costly pretraining runs.
  • Viewmaker networks may find use in supervised learning, through the lens of data augmentation and robustness.
  • It is interesting to consider what happens as the viewmaker networks increase in size: do the authors see performance gains or robustness-accuracy trade-offs (Raghunathan et al, 2019)?
  • It is interesting to consider what happens as the viewmaker networks increase in size: do the authors see performance gains or robustness-accuracy trade-offs (Raghunathan et al, 2019)? the work is a step towards reducing the amount of expertise, time, and compute needed to make unsupervised learning work for a much wider variety of domains
Summary
  • Introduction:

    Unsupervised representation learning has made significant recent strides, including in computer vision where view-based methods have enabled strong performance on benchmark tasks (Wu et al, 2018; Oord et al, 2018; Bachman et al, 2019; Zhuang et al, 2019; Bachman et al, 2019; Misra & Maaten, 2020; He et al, 2020; Chen et al, 2020a).
  • In contrastive learning of visual representations models are trained to maximize the mutual information between different views of an image, including cropping, blurring, noise, and changes to color and contrast (Bachman et al, 2019; Chen et al, 2020a).
  • Much work has investigated the space of possible image views and understanding their effects on transfer learning (Chen et al, 2020a; Wu et al, 2020; Tian et al, 2019; Purushwalkam & Gupta, 2020).
  • Previous studies have investigated the properties of good views through the lens of mutual information (Tian et al, 2020; Wu et al, 2020), but a broadly-applicable approach for learning views remains unstudied
  • Methods:

    The authors' work is related to and inspired by work on adversarial methods, including thep balls studied in adversarial robustness (Szegedy et al, 2013; Madry et al, 2017; Raghunathan et al, 2018) and training networks with adversarial objectives (Goodfellow et al, 2014; Xiao et al, 2018).
  • Learning views Outside of adversarial approaches, the work is related to other studies that seek to learn data augmentation strategies by composing existing human-designed augmentations (Ratner et al, 2017; Cubuk et al, 2018; Zhang et al, 2019; Ho et al, 2019; Lim et al, 2019; Cubuk et al, 2020) or by modeling variations specific to the data distribution (Tran et al, 2017; Wong & Kolter, 2020).
  • The authors' work requires no human-defined view functions, does not require pretraining a generative model, and can generate perturbations beyond variation observed in the training data
  • Results:

    Table 1 shows the results, indicating comparable overall performance with SimCLR and InstDisc, all without use of human-crafted view functions.
  • This is noteworthy as the views cannot implement cropping-and-rescaling, which was shown to be the most important view function in Chen et al (2020a).
  • An interesting possibility is that the worse performance of viewmaker views may result from the model being able to identify and ablate such spurious correlations in the spectrograms
  • Conclusion:

    The authors introduce a method for learning views for contrastive learning, demonstrating its effectiveness across image, speech, and wearable sensor modalities.
  • The authors' novel generative model—viewmaker networks—enables them to efficiently learn views as part of the representation learning process, as opposed to relying on domain-specific knowledge or many costly pretraining runs.
  • Viewmaker networks may find use in supervised learning, through the lens of data augmentation and robustness.
  • It is interesting to consider what happens as the viewmaker networks increase in size: do the authors see performance gains or robustness-accuracy trade-offs (Raghunathan et al, 2019)?
  • It is interesting to consider what happens as the viewmaker networks increase in size: do the authors see performance gains or robustness-accuracy trade-offs (Raghunathan et al, 2019)? the work is a step towards reducing the amount of expertise, time, and compute needed to make unsupervised learning work for a much wider variety of domains
Tables
  • Table1: Our learned views enable comparable transfer performance to human-engineered views on CIFAR-10. Suite of transfer tasks using pretrained representations from CIFAR-10 for both the SimCLR and InstDisc pretraining setups. Numbers are percent accuracy with the exception of CelebA which is F1. FaMNIST stands for FashionMNIST
  • Table2: Our learned views significantly outperform existing views for speech transfer tasks. Linear evaluation accuracy for SimCLR models trained on LibriSpeech. Left: ResNet18 + Librispeech 100 hour, Right: Resnet50 + Librispeech 960hr. “Time” refers to view functions applied in the time domain (Kharitonov et al, 2020), while “Spec.” refers to view functions applied directly to the spectrogram (<a class="ref-link" id="cPark_et+al_2019_a" href="#rPark_et+al_2019_a">Park et al, 2019</a>). 0.05 and 0.1 denote viewmaker distortion bounds ✏
  • Table3: Our learned views significantly outperform existing views for activity recognition on wearable sensor data. Our method learns superior representations across a large range of distortion budgets ✏, although budgets that are too strong prevent learning. Linear evaluation accuracy for ResNet18 models trained on Pamap2 with SimCLR. “Spectral” refers to view functions applied directly to the spectrogram (<a class="ref-link" id="cPark_et+al_2019_a" href="#rPark_et+al_2019_a">Park et al, 2019</a>)
  • Table4: Our method enables superior results in a semi-supervised setting where labels for data from only one participant are available. Validation accuracy for activity recognition on Pamap2. Supervised Learning refers to training a randomly initialized model on the labeled data until convergence. Pretrain & Transfer refers to training a linear classifier off of the best pretrained model above. 1 or 7 Participants refers to the number of participants comprising the training set
Download tables as Excel
Related work
  • Unsupervised representation learning Learning useful representations from unlabeled data is a fundamental problem in machine learning (Pan & Yang, 2009; Bengio et al, 2013). A recently successful framework for unsupervised representation learning of images involves training a model to be invariant to various data transformations (Bachman et al, 2019; Misra & Maaten, 2020), although the idea has much earlier roots (Becker & Hinton, 1992; Hadsell et al, 2006; Dosovitskiy et al, 2014). This idea has been expanded by a number of contrastive learning approaches which push embeddings of different views, or transformed inputs, closer together, while pushing other pairs apart (Tian et al, 2019; He et al, 2020; Chen et al, 2020a;b;c). Related but more limited setups have been explored for speech, where data augmentation strategies are less explored (Oord et al, 2018; Kharitonov et al, 2020).

    Understanding and designing views Several works have studied the role of views in contrastive learning, including from a mutual-information perspective (Wu et al, 2020), in relation to specific transfer tasks (Tian et al, 2019), with respect to different kinds of invariances (Purushwalkam & Gupta, 2020), or via careful empirical studies (Chen et al, 2020a). Outside of a contrastive learning framework, Gontijo-Lopes et al (2020) study how data augmentation aids generalization in vision models. Much work has explored different handcrafted data augmentation methods for supervised learning of images (Hendrycks et al, 2020; Lopes et al, 2019; Perez & Wang, 2017; Yun et al, 2019; Zhang et al, 2017), speech (Park et al, 2019; Kovacs et al, 2017; Toth et al, 2018; Kharitonov et al, 2020), or in feature space (DeVries & Taylor, 2017).
Funding
  • Our views significantly outperforming baseline augmentations in speech (+9% absolute) and wearable sensor (+17% absolute) domains
  • On speech data, our model significantly outperform existing human-defined views on a range of speech recognition transfer tasks
  • On time-series data from wearable sensors, our model significantly outperforms baseline views on the task of human activity recognition (e.g. cycling, running, jumping rope)
  • We also compare against a variant of these views with spectrogram noise removed, which we find improves performance of this baseline
  • Our views significantly outperform spectral masking by 12.8% absolute when using the same ✏ = 0.05 as image and speech, and by 16.7% absolute when using a larger ✏ = 0.5 (Table 3)
Study subjects and analysis
Participants: 7
Table 3). 1 Participant 7 Participants. Pretrain (Ours) & Transfer

speech classification datasets: 3
5.2 SPEECH CLASSIFICATION RESULTS. We evaluate on three speech classification datasets: Fluent Speech Commands (Lugosch et al, 2019), Google Speech Commands (Warden, 2018), and spoken digit classification (Becker et al, Resnet18, 100hr. LibriSpeech Sp

participants: 9
6.1 SELF-SUPERVISED LEARNING SETUP. We consider the Pamap2 dataset (Reiss & Stricker, 2012), a dataset of 12 different activities performed by 9 participants. Each activity contains 52 different time series, including heart rate data, as well as accelerometer, gyroscope, and magnetometer data collected from sensors on the ankle, hand, and chest (all sampled at 100Hz, except heart rate which is sampled at approximately 9Hz)

participants: 7
Here, we show that our method can enable strong performance when labels for only a single participant (Participant 1) out of seven are available. We compare simple supervised learning on Participant 1’s labels against linear evaluation of our best pretrained model, which was trained on unlabeled data from all 7 participants. The model architectures and training procedures are otherwise identical to the previous section

participants: 7
The model architectures and training procedures are otherwise identical to the previous section. As Figure 4 shows, pretraining with our method on unlabeled data enables significant gains over pure supervised learning when data is scarce, and even slightly outperforms the hand-crafted views trained on all 7 participants (Cf. Table 3)

Participants: 7
Pretrain (Ours) & Transfer. 1 Participant 7 Participants. We introduce a method for learning views for contrastive learning, demonstrating its effectiveness across image, speech, and wearable sensor modalities

Reference
  • Nasir Ahmed, T Natarajan, and Kamisetty R Rao. Discrete cosine transform. IEEE transactions on Computers, 100(1):90–93, 1974.
    Google ScholarLocate open access versionFindings
  • Antreas Antoniou, Amos Storkey, and Harrison Edwards. Data augmentation generative adversarial networks, 2017.
    Google ScholarFindings
  • Philip Bachman, R Devon Hjelm, and William Buchwalter. Learning representations by maximizing mutual information across views. In Advances in Neural Information Processing Systems, pp. 15535–15545, 2019.
    Google ScholarLocate open access versionFindings
  • Suzanna Becker and Geoffrey E Hinton. Self-organizing neural network that discovers surfaces in random-dot stereograms. Nature, 355(6356):161–163, 1992.
    Google ScholarLocate open access versionFindings
  • Soren Becker, Marcel Ackermann, Sebastian Lapuschkin, Klaus-Robert Muller, and Wojciech Samek. Interpreting and explaining deep neural networks for classification of audio signals, 2018.
    Google ScholarFindings
  • Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013.
    Google ScholarLocate open access versionFindings
  • Lukas Biewald. Experiment tracking with weights and biases, 2020. URL https://www.wandb.com/. Software available from wandb.com.
    Findings
  • Avishek Joey Bose, Huan Ling, and Yanshuai Cao. Adversarial contrastive estimation. arXiv preprint arXiv:1805.03642, 2018.
    Findings
  • Christopher Bowles, Liang Chen, Ricardo Guerrero, Paul Bentley, Roger Gunn, Alexander Hammers, David Alexander Dickie, Maria Valdes Hernandez, Joanna Wardlaw, and Daniel Rueckert. Gan augmentation: Augmenting training data using generative adversarial networks, 2018.
    Google ScholarFindings
  • Olivier Chapelle, Jason Weston, Leon Bottou, and Vladimir Vapnik. Vicinal risk minimization. In Advances in neural information processing systems, pp. 416–422, 2001.
    Google ScholarLocate open access versionFindings
  • Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709, 2020a.
    Findings
  • Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, and Geoffrey Hinton. Big selfsupervised models are strong semi-supervised learners. arXiv preprint arXiv:2006.10029, 2020b.
    Findings
  • Xinlei Chen, Haoqi Fan, Ross Girshick, and Kaiming He. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020c.
    Findings
  • Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V Le. Autoaugment: Learning augmentation policies from data. arXiv preprint arXiv:1805.09501, 2018.
    Findings
  • Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 702–703, 2020.
    Google ScholarLocate open access versionFindings
  • Terrance DeVries and Graham W. Taylor. Dataset augmentation in feature space, 2017.
    Google ScholarFindings
  • Jeff Donahue and Karen Simonyan. Large scale adversarial representation learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlche-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems 32, pp. 10542– 10552. Curran Associates, Inc., 2019. URL http://papers.nips.cc/paper/9240-large-scale-adversarial-representation-learning.pdf.
    Locate open access versionFindings
  • Alexey Dosovitskiy, Jost Tobias Springenberg, Martin Riedmiller, and Thomas Brox. Discriminative unsupervised feature learning with convolutional neural networks. In Advances in neural information processing systems, pp. 766–774, 2014.
    Google ScholarLocate open access versionFindings
  • WA Falcon. Pytorch lightning. GitHub. Note: https://github.com/PyTorchLightning/pytorchlightning, 3, 2019.
    Locate open access versionFindings
  • Raphael Gontijo-Lopes, Sylvia J Smullin, Ekin D Cubuk, and Ethan Dyer. Affinity and diversity: Quantifying mechanisms of data augmentation. arXiv preprint arXiv:2002.08973, 2020.
    Findings
  • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680, 2014.
    Google ScholarLocate open access versionFindings
  • Raia Hadsell, Sumit Chopra, and Yann LeCun. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 2, pp. 1735–1742. IEEE, 2006.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition, 2015.
    Google ScholarFindings
  • Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738, 2020.
    Google ScholarLocate open access versionFindings
  • Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261, 2019.
    Findings
  • Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, et al. The many faces of robustness: A critical analysis of out-of-distribution generalization. arXiv preprint arXiv:2006.16241, 2020.
    Findings
  • Daniel Ho, Eric Liang, Xi Chen, Ion Stoica, and Pieter Abbeel. Population based augmentation: Efficient learning of augmentation policy schedules. In International Conference on Machine Learning, pp. 2731–2741. PMLR, 2019.
    Google ScholarLocate open access versionFindings
  • Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (eds.), Computer Vision – ECCV 2016, pp. 694–711, Cham, 2016. Springer International Publishing. ISBN 978-3319-46475-6.
    Google ScholarLocate open access versionFindings
  • Eugene Kharitonov, Morgane Riviere, Gabriel Synnaeve, Lior Wolf, Pierre-Emmanuel Mazare, Matthijs Douze, and Emmanuel Dupoux. Data augmenting contrastive learning of speech representations in the time domain. arXiv preprint arXiv:2007.00991, 2020.
    Findings
  • Minseon Kim, Jihoon Tack, and Sung Ju Hwang. Adversarial self-supervised contrastive learning, 2020.
    Google ScholarFindings
  • Gyorgy Kovacs, Laszlo Toth, Dirk Van Compernolle, and Sriram Ganapathy. Increasing the robustness of cnn acoustic models using autoregressive moving average spectrogram features and channel dropout. Pattern Recognition Letters, 100:44–50, 2017.
    Google ScholarLocate open access versionFindings
  • Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.
    Google ScholarFindings
  • Oscar D Lara and Miguel A Labrador. A survey on human activity recognition using wearable sensors. IEEE communications surveys & tutorials, 15(3):1192–1209, 2012.
    Google ScholarLocate open access versionFindings
  • Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Junaid Qadir, and Bjorn W Schuller. Deep representation learning in speech processing: Challenges, recent advances, and future trends. arXiv preprint arXiv:2001.00378, 2020.
    Findings
  • Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
    Google ScholarLocate open access versionFindings
  • Sungbin Lim, Ildoo Kim, Taesup Kim, Chiheon Kim, and Sungwoong Kim. Fast autoaugment. In Advances in Neural Information Processing Systems, pp. 6665–6675, 2019.
    Google ScholarLocate open access versionFindings
  • Raphael Gontijo Lopes, Dong Yin, Ben Poole, Justin Gilmer, and Ekin D Cubuk. Improving robustness without sacrificing accuracy with patch gaussian augmentation. arXiv preprint arXiv:1906.02611, 2019.
    Findings
  • Loren Lugosch, Mirco Ravanelli, Patrick Ignoto, Vikrant Singh Tomar, and Yoshua Bengio. Speech model pre-training for end-to-end spoken language understanding, 2019.
    Google ScholarFindings
  • Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
    Findings
  • Matthias Minderer, Olivier Bachem, Neil Houlsby, and Michael Tschannen. Automatic shortcut removal for self-supervised representation learning. arXiv preprint arXiv:2002.08822, 2020.
    Findings
  • Ishan Misra and Laurens van der Maaten. Self-supervised learning of pretext-invariant representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6707–6717, 2020.
    Google ScholarLocate open access versionFindings
  • Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE transactions on pattern analysis and machine intelligence, 41(8):1979–1993, 2018.
    Google ScholarLocate open access versionFindings
  • Fernando Moya Rueda, Rene Grzeszick, Gernot A Fink, Sascha Feldhorst, and Michael Ten Hompel. Convolutional neural networks for human activity recognition using body-worn sensors. In Informatics, volume 5, pp. 26. Multidisciplinary Digital Publishing Institute, 2018.
    Google ScholarLocate open access versionFindings
  • Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. Voxceleb: a large-scale speaker identification dataset, 2017.
    Google ScholarFindings
  • Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
    Findings
  • Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2009.
    Google ScholarLocate open access versionFindings
  • Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. Librispeech: An asr corpus based on public domain audio books. pp. 5206–5210, 04 2015. doi: 10.1109/ICASSP.2015. 7178964.
    Findings
  • Daniel S Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D Cubuk, and Quoc V Le. Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779, 2019.
    Findings
  • Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, highperformance deep learning library. In Advances in neural information processing systems, pp. 8026–8037, 2019.
    Google ScholarLocate open access versionFindings
  • Luis Perez and Jason Wang. The effectiveness of data augmentation in image classification using deep learning. arXiv preprint arXiv:1712.04621, 2017.
    Findings
  • Senthil Purushwalkam and Abhinav Gupta. Demystifying contrastive self-supervised learning: Invariances, augmentations and dataset biases. arXiv preprint arXiv:2007.13916, 2020.
    Findings
  • Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certified defenses against adversarial examples. arXiv preprint arXiv:1801.09344, 2018.
    Findings
  • Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John C Duchi, and Percy Liang. Adversarial training can hurt generalization. arXiv preprint arXiv:1906.06032, 2019.
    Findings
  • Alexander J Ratner, Henry Ehrenberg, Zeshan Hussain, Jared Dunnmon, and Christopher Re. Learning to compose domain-specific transformations for data augmentation. In Advances in neural information processing systems, pp. 3236–3246, 2017.
    Google ScholarLocate open access versionFindings
  • Attila Reiss and Didier Stricker. Introducing a new benchmarked dataset for activity monitoring. In 2012 16th International Symposium on Wearable Computers, pp. 108–109. IEEE, 2012.
    Google ScholarLocate open access versionFindings
  • Mehdi SM Sajjadi, Giambattista Parascandolo, Arash Mehrjou, and Bernhard Scholkopf. Tempered adversarial networks. arXiv preprint arXiv:1802.04374, 2018.
    Findings
  • Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
    Findings
  • Yonglong Tian, Dilip Krishnan, and Phillip Isola. Contrastive multiview coding. arXiv preprint arXiv:1906.05849, 2019.
    Findings
  • Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid, and Phillip Isola. What makes for good views for contrastive learning. arXiv preprint arXiv:2005.10243, 2020.
    Findings
  • Toan Tran, Trung Pham, Gustavo Carneiro, Lyle Palmer, and Ian Reid. A bayesian data augmentation approach for learning deep models, 2017.
    Google ScholarFindings
  • Eleni Triantafillou, Tyler Zhu, Vincent Dumoulin, Pascal Lamblin, Utku Evci, Kelvin Xu, Ross Goroshin, Carles Gelada, Kevin Swersky, Pierre-Antoine Manzagol, and Hugo Larochelle. Metadataset: A dataset of datasets for learning to learn from few examples, 2019.
    Google ScholarFindings
  • Riccardo Volpi, Hongseok Namkoong, Ozan Sener, John Duchi, Vittorio Murino, and Silvio Savarese. Generalizing to unseen domains via adversarial data augmentation, 2018.
    Google ScholarFindings
  • Pete Warden. Speech commands: A dataset for limited-vocabulary speech recognition, 2018.
    Google ScholarFindings
  • Eric Wong and J. Zico Kolter. Learning perturbation sets for robust machine learning, 2020.
    Google ScholarFindings
  • Mike Wu, Chengxu Zhuang, Milan Mosse, Daniel Yamins, and Noah Goodman. On mutual information in contrastive learning for visual representations. arXiv preprint arXiv:2005.13149, 2020.
    Findings
  • Zhirong Wu, Yuanjun Xiong, Stella Yu, and Dahua Lin. Unsupervised feature learning via nonparametric instance-level discrimination. arXiv preprint arXiv:1805.01978, 2018.
    Findings
  • Chaowei Xiao, Bo Li, Jun-Yan Zhu, Warren He, Mingyan Liu, and Dawn Song. Generating adversarial examples with adversarial networks, 2018.
    Google ScholarFindings
  • Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
    Findings
  • Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE International Conference on Computer Vision, pp. 6023–6032, 2019.
    Google ScholarLocate open access versionFindings
  • Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
    Findings
  • Xinyu Zhang, Qiang Wang, Jian Zhang, and Zhao Zhong. Adversarial autoaugment. arXiv preprint arXiv:1912.11188, 2019.
    Findings
  • Chengxu Zhuang, Alex Lin Zhai, and Daniel Yamins. Local aggregation for unsupervised learning of visual embeddings. In Proceedings of the IEEE International Conference on Computer Vision, pp. 6002–6012, 2019.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments