AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
Invariance penalties across splits of a biased dataset can improve systematic generalisation

Systematic generalisation with group invariant predictions

ICLR, (2021)

被引用0|浏览136
EI
下载 PDF 全文
引用
微博一下

摘要

We consider situations where the presence of dominant simpler correlations with the target variable in a training set can cause an SGD-trained neural network to be less reliant on more persistently-correlating complex features. When the non-persistent, simpler correlations correspond to non-semantic background factors, a neural network tr...更多

代码

数据

0
简介
  • If a training set is biased such that an easier-to-learn feature correlates with the target variable throughout the training set, a modern neural network trained with SGD will use that factor to perform predictions, ignoring co-occurring harder-to-learn complex predictive features (Shah et al, 2020).
  • The authors consider the situation where such a simpler correlation is a dominant bias in the training set, a minority group exists within the dataset where the bias does not manifest.
  • In such cases, relying on more complex predictive features which more pervasively explain the data can be preferable to simpler ones that only explain most of it.
  • If some chairs are not red, and all chairs have backs and legs, one can infer that redness is less relevant
重点内容
  • If a training set is biased such that an easier-to-learn feature correlates with the target variable throughout the training set, a modern neural network trained with SGD will use that factor to perform predictions, ignoring co-occurring harder-to-learn complex predictive features (Shah et al, 2020)
  • We consider the situation where such a simpler correlation is a dominant bias in the training set, a minority group exists within the dataset where the bias does not manifest
  • We investigate the role of encouraging robust predictive behaviour across such groups in terms of improved performance at tasks with systematic distributional shift
  • Our experiments investigate the potential usefulness of invariance penalties and methods at improving performance under systematic distributional shift, such as systematic generalisation and semantic anomaly detection
  • We find that our proposed invariance penalty appears to hold most promise at handling situations undergoing systematic distributional shifts, even though it does not always perform best across different validation schemes
方法
  • The authors compare recent methods aimed at robust predictions across groups, and which do not require changes to network capacity or additional adversaries to impose invariance penalties.
  • IRMv1, REx, GroupDRO: IRMv1 (Arjovsky et al, 2019) and REx (Krueger et al, 2020) are two methods that augment the standard ERM term with invariance penalties across data from different sources.
  • GroupDRO (Sagawa et al, 2020) is an algorithm for distributional robustness, which works by weighting groups of data as a function of their relative losses.
结果
  • The authors have used the partition predictor to infer the two groups.
  • The partition prediction accuracies for the three datasets at the end of one epoch of training the base models are in the table below.
  • The authors tested a more naïve approach by applying K-Means clustering to the losses, but found it to under-perform, since it cannot account for a consistent feature bias learned by the reference model, and can group memorised examples that are “unbiased” along with the biased ones.
结论
  • The authors' experiments investigate the potential usefulness of invariance penalties and methods at improving performance under systematic distributional shift, such as systematic generalisation and semantic anomaly detection.

    While these exploratory experiments are conducted in carefully disambiguated synthetic setups, steps would involve investigating the potential for extending the insights to real datasets used in the field.
  • The authors' experiments investigate the potential usefulness of invariance penalties and methods at improving performance under systematic distributional shift, such as systematic generalisation and semantic anomaly detection.
  • While these exploratory experiments are conducted in carefully disambiguated synthetic setups, steps would involve investigating the potential for extending the insights to real datasets used in the field.
  • A relevant line of inquiry would be the question of how to make trade-offs between in-distribution and unexpected situations
表格
  • Table1: For a coloured MNIST dataset with every digit correlated with a colour 80% of the time, we see poor performance at systematically varying tasks. Performance improves if the minority group combines colours from other biased digits - this provides corrective gradients that promote invariance to colour. Non-systematic shifts are when unseen colours are used, and anomaly detection is measured by decreased predictive confidence for an unseen digit
  • Table2: Generalisation results on COLOURED MNIST
  • Table3: Generalisation performance on COCO-ON-COLOURS
  • Table4: Generalisation performance on COCO-ON-PLACES
  • Table5: Hyper-parameters with different validation sets for COLOURED MNIST
  • Table6: Hyper-parameters with different validation sets for COCO-ON-COLOURS
  • Table7: Hyper-parameters with different validation sets for COCO-ON-PLACES
  • Table8: RGB codes used to bias the digits in the majority group
  • Table9: Background scenes for the in-distribution majority group, minority group, and the nonsystematically shifted validation and test sets. (The mapping to categories only applies to the majority group in the training set.)
  • Table10: Picking hyper-parameters only using a validation set of non-systematic shifts for COLOURED MNIST
  • Table11: Picking hyper-parameters using both a validation set of non-systematic shifts and the in-distribution set for COLOURED MNIST
  • Table12: Picking hyper-parameters using only the in-distribution set for COLOURED MNIST
  • Table13: Picking hyper-parameters only using a validation set of non-systematic shifts for COCOON-COLOURS
  • Table14: Picking hyper-parameters using both a validation set of non-systematic shifts and the in-distribution set for COCO-ON-COLOURS
  • Table15: Picking hyper-parameters using only the in-distribution set for COCO-ON-COLOURS
  • Table16: Picking hyper-parameters only using a validation set of non-systematic shifts for COCOON-PLACES
  • Table17: Picking hyper-parameters using both a validation set of non-systematic shifts and the in-distribution set for COCO-ON-PLACES
  • Table18: Picking hyper-parameters using only the in-distribution set for COCO-ON-PLACES
Download tables as Excel
相关工作
  • The dominant perspective towards the issue of unreliable behaviour in novel domains has consisted of treating the problem as that of domain generalisation (Blanchard et al, 2011). One hopes to recover stable features by encouraging invariance across data sampled from different domains, so that performance at test-time out-of-distribution (OoD) scenarios is less likely to be unstable. Approaches along such lines typically resemble a cross-domain distribution-matching penalty applied to the features being learned, augmenting the usual ERM term (Ganin et al, 2016; Sun & Saenko, 2016; Heinze-Deml & Meinshausen, 2017; Li et al, 2018; Li et al, 2018a;b), and evaluated on datasets that consist of data in different modalities (Li et al, 2017; Peng et al, 2019; Venkateswara et al, 2017), or collected through different means (Fang et al, 2013), or in different contexts (Beery et al, 2018). Works with the perspective of distributionally robust optimisation (DRO) have generally considered using uncertainty sets around training data (Ben-Tal et al, 2013; Duchi & Namkoong, 2018) to minimise worst-case losses, which can often have a regularising effect by effectively up-weighting harder examples. More relevant to our discussion, group DRO methods have considered uncertainty sets in terms of different groups of data, for example with different cross-group distributions of labels (Hu et al, 2018), or groups collected differently (Oren et al, 2019), similarly to domain generalisation datasets. More recently, methods promoting the learning of stable features across data from different environments, or sources, have been proposed by using gradient penalties (Arjovsky et al, 2019), matching losses (Krueger et al, 2020), and masking gradients with opposing signs (Parascandolo et al, 2020). The typical datasets in such existing works are not curated with testing performance under systematic distributional shift in mind, most often not characterising the specific shift in distribution. In recent times, a commonly adopted synthetic dataset is the coloured MNIST variant used in Arjovsky et al (2019) – since this particular dataset uses flipped colours for the minority group, which is less of a problem with ERM-training, the true digit labels were flipped at a sufficiently high frequency to incapacitate ERM performance by forcing reliance on colour. We believe setups such as ours can be better synthetic testbeds for developing ideas, where it is not necessary to alter ground truth labels to expose a failure mode. In general, using better models of dataset bias implies a narrower disconnect with realistic settings, with higher chances of the conclusions carrying over.
基金
  • For the training set, Tr, MNIST digits are coloured with a set of digit-correlated “biasing” colours 80% of the time, and with ten random colours that are different from the biasing colours the remaining 20% of the time
研究对象与分析
synthetic datasets: 3
We perform an empirical study showing that group invariance methods across inferred partitionings of the training set can lead to significant improvements at such test-time situations. We suggest a new invariance penalty, showing with experiments on three synthetic datasets that it can perform better than alternatives. We find that even without assuming access to any systematic-shift validation sets, one can still find improvements over an ERM-trained reference model

synthetic datasets: 3
Evaluating performance in an unambiguous manner for the specific kinds of generalisation that we aim to study necessitates controlled test-beds. In order to model these tasks, we use 3 synthetic datasets of progressively higher complexity, approaching photo-realism. COLOURED MNIST: This is the simplest setting, where the background information exists as part of the object

datasets: 3
In all cases, we have used the partition predictor to infer the two groups. The partition prediction accuracies for the three datasets at the end of one epoch of training the base models are in the table below. We tested a more naïve approach by applying K-Means clustering to the losses, but found it to under-perform, since it cannot account for a consistent feature bias learned by our reference model, and can group memorised examples that are “unbiased” along with the biased ones

cases: 3
Here, we will simply show that picking hyper-parameters without assuming access to validation sets consisting of systematic distributional shift can still provide improvements over the baseline reference model. We consider three cases. Methods cIRMv1 cREx cGroupDRO cMMD

引用论文
  • Faruk Ahmed and Aaron Courville. Detecting semantic anomalies. AAAI, 2020.
    Google ScholarLocate open access versionFindings
  • Michael Alcorn, Qi Li, Zhitao Gong, Chengfei Wang, Long Mai, Wei shinn Ku, and Anh Nguyen. Strike (with) a pose: Neural networks are easily fooled by strange poses of familiar objects. CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization. CoRR, 2019.
    Google ScholarFindings
  • Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization. CoRR, 2016.
    Google ScholarLocate open access versionFindings
  • Dzmitry Bahdanau, Shikhar Murty, Michael Noukhovitch, Thien Huu Nguyen, Harm de Vries, and Aaron Courville. Systematic generalization: What is required and can it be learned? In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Sara Beery, Grant Van Horn, and Pietro Perona. Recognition in terra incognita. CoRR, 2018.
    Google ScholarLocate open access versionFindings
  • Aharon Ben-Tal, Dick den Hertog, Anja De Waegenaere, Bertrand Melenberg, and Gijs Rennen. Robust solutions of optimization problems affected by uncertain probabilities. 59(2), 2013. ISSN 0025-1909.
    Google ScholarLocate open access versionFindings
  • Gilles Blanchard, Gyemin Lee, and Clayton Scott. Generalizing from several related classification tasks to a new unlabeled sample. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger (eds.), Advances in Neural Information Processing Systems 24, pp. 2178–2186. 2011.
    Google ScholarLocate open access versionFindings
  • Wieland Brendel and Matthias Bethge. Approximating cnns with bag-of-local-features models works surprisingly well on imagenet. International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • Mateusz Buda, Atsuto Maki, and Maciej A Mazurowski. A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks, 106:249–259, 2018.
    Google ScholarLocate open access versionFindings
  • Fabio Maria Carlucci, Antonio D’Innocente, Silvia Bucci, Barbara Caputo, and Tatiana Tommasi. Domain generalization by solving jigsaw puzzles. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Elliot Creager, Jörn-Henrik Jacobson, and Richard Zemel. Environment inference for invariant learning. ICML Workshop on Uncertainty and Robustness, 2020.
    Google ScholarLocate open access versionFindings
  • Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9268–9277, 2019.
    Google ScholarLocate open access versionFindings
  • Harm de Vries, Florian Strub, Jérémie Mary, Hugo Larochelle, Olivier Pietquin, and Aaron C. Courville. Modulating early visual processing by language. NIPS, 2017.
    Google ScholarLocate open access versionFindings
  • John Duchi and Hongseok Namkoong. Learning models with uniform performance via distributionally robust optimization. arXiv preprint arXiv:1810.08750, 2018.
    Findings
  • Chen Fang, Ye Xu, and Daniel N Rockmore. Unbiased metric learning: On the utilization of multiple datasets and web images for softening bias. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1657–1664, 2013.
    Google ScholarLocate open access versionFindings
  • Jerry A. Fodor and Zenon W. Pylyshyn. Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1):3 – 71, 1988.
    Google ScholarLocate open access versionFindings
  • Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. Domain-adversarial training of neural networks. J. Mach. Learn. Res., 17(1):2096–2030, January 2016.
    Google ScholarLocate open access versionFindings
  • Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, and Wieland Brendel. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test. 13(null), 2012. ISSN 1532-4435.
    Google ScholarLocate open access versionFindings
  • Christina Heinze-Deml and Nicolai Meinshausen. Conditional variance penalties and domain shift robustness. CoRR, 2017.
    Google ScholarFindings
  • Daniel Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks. ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Weihua Hu, Gang Niu, Issei Sato, and Masashi Sugiyama. Does distributionally robust supervised learning give robust classifiers? In International Conference on Machine Learning, pp. 2029–2037. PMLR, 2018.
    Google ScholarLocate open access versionFindings
  • Gary King and Langche Zeng. Logistic regression in rare events data. Political analysis, 9(2): 137–163, 2001.
    Google ScholarLocate open access versionFindings
  • Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. International Conference on Learning Representations, 12 2014.
    Google ScholarLocate open access versionFindings
  • David Krueger, Ethan Caballero, Joern-Henrik Jacobsen, Amy Zhang, Jonathan Binas, Remi Le Priol, and Aaron Courville. Out-of-distribution generalization via risk extrapolation (rex). CoRR, 2020.
    Google ScholarFindings
  • Brenden M. Lake and Marco Baroni. Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. In ICML, 2018.
    Google ScholarLocate open access versionFindings
  • Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy M Hospedales. Deeper, broader and artier domain generalization. In Proceedings of the IEEE international conference on computer vision, pp. 5542–5550, 2017.
    Google ScholarLocate open access versionFindings
  • H. Li, S. J. Pan, S. Wang, and A. C. Kot. Domain generalization with adversarial feature learning. pp. 5400–5409, 2018.
    Google ScholarFindings
  • Y. Li, X. Tian, M. Gong, Y. Liu, T. Liu, K. Zhang, and D. Tao. Deep domain generalization via conditional invariant adversarial networks. In ECCV, 2018a.
    Google ScholarLocate open access versionFindings
  • Ya Li, Mingming Gong, Xinmei Tian, Tongliang Liu, and Dacheng Tao. Domain generalization via conditional invariant representations. 2018b.
    Google ScholarFindings
  • Yuanzhi Li, Colin Wei, and Tengyu Ma. Towards explaining the regularization effect of initial large learning rate in training neural networks. In Advances in Neural Information Processing Systems 32, pp. 11674–11685. 2019.
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, M. Maire, Serge J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. ArXiv, abs/1405.0312, 2014.
    Findings
  • Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. CoRR, 2018.
    Google ScholarFindings
  • Yonatan Oren, Shiori Sagawa, Tatsunori Hashimoto, and Percy Liang. Distributionally robust language modeling. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 4218–4228, 2019.
    Google ScholarLocate open access versionFindings
  • Giambattista Parascandolo, Alexander Neitz, Antonio Orvieto, Luigi Gresele, and Bernhard Schölkopf. Learning explanations that are hard to vary. arXiv preprint arXiv:2009.00329, 2020.
    Findings
  • Xingchao Peng, Qinxun Bai, Xide Xia, Zijun Huang, Kate Saenko, and Bo Wang. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1406–1415, 2019.
    Google ScholarLocate open access versionFindings
  • Shiori Sagawa, Pang Wei Koh, Tatsunori Hashimoto, and Percy Liang. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. ICLR, 2020.
    Google ScholarLocate open access versionFindings
  • Harshay Shah, Kaustav Tamuly, Aditi Raghunathan, Prateek Jain, and Praneeth Netrapalli. The pitfalls of simplicity bias in neural networks. CoRR, 2020.
    Google ScholarLocate open access versionFindings
  • Baochen Sun and Kate Saenko. Deep coral: Correlation alignment for deep domain adaptation. pp. 443–450, 2016.
    Google ScholarFindings
  • Hemanth Venkateswara, Jose Eusebio, Shayok Chakraborty, and Sethuraman Panchanathan. Deep hashing network for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5018–5027, 2017.
    Google ScholarLocate open access versionFindings
  • Yuichi Yoshida and Takeru Miyato. Spectral norm regularization for improving the generalizability of deep learning. CoRR, 2017.
    Google ScholarLocate open access versionFindings
  • Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. In BMVC, 2016. Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科