AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We show that leveraging recent pretrained image embeddings for clustering can substantially further improve the robust performance of GEORGE, in some cases to match the performance of group DRO trained using the true subclass labels

No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems

NIPS 2020, (2020)

被引用4|浏览47
EI
下载 PDF 全文
引用
微博一下

摘要

In real-world classification tasks, each class often comprises multiple finer-grained "subclasses." As the subclass labels are frequently unavailable, models trained using only the coarser-grained class labels often exhibit highly variable performance across different subclasses. This phenomenon, known as hidden stratification, has impo...更多

代码

数据

0
简介
  • In many real-world classification tasks, each labeled class consists of multiple semantically distinct subclasses that are unlabeled.
  • Recent empirical evidence [36] encouragingly suggests that feature representations of deep neural networks often carry information about unlabeled subclasses
  • Motivated by this observation, the authors propose a method for addressing hidden stratification, by both measuring and improving worst-case subclass performance in the setting where subclass labels are unavailable.
重点内容
  • In many real-world classification tasks, each labeled class consists of multiple semantically distinct subclasses that are unlabeled
  • We show that leveraging recent pretrained image embeddings [27] for clustering can substantially further improve the robust performance of GEORGE, in some cases to match the performance of group DRO (GDRO) trained using the true subclass labels
  • A major obstacle to applying GDRO methods in practice is that subgroup labels are often unavailable; in our work, we aim to address this issue in the classification setting
  • We show that in this setting, unlike empirical risk minimization (ERM), GEORGE converges to the optimal robust risk at the same sample complexity rate as GDRO when it is able to recover the true latent features
  • 1In Appendix C, we show that GEORGE outperforms other subclass-agnostic baselines, such as GDRO trained using the superclasses as groups
  • Our clustering approach significantly improves performance over “vanilla” clustering; we hope that it may be of independent interest as well
  • We propose GEORGE, a two-step approach for measuring and mitigating hidden stratification without requiring access to subclass labels
方法
  • Requires Subclass Labels?
  • Metric Type Robust Overall Waterbirds.
  • 63.3(±1.6) 97.2(±0.1) 76.2(±2.0) 95.5(±0.6) U-MNIST.
  • Cluster-Robust 76.8(±1.6) 92.3(±2.5) .894(±.031) 59.1(±1.1)
结果
  • Step 1 of GEORGE is to train an ERM model and cluster the data of each superclass in its feature space.
  • The authors analyze these clusters to better understand GEORGE’s behavior.
  • The authors show that GEORGE finds clusters that align well with poorly-performing human-labeled subclasses.
  • The authors show that the worst-case performance measured on the clusters returned by GEORGE is a good approximation of the true robust performance
结论
  • The authors propose GEORGE, a two-step approach for measuring and mitigating hidden stratification without requiring access to subclass labels.
  • GEORGE’s first step, clustering the features of an ERM model, identifies clusters that provide useful approximations of worst-case subclass performance.
  • GEORGE’s second step, using these cluster assignments as groups in GDRO, yields significant improvements in worst-case subclass performance.
  • The authors analyze GEORGE in the context of a simple generative model, and show that under suitable assumptions GEORGE achieves the same asymptotic sample complexity rates as if the authors had access to true subclass labels.
总结
  • Introduction:

    In many real-world classification tasks, each labeled class consists of multiple semantically distinct subclasses that are unlabeled.
  • Recent empirical evidence [36] encouragingly suggests that feature representations of deep neural networks often carry information about unlabeled subclasses
  • Motivated by this observation, the authors propose a method for addressing hidden stratification, by both measuring and improving worst-case subclass performance in the setting where subclass labels are unavailable.
  • Objectives:

    A major obstacle to applying GDRO methods in practice is that subgroup labels are often unavailable; in the work, the authors aim to address this issue in the classification setting.
  • The authors' goal is to classify examples from X into their correct superclass.
  • The authors' goal is to learn a model f ∈ F such that Rrobust(f)−min (Rrobust(f )).
  • The authors' goal is to minimize the maximum per-subclass loss by solving Eq (4)
  • Methods:

    Requires Subclass Labels?
  • Metric Type Robust Overall Waterbirds.
  • 63.3(±1.6) 97.2(±0.1) 76.2(±2.0) 95.5(±0.6) U-MNIST.
  • Cluster-Robust 76.8(±1.6) 92.3(±2.5) .894(±.031) 59.1(±1.1)
  • Results:

    Step 1 of GEORGE is to train an ERM model and cluster the data of each superclass in its feature space.
  • The authors analyze these clusters to better understand GEORGE’s behavior.
  • The authors show that GEORGE finds clusters that align well with poorly-performing human-labeled subclasses.
  • The authors show that the worst-case performance measured on the clusters returned by GEORGE is a good approximation of the true robust performance
  • Conclusion:

    The authors propose GEORGE, a two-step approach for measuring and mitigating hidden stratification without requiring access to subclass labels.
  • GEORGE’s first step, clustering the features of an ERM model, identifies clusters that provide useful approximations of worst-case subclass performance.
  • GEORGE’s second step, using these cluster assignments as groups in GDRO, yields significant improvements in worst-case subclass performance.
  • The authors analyze GEORGE in the context of a simple generative model, and show that under suitable assumptions GEORGE achieves the same asymptotic sample complexity rates as if the authors had access to true subclass labels.
表格
  • Table1: Robust and overall performance for ERM, GEORGE, and subclass-GDRO (i.e., GDRO with true subclass labels). Performance metric is accuracy for all datasets but ISIC, which uses AUROC. Bolded values are best between ERM and GEORGE, which do not require subclass labels. Sub-columns for ISIC represent two different definitions of the ISIC subclasses; see Section 6.3
  • Table2: Alignment of clusters with poorly-performing subclasses on the train set. We run Step 1 of GEORGE over multiple random seeds (i.e., train multiple ERM models and cluster their activations). In col. 4, we report the percentage of these trials with a cluster above the given precision and recall thresholds (cols. 5, 6) for identifying the subclass in col. 2. We report the proportion of examples from that subclass within its superclass in col. 3
  • Table3: Comparison of overall, cluster-robust, and robust performance.2(Conventions as in Table 1.)
Download tables as Excel
基金
  • Acknowledgments and Disclosure of Funding We thank Arjun Desai, Pang Wei Koh, Shiori Sagawa, Charles Kuang, Karan Goel, Avner May, Esther Rolf, and Sharon Li for helpful discussions and feedback. We gratefully acknowledge the support of DARPA under Nos
研究对象与分析
datasets: 4
This underscores the importance of recovering a “good” feature space; empirically, we show in Appendix C that the choice of model architecture can indeed dramatically impact the model feature space and thus the ability to recover subclasses. We empirically validate that GEORGE can mitigate hidden stratification across four datasets. In Section 6.2, we show that when subclass labels are unavailable, GEORGE improves robust performance

datasets: 4
We analyze GEORGE in the context of a simple generative model, and show that under suitable assumptions GEORGE achieves the same asymptotic sample complexity rates as if we had access to true subclass labels. We empirically validate GEORGE on four datasets, and find evidence that it can reduce hidden stratification on real-world machine learning tasks. 2We note that if reweighting is not applied to the Waterbirds validation/test sets (see Appendix B.2.2 for explanation), the cluster-robust performance is significantly closer to the true robust performance (within 2 accuracy points, for both ERM and GEORGE); true robust performance for GEORGE also increases to 82.6%, while other methods are relatively unaffected by this reweighting

引用论文
  • Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
    Findings
  • Yuki M. Asano, Christian Rupprecht, and Andrea Vedaldi. A critical analysis of self-supervision, or what we can learn from a single image. In International Conference on Learning Representations (ICLR), 2020.
    Google ScholarLocate open access versionFindings
  • Hassan Ashtiani, Shai Ben-David, Nick Harvey, Christopher Liaw, Abbas Mehrabian, and Yaniv Plan. Near-optimal sample complexity bounds for robust learning of gaussian mixtures via compression schemes. Advances in Neural Information Processing Systems (NeurIPS), 2018.
    Google ScholarLocate open access versionFindings
  • Solon Barocas, Moritz Hardt, and Arvind Narayanan. Fairness and Machine Learning. fairmlbook.org, 2019. http://www.fairmlbook.org.
    Locate open access versionFindings
  • Piotr Bojanowski, Armand Joulin, David Lopez-Paz, and Arthur Szlam. Optimizing the latent space of generative networks. In International Conference on Machine Learning (ICML), 2018.
    Google ScholarLocate open access versionFindings
  • Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep clustering for unsupervised learning of visual features. In Proceedings of the European Conference on Computer Vision (ECCV), pages 132–149, 2018.
    Google ScholarLocate open access versionFindings
  • Krzysztof Chalupka, Pietro Perona, and Frederick Eberhardt. Visual causal feature learning. In Uncertainty in Artificial Intelligence (UAI), 2015.
    Google ScholarLocate open access versionFindings
  • Beidi Chen, Weiyang Liu, Zhiding Yu, Jan Kautz, Anshumali Shrivastava, Anshumali Shrivastava, Animesh Garg, and Anima Anandkumar. Angular visual hardness. In International Conference on Machine Learning (ICML), 2020.
    Google ScholarLocate open access versionFindings
  • Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning (ICML), 2020.
    Google ScholarLocate open access versionFindings
  • Vincent Chen, Sen Wu, Alexander J Ratner, Jen Weng, and Christopher Ré. Slice-based learning: A programming model for residual learning in critical data slices. In Advances in Neural Information Processing Systems, pages 9392–9402, 2019.
    Google ScholarLocate open access versionFindings
  • Sasank Chilamkurthy, Rohit Ghosh, Swetha Tanamala, Mustafa Biviji, Norbert G Campeau, Vasantha Kumar Venugopal, Vidur Mahajan, Pooja Rao, and Prashant Warier. Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study. Lancet, 392 (10162):2388–2396, December 2018.
    Google ScholarLocate open access versionFindings
  • Noel Codella, Veronica Rotemberg, Philipp Tschandl, M Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, et al. Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (ISIC). arXiv preprint arXiv:1902.03368, 2019.
    Findings
  • Luc Devroye, Abbas Mehrabian, and Tommy Reddad. The total variation distance between high-dimensional Gaussians. arXiv preprint arXiv:1810.08693, 2018.
    Findings
  • Jian Dong, Qiang Chen, Jiashi Feng, Kui Jia, Zhongyang Huang, and Shuicheng Yan. Looking inside category: subcategory-aware object recognition. IEEE Transactions on Circuits and Systems for Video Technology, 25(8):1322–1334, 2014.
    Google ScholarLocate open access versionFindings
  • Jian Dong, Qiang Chen, Jiashi Feng, Kui Jia, Zhongyang Huang, and Shuicheng Yan. Looking inside category: Subcategory-aware object recognition. Circuits and Systems for Video Technology, IEEE Transactions on, 25:1322–1334, 08 20doi: 10.1109/TCSVT.2014.2355697.
    Locate open access versionFindings
  • John C Duchi, Tatsunori Hashimoto, and Hongseok Namkoong. Distributionally robust losses against mixture covariate shifts. Under review, 2019.
    Google ScholarLocate open access versionFindings
  • Jared A Dunnmon, Darvin Yi, Curtis P Langlotz, Christopher Ré, Daniel L Rubin, and Matthew P Lungren. Assessment of convolutional neural networks for automated classification of chest radiographs. Radiology, 290(2):537–544, February 2019.
    Google ScholarLocate open access versionFindings
  • Wouter Van Gansbeke, Simon Vandenhende, Stamatios Georgoulis, Marc Proesmans, and Luc Van Gool. Learning to classify images without labels. arXiv preprint arXiv:2005.12320, 2020.
    Findings
  • Varun Gulshan, Lily Peng, Marc Coram, Martin C Stumpe, Derek Wu, Arunachalam Narayanaswamy, Subhashini Venugopalan, Kasumi Widner, Tom Madams, Jorge Cuadros, Ramasamy Kim, Rajiv Raman, Philip C Nelson, Jessica L Mega, and Dale R Webster. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA, 316(22):2402–2410, December 2016.
    Google ScholarLocate open access versionFindings
  • Kai Han, Andrea Vedaldi, and Andrew Zisserman. Learning to discover novel visual categories via deep transfer clustering. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 8401–8409, 2019.
    Google ScholarLocate open access versionFindings
  • Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. In Advances in neural information processing systems, pages 3315–3323, 2016.
    Google ScholarLocate open access versionFindings
  • Minh Hoai and Andrew Zisserman. Discriminative sub-categorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1666–1673, 2013.
    Google ScholarLocate open access versionFindings
  • Weihua Hu, Gang Niu, Issei Sato, and Masashi Sugiyama. Does distributionally robust supervised learning give robust classifiers? In International Conference on Machine Learning (ICML), 2018.
    Google ScholarLocate open access versionFindings
  • Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton. Adaptive mixtures of local experts. Neural computation, 3(1):79–87, 1991.
    Google ScholarLocate open access versionFindings
  • Xu Ji, João F Henriques, and Andrea Vedaldi. Invariant information clustering for unsupervised image classification and segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 9865–9874, 2019.
    Google ScholarLocate open access versionFindings
  • Michael Kearns, Aaron Roth, and Saeed Sharifi-Malvajerdi. Average individual fairness: Algorithms, generalization and experiments. In Advances in Neural Information Processing Systems (NeurIPS), 2019.
    Google ScholarLocate open access versionFindings
  • Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, and Neil Houlsby. Big transfer (BiT): General visual representation learning. arXiv preprint arXiv:1912.11370, 2020.
    Findings
  • Yann LeCun and Corinna Cortes. 2010.
    Google ScholarFindings
  • Percy Liang and Tengyu Ma. 229t course notes, 2019. URL http://web.stanford.edu/class/cs229t/.
    Findings
  • Zachary Lipton, Julian McAuley, and Alexandra Chouldechova. Does mitigating ml’s impact disparity require treatment disparity? In Advances in Neural Information Processing Systems, pages 8125–8135, 2018.
    Google ScholarLocate open access versionFindings
  • Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
    Google ScholarLocate open access versionFindings
  • Ryan McConville, Raul Santos-Rodriguez, Robert J Piechocki, and Ian Craddock. N2d:(not too) deep clustering via clustering the local manifold of an autoencoded embedding. arXiv preprint arXiv:1908.05968, 2019.
    Findings
  • Leland McInnes, John Healy, and James Melville. UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
    Findings
  • Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2, pages 1003–1011. Association for Computational Linguistics, 2009.
    Google ScholarLocate open access versionFindings
  • Rafael Muller, Simon Kornblith, and Geoffrey Hinton. Subclass distillation. arXiv preprint arXiv:2002.03936, 2020.
    Findings
  • Luke Oakden-Rayner, Jared Dunnmon, Gustavo Carneiro, and Christopher Ré. Hidden stratification causes clinically meaningful failures in machine learning for medical imaging. In Proceedings of the ACM Conference on Health, Inference, and Learning (CHIL), 2020.
    Google ScholarLocate open access versionFindings
  • Neoklis Polyzotis, Steven Whang, Tim Klas Kraska, and Yeounoh Chung. Slice finder: Automated data slicing for model validation. In Proceedings of the IEEE International Conference on Data Engineering (ICDE), 2019.
    Google ScholarLocate open access versionFindings
  • Alexander Ratner, Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Ré. Snorkel: Rapid training data creation with weak supervision. The VLDB Journal, pages 1–22, 2019.
    Google ScholarLocate open access versionFindings
  • Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do CIFAR-10 classifiers generalize to CIFAR-10? arXiv preprint arXiv:1806.00451, 2018.
    Findings
  • Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do ImageNet classifiers generalize to ImageNet? In International Conference on Machine Learning (ICML), 2019.
    Google ScholarLocate open access versionFindings
  • Laura Rieger, Chandan Singh, W. James Murdoch, and Bin Yu. Interpretations are useful: penalizing explanations to align neural networks with prior knowledge. arXiv preprint arXiv:1909.13584, 2019.
    Findings
  • Peter J. Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20:53–65, 1987.
    Google ScholarLocate open access versionFindings
  • Shiori Sagawa, Pang Wei Koh, Tatsunori B. Hashimoto, and Percy Liang. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. In International Conference on Learning Representations (ICLR), 2020.
    Google ScholarLocate open access versionFindings
  • Bernhard Schölkopf. Causality for machine learning. arXiv preprint arXiv:1911.10500, 2019.
    Findings
  • Alon Scope, Michael A. Marchetti, Ashfaq A. Marghoob, Stephen W. Dusza, Alan C. Geller, Jaya M. Satagopan, Martin A. Weinstock, Marianne Berwick, and Allan C. Halpern. The study of nevi in children: Principles learned and implications for melanoma diagnosis. Journal of the American Academy of Dermatology, 75(4):813 – 823, 2016. ISSN 0190-9622. doi: https://doi.org/10.1016/j.jaad.2016.03.027. URL http://www.sciencedirect.com/science/article/pii/S019096221630010X.
    Locate open access versionFindings
  • Ankita Shukla, Gullal Singh Cheema, and Saket Anand. Semi-supervised clustering with neural networks. arXiv preprint arXiv:1806.01547, 2018.
    Findings
  • Aman Sinha, Hongseok Namkoong, Riccardo Volpi, and John Duchi. Certifying some distributional robustness with principled adversarial training. In International Conference on Learning Representations (ICLR), 2018.
    Google ScholarLocate open access versionFindings
  • Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. In Advances in neural information processing systems, pages 4077–4087, 2017.
    Google ScholarLocate open access versionFindings
  • Matthew Staib and Stefanie Jegelka. Distributionally robust deep learning as a generalization of adversarial training. In NIPS Workshop on Machine Learning and Computer Security, 2017.
    Google ScholarLocate open access versionFindings
  • Haohan Wang, Songwei Ge, Zachary Lipton, and Eric P Xing. Learning robust global representations by penalizing local predictive power. In Advances in Neural Information Processing Systems, pages 10506–10518, 2019.
    Google ScholarLocate open access versionFindings
  • Pengtao Xie, Aarti Singh, and Eric P. Xing. Uncorrelation and evenness: a new diversitypromoting regularizer. In International Conference on Machine Learning (ICML), 2017.
    Google ScholarLocate open access versionFindings
  • Bangpeng Yao, Aditya Khosla, and Li Fei-Fei. Combining randomization and discrimination for fine-grained image categorization. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 1577–1584. IEEE, 2011.
    Google ScholarLocate open access versionFindings
作者
Nimit S Sohoni
Nimit S Sohoni
Geoffrey Angus
Geoffrey Angus
您的评分 :
0

 

标签
评论
小科