Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias

CVPR, pp. 11067-11075, 2020.

Cited by: 0|Bibtex|Views87|DOI:https://doi.org/10.1109/CVPR42600.2020.01108
EI
Other Links: arxiv.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
We proposed two simple yet effective methods to decorrelate feature representations of a biased category from its context

Abstract:

Existing models often leverage co-occurrences between objects and their context to improve recognition accuracy. However, strongly relying on context risks a model's generalizability, especially when typical co-occurrence patterns are absent. This work focuses on addressing such contextual biases to improve the robustness of the learnt ...More

Code:

Data:

0
Introduction
  • Visual context serves as a valuable auxiliary cue for the human visual system for scene interpretation and object recognition [4].
  • Context becomes especially crucial for the visual system when the visual signal is ambiguous or incomplete.
  • Past research explicitly models context and shows benefits on standard visual tasks such as classification [31] and detection [13, 3].
  • Convolution networks by design implicitly capture context.
  • As highlighted in [33, 32], despite the best efforts of its creators, most
Highlights
  • Visual context serves as a valuable auxiliary cue for the human visual system for scene interpretation and object recognition [4]
  • With an aim to teach the network to “learn from the right thing,” we propose a method that minimizes the overlap between the class activation maps (CAM) of the cooccurring categories (Sec. 4.1). Building on the insights from the class activation maps-based method, we propose a second method that learns feature representations that decorrelate context from category (Sec. 4.2). We apply both methods on two tasks: object and attribute classification, and 4 datasets, and achieve significant boosts over strong baselines for the hard cases where a category occurs away from its typical context (Sec. 5)
  • Building on the observations from this CAMbased approach, we propose a second method which learns a feature space by encouraging context sharing when a biased category co-occurs with context while suppressing context when it occurs in isolation (Sec. 4.2)
  • Though class activation maps are typically used as a visualization technique, in this work, we use them to reduce contextual bias as we describe
  • We demonstrated the problem of contextual bias in popular object and attribute datasets by showing that standard classifiers perform poorly when biased categories occur away from their typical context
  • We proposed two simple yet effective methods to decorrelate feature representations of a biased category from its context
Methods
  • Methods standard ours

    CAM ours-feature-split mAP very small. Analysing Wo and Ws: Recall that in Sec. 4.2, oursfeature-split is formulated with a goal to prominently capture biased category-specific features through Wo and context through Ws.
  • From Table 3, the authors observe that both oursCAM and ours-feature-split outperform standard by a large margins
  • This clearly demonstrates that both the methods learn from the right category and overcome contextual bias.
  • The authors show the co-occurrence bias value for each class computed according to Eq 1 in the main paper
  • From these results, the authors may observe that when a category occurs out of its context oursfeature-split gives better performance compared to standard classifier while maintaining the performance when a category co-occurs with context.
  • The authors may observe that when a category occurs out of its context oursfeature-split gives better performance compared to standard classifier while maintaining the performance when a category co-occurs with context. ours-CAM performs better than standard when a category occurs away from its context, but struggles when categories co-occur
Results
  • In Table 2, the authors report performance on COCO-Stuff for the 20 most biased categories.
  • The authors observe that the standard classifier has much better performance for cooccurring compared to exclusive test splits.
  • This clearly demonstrates the inherent contextual bias present in COCOStuff, as standard classifier struggles when biased cate-
Conclusion
  • The authors demonstrated the problem of contextual bias in popular object and attribute datasets by showing that standard classifiers perform poorly when biased categories occur away from their typical context
  • To tackle this issue, the authors proposed two simple yet effective methods to decorrelate feature representations of a biased category from its context.
  • The authors proposed two simple yet effective methods to decorrelate feature representations of a biased category from its context
  • Both methods perform better at recognizing biased classes occurring away from their co-occurring context while maintaining the overall performance.
  • Extending proposed methods to tasks like object detection and video action recognition is a worthy future direction
Summary
  • Introduction:

    Visual context serves as a valuable auxiliary cue for the human visual system for scene interpretation and object recognition [4].
  • Context becomes especially crucial for the visual system when the visual signal is ambiguous or incomplete.
  • Past research explicitly models context and shows benefits on standard visual tasks such as classification [31] and detection [13, 3].
  • Convolution networks by design implicitly capture context.
  • As highlighted in [33, 32], despite the best efforts of its creators, most
  • Methods:

    Methods standard ours

    CAM ours-feature-split mAP very small. Analysing Wo and Ws: Recall that in Sec. 4.2, oursfeature-split is formulated with a goal to prominently capture biased category-specific features through Wo and context through Ws.
  • From Table 3, the authors observe that both oursCAM and ours-feature-split outperform standard by a large margins
  • This clearly demonstrates that both the methods learn from the right category and overcome contextual bias.
  • The authors show the co-occurrence bias value for each class computed according to Eq 1 in the main paper
  • From these results, the authors may observe that when a category occurs out of its context oursfeature-split gives better performance compared to standard classifier while maintaining the performance when a category co-occurs with context.
  • The authors may observe that when a category occurs out of its context oursfeature-split gives better performance compared to standard classifier while maintaining the performance when a category co-occurs with context. ours-CAM performs better than standard when a category occurs away from its context, but struggles when categories co-occur
  • Results:

    In Table 2, the authors report performance on COCO-Stuff for the 20 most biased categories.
  • The authors observe that the standard classifier has much better performance for cooccurring compared to exclusive test splits.
  • This clearly demonstrates the inherent contextual bias present in COCOStuff, as standard classifier struggles when biased cate-
  • Conclusion:

    The authors demonstrated the problem of contextual bias in popular object and attribute datasets by showing that standard classifiers perform poorly when biased categories occur away from their typical context
  • To tackle this issue, the authors proposed two simple yet effective methods to decorrelate feature representations of a biased category from its context.
  • The authors proposed two simple yet effective methods to decorrelate feature representations of a biased category from its context
  • Both methods perform better at recognizing biased classes occurring away from their co-occurring context while maintaining the overall performance.
  • Extending proposed methods to tasks like object detection and video action recognition is a worthy future direction
Tables
  • Table1: Properties of evaluation datasets. For COCO-Stuff, we use object training and validation data from COCO-2014 split [<a class="ref-link" id="c22" href="#r22">22</a>]
  • Table2: Performance on COCO-Stuff for the 20 most biased categories. Both our methods perform very well on all baselines except weighted loss and remove co-occur images on the exclusive test split, while successfully maintaining performance on the co-occurring test split
  • Table3: Cross-dataset experiment where models trained on COCOStuff are applied without fine-tuning on UnRel. ours-feature-split yields huge boost over standard highlighting its generalizability on unseen data
  • Table4: Attribute Classification Performance: on DeepFashion and Animals with Attributes computed on the 20 most biased attributes. oursfeature-split offers boosts over all approaches for the exclusive test split, without hurting performance on the co-occurring split
  • Table5: Performance on COCO-Stuff for the 20 most biased categories. ours-CAM and ours-feature-split outperform split biased with significant margin on both exclusive and co-occurring images
  • Table6: mAP of the non-biased object classes and entire object+ stuff classes. Our approach loses only negligible mAP compared to standard classifier in these cases
  • Table7: Cosine similarity between classifier weights of the biased class pairs (b,c). Our approach reduces the similarity between them indicating the biased class b is less dependent on c for prediction
  • Table8: Top-3 recall on DeepFashion for the 20 most biased attributes. ours-feature-split yields a significant boost over all approaches for the exclusive test split, without hurting performance on the co-occurring split. ours-CAM is not extensible to attributes hence not reported here. The above baseline methods are described in our main paper
  • Table9: Performance on Animals with Attributes for the 20 most biased attributes. Our proposed method ours-feature-split outperforms other methods. ours-CAM is not extensible to attributes hence not reported here
  • Table10: COCO-Stuff dataset. Per class mAP and bias for 20 most biased classes. ours-feature-split outperforms standard on the exclusive set while maintaining the performance on the co-occurring cases
  • Table11: DeepFashion dataset. Per class top-3 recall and bias for 20 most biased classes. ours-feature-split outperforms standard on the exclusive set while maintaining the performance on the co-occurring cases
  • Table12: Animals with Attributes dataset. Per class mAP and bias for 20 most biased classes. ours-feature-split outperforms standard on the exclusive set while maintaining the performance on the co-occurring cases
Download tables as Excel
Related work
  • Addressing biases: Prior work [33, 19, 34, 32] has shown that existing datasets suffer from bias and are not perfectly representative of the real world. Hence, a model trained on such data will have difficulty generalizing to non-biased cases. Attempts to reduce dataset bias include domain adaptation techniques [9] and data re-sampling [7, 21], e.g., so that minority class instances are better represented. One limitation of data re-sampling is that it can involve reducing the dataset, leading to sub-optimal models. Recent adversarial learning approaches [2, 20] try to mitigate bias from the learned feature representations while optimizing performance for the task at hand (e.g., removing gender bias while classifying age). However, these methods would not be directly applicable for mitigating contextual bias, as context (the bias factor) can still be useful for recognition—so it cannot be simply removed. Others study various forms of bias in the context of image captioning (e.g., gender bias) [16], image classification (e.g., ethnicity bias) [29], and object recognition (e.g., socio-economic bias) [11]. Overall, contextual bias in visual recognition remains relatively under explored. Co-occurring-bias: Contextual bias is a well-studied problem in the field of natural language processing [25, 30], however, it is much less studied in the computer vision community. In vision, most efforts consider context as a useful cue [13, 3]. A few efforts have shown that a recognition model will fail to recognize an object without its cooccurring context, but do not propose a solution [8, 26].
Funding
  • This work was supported in part by NSF CAREER IIS-1751206
Reference
  • Amit Alfassy, Leonid Karlinsky, Amit Aides, Joseph Shtok, Sivan Harary, Rogerio Feris, Raja Giryes, and Alex M Bronstein. Laso: Label-set operations networks for multi-label few-shot learning. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Mohsan Alvi, Andrew Zisserman, and Christoffer Nellaker. Turning a blind eye: Explicit removal of biases and variation from deep neural network embeddings. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Ehud Barnea and Ohad Ben-Shahar. Exploring the bounds of the utility of context for object detection. CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Irving Biederman, Robert J. Mezzanotte, and Jan C. Rabinowitz. Scene perception: Detecting and judging objects undergoing relational violations. Cognitive psychology, 1982.
    Google ScholarLocate open access versionFindings
  • Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In NIPS, 2016.
    Google ScholarLocate open access versionFindings
  • Holger Caesar, Jasper Uijlings, and Vittorio Ferrari. Cocostuff: Thing and stuff classes in context. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Nitesh Chawla, Kevin Bowyer, Lawrence Hall, and Philip Kegelmeyer. Smote: synthetic minority oversampling technique. JAIR, 2002.
    Google ScholarLocate open access versionFindings
  • Myung Jin Choi, Antonio Torralba, and Alan S Willsky. Context models and out-of-context objects. Pattern Recognition Letters, 2012.
    Google ScholarLocate open access versionFindings
  • Gabriela Csurka. Domain adaptation for visual applications: A comprehensive survey. arXiv preprint arXiv:1702.05374, 2017.
    Findings
  • Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. Class-balanced loss based on effective number of samples. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Terrance de Vries, Ishan Misra, Changhan Wang, and Laurens van der Maaten. Does object recognition work for everyone? In CVPRW, 2019.
    Google ScholarLocate open access versionFindings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR, 2009.
    Google ScholarLocate open access versionFindings
  • Santosh K Divvala, Derek Hoiem, James H Hays, Alexei A Efros, and Martial Hebert. An empirical study of context in object detection. In CVPR, 2009.
    Google ScholarLocate open access versionFindings
  • Charles Elkan. The foundations of cost-sensitive learning.
    Google ScholarFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Lisa Anne Hendricks, Kaylee Burns, Kate Saenko, Trevor Darrell, and Anna Rohrbach. Women also snowboard: Overcoming bias in captioning models. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Dinesh Jayaraman, Fei Sha, and Kristen Grauman. Decorrelating semantic visual attributes by resisting the urge to share. In CVPR, 2014.
    Google ScholarLocate open access versionFindings
  • Leonid Karlinsky, Joseph Shtok, Sivan Harary, Eli Schwartz, Amit Aides, Rogerio Feris, Raja Giryes, and Alex M Bronstein. Repmet: Representative-based metric learning for classification and few-shot object detection. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Aditya Khosla, Tinghui Zhou, Tomasz Malisiewicz, Alexei A Efros, and Antonio Torralba. Undoing the damage of dataset bias. In ECCV, 2012.
    Google ScholarLocate open access versionFindings
  • Byungju Kim, Hyunwoo Kim, Kyungsu Kim, Sungjin Kim, and Junmo Kim. Learning not to learn: Training deep neural networks with biased data. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Yi Li and Nuno Vasconcelos. Repair: Removing representation bias by dataset resampling. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014.
    Google ScholarLocate open access versionFindings
  • Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Julia Peyre, Ivan Laptev, Cordelia Schmid, and Josef Sivic. Weakly-supervised learning of visual relations. In ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Marta Recasens, Cristian Danescu-Niculescu-Mizil, and Dan Jurafsky. Linguistic models for analyzing and detecting biased language. In ACL, 2013.
    Google ScholarLocate open access versionFindings
  • Amir Rosenfeld, Richard Zemel, and John K Tsotsos. The elephant in the room. arXiv preprint arXiv:1808.03305, 2018.
    Findings
  • Mohammad Amin Sadeghi and Ali Farhadi. Recognition using visual phrases. In CVPR, 2011.
    Google ScholarLocate open access versionFindings
  • Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. In NIPS, 2017.
    Google ScholarLocate open access versionFindings
  • Pierre Stock and Moustapha Cisse. Convnets and imagenet beyond accuracy: Understanding mistakes and uncovering biases. In ECCV, 2018.
    Google ScholarLocate open access versionFindings
  • Tony Sun, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, KaiWei Chang, and William Yang Wang. Mitigating gender bias in natural language processing: Literature review. arXiv preprint arXiv:1906.08976, 2019.
    Findings
  • Kevin Tang, Manohar Paluri, Li Fei-Fei, Rob Fergus, and Lubomir Bourdev. Improving image classification with location context. In CVPR, 2015.
    Google ScholarLocate open access versionFindings
  • Tatiana Tommasi, Novi Patricia, Barbara Caputo, and Tinne Tuytelaars. A deeper look at dataset bias. In Domain adaptation in computer vision applications. 2017.
    Google ScholarFindings
  • Antonio Torralba and Alexei A Efros. Unbiased look at dataset bias. In CVPR. 2011.
    Google ScholarLocate open access versionFindings
  • Emiel van Miltenburg. Stereotyping and bias in the flickr30k dataset. arXiv preprint arXiv:1605.06083, 2016.
    Findings
  • Yang Wang and Minh Hoai. Pulling actions out of context: Explicit separation for effective combination. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Yongqin Xian, Christoph H Lampert, Bernt Schiele, and Zeynep Akata. Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. TPAMI, 2018.
    Google ScholarLocate open access versionFindings
  • Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. Men also like shopping: Reducing gender bias amplification using corpus-level constraints. arXiv preprint arXiv:1707.09457, 2017.
    Findings
  • Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Learning Deep Features for Discriminative Localization. CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. Deformable convnets v2: More deformable, better results. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • 7. Additional implementation Details
    Google ScholarFindings
Full Text
Your rating :
0

 

Tags
Comments