Interventional Few-Shot Learning

NIPS 2020, 2020.

Cited by: 0|Bibtex|Views90
EI
Other Links: arxiv.org|dblp.uni-trier.de
Weibo:
We presented a novel casual framework: Interventional Few-Shot Learning, to address an overlooked deficiency in recent FSL methods: the pre-training is a confounder hurting the performance

Abstract:

We uncover an ever-overlooked deficiency in the prevailing Few-Shot Learning (FSL) methods: the pre-trained knowledge is indeed a confounder that limits the performance. This finding is rooted from our causal assumption: a Structural Causal Model (SCM) for the causalities among the pre-trained knowledge, sample features, and labels. Tha...More
0
Introduction
  • Few-Shot Learning (FSL) — the task of training a model using very few samples — is nothing short of a panacea for any scenario that requires fast model adaptation to new tasks [69], such as minimizing the need for expensive trials in reinforcement learning [31] and saving computation resource for light-weight neural networks [28, 26].
  • More than a decade ago, the crux of FSL is to imitate the human ability of transferring prior knowledge to new tasks [19], not until the recent advances in pre-training techniques, had the authors yet reached a consensus on “what & how to transfer”: a powerful neural network Ω pre-trained on a large dataset D.
  • In the context of pre-trained knowledge, the authors denote the original FSL training set as support set S and the test set as query set Q, where the classes in (S, Q) are unseen in.
  • The authors' goal is to a post-pre-training and pre-fine-tuning strategy: remove the deficiency introduced by Pre-Training
Highlights
  • Few-Shot Learning (FSL) — the task of training a model using very few samples — is nothing short of a panacea for any scenario that requires fast model adaptation to new tasks [69], such as minimizing the need for expensive trials in reinforcement learning [31] and saving computation resource for light-weight neural networks [28, 26]
  • We further improve the accuracy of various FSL methods, and explain the reason behind the improvements
  • Our work aims to deal with the pre-training confounder in FSL based on causal inference [48]
  • Conventional Acc. 1) From Table 1, we observe that Interventional Few-Shot Learning (IFSL) consistently improves fine-tuning and meta-learning in all settings, which suggests that IFSL is agnostic to methods, datasets, and backbones
  • We presented a novel casual framework: Interventional Few-Shot Learning (IFSL), to address an overlooked deficiency in recent FSL methods: the pre-training is a confounder hurting the performance
  • We proposed a structural causal model of the causalities in the process of FSL and developed three practical implementations based on the backdoor adjustment
Methods
  • Cosine k-NN MAML [20] MTL [58].
  • MN [66] SIB [29] SIB [29] ft cl ft+cl ft cl ft+cl ft cl ft+cl ft cl ft+cl ft cl ft+cl ft cl ft+cl ft cl ft+cl ft cl ft+cl ft cl ft+cl ne line ss ne line ss.
  • ResNet-10 Baseline ResNeWt-1R0N-B2a8s-e1l0inBeaseline WRN-R2e8s-1N0etB-a1s0eIlFinSeL ResNeWt-1R0N-IF2S8L-10 IFSL WRN-28-10 IFSL
Results
  • Results and Analysis

    Conventional Acc. 1) From Table 1, the authors observe that IFSL consistently improves fine-tuning and meta-learning in all settings, which suggests that IFSL is agnostic to methods, datasets, and backbones. 2) In particular, the improvements are typically larger on 1-shot than 5-shot.
  • Conventional Acc. 1) From Table 1, the authors observe that IFSL consistently improves fine-tuning and meta-learning in all settings, which suggests that IFSL is agnostic to methods, datasets, and backbones.
  • 2) In particular, the improvements are typically larger on 1-shot than 5-shot.
  • Backbone miniImageNet tieredImageNet 5-shot 1-shot 5-shot 1-shot.
  • Baseline++ [13] IdeMe-Net† [14].
  • Baseline [18] wDAE-GNN [21] SIB† [29].
  • SIB+IFSL WRN-28-10 83.21 73.51 88.69 83.07 †Using the pre-trained backbone.
Conclusion
  • The authors presented a novel casual framework: Interventional Few-Shot Learning (IFSL), to address an overlooked deficiency in recent FSL methods: the pre-training is a confounder hurting the performance.
  • To better illustrate the deficiency, the authors diagnosed the classification accuracy comprehensively across query hardness, and showed that IFSL improves all the baselines across all the hardness.
  • To upgrade IFSL, the authors will seek other observational intervention algorithms for better performance, and devise counterfactual reasoning for more general few-shot settings such as domain transfer
Summary
  • Introduction:

    Few-Shot Learning (FSL) — the task of training a model using very few samples — is nothing short of a panacea for any scenario that requires fast model adaptation to new tasks [69], such as minimizing the need for expensive trials in reinforcement learning [31] and saving computation resource for light-weight neural networks [28, 26].
  • More than a decade ago, the crux of FSL is to imitate the human ability of transferring prior knowledge to new tasks [19], not until the recent advances in pre-training techniques, had the authors yet reached a consensus on “what & how to transfer”: a powerful neural network Ω pre-trained on a large dataset D.
  • In the context of pre-trained knowledge, the authors denote the original FSL training set as support set S and the test set as query set Q, where the classes in (S, Q) are unseen in.
  • The authors' goal is to a post-pre-training and pre-fine-tuning strategy: remove the deficiency introduced by Pre-Training
  • Objectives:

    The authors' goal is to a post-pre-training and pre-fine-tuning strategy: remove the deficiency introduced by Pre-Training.
  • The authors will first consider a simplified case of Figure 4(a) where each causal link represents a linear relationship and the authors aim to find the true causal effect from X → Y through linear regression
  • Methods:

    Cosine k-NN MAML [20] MTL [58].
  • MN [66] SIB [29] SIB [29] ft cl ft+cl ft cl ft+cl ft cl ft+cl ft cl ft+cl ft cl ft+cl ft cl ft+cl ft cl ft+cl ft cl ft+cl ft cl ft+cl ne line ss ne line ss.
  • ResNet-10 Baseline ResNeWt-1R0N-B2a8s-e1l0inBeaseline WRN-R2e8s-1N0etB-a1s0eIlFinSeL ResNeWt-1R0N-IF2S8L-10 IFSL WRN-28-10 IFSL
  • Results:

    Results and Analysis

    Conventional Acc. 1) From Table 1, the authors observe that IFSL consistently improves fine-tuning and meta-learning in all settings, which suggests that IFSL is agnostic to methods, datasets, and backbones. 2) In particular, the improvements are typically larger on 1-shot than 5-shot.
  • Conventional Acc. 1) From Table 1, the authors observe that IFSL consistently improves fine-tuning and meta-learning in all settings, which suggests that IFSL is agnostic to methods, datasets, and backbones.
  • 2) In particular, the improvements are typically larger on 1-shot than 5-shot.
  • Backbone miniImageNet tieredImageNet 5-shot 1-shot 5-shot 1-shot.
  • Baseline++ [13] IdeMe-Net† [14].
  • Baseline [18] wDAE-GNN [21] SIB† [29].
  • SIB+IFSL WRN-28-10 83.21 73.51 88.69 83.07 †Using the pre-trained backbone.
  • Conclusion:

    The authors presented a novel casual framework: Interventional Few-Shot Learning (IFSL), to address an overlooked deficiency in recent FSL methods: the pre-training is a confounder hurting the performance.
  • To better illustrate the deficiency, the authors diagnosed the classification accuracy comprehensively across query hardness, and showed that IFSL improves all the baselines across all the hardness.
  • To upgrade IFSL, the authors will seek other observational intervention algorithms for better performance, and devise counterfactual reasoning for more general few-shot settings such as domain transfer
Tables
  • Table1: Acc (%) averaged over 2000 5-way FSL tasks before and after applying IFSL. We obtained the results by using official code and our
  • Table2: Comparison with state-of-the-arts of 5-way 1-/5shot Acc (%) on miniImageNet and tieredImageNet
  • Table3: Results of cross-domain evaluation: miniImageNet → CUB. The whole report is in Appendix 6
  • Table4: Supplementary to Table 1. Acc (%) and 95% confidence intervals averaged over 2000 5way FSL tasks before and after applying three proposed implementations of adjustment. Specifically, “ft” refers to feature-wise adjustment, “cl” refers to class-wise adjustment and “ft+cl” refers to combined adjustment
  • Table5: Supplementary to Figure 6. CAM-Acc (%) on fine-tuning and meta-learning. We used combined adjustment for IFSL
  • Table6: Supplementary to Table 3. Acc (%) and 95% confidence interval averaged over 2000 5-way FSL tasks on cross-domain evaluation. Specifically, “ft” refers to feature-wise adjustment, “cl” refers to class-wise adjustment and “ft+cl” refers to combined adjustment
Download tables as Excel
Related work
  • Few-Shot Learning. FSL has a wide spectrum of methods, including fine-tuning [13, 18], optimizing model initialization [20, 42], generating model parameters [54, 36], learning a feature space for a better separation of sample categories [66, 77], feature transfer [58, 43], and transductive learning that additionally uses query set data [18, 29, 27]. Thanks to them, the classification accuracy has been drastically increased [29, 77, 73, 37]. However, accuracy as a single number cannot explain the paradoxical phenomenon in Figure 2. Our work offers an answer from a causal standpoint by showing that pre-training is a confounder. We not only further improve the accuracy of various FSL methods, but also explain the reason behind the improvements. In fact, the perspective offered by our work can benefit all the tasks that involve pre-training—any downstream task can be seen as FSL compared to the large-scale pre-training data.
Funding
  • We not only further improve the accuracy of various FSL methods, but also explain the reason behind the improvements
  • 5) According to Table 1 and Table 2, it is clear that our k-NN+IFSL outperforms IdeMe-Net [14] using the same pre-trained ResNet-10. This shows that using data augmentation — a method of physical data intervention as in IdeMe-Net [14] is inferior to our causal intervention in IFSL. 6) Overall, our IFSL achieves the new state-of-the-art on both datasets
Study subjects and analysis
samples: 1300
We conducted experiments on benchmark datasets in FSL literature: 1) miniImageNet [66] containing 600 images per class over 100 classes. We followed the split proposed in [51]: 64/16/20 classes for train/val/test. 2) tieredImageNet [52] is much larger compared to miniImageNet with 608 classes and each class around 1,300 samples. These classes were grouped into 34 higher-level concepts and then partitioned into 20/6/8 disjoint sets for train/val/test to achieve larger domain difference between training and testing

samples: 60
3) Caltech-UCSD Birds-200-2011 (CUB) [70] for crossdomain evaluation. It contains 200 classes and each class has around 60 samples. The models used for CUB test were trained on the miniImageNet

Reference
  • Joshua D Angrist and Alan B Krueger. Instrumental variables and the search for identification: From supply and demand to natural experiments. Journal of Economic Perspectives, 2004
    Google ScholarLocate open access versionFindings
  • Antreas Antoniou, Amos Storkey, and Harrison Edwards. Data augmentation generative adversarial networks. In Proceedings of the International Conference on Learning Representations Workshops, 2018. 6
    Google ScholarLocate open access versionFindings
  • Sanjeev Arora, Nadav Cohen, Wei Hu, and Yuping Luo. Implicit regularization in deep matrix factorization. In Advances in Neural Information Processing Systems, 2019. 3
    Google ScholarLocate open access versionFindings
  • Hossein Azizpour, Ali Sharif Razavian, Josephine Sullivan, Atsuto Maki, and Stefan Carlsson. Factors of transferability for a generic convnet representation. IEEE transactions on Pattern Analysis and Machine Intelligence, 2015. 6
    Google ScholarLocate open access versionFindings
  • Pierre Baldi and Peter Sadowski. The dropout learning algorithm. Artificial intelligence, 2014. 16
    Google ScholarFindings
  • Alexander Balke and Judea Pearl. Bounds on treatment effects from studies with imperfect compliance. Journal of the American Statistical Association, 1997. 15
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio. Deep learning of representations for unsupervised and transfer learning. In Proceedings of ICML Workshop on Unsupervised and Transfer Learning, 2012. 4
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013. 3
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio, Tristan Deleu, Nasim Rahaman, Rosemary Ke, Sébastien Lachapelle, Olexa Bilaniuk, Anirudh Goyal, and Christopher Pal. A meta-transfer objective for learning to disentangle causal mechanisms. In International Conference on Learning Representations, 2016
    Google ScholarLocate open access versionFindings
  • Michel Besserve, Rémy Sun, and Bernhard Schölkopf. Counterfactuals uncover the modular structure of deep generative models. arXiv preprint arXiv:1812.03253, 2018. 3
    Findings
  • Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? Journal of the ACM (JACM), 203
    Google ScholarLocate open access versionFindings
  • Krzysztof Chalupka, Pietro Perona, and Frederick Eberhardt. Visual causal feature learning. In Uncertainty in Artificial Intelligence, 2015. 6
    Google ScholarLocate open access versionFindings
  • Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, and Jia-Bin Huang. A closer look at few-shot classification. In International Conference on Learning Representations, 2019. 1, 6, 8, 18, 19
    Google ScholarLocate open access versionFindings
  • Zitian Chen, Yanwei Fu, Yu-Xiong Wang, Lin Ma, Wei Liu, and Martial Hebert. Image deformation meta-networks for one-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 6, 8
    Google ScholarLocate open access versionFindings
  • Pim de Haan, Dinesh Jayaraman, and Sergey Levine. Causal confusion in imitation learning. In Advances in Neural Information Processing Systems, 2019. 6
    Google ScholarLocate open access versionFindings
  • F.M. Dekking, C. Kraaikamp, H.P. Lopuhaä, and L.E. Meester. A Modern Introduction to Probability and Statistics: Understanding Why and How. Springer Texts in Statistics. Springer, 2005. 4
    Google ScholarFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019. 1, 5
    Google ScholarLocate open access versionFindings
  • Guneet S Dhillon, Pratik Chaudhari, Avinash Ravichandran, and Stefano Soatto. A baseline for few-shot image classification. In International Conference on Learning Representations, 2020. 1, 6, 7, 8
    Google ScholarLocate open access versionFindings
  • Li Fei-Fei, Rob Fergus, and Pietro Perona. One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006. 1
    Google ScholarLocate open access versionFindings
  • Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning, 2017. 2, 3, 6, 7, 8, 17, 18, 20, 21, 22, 23
    Google ScholarLocate open access versionFindings
  • Spyros Gidaris and Nikos Komodakis. Generating classification weights with gnn denoising autoencoders for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2, 3, 8
    Google ScholarLocate open access versionFindings
  • Ross Girshick, Ilija Radosavovic, Georgia Gkioxari, Piotr Dollár, and Kaiming He. Detectron. https://github.com/facebookresearch/detectron, 2018.5
    Findings
  • Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. The MIT Press, 2016. 4
    Google ScholarFindings
  • Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, 2017. 1
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 1, 5, 7, 17, 18
    Google ScholarLocate open access versionFindings
  • Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. In Advances in Neural Information Processing Systems Deep Learning Workshop, 2014. 1, 5
    Google ScholarLocate open access versionFindings
  • Ruibing Hou, Hong Chang, MA Bingpeng, Shiguang Shan, and Xilin Chen. Cross attention network for few-shot classification. In Advances in Neural Information Processing Systems, 2019. 6
    Google ScholarLocate open access versionFindings
  • Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017. 1
    Findings
  • Shell Xu Hu, Pablo Moreno, Yang Xiao, Xi Shen, Guillaume Obozinski, Neil Lawrence, and Andreas Damianou. Empirical bayes transductive meta-learning with synthetic gradients. In International Conference on Learning Representations, 2020. 2, 6, 7, 8, 19, 20, 21, 22, 23
    Google ScholarLocate open access versionFindings
  • Minyoung Huh, Pulkit Agrawal, and Alexei A Efros. What makes imagenet good for transfer learning? arXiv preprint arXiv:1608.08614, 2016. 6
    Findings
  • Muhammad Abdullah Jamal and Guo-Jun Qi. Task agnostic meta-learning for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 1
    Google ScholarLocate open access versionFindings
  • Qi Jiaxin, Niu Yulei, Huang Jianqiang, and Zhang Hanwang. Two causal principles for improving visual dialog. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020. 6
    Google ScholarLocate open access versionFindings
  • Simon Kornblith, Jonathon Shlens, and Quoc V Le. Do better imagenet models transfer better? In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2019. 6
    Google ScholarLocate open access versionFindings
  • Aoxue Li, Weiran Huang, Xu Lan, Jiashi Feng, Zhenguo Li, and Liwei Wang. Boosting few-shot learning with adaptive margin loss. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020. 8
    Google ScholarLocate open access versionFindings
  • Hongyang Li, David Eigen, Samuel Dodge, Matthew Zeiler, and Xiaogang Wang. Finding task-relevant features for few-shot learning by category traversal. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 8
    Google ScholarLocate open access versionFindings
  • Huaiyu Li, Weiming Dong, Xing Mei, Chongyang Ma, Feiyue Huang, and Bao-Gang Hu. Lgm-net: Learning to generate matching networks for few-shot learning. In International Conference on Machine Learning, 2019. 6
    Google ScholarLocate open access versionFindings
  • Yaoyao Liu, Bernt Schiele, and Qianru Sun. An ensemble of epoch-wise empirical bayes for few-shot learning. In European Conference on Computer Vision, 2020. 6
    Google ScholarLocate open access versionFindings
  • David Lopez-Paz, Robert Nishihara, Soumith Chintala, Bernhard Scholkopf, and Léon Bottou. Discovering causal signals in images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 6
    Google ScholarLocate open access versionFindings
  • Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 2008. 3
    Google ScholarLocate open access versionFindings
  • Sara Magliacane, Thijs van Ommen, Tom Claassen, Stephan Bongers, Philip Versteeg, and Joris M Mooij. Domain adaptation by using causal inference to predict invariant conditional distributions. In Advances in Neural Information Processing Systems, 2018. 6
    Google ScholarLocate open access versionFindings
  • Gong Mingming, Zhang Kun, Liu Tongliang, Tao Dacheng, Clark Glymour, and Bernhard Schölkopf. Domain adaptation with conditional transferable components. In International Conference on Machine Learning, 2016. 6
    Google ScholarLocate open access versionFindings
  • Alex Nichol and John Schulman. Reptile: a scalable metalearning algorithm. arXiv preprint arXiv:1803.02999, 2018. 6
    Findings
  • Boris Oreshkin, Pau Rodríguez López, and Alexandre Lacoste. Tadam: Task dependent adaptive metric for improved few-shot learning. In Advances in Neural Information Processing Systems, 2018. 6
    Google ScholarLocate open access versionFindings
  • Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2009. 6
    Google ScholarLocate open access versionFindings
  • Giambattista Parascandolo, Niki Kilbertus, Mateo Rojas-Carulla, and Bernhard Schölkopf. Learning independent causal mechanisms. In International Conference on Machine Learning, 2018. 6
    Google ScholarLocate open access versionFindings
  • J. Pearl, M. Glymour, and N.P. Jewell. Causal Inference in Statistics: A Primer. Wiley, 2016. 2, 3, 4, 14
    Google ScholarFindings
  • Judea Pearl. Causal diagrams for empirical research. Biometrika, 1995. 15
    Google ScholarLocate open access versionFindings
  • Judea Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, 2nd edition, 2009. 4, 6, 14
    Google ScholarFindings
  • Siyuan Qiao, Chenxi Liu, Wei Shen, and Alan L Yuille. Few-shot image recognition by predicting parameters from activations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 2
    Google ScholarLocate open access versionFindings
  • Tiago Ramalho, Thierry Sousbie, and Stefano Peluchetti. An empirical study of pretrained representations for few-shot classification. arXiv preprint arXiv:1910.01319, 2019. 6
    Findings
  • Sachin Ravi and Hugo Larochelle. Optimization as a model for few-shot learning. In International Conference on Learning Representations, 2017. 6
    Google ScholarLocate open access versionFindings
  • Mengye Ren, Eleni Triantafillou, Sachin Ravi, Jake Snell, Kevin Swersky, Joshua B. Tenenbaum, Hugo Larochelle, and Richard S. Zemel. Meta-learning for semi-supervised few-shot classification. In International Conference on Learning Representations, 2018. 2, 6
    Google ScholarLocate open access versionFindings
  • Sam T Roweis and Lawrence K Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 2000. 3
    Google ScholarLocate open access versionFindings
  • Andrei A. Rusu, Dushyant Rao, Jakub Sygnowski, Oriol Vinyals, Razvan Pascanu, Simon Osindero, and Raia Hadsell. Meta-learning with latent embedding optimization. In International Conference on Learning Representations, 2019. 6, 7, 17, 19, 20, 21, 22, 23
    Google ScholarLocate open access versionFindings
  • Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap. Meta-learning with memory-augmented neural networks. In International Conference on Machine Learning, 2016. 1
    Google ScholarLocate open access versionFindings
  • Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, 2017. 7, 8
    Google ScholarLocate open access versionFindings
  • Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems, 2017. 16, 18
    Google ScholarLocate open access versionFindings
  • Qianru Sun, Yaoyao Liu, Tat-Seng Chua, and Bernt Schiele. Meta-transfer learning for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 6, 7, 18, 20, 21, 22, 23
    Google ScholarLocate open access versionFindings
  • Raphael Suter, Ðorde Miladinovic, Bernhard Schölkopf, and Stefan Bauer. Robustly disentangled causal mechanisms: Validating deep representations for interventional robustness. In International Conference on Machine Learning, 2019. 6
    Google ScholarLocate open access versionFindings
  • Kaihua Tang, Jianqiang Huang, and Hanwang Zhang. Long-tailed classification by keeping the good and removing the bad momentum causal effect. In Advances in Neural Information Processing Systems, 2020. 6
    Google ScholarLocate open access versionFindings
  • Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, and Hanwang Zhang. Unbiased scene graph generation from biased training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020. 6
    Google ScholarLocate open access versionFindings
  • Marc Tanti, Albert Gatt, and Kenneth P Camilleri. Transfer learning from language models to image caption generators: Better models may not transfer better. arXiv preprint arXiv:1901.01216, 2019. 6
    Findings
  • Joshua B Tenenbaum, Vin De Silva, and John C Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 2000. 3
    Google ScholarLocate open access versionFindings
  • Matthew Turk and Alex Pentland. Face recognition using eigenfaces. In Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1991. 3
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, 2017. 1, 5
    Google ScholarLocate open access versionFindings
  • Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning. In Advances in Neural Information Processing Systems, 2016. 2, 3, 6, 7, 19, 20, 21, 22, 23
    Google ScholarLocate open access versionFindings
  • Tan Wang, Jianqiang Huang, Hanwang Zhang, and Qianru Sun. Visual commonsense r-cnn. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 6
    Google ScholarLocate open access versionFindings
  • Yan Wang, Wei-Lun Chao, Kilian Q Weinberger, and Laurens van der Maaten. Simpleshot: Revisiting nearest-neighbor classification for few-shot learning. arXiv preprint arXiv:1911.04623, 2019. 18
    Findings
  • Yaqing Wang and Quanming Yao. Few-shot learning: A survey. arXiv preprint arXiv:1904.05046, 2019. 1
    Findings
  • P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff, S. Belongie, and P. Perona. Caltech-UCSD Birds 200. Technical report, California Institute of Technology, 2010. 2, 7
    Google ScholarFindings
  • Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning, 2015. 6
    Google ScholarLocate open access versionFindings
  • Xu Yang, Hanwang Zhang, and Jianfei Cai. Deconfounded image captioning: A causal retrospect. arXiv preprint arXiv:2003.03923, 2020. 6
    Findings
  • Han-Jia Ye, Hexiang Hu, De-Chuan Zhan, and Fei Sha. Few-shot learning via embedding adaptation with set-to-set functions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020. 6, 8
    Google ScholarLocate open access versionFindings
  • Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems, 2014. 6
    Google ScholarLocate open access versionFindings
  • Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. In British Machine Vision Conference, 2016. 7, 17, 18
    Google ScholarLocate open access versionFindings
  • Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In European Conference on Computer Vision, 2014. 3, 5
    Google ScholarLocate open access versionFindings
  • Chi Zhang, Yujun Cai, Guosheng Lin, and Chunhua Shen. Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers. 2020. 6, 8
    Google ScholarFindings
  • Dong Zhang, Hanwang Zhang, Jinhui Tang, Xian sheng Hua, and Qianru Sun. Causal intervention for weakly-supervised semantic segmentation. In Advances in Neural Information Processing Systems, 2020. 6
    Google ScholarLocate open access versionFindings
  • Hongguang Zhang, Jing Zhang, and Piotr Koniusz. Few-shot learning via saliency-guided hallucination of samples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 6
    Google ScholarLocate open access versionFindings
  • Ruixiang Zhang, Tong Che, Zoubin Ghahramani, Yoshua Bengio, and Yangqiu Song. Metagan: An adversarial approach to few-shot learning. In Advances in Neural Information Processing Systems, 2018. 6
    Google ScholarLocate open access versionFindings
  • Youshan Zhang and Brian D Davison. Impact of imagenet model selection on domain adaptation. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision Workshops, 2020. 6
    Google ScholarLocate open access versionFindings
  • Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 3, 5
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments