AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
While we do not claim that the very challenging few-shot problem is solved, we believe that our model-agnostic Transductive Information Maximization inference should be used as a strong baseline for future few-shot learning research

Information Maximization for Few-Shot Learning

NIPS 2020, (2020)

Cited by: 25|Views457
EI
Full Text
Bibtex
Weibo

Abstract

We introduce Transductive Infomation Maximization (TIM) for few-shot learning. Our method maximizes the mutual information between the query features and their label predictions for a given few-shot task, in conjunction with a supervision loss based on the support set. Furthermore, we propose a new alternating-direction solver for our mut...More

Code:

Data:

0
Introduction
  • Deep learning models have achieved unprecedented success, approaching human-level performances when trained on large-scale labeled data.
  • The generalization of such models might be seriously challenged when dealing with new classes, with only a few labeled instances per class.
  • Can learn new tasks rapidly from a handful of instances, by leveraging context and prior knowledge.
  • Model generalization is evaluated on few-shot tasks, composed of unlabeled samples from novel classes unseen during training, assuming only one or a few labeled samples are given per novel class
Highlights
  • Deep learning models have achieved unprecedented success, approaching human-level performances when trained on large-scale labeled data
  • We propose Transductive Information Maximization (TIM) for few-shot learning
  • Following standard transductive few-shot settings, our comprehensive evaluations show that TIM outperforms state-of-the-art methods substantially across various datasets and networks, while using a simple cross-entropy training on the base classes, without complex metalearning schemes
  • We used feature extractors based on a simple base-class training with the standard cross-entropy loss, without resorting to the complex meta-training schemes that are often used and advocated in the recent few-shot literature
  • While we do not claim that the very challenging few-shot problem is solved, we believe that our model-agnostic TIM inference should be used as a strong baseline for future few-shot learning research
  • One of our theoretical goals will be to connect TIM’s objective to the classifier’s empirical risk on the query set, showing that the former could be viewed as a surrogate for the latter
Methods
  • MatchNet [45] MAML [9] ProtoNet [38] RelatNet [40] SimpleShot [46] GNN [42] Neg-Cosine [26] Baseline [5] LaplacianShot [51] TIM-ADM TIM-GD

    ResNet-18 ResNet-18 ResNet-18 ResNet-18 ResNet-18 ResNet-10 ResNet-18 ResNet-18 ResNet-18 ResNet-18 ResNet-18 mini-ImageNet → CUB 5-shot

    MatchNet [45] ProtoNet [38] RelatNet [40] SimpleShot [46] Baseline [5] Baseline++ [5] TIM-ADM TIM-GD

    ResNet-18 ResNet-18 ResNet-18 ResNet-18 ResNet-18 ResNet-18 ResNet-18 ResNet-18

    10-way 1-shot 5-shot

    20-way 1-shot 5-shot is used here: H(YQ): Marginal entropy, H(YQ|XQ): Conditional entropy, CE: Cross-entropy.
Results
  • Following standard transductive few-shot settings, the comprehensive experiments2 demonstrate that TIM outperforms state-of-the-art methods significantly across various datasets and networks, while used on top of a fixed feature extractor trained with simple cross-entropy on the base classes, without resorting to complex meta-learning schemes.
  • As it will be observed from the experiments, this term brings substantial improvements in performances, while facilitating optimization, thereby reducing transductive runtimes by orders of magnitude
Conclusion
  • The authors' TIM inference establishes new state-of-the-art results on the standard few-shot benchmarks, as well as in more challenging scenarios, with larger numbers of classes and domain shifts.
  • The authors used feature extractors based on a simple base-class training with the standard cross-entropy loss, without resorting to the complex meta-training schemes that are often used and advocated in the recent few-shot literature.
  • The authors target on giving a more theoretical ground for the proposed mutual-information objective, and on exploring further generalizations of the objective, e.g., via embedding domain-knowledge priors.
  • One of the theoretical goals will be to connect TIM’s objective to the classifier’s empirical risk on the query set, showing that the former could be viewed as a surrogate for the latter
Tables
  • Table1: Comparison to the state-of-the-art methods on mini-ImageNet, tiered-Imagenet and CUB. The methods are sub-grouped into transductive and inductive methods, as well as by backbone architecture. Our results (gray-shaded) are averaged over 10,000 episodes. "-" signifies the result is unavailable
  • Table2: The results for the domain-shift setting mini-Imagenet → CUB. The results obtained by our models (gray-shaded) are averaged over 10,000 episodes
  • Table3: Results for increasing the number of classes on mini-ImageNet. The results obtained by our models (gray-shaded) are averaged over 10,000 episodes
  • Table4: Ablation study on the effect of each term in our loss in Eq (3), when only the classifier weights are fine-tuned, i.e., updating only W, and when the whole network is fine-tuned, i.e., updating {φ, W}. The results are reported for ResNet-18 as backbone. The same term indexing as in Eq (3)
  • Table5: Inference run-time per few-shot task for a 5-shot 5-way task on mini-ImageNet with a WRN28-10 backbone
Download tables as Excel
Related work
  • Transductive inference: In a recent line of work, transductive inference has emerged as an appealing approach to tackling few-shot tasks [7, 14, 19, 28, 34, 32, 27, 51], showing performance improvements over inductive inference. In the transductive setting3, the model classifies the unlabeled query examples of a single few-shot task at once, instead of one sample at a time as in inductive methods. These recent experimental observations in few-shot learning are consistent with established facts in classical transductive inference [44, 18, 6], which is well-known to outperform inductive methods on small training sets. While [32] used information of unlabeled query samples via batch normalization, the authors of [28] were the first to model explicitly transductive inference in few-shot learning. Inspired by popular label-propagation concepts [6], they built a meta-learning framework that learns to propagate labels from labeled to unlabeled instances via a graph. The meta-learning transductive method in [14] used attention mechanisms to propagate labels to unlabeled query samples. More closely related to our work, the recent transductive inference of Dhillion et al [7] minimizes the entropy of the network softmax predictions at unlabeled query samples, reporting competitive fewshot performances, while using standard cross-entropy training on the base classes. The competitive performance of [7] is in line with several recent inductive baselines [5, 46, 41], which reported that standard cross-entropy training for the base classes matches or exceeds the performances of more sophisticated meta-learning procedures. Also, the performance of [7] is in line with established results in the context of semi-supervised learning, where entropy minimization is widely used [11, 31, 2]. It is worth noting that the inference runtimes of transductive methods are, typically, much higher than their inductive counterparts. For, instance, the authors of [7] fine-tune all the parameters of a deep network during inference, which is several orders of magnitude slower than inductive methods such as ProtoNet [38]. Also, based on matrix inversion, the transductive inference in [28] has a complexity that is cubic in the number of query samples.
Funding
  • This research was supported by the National Science and Engineering Research Council of Canada (NSERC), via its Discovery Grant program
  • This may help level the playing field with larger and better funded entities
Study subjects and analysis
few-shot learning datasets: 3
During training, all the images are resized to 84 × 84, and we used the same data augmentation procedure as in [51], which includes random cropping, color jitter and random horizontal flipping. Datasets: We resort to 3 few-shot learning datasets to benchmark the proposed models. As standard few-shot benchmarks, we use the mini-Imagenet [45] dataset, with 100 classes split as in [35], the Caltech-UCSD Birds 200 [47] (CUB) dataset, with 200 classes, split following [5], and finally the larger tiered-Imagenet dataset, with 608 classes split as in [36]

Reference
  • H. B. Barlow. Unsupervised learning. In Neural Comput., 1989.
    Google ScholarLocate open access versionFindings
  • David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin A Raffel. Mixmatch: A holistic approach to semi-supervised learning. In Advances in Neural Information Processing Systems (NeurIPS), 2019.
    Google ScholarLocate open access versionFindings
  • Malik Boudiaf, Jérôme Rony, Imtiaz Masud Ziko, Eric Granger, Marco Pedersoli, Pablo Piantanida, and Ismail Ben Ayed. A unifying mutual information view of metric learning: cross-entropy vs. pairwise losses. In European Conference on Computer Vision (ECCV), 2020.
    Google ScholarLocate open access versionFindings
  • Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. In Foundations and Trends R in Machine learning. Now Publishers Inc., 2011.
    Google ScholarLocate open access versionFindings
  • Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, and Jia-Bin Huang. A closer look at few-shot classification. In International Conference on Learning Representations (ICLR), 2019.
    Google ScholarLocate open access versionFindings
  • Zhou Dengyong, Olivier Bousquet, Thomas N Lal, Jason Weston, and Bernhard Schölkopf. Learning with local and global consistency. In Advances in Neural Information Processing Systems (NeurIPS), 2004.
    Google ScholarLocate open access versionFindings
  • Guneet S Dhillon, Pratik Chaudhari, Avinash Ravichandran, and Stefano Soatto. A baseline for few-shot image classification. In International Conference on Learning Representations (ICLR), 2020.
    Google ScholarLocate open access versionFindings
  • Li Fei-Fei, Rob Fergus, and Pietro Perona. One-shot learning of object categories. In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2006.
    Google ScholarLocate open access versionFindings
  • Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning (ICML), 2017.
    Google ScholarLocate open access versionFindings
  • Spyros Gidaris, Andrei Bursuc, Nikos Komodakis, Patrick Pérez, and Matthieu Cord. Boosting few-shot visual learning with self-supervision. In International Conference on Computer Vision (ICCV), 2019.
    Google ScholarLocate open access versionFindings
  • Yves Grandvalet and Yoshua Bengio. Semi-supervised learning by entropy minimization. In Advances in neural information processing systems (NeurIPS), 2005.
    Google ScholarLocate open access versionFindings
  • Yiluan Guo and Ngai-Man Cheung. Attentive weights generation for few shot learning via information maximization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
    Google ScholarLocate open access versionFindings
  • R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. Learning deep representations by mutual information estimation and maximization. In International Conference on Learning Representations (ICLR), 2019.
    Google ScholarLocate open access versionFindings
  • Ruibing Hou, Hong Chang, MA Bingpeng, Shiguang Shan, and Xilin Chen. Cross attention network for few-shot classification. In Advances in Neural Information Processing Systems (NeurIPS), 2019.
    Google ScholarLocate open access versionFindings
  • Shell Xu Hu, Pablo G Moreno, Yang Xiao, Xi Shen, Guillaume Obozinski, Neil D Lawrence, and Andreas Damianou. Empirical bayes transductive meta-learning with synthetic gradients. In International Conference on Learning Representations (ICLR), 2020.
    Google ScholarLocate open access versionFindings
  • Weihua Hu, Takeru Miyato, Seiya Tokui, Eiichi Matsumoto, and Masashi Sugiyama. Learning discrete representations via information maximizing self-augmented training. In International Conference on Machine Learning (ICML), 2017.
    Google ScholarLocate open access versionFindings
  • Mohammed Jabi, Marco Pedersoli, Amar Mitiche, and Ismail Ben Ayed. Deep clustering: On the link between discriminative models and k-means. In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020.
    Google ScholarLocate open access versionFindings
  • Thorsten Joachims. Transductive inference for text classification using support vector machines. In International Conference on Machine Learning (ICML), 1999.
    Google ScholarLocate open access versionFindings
  • Jongmin Kim, Taesup Kim, Sungwoong Kim, and Chang D Yoo. Edge-labeling graph neural network for few-shot learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    Google ScholarLocate open access versionFindings
  • Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2014.
    Google ScholarLocate open access versionFindings
  • Andreas Krause, Pietro Perona, and Ryan G Gomes. Discriminative clustering by regularized information maximization. In Advances in Neural Information Processing systems (NeurIPS), 2010.
    Google ScholarLocate open access versionFindings
  • Kwonjoon Lee, Subhransu Maji, Avinash Ravichandran, and Stefano Soatto. Meta-learning with differentiable convex optimization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    Google ScholarLocate open access versionFindings
  • Xinzhe Li, Qianru Sun, Yaoyao Liu, Qin Zhou, Shibao Zheng, Tat-Seng Chua, and Bernt Schiele. Learning to self-train for semi-supervised few-shot classification. In Advances in Neural Information Processing Systems (NeurIPS), 2019.
    Google ScholarLocate open access versionFindings
  • Jian Liang, Dapeng Hu, and Jiashi Feng. Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In International Conference on Machine Learning (ICML), 2020.
    Google ScholarLocate open access versionFindings
  • R. Linsker. Self-organization in a perceptual network. In Computer, 1988.
    Google ScholarLocate open access versionFindings
  • Bin Liu, Yue Cao, Yutong Lin, Qi Li, Zheng Zhang, Mingsheng Long, and Han Hu. Negative margin matters: Understanding margin in few-shot classification. In European Conference on Computer Vision (ECCV), 2020.
    Google ScholarLocate open access versionFindings
  • Jinlu Liu, Liang Song, and Yongqiang Qin. Prototype rectification for few-shot learning. In European Conference on Computer Vision (ECCV), 2020.
    Google ScholarLocate open access versionFindings
  • Yanbin Liu, Juho Lee, Minseop Park, Saehoon Kim, Eunho Yang, Sung Ju Hwang, and Yi Yang. Learning to propagate labels: Transductive propagation network for few-shot learning. In International Conference on Learning Representations (ICLR), 2019.
    Google ScholarLocate open access versionFindings
  • Erik G Miller, Nicholas E Matsakis, and Paul A Viola. Learning from one example through shared densities on transforms. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2000.
    Google ScholarLocate open access versionFindings
  • N. Mishra, M. Rohaninejad, X. Chen, and P. A. Abbeel. simple neural attentive meta-learner. In International Conference on Learning Representations (ICLR), 2018.
    Google ScholarLocate open access versionFindings
  • Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2018.
    Google ScholarLocate open access versionFindings
  • Alex Nichol, Joshua Achiam, and John Schulman. On first-order meta-learning algorithms. In arXiv preprint arXiv:1803.02999, 2018.
    Findings
  • Boris Oreshkin, Pau Rodríguez López, and Alexandre Lacoste. Tadam: Task dependent adaptive metric for improved few-shot learning. In Advances in Neural Information Processing Systems (NeurIPS), 2018.
    Google ScholarLocate open access versionFindings
  • Limeng Qiao, Yemin Shi, Jia Li, Yaowei Wang, Tiejun Huang, and Yonghong Tian. Transductive episodic-wise adaptive metric for few-shot learning. In International Conference on Computer Vision (ICCV), 2019.
    Google ScholarLocate open access versionFindings
  • Sachin Ravi and Hugo Larochelle. Optimization as a model for few-shot learning. In International Conference on Learning Representations (ICLR), 2016.
    Google ScholarLocate open access versionFindings
  • Mengye Ren, Eleni Triantafillou, Sachin Ravi, Jake Snell, Kevin Swersky, Joshua B Tenenbaum, Hugo Larochelle, and Richard S Zemel. Meta-learning for semi-supervised few-shot classification. In International Conference on Learning Representations (ICLR), 2018.
    Google ScholarLocate open access versionFindings
  • Andrei A Rusu, Dushyant Rao, Jakub Sygnowski, Oriol Vinyals, Razvan Pascanu, Simon Osindero, and Raia Hadsell. Meta-learning with latent embedding optimization. In International Conference on Learning Representations (ICLR), 2019.
    Google ScholarLocate open access versionFindings
  • Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. In Advances in neural information processing systems (NeurIPS), 2017.
    Google ScholarLocate open access versionFindings
  • Qianru Sun, Yaoyao Liu, Tat-Seng Chua, and Bernt Schiele. Meta-transfer learning for few-shot learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    Google ScholarLocate open access versionFindings
  • Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. Learning to compare: Relation network for few-shot learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    Google ScholarLocate open access versionFindings
  • Yonglong Tian, Yue Wang, Dilip Krishnan, Joshua B. Tenenbaum, and Phillip Isola. Rethinking few-shot image classification: a good embedding is all you need? In European Conference on Computer Vision (ECCV), 2020.
    Google ScholarLocate open access versionFindings
  • Hung-Yu Tseng, Hsin-Ying Lee, Jia-Bin Huang, and Ming-Hsuan Yang. Cross-domain few-shot classification via learned feature-wise transformation. In International Conference on Learning Representations (ICLR), 2020.
    Google ScholarLocate open access versionFindings
  • Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. In ArXiv preprint arXiv:1807.03748, 2019.
    Findings
  • Vladimir N Vapnik. An overview of statistical learning theory. In IEEE Transactions on Neural Networks (TNN), 1999.
    Google ScholarLocate open access versionFindings
  • Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning. In Advances in Neural Information Processing Systems (NeurIPS), 2016.
    Google ScholarLocate open access versionFindings
  • Yan Wang, Wei-Lun Chao, Kilian Q Weinberger, and Laurens van der Maaten. Simpleshot: Revisiting nearest-neighbor classification for few-shot learning. In arXiv preprint arXiv:1911.04623, 2019.
    Findings
  • P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff, S. Belongie, and P. Perona. Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology, 2010.
    Google ScholarFindings
  • Davis Wertheimer and Bharath Hariharan. Few-shot learning with localization in realistic settings. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    Google ScholarLocate open access versionFindings
  • Han-Jia Ye, Hexiang Hu, De-Chuan Zhan, and Fei Sha. Learning embedding adaptation for few-shot learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
    Google ScholarLocate open access versionFindings
  • Jian Zhang, Chenglong Zhao, Bingbing Ni, Minghao Xu, and Xiaokang Yang. Variational few-shot learning. In International Conference on Computer Vision (ICCV), 2019.
    Google ScholarLocate open access versionFindings
  • Imtiaz Masud Ziko, Jose Dolz, Eric Granger, and Ismail Ben Ayed. Laplacian regularized few-shot learning. In International Conference on Machine Learning (ICML), 2020.
    Google ScholarLocate open access versionFindings
Author
Malik Boudiaf
Malik Boudiaf
Jérôme Rony
Jérôme Rony
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科