AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We have proposed a simple baseline for few-shot image classification in the meta-learning context

Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need?

european conference on computer vision, pp.266-282, (2020)

Cited by: 68|Views134
Full Text
Bibtex
Weibo

Abstract

The focus of recent meta-learning research has been on the development of learning algorithms that can quickly adapt to test time tasks with limited data and low computational cost. Few-shot learning is widely used as one of the standard benchmarks in meta-learning. In this work, we show that a simple baseline: learning a supervised or ...More

Code:

Data:

0
Introduction
  • Few-shot learning measures a model’s ability to quickly adapt to new environments and tasks.
  • This is a challenging problem because only limited data is available to adapt the model.
  • The performance of the learner is evaluated by the average test accuracy across many meta-testing tasks
  • Methods to tackle this problem can be cast into two main categories: optimization-based methods and metric-based methods.
  • Optimization-based methods focus on designing algorithms that can quickly adapt to each task; while metric-
Highlights
  • Few-shot learning measures a model’s ability to quickly adapt to new environments and tasks
  • We propose an extremely simple baseline that suggests that good learned representations are more powerful for few-shot classification tasks than the current crop of complicated meta-learning algorithms
  • Our simple baseline with ResNet-12 is already comparable with the state-of-the-art MetaOptNet [26] on miniImageNet, and outperforms all previous works by at least 3% on tieredImageNet
  • The network trained with distillation further improves over the simple baseline by 2-3%
  • We have proposed a simple baseline for few-shot image classification in the meta-learning context
  • Our distillation version achieves the new state-of-the-art on both datasets
  • What is the intuition of this paper? A: We hope this paper will shed new light on few-shot classification
Methods
  • The authors establish preliminaries about the meta-learning problem and related algorithms in §3.1; the authors present the baseline in §3.2; the authors introduce how knowledge distillation helps few-shot learning in §3.3.
  • Training examples Dtrain = {(xt, yt)}Tt=1 and testing examples Dtest = {(xq, yq)}Qq=1 are sampled from the same distribution.
  • A base learner A, which is given by y∗ = fθ(x∗) (∗ denotes t or q), is trained on Dtrain and used as a predictor on Dtest.
  • Assume the embedding model is fixed during training the base learner on each task, the objective of the base learner is θ = A(Dtrain; φ)
Results
  • Results on ImageNet derivatives

    The miniImageNet dataset [54] is a standard benchmark for few-shot learning algorithms for recent works.
  • The tieredImageNet dataset [42] is another subset of ImageNet but has more classes (608 classes)
  • These classes are first grouped into 34 higher-level categories, which are further divided into 20 training categories (351 classes), 6 validation categories (97 classes), and 8 testing categories (160 classes).
  • The FC100 dataset [34] is derived from CIFAR-100 dataset in a similar way to tieredImagNnet
  • This results in 60 classes for training, 20 classes for validation, and 20 classes for testing.
  • This verifies the hypothesis that a good embedding plays an important role in few-shot recognition
Conclusion
  • The authors have proposed a simple baseline for few-shot image classification in the meta-learning context.
  • This approach has been underappreciated in the literature far.
  • The authors show with numerous experiments that uch a simple baseline outperforms the current state-of-the-arts on four widelyused few-shot benchmarks.
  • Even when meta-training labels are unavailable, it may be possible to leverage state of the art self-supervised learning approaches to learn very good embeddings for meta-testing tasks.
  • Shown by the empirical experiments, a linear model can generalize well as long as a good representation of the data is given
Tables
  • Table1: Comparison to prior work on miniImageNet and tieredImageNet. Average few-shot classification accuracies (%) with 95%
  • Table2: Comparison to prior work on CIFAR-FS and FC100. Average few-shot classification accuracies (%) with 95% confidence intervals on CIFAR-FS and FC100. a-b-c-d denotes a 4-layer convolutional network with a, b, c, and d filters in each layer
  • Table3: Comparsions of embeddings from supervised pretraining and self-supervised pre-training (Moco and CMC). ∗ the encoder of each view is 0.5× width of a normal ResNet-50
  • Table4: Ablation study on four benchmarks with ResNet-12 as backbone network. “NN” and “LR” stand for nearest neighbour classifier and logistic regression. “L-2” means feature normalization after which feature embeddings are on the unit sphere. “Aug” indicates that each support image is augmented into 5 samples to train the classifier. “Distill” represents the use of knowledge distillation
  • Table5: Comparisons of different backbones on miniImageNet and tieredImageNet
  • Table6: Comparisons of different backbones on CIFAR-FS and FC100
Download tables as Excel
Related work
  • Metric-based meta-learning. The core idea in metricbased meta-learning is related to nearest neighbor algorithms and kernel density estimation. Metric-based methods embed input data into fixed dimensional vectors and use them to design proper kernel functions. The predicted label of a query is the weighted sum of labels over support samples. Metric-based meta-learning aims to learn a task-dependent metric. [22] used Siamese network to encode image pairs and predict confidence scores for each pair. Matching Networks [54] employed two networks for query samples and support samples respectively and used an LSTM with read-attention to encode a full context embedding of support samples. Prototypical Networks [46] learned to encode query samples and support samples into a shared embedding space; the metric used to classify query samples is the distance to prototype representations of each class. Instead of using distances of embeddings, Relation Networks [48] leveraged relational module to represent an appropriate metric. TADAM [34] proposed metric scaling and metric task conditioning to boost the performance of Prototypical Networks.
Funding
  • • Our combined method achieves an average of 3% improvement over the previous state-of-the-art
  • Our distillation version achieves the new state-of-the-art on both datasets
Study subjects and analysis
samples: 64
We use SGD optimizer with a momentum of 0.9 and a weight decay of 5e−4. Each batch consists of 64 samples. The learning rate is initialized as 0.05 and decayed with a factor of 0.1 by three times for all datasets, except for miniImageNet where we only decay twice as the third decay has no effect

times for all datasets: 3
Each batch consists of 64 samples. The learning rate is initialized as 0.05 and decayed with a factor of 0.1 by three times for all datasets, except for miniImageNet where we only decay twice as the third decay has no effect. We train 100 epochs for miniImageNet, 60 epochs for tieredImageNet, and 90 model

augmented samples: 5
We study the following five components of our method: (a) we chose logistic regression as our base learner, and compare it to a nearest neighbour classifier with euclidean distance; (b) we find that normalizing the feature vectors onto the unit sphere, e.g., L-2 normalization, could improve the classification of the downstream base classiminiImageNet tieredImageNet CIFAR-FS. NN LR L-2 Aug Distill 1-shot 5-shot 1-shot 5-shot 1-shot 5-shot 1-shot 5-shot fier; (c) during meta-testing, we create 5 augmented samples from each support image to alleviate the data insufficiency problem, and using these augmented samples to train the linear classifier; (d) we distill the embedding network on the training set by following the sequential distillation [13] strategy. Table 4 shows the results of our ablation studies on miniImageNet, tieredImageNet, CIFAR-FS, and FC100

Reference
  • Machine learning in python. https://scikit-learn.org/stable/.5
    Findings
  • Kelsey Allen, Evan Shelhamer, Hanul Shin, and Joshua Tenenbaum. Infinite mixture prototypes for few-shot learning. In ICML, 2019. 5
    Google ScholarLocate open access versionFindings
  • Luca Bertinetto, Joao F Henriques, Philip HS Torr, and Andrea Vedaldi. Meta-learning with differentiable closed-form solvers. arXiv preprint arXiv:1805.08136, 2018. 4, 5, 6
    Findings
  • Cristian Bucilua, Rich Caruana, and Alexandru NiculescuMizil. Model compression. In SIGKDD, 2006. 2
    Google ScholarLocate open access versionFindings
  • Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Wang, and Jia-Bin Huang. A closer look at few-shot classification. In ICLR, 2019. 1
    Google ScholarLocate open access versionFindings
  • Yinbo Chen, Xiaolong Wang, Zhuang Liu, Huijuan Xu, and Trevor Darrell. A new meta-baseline for few-shot learning. ArXiv, abs/2003.04390, 2020. 1
    Findings
  • Kevin Clark, Minh-Thang Luong, Christopher D. Manning, and Quoc V. Le. Bam! born-again multi-task networks for natural language understanding. In ACL, 2019. 3
    Google ScholarLocate open access versionFindings
  • J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009. 2
    Google ScholarLocate open access versionFindings
  • Guneet Singh Dhillon, Pratik Chaudhari, Avinash Ravichandran, and Stefano Soatto. A baseline for few-shot image classification. In ICLR, 2020. 1, 4, 5, 6
    Google ScholarLocate open access versionFindings
  • Simon Shaolei Du, Wei Hu, Sham M. Kakade, Jason D. Lee, and Qi Lei. Few-shot learning via learning the representation, provably. ArXiv, abs/2002.09434, 2020. 9
    Findings
  • Nikita Dvornik, Cordelia Schmid, and Julien Mairal. Diversity with cooperation: Ensemble methods for few-shot classification. In ICCV, 2019. 5
    Google ScholarLocate open access versionFindings
  • Chelsea Finn, Pieter Abbeel, and Sergey Levine. Modelagnostic meta-learning for fast adaptation of deep networks. In ICML, 2017. 1, 2, 5, 6
    Google ScholarLocate open access versionFindings
  • Tommaso Furlanello, Zachary Chase Lipton, Michael Tschannen, Laurent Itti, and Anima Anandkumar. Bornagain neural networks. In ICML, 2018. 3, 4, 7
    Google ScholarLocate open access versionFindings
  • Spyros Gidaris and Nikos Komodakis. Dynamic few-shot visual learning without forgetting. In CVPR, 2018. 5
    Google ScholarLocate open access versionFindings
  • Fusheng Hao, Fengxiang He, Jun Cheng, Lei Wang, Jianzhong Cao, and Dacheng Tao. Collect and select: Semantic alignment metric learning for few-shot learning. In ICCV, 2019. 5
    Google ScholarLocate open access versionFindings
  • Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross B. Girshick. Momentum contrast for unsupervised visual representation learning. ArXiv, abs/1911.05722, 2019. 6, 12
    Findings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CVPR, 2016. 6
    Google ScholarLocate open access versionFindings
  • Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop, 2015. 2, 4, 9
    Google ScholarFindings
  • Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In CVPR, 2018. 8
    Google ScholarLocate open access versionFindings
  • Shaoli Huang and Dacheng Tao. All you need is a good representation: A multi-level and classifier-centric representation for few-shot learning. ArXiv, abs/1911.12476, 2019. 1
    Findings
  • Muhammad Abdullah Jamal and Guo-Jun Qi. Task agnostic meta-learning for few-shot learning. In CVPR, 2019. 5
    Google ScholarLocate open access versionFindings
  • Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. Siamese neural networks for one-shot image recognition. In ICML Deep Learning Workshop, 2015. 2
    Google ScholarLocate open access versionFindings
  • Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. 4
    Google ScholarFindings
  • Brenden M Lake, Ruslan Salakhutdinov, and Joshua B Tenenbaum. Human-level concept learning through probabilistic program induction. Science, 2015. 2
    Google ScholarLocate open access versionFindings
  • Brenden M. Lake, Ruslan Salakhutdinov, and Joshua B. Tenenbaum. The omniglot challenge: a 3-year progress report. Current Opinion in Behavioral Sciences, 2019. 2
    Google ScholarLocate open access versionFindings
  • Kwonjoon Lee, Subhransu Maji, Avinash Ravichandran, and Stefano Soatto. Meta-learning with differentiable convex optimization. In CVPR, 2019. 1, 2, 3, 4, 5, 6
    Google ScholarLocate open access versionFindings
  • Aoxue Li, Tiange Luo, Tao Xiang, Weiran Huang, and Liwei Wang. Few-shot learning with global class representations. In ICCV, 2019. 5
    Google ScholarLocate open access versionFindings
  • Hongyang Li, David Eigen, Samuel Dodge, Matthew Zeiler, and Xiaogang Wang. Finding task-relevant features for fewshot learning by category traversal. In CVPR, 2019. 1
    Google ScholarLocate open access versionFindings
  • Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, and Pieter Abbeel. A simple neural attentive meta-learner. arXiv preprint arXiv:1707.03141, 2017. 4, 5
    Findings
  • Hossein Mobahi, Mehrdad Farajtabar, and Peter L Bartlett. Self-distillation amplifies regularization in hilbert space. arXiv preprint arXiv:2002.05715, 2020. 3
    Findings
  • Hossein Mobahi, Mehrdad Farajtabar, and Peter L. Bartlett. Self-distillation amplifies regularization in hilbert space. ArXiv, abs/2002.05715, 2020. 9
    Findings
  • Tsendsuren Munkhdalai, Xingdi Yuan, Soroush Mehri, and Adam Trischler. Rapid adaptation with conditionally shifted neurons. arXiv preprint arXiv:1712.09926, 2017. 5
    Findings
  • Alex Nichol, Joshua Achiam, and John Schulman. On first-order meta-learning algorithms. ArXiv, abs/1803.02999, 2018. 2
    Findings
  • Boris Oreshkin, Pau Rodrıguez Lopez, and Alexandre Lacoste. Tadam: Task dependent adaptive metric for improved few-shot learning. In NIPS, 2018. 1, 2, 4, 5, 6
    Google ScholarLocate open access versionFindings
  • Zhimao Peng, Zechao Li, Junge Zhang, Yan Li, Guo-Jun Qi, and Jinhui Tang. Few-shot image recognition with knowledge transfer. In ICCV, 2019. 5
    Google ScholarLocate open access versionFindings
  • Mary Phuong and Christoph Lampert. Towards understanding knowledge distillation. In ICML, 2019. 9
    Google ScholarLocate open access versionFindings
  • Limeng Qiao, Yemin Shi, Jia Li, Yaowei Wang, Tiejun Huang, and Yonghong Tian. Transductive episodic-wise adaptive metric for few-shot learning. In ICCV, 2019. 5, 6
    Google ScholarLocate open access versionFindings
  • Siyuan Qiao, Chenxi Liu, Wei Shen, and Alan L. Yuille. Few-shot image recognition by predicting parameters from activations. In CVPR, 2018. 5
    Google ScholarLocate open access versionFindings
  • Aniruddh Raghu, Maithra Raghu, Samy Bengio, and Oriol Vinyals. Rapid learning or feature reuse? towards understanding the effectiveness of maml. arXiv preprint arXiv:1909.09157, 2019. 1, 2, 4
    Findings
  • Sachin Ravi and Hugo Larochelle. Optimization as a model for few-shot learning. In ICLR, 2017. 2, 5
    Google ScholarLocate open access versionFindings
  • Avinash Ravichandran, Rahul Bhotika, and Stefano Soatto. Few-shot learning with embedded class models and shot-free meta training. In ICCV, 2019. 4, 5, 6
    Google ScholarLocate open access versionFindings
  • Mengye Ren, Sachin Ravi, Eleni Triantafillou, Jake Snell, Kevin Swersky, Josh B. Tenenbaum, Hugo Larochelle, and Richard S. Zemel. Meta-learning for semi-supervised fewshot classification. In ICLR, 2018. 2, 4, 5
    Google ScholarLocate open access versionFindings
  • Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015. 4
    Google ScholarLocate open access versionFindings
  • Andrei A. Rusu, Dushyant Rao, Jakub Sygnowski, Oriol Vinyals, Razvan Pascanu, Simon Osindero, and Raia Hadsell. Meta-learning with latent embedding optimization. In ICLR, 2019. 1, 2, 4, 5
    Google ScholarLocate open access versionFindings
  • Tyler Scott, Karl Ridgeway, and Michael C Mozer. Adapted deep embeddings: A synthesis of methods for k-shot inductive transfer learning. In NIPS, 2018. 1
    Google ScholarLocate open access versionFindings
  • Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. In NIPS, 2017. 1, 2, 5, 6
    Google ScholarLocate open access versionFindings
  • Qianru Sun, Yaoyao Liu, Tat-Seng Chua, and Bernt Schiele. Meta-transfer learning for few-shot learning. In CVPR, 2019. 5, 6
    Google ScholarLocate open access versionFindings
  • Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. Learning to compare: Relation network for few-shot learning. In CVPR, 2018. 1, 2, 5, 6
    Google ScholarLocate open access versionFindings
  • Yonglong Tian, Dilip Krishnan, and Phillip Isola. Contrastive multiview coding. arXiv preprint arXiv:1906.05849, 2019. 6, 12
    Findings
  • Yonglong Tian, Dilip Krishnan, and Phillip Isola. Contrastive representation distillation. arXiv preprint arXiv:1910.10699, 2019. 3
    Findings
  • Antonio Torralba, Rob Fergus, and William T. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. TPAMI, 2008. 4
    Google ScholarLocate open access versionFindings
  • Eleni Triantafillou, Richard S. Zemel, and Raquel Urtasun. Few-shot learning through an information retrieval lens. In NIPS, 2017. 1
    Google ScholarLocate open access versionFindings
  • Eleni Triantafillou, Tyler Zhu, Vincent Dumoulin, Pascal Lamblin, Kelvin Xu, Ross Goroshin, Carles Gelada, Kevin Swersky, Pierre-Antoine Manzagol, and Hugo Larochelle. Meta-dataset: A dataset of datasets for learning to learn from few examples. ArXiv, 2019. 2
    Google ScholarLocate open access versionFindings
  • Oriol Vinyals, Charles Blundell, Timothy Lillicrap, koray kavukcuoglu, and Daan Wierstra. Matching networks for one shot learning. In NIPS, 2016. 1, 2, 4, 5
    Google ScholarLocate open access versionFindings
  • Yuxiong Wang and Martial Hebert. Learning to learn: Model regression networks for easy small sample learning. In ECCV, 2016. 1
    Google ScholarLocate open access versionFindings
  • Yu-Xiong Wang, Ross B. Girshick, Martial Hebert, and Bharath Hariharan. Low-shot learning from imaginary data. CVPR, 2018. 1
    Google ScholarLocate open access versionFindings
  • Yu-Xiong Wang and Martial Hebert. Learning from small sample sets by combining unsupervised meta-training with cnns. In Advances in Neural Information Processing Systems 29, 2016. 6
    Google ScholarLocate open access versionFindings
  • Lilian Weng. Meta-learning: Learning to learn fast. lilianweng.github.io/lil-log, 2018. 2
    Google ScholarFindings
  • Ziyang Wu, Yuwei Li, Lihua Guo, and Kui Jia. Parn: Position-aware relation networks for few-shot learning. In ICCV, 2019. 5
    Google ScholarLocate open access versionFindings
  • Han-Jia Ye, Hexiang Hu, De-Chuan Zhan, and Fei Sha. Learning embedding adaptation for few-shot learning. CoRR, abs/1812.03664, 2018. 1
    Findings
  • Junho Yim, Donggyu Joo, Jihoon Bae, and Junmo Kim. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In CVPR, 2017. 3
    Google ScholarLocate open access versionFindings
  • Jian Zhang, Chenglong Zhao, Bingbing Ni, Minghao Xu, and Xiaokang Yang. Variational few-shot learning. In ICCV, 2019. 5
    Google ScholarLocate open access versionFindings
Author
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科