LEEP: A New Measure to Evaluate Transferability of Learned Representations

Nguyen Cuong V.
Nguyen Cuong V.
Archambeau Cedric
Archambeau Cedric
Seeger Matthias
Seeger Matthias

international conference on machine learning, 2020.

Cited by: 1|Bibtex|Views8
Other Links: arxiv.org|academic.microsoft.com
Weibo:
We proposed Log Expected Empirical Prediction, a novel transferability measure that can efficiently estimate the performance of transfer and metatransfer learning algorithms before executing them

Abstract:

We introduce a new measure to evaluate the transferability of representations learned by classifiers. Our measure, the Log Expected Empirical Prediction (LEEP), is simple and easy to compute: when given a classifier trained on a source data set, it only requires running the target data set through this classifier once. We analyze the pr...More

Code:

Data:

0
Introduction
  • Transferability estimation (Eaton et al, 2008; Ammar et al, 2014; Sinapov et al, 2015) is the problem of quantitatively estimating how easy it is to transfer knowledge learned from one classification task to another.
  • Consider transfer learning between two classification tasks: a source task, represented by a pre-trained model, and a target task, represented by a labeled data set.
  • Similar to the transfer learning setting, Sun et al (2019) and Requeima et al (2019) adapted a pre-trained model on the source task to the target task.
Highlights
  • Transferability estimation (Eaton et al, 2008; Ammar et al, 2014; Sinapov et al, 2015) is the problem of quantitatively estimating how easy it is to transfer knowledge learned from one classification task to another
  • We show two theoretical properties of the measure: (1) Log Expected Empirical Prediction is upper bounded by the average log-likelihood of the optimal model, obtained by re-training the head classifier while freezing the feature extractor; (2) Log Expected Empirical Prediction is related to the negative conditional entropy measure proposed by Tran et al (2019)
  • We study the transferability estimation problem, which aims to develop a measure that can tell us, without training on the target data set, how effectively these transfer learning algorithms can transfer knowledge learned in the source model θ to the target task, using the target data set D
  • We describe our proposed measure, Log Expected Empirical Prediction, that requires no expensive training on the target task and offers an a priori estimate of how well a model will transfer to the target task
  • We evaluate the ability of Log Expected Empirical Prediction to predict the performance of transfer and meta-transfer learning algorithms, prior to applying these algorithms in practice
  • When compared with negative conditional entropy, Log Expected Empirical Prediction scores have equal or better correlations with transfer performance in all except for two cases of the fine-tuning method. Even in these two cases, Log Expected Empirical Prediction scores are only slightly worse than negative conditional entropy scores. These comparisons confirm that Log Expected Empirical Prediction scores are better than negative conditional entropy scores for our transfer settings, with up to 30% improvement in terms of correlation coefficients
  • We proposed Log Expected Empirical Prediction, a novel transferability measure that can efficiently estimate the performance of transfer and metatransfer learning algorithms before executing them
Results
  • The authors show that LEEP scores effectively predict the accuracies of models transferred from source to target tasks.
  • The authors use the test examples of these selected classes as the target test set to compute the accuracy of the transferred model.
  • This method keeps the feature extractor layers of the source model fixed and trains a new head classifier using the target data set from scratch.
  • Fig. 1 shows the LEEP scores and the corresponding test accuracies for transferred models on 200 target tasks.
  • To factor out the noise when evaluating LEEP scores on small target data sets, the authors consider partitioning the scores’ range into five equal bins and averaging the test accuracies of tasks in each bin.
  • These results testify that LEEP scores can predict accuracies of transferred models, even in small data regimes.
  • These results again confirm that LEEP scores can predict the performance of transferred models, even for imbalanced target data sets.
  • The authors follow the original training procedure proposed for CNAPs by Requeima et al (2019), where the source model is a ResNet18 pre-trained on ImageNet, and the adaptation networks are trained using the Meta-data set (Triantafillou et al, 2020).
  • Similar to Sec. 5.2 and 5.3, Fig. 3 provides the average test accuracies of tasks in five LEEP score transferability levels.
  • The authors use the same small data settings defined in Sec. 5.2 and train a reference model for each target task.
  • These results confirm the advantage of transfer learning in small data settings, especially between highly transferable tasks, which in turn, can be efficiently predicted using the LEEP scores.
Conclusion
  • For all the experimental settings in Sec. 5.1, 5.2, 5.3, and 5.4 above, the authors compute the NCE score for each transfer from a source model to a target task using the method described in Sec. 3.
  • When compared with NCE, LEEP scores have equal or better correlations with transfer performance in all except for two cases of the fine-tuning method.
  • LEEP scores can be used to efficiently select highly transferable pairs of source and target tasks, yielding high transfer accuracy and good convergence speeds.
Summary
  • Transferability estimation (Eaton et al, 2008; Ammar et al, 2014; Sinapov et al, 2015) is the problem of quantitatively estimating how easy it is to transfer knowledge learned from one classification task to another.
  • Consider transfer learning between two classification tasks: a source task, represented by a pre-trained model, and a target task, represented by a labeled data set.
  • Similar to the transfer learning setting, Sun et al (2019) and Requeima et al (2019) adapted a pre-trained model on the source task to the target task.
  • The authors show that LEEP scores effectively predict the accuracies of models transferred from source to target tasks.
  • The authors use the test examples of these selected classes as the target test set to compute the accuracy of the transferred model.
  • This method keeps the feature extractor layers of the source model fixed and trains a new head classifier using the target data set from scratch.
  • Fig. 1 shows the LEEP scores and the corresponding test accuracies for transferred models on 200 target tasks.
  • To factor out the noise when evaluating LEEP scores on small target data sets, the authors consider partitioning the scores’ range into five equal bins and averaging the test accuracies of tasks in each bin.
  • These results testify that LEEP scores can predict accuracies of transferred models, even in small data regimes.
  • These results again confirm that LEEP scores can predict the performance of transferred models, even for imbalanced target data sets.
  • The authors follow the original training procedure proposed for CNAPs by Requeima et al (2019), where the source model is a ResNet18 pre-trained on ImageNet, and the adaptation networks are trained using the Meta-data set (Triantafillou et al, 2020).
  • Similar to Sec. 5.2 and 5.3, Fig. 3 provides the average test accuracies of tasks in five LEEP score transferability levels.
  • The authors use the same small data settings defined in Sec. 5.2 and train a reference model for each target task.
  • These results confirm the advantage of transfer learning in small data settings, especially between highly transferable tasks, which in turn, can be efficiently predicted using the LEEP scores.
  • For all the experimental settings in Sec. 5.1, 5.2, 5.3, and 5.4 above, the authors compute the NCE score for each transfer from a source model to a target task using the method described in Sec. 3.
  • When compared with NCE, LEEP scores have equal or better correlations with transfer performance in all except for two cases of the fine-tuning method.
  • LEEP scores can be used to efficiently select highly transferable pairs of source and target tasks, yielding high transfer accuracy and good convergence speeds.
Tables
  • Table1: Comparison of Pearson correlation coefficients of LEEP, NCE (<a class="ref-link" id="cTran_et+al_2019_a" href="#rTran_et+al_2019_a">Tran et al, 2019</a>), and H scores (<a class="ref-link" id="cBao_et+al_2019_a" href="#rBao_et+al_2019_a">Bao et al, 2019</a>). Correlations are computed with respect to test accuracies (or F1) of three (meta)-transfer learning algorithms in various experimental settings. Correlations marked with asterisks (*) are not statistically significant (p > 0.05), while the rest are statistically significant with p < 0.001
Download tables as Excel
Related work
  • Our work is related to several research areas in machine learning and computer vision, including transfer learning (Weiss et al, 2016; Yang et al, 2020), meta-transfer learning (Wei et al, 2018; Sun et al, 2019; Requeima et al, 2019), task space modeling (Zamir et al, 2018; Achille et al, 2019), and domain adaptation (Sun et al, 2016; Azizzadenesheli et al, 2019). We discuss below previous work that is closely related to ours.

    Transfer learning. Our paper addresses the problem of predicting the performance of transfer learning algorithms between two classification tasks, without actually executing these algorithms. This problem is also called transferability estimation between classification tasks (Bao et al, 2019; Tran et al, 2019). Early theoretical work in transfer and multi-task learning studied the relatedness between tasks and proposed several types of distances between tasks. These distances include the F-relatedness (Ben-David & Schuller, 2003), A-distance (Kifer et al, 2004; Ben-David et al, 2007), and discrepancy distance (Mansour et al, 2009). Although useful for theoretical analysis, these approaches are unsuited for measuring transferability in practice because they cannot be computed easily and are symmetric. Transferability measures should be non-symmetric since transferring from one task to another (e.g., from a hard task to an easy one) is different from transferring in the reverse direction (e.g., from the easy task to the hard one).
Reference
  • Achille, A., Lam, M., Tewari, R., Ravichandran, A., Maji, S., Fowlkes, C. C., Soatto, S., and Perona, P. Task2vec: Task embedding for meta-learning. In International Conference on Computer Vision, pp. 6430–6439, 2019.
    Google ScholarLocate open access versionFindings
  • Agrawal, P., Girshick, R., and Malik, J. Analyzing the performance of multilayer neural networks for object recognition. In European Conference on Computer Vision, pp. 329–344, 2014.
    Google ScholarLocate open access versionFindings
  • Al-Stouhi, S. and Reddy, C. K. Transfer learning for class imbalance problems with inadequate data. Knowledge and information systems, 48(1):201–228, 2016.
    Google ScholarLocate open access versionFindings
  • Ammar, H. B., Eaton, E., Taylor, M. E., Mocanu, D. C., Driessens, K., Weiss, G., and Tuyls, K. An automated measure of MDP similarity for transfer in reinforcement learning. In AAAI Conference on Artificial Intelligence Workshops, 2014.
    Google ScholarLocate open access versionFindings
  • Azizzadenesheli, K., Liu, A., Yang, F., and Anandkumar, A. Regularized learning for domain adaptation under label shifts. In International Conference on Learning Representations, 2019.
    Google ScholarLocate open access versionFindings
  • Bao, Y., Li, Y., Huang, S.-L., Zhang, L., Zheng, L., Zamir, A., and Guibas, L. An information-theoretic approach to transferability in task transfer learning. In IEEE International Conference on Image Processing, pp. 2309–2313, 2019.
    Google ScholarLocate open access versionFindings
  • Ben-David, S., Blitzer, J., Crammer, K., and Pereira, F. Analysis of representations for domain adaptation. In Advances in Neural Information Processing Systems, pp. 137–144, 2007.
    Google ScholarLocate open access versionFindings
  • Bhattacharjee, B., Codella, N., Kender, J. R., Huo, S., Watson, P., Glass, M. R., Dube, P., Hill, M., and Belgodere, B. P2L: Predicting transfer learning for images and semantic relations. arXiv:1908.07630, 2019.
    Findings
  • Bottou, L. Stochastic gradient learning in neural networks. Proceedings of Neuro-Nımes, 91(8):12, 1991.
    Google ScholarLocate open access versionFindings
  • Chatfield, K., Simonyan, K., Vedaldi, A., and Zisserman, A. Return of the devil in the details: Delving deep into convolutional nets. In British Machine Vision Conference, 2014.
    Google ScholarLocate open access versionFindings
  • Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., Xiao, T., Xu, B., Zhang, C., and Zhang, Z. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv:1512.01274, 2015.
    Findings
  • Dhillon, G. S., Chaudhari, P., Ravichandran, A., and Soatto, S. A baseline for few-shot image classification. In International Conference on Learning Representations, 2020.
    Google ScholarLocate open access versionFindings
  • Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., and Darrell, T. DeCAF: A deep convolutional activation feature for generic visual recognition. In International Conference on Machine Learning, pp. 647–655, 2014.
    Google ScholarLocate open access versionFindings
  • Eaton, E., Lane, T., et al. Modeling transfer relationships between learning tasks for improved inductive transfer. In European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pp. 317–332, 2008.
    Google ScholarLocate open access versionFindings
  • Edwards, H. and Storkey, A. Towards a neural statistician. In International Conference on Learning Representations, 2017.
    Google ScholarLocate open access versionFindings
  • Gebelein, H. Das statistische problem der korrelation als variations-und eigenwertproblem und sein zusammenhang mit der ausgleichsrechnung. ZAMM-Journal of Applied Mathematics and Mechanics/Zeitschrift fur Angewandte Mathematik und Mechanik, 21(6):364–379, 1941.
    Google ScholarLocate open access versionFindings
  • Girshick, R., Donahue, J., Darrell, T., and Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587, 2014.
    Google ScholarLocate open access versionFindings
  • Guo, J., He, H., He, T., Lausen, L., Li, M., Lin, H., Shi, X., Wang, C., Xie, J., Zha, S., et al. GluonCV and GluonNLP: Deep learning in computer vision and natural language processing. arXiv preprint arXiv:1907.04433, 2019.
    Findings
  • He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
    Google ScholarLocate open access versionFindings
  • Hirschfeld, H. O. A connection between correlation and contingency. In Mathematical Proceedings of the Cambridge Philosophical Society, volume 31, pp. 520–524, 1935.
    Google ScholarLocate open access versionFindings
  • Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861, 2017.
    Findings
  • Hu, J., Shen, L., and Sun, G. Squeeze-and-excitation networks. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141, 2018.
    Google ScholarLocate open access versionFindings
  • Jomaa, H. S., Grabocka, J., and Schmidt-Thieme, L. Dataset2vec: Learning dataset meta-features. arXiv:1905.11063, 2019.
    Findings
  • Kifer, D., Ben-David, S., and Gehrke, J. Detecting change in data streams. In International Conference on Very Large Data Bases, volume 4, pp. 180–191, 2004.
    Google ScholarLocate open access versionFindings
  • Kingma, D. P. and Welling, M. Stochastic gradient vb and the variational auto-encoder. In International Conference on Learning Representations, volume 19, 2014.
    Google ScholarLocate open access versionFindings
  • Kornblith, S., Shlens, J., and Le, Q. V. Do better Imagenet models transfer better? In IEEE Conference on Computer Vision and Pattern Recognition, pp. 2661–2671, 2019.
    Google ScholarLocate open access versionFindings
  • Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. Master’s thesis, University of Toronto, 2009.
    Google ScholarFindings
  • Mansour, Y., Mohri, M., and Rostamizadeh, A. Domain adaptation: Learning bounds and algorithms. In Annual Conference on Learning Theory, 2009.
    Google ScholarLocate open access versionFindings
  • Misra, I., Shrivastava, A., Gupta, A., and Hebert, M. Crossstitch networks for multi-task learning. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 3994–4003, 2016.
    Google ScholarLocate open access versionFindings
  • Nguyen, C. V., Achille, A., Lam, M., Hassner, T., Mahadevan, V., and Soatto, S. Meta-analysis of continual learning. In Workshop on Meta-Learning @ NeurIPS, 2019.
    Google ScholarLocate open access versionFindings
  • Oquab, M., Bottou, L., Laptev, I., and Sivic, J. Learning and transferring mid-level image representations using convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717–1724, 2014.
    Google ScholarLocate open access versionFindings
  • Perrone, V., Jenatton, R., Seeger, M. W., and Archambeau, C. Scalable hyperparameter transfer learning. In Advances in Neural Information Processing Systems, pp. 6845–6855, 2018.
    Google ScholarLocate open access versionFindings
  • Razavian, A. S., Azizpour, H., Sullivan, J., and Carlsson, S. CNN features off-the-shelf: an astounding baseline for recognition. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 806–813, 2014.
    Google ScholarLocate open access versionFindings
  • Redmon, J. and Farhadi, A. Yolov3: An incremental improvement. arXiv:1804.02767, 2018.
    Findings
  • Renyi, A. On measures of dependence. Acta Mathematica Hungarica, 10(3-4):441–451, 1959.
    Google ScholarLocate open access versionFindings
  • Requeima, J., Gordon, J., Bronskill, J., Nowozin, S., and Turner, R. E. Fast and flexible multi-task classification using conditional neural adaptive processes. In Advances in Neural Information Processing Systems, pp. 7957–7968, 2019.
    Google ScholarLocate open access versionFindings
  • Robbins, H. and Monro, S. A stochastic approximation method. The Annals of Mathematical Statistics, pp. 400– 407, 1951.
    Google ScholarLocate open access versionFindings
  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Li, F.-F. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211–252, 2015.
    Google ScholarLocate open access versionFindings
  • Sinapov, J., Narvekar, S., Leonetti, M., and Stone, P. Learning inter-task transferability in the absence of target task samples. In International Conference on Autonomous Agents and Multiagent Systems, pp. 725–733, 2015.
    Google ScholarLocate open access versionFindings
  • Standley, T., Zamir, A. R., Chen, D., Guibas, L., Malik, J., and Savarese, S. Which tasks should be learned together in multi-task learning? arXiv:1905.07553, 2019.
    Findings
  • Sun, B., Feng, J., and Saenko, K. Return of frustratingly easy domain adaptation. In AAAI Conference on Artificial Intelligence, 2016.
    Google ScholarLocate open access versionFindings
  • Sun, Q., Liu, Y., Chua, T.-S., and Schiele, B. Meta-transfer learning for few-shot learning. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 403–412, 2019.
    Google ScholarLocate open access versionFindings
  • Tran, A. T., Nguyen, C. V., and Hassner, T. Transferability and hardness of supervised classification tasks. In International Conference on Computer Vision, pp. 1395–1405, 2019.
    Google ScholarLocate open access versionFindings
  • Triantafillou, E., Zhu, T., Dumoulin, V., Lamblin, P., Xu, K., Goroshin, R., Gelada, C., Swersky, K., Manzagol, P.-A., and Larochelle, H. Meta-dataset: A dataset of datasets for learning to learn from few examples. In International Conference on Learning Representations, 2020.
    Google ScholarLocate open access versionFindings
  • Wang, J., Chen, Y., Hao, S., Feng, W., and Shen, Z. Balanced distribution adaptation for transfer learning. In IEEE International Conference on Data Mining, pp. 1129–1134, 2017.
    Google ScholarLocate open access versionFindings
  • Wei, Y., Zhang, Y., Huang, J., and Yang, Q. Transfer learning via learning to transfer. In International Conference on Machine Learning, pp. 5085–5094, 2018.
    Google ScholarLocate open access versionFindings
  • Weiss, K., Khoshgoftaar, T. M., and Wang, D. A survey of transfer learning. Journal of Big Data, 3(1):9, 2016.
    Google ScholarLocate open access versionFindings
  • Whatmough, P. N., Zhou, C., Hansen, P., Venkataramanaiah, S. K., Seo, J.-s., and Mattina, M. FixyNN: Efficient hardware for mobile computer vision via transfer learning. In Conference on Systems and Machine Learning, 2019.
    Google ScholarLocate open access versionFindings
  • Xiao, H., Rasul, K., and Vollgraf, R. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747, 2017.
    Findings
  • Yang, Q., Zhang, Y., Dai, W., and Pan, S. J. Transfer learning. Cambridge University Press, 2020.
    Google ScholarFindings
  • Yosinski, J., Clune, J., Bengio, Y., and Lipson, H. How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems, pp. 3320–3328, 2014.
    Google ScholarLocate open access versionFindings
  • Zamir, A. R., Sax, A., Shen, W., Guibas, L. J., Malik, J., and Savarese, S. Taskonomy: Disentangling task transfer learning. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 3712–3722, 2018.
    Google ScholarLocate open access versionFindings
  • Zeiler, M. D. and Fergus, R. Visualizing and understanding convolutional networks. In European Conference on Computer Vision, pp. 818–833, 2014.
    Google ScholarLocate open access versionFindings
  • Zenke, F., Poole, B., and Ganguli, S. Continual learning through synaptic intelligence. In International Conference on Machine Learning, pp. 3987–3995, 2017.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments