AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We propose a simple technique for ranking-based methods to regularize the compression of the learned embedding space, which results in boosted performance across all benchmark datasets

Revisiting Training Strategies and Generalization Performance in Deep Metric Learning

ICML, pp.8242-8252, (2020)

Cited by: 44|Views122
EI
Full Text
Bibtex
Weibo

Abstract

Deep Metric Learning (DML) is arguably one of the most influential lines of research for learning visual similarities with many proposed approaches every year. Although the field benefits from the rapid progress, the divergence in training protocols, architectures, and parameter choices make an unbiased comparison difficult. To provide ...More

Code:

Data:

0
Introduction
  • Learning visual similarity is important for a wide range of vision tasks, such as image clustering (Bouchacourt et al, 2018), face detection (Schroff et al, 2015) or image retrieval (Wu et al, 2017).
  • One of the most adopted classes of algorithms for this task is Deep Metric Learning (DML) which leverages deep neural networks to learn such a distance preserving embedding.
  • Undisclosed technical details (s.a. data augmentations or training regularization) pose a challenge to the reproducibility of such methods, which is of great concern in the machine learning community in general (Bouthillier et al, 2019).
  • The generalization capabilities of DML models by analyzing the structure of their learned embedding spaces.
  • While the authors are not able to reliably link typically targeted concepts such as large inter-class margins (Liu et al, 2017; Deng et al, 2018)
Highlights
  • Learning visual similarity is important for a wide range of vision tasks, such as image clustering (Bouchacourt et al, 2018), face detection (Schroff et al, 2015) or image retrieval (Wu et al, 2017)
  • Following we present batch mining strategies operating on both labels and the data itself: label samplers, which are sampling heuristics that follow selection rules based on label information only, and embedded samplers, which operate on data embeddings themselves to create batches B of diverse data statistics
  • Comprehensive study of important training components and objectives for Deep Metric Learning to contribute to improved comparability of recent and future approaches
  • We study generalization performance in Deep Metric Learning and uncover a strong correlation to the level of compression of learned data representation
  • Our findings reveal that highly compressed representations disregard helpful features for capturing data characteristics that transfer to unknown test distributions
  • We propose a simple technique for ranking-based methods to regularize the compression of the learned embedding space, which results in boosted performance across all benchmark datasets
Results
  • The authors' analysis reveals that the batch sampling process effects DML training with a difference in mean performance up to 1.5%.
Conclusion
  • The authors counteract the worrying trend of diverging training protocols in Deep Metric Learning (DML).
  • Comprehensive study of important training components and objectives for DML to contribute to improved comparability of recent and future approaches
  • On this basis, the authors study generalization performance in DML and uncover a strong correlation to the level of compression of learned data representation.
  • The authors' findings reveal that highly compressed representations disregard helpful features for capturing data characteristics that transfer to unknown test distributions
  • To this end, the authors propose a simple technique for ranking-based methods to regularize the compression of the learned embedding space, which results in boosted performance across all benchmark datasets
Tables
  • Table1: Recall performance of commonly used network architectures after ImageNet pretraining. Final linear layer is randomly initialized and normalized
  • Table2: Comparison of Recall@1 and NMI performances for all objective functions evaluated in our study averaged over 5 runs. Each model is trained using the same training setting over 150 epochs for both CUB and CARS, and 100 epochs for SOP. ’R-’ denotes model is trained using our propose regularization. Bold denotes best results excluding regularization. Boldblue marks overall best results
  • Table3: Comparison to the state-of-the-art DML methods on SOP(<a class="ref-link" id="cOh_et+al_2016_a" href="#rOh_et+al_2016_a">Oh Song et al, 2016</a>). Dim denotes the dimensionality of φθ
  • Table4: Comparison of DML setups for CUB200-2011. We report all relevant performance metrics. Training is done over 150 epochs
  • Table5: Comparison of DML setups for CARS196. We report all relevant performance metrics. Training is done over 150 epochs
  • Table6: Comparison of DML setups for Stanford Online Products. We report all relevant performance metrics.. Training is done over 100 epochs
  • Table7: CUB200-2011: Comparison of Batch-Sampling methods for various loss functions and sampling methods
  • Table8: CARS196: Comparison of Batch-Sampling methods for various loss functions and sampling methods
  • Table9: SOP: Comparison of Batch-Sampling methods for various loss functions and sampling methods
Download tables as Excel
Related work
Reference
  • Achille, A. and Soatto, S. Information dropout: Learning optimal representations through noisy computation, 2016.
    Google ScholarFindings
  • Agarwal, P. K., Har-Peled, S., and Varadarajan, K. R. Geometric approximation via coresets. Combinatorial and computational geometry, 52:1–30, 2005.
    Google ScholarLocate open access versionFindings
  • Alemi, A. A., Fischer, I., Dillon, J. V., and Murphy, K. Deep variational information bottleneck, 2016.
    Google ScholarFindings
  • Belghazi, M. I., Baratin, A., Rajeswar, S., Ozair, S., Bengio, Y., Courville, A., and Hjelm, R. D. Mine: Mutual information neural estimation, 2018.
    Google ScholarFindings
  • Bellet, A. Supervised metric learning with generalization guarantees, 2013.
    Google ScholarFindings
  • Bellet, A. and Habrard, A. Robustness and generalization for metric learning. Neurocomputing, 151:259–267, Mar 2015. ISSN 0925-2312. doi: 10.1016/j.neucom.2014. 09.044. URL http://dx.doi.org/10.1016/j.neucom.2014.09.044.
    Locate open access versionFindings
  • Bouchacourt, D., Tomioka, R., and Nowozin, S. Multi-level variational autoencoder: Learning disentangled representations from grouped observations. In AAAI 2018, 2018.
    Google ScholarLocate open access versionFindings
  • Bouthillier, X., Laurent, C., and Vincent, P. Unreproducible research is reproducible. In International Conference on Machine Learning, pp. 725–734, 2019.
    Google ScholarLocate open access versionFindings
  • Brock, A., Donahue, J., and Simonyan, K. Large scale GAN training for high fidelity natural image synthesis. CoRR, abs/1809.11096, 2018. URL http://arxiv.org/abs/1809.11096.
    Findings
  • Cakir, F., He, K., Xia, X., Kulis, B., and Sclaroff, S. Deep metric learning to rank. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
    Google ScholarLocate open access versionFindings
  • Chen, W., Chen, X., Zhang, J., and Huang, K. Beyond triplet loss: a deep quadruplet network for person reidentification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
    Google ScholarLocate open access versionFindings
  • Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and FeiFei, L. ImageNet: A Large-Scale Hierarchical Image Database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
    Google ScholarLocate open access versionFindings
  • Deng, J., Guo, J., Xue, N., and Zafeiriou, S. Arcface: Additive angular margin loss for deep face recognition, 2018.
    Google ScholarFindings
  • Goyal, A., Islam, R., Strouse, D., Ahmed, Z., Botvinick, M., Larochelle, H., Bengio, Y., and Levine, S. Infobot: Transfer and exploration via the information bottleneck, 2019.
    Google ScholarFindings
  • Goyal, P., Dollar, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., and He, K. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
    Findings
  • Hadsell, R., Chopra, S., and LeCun, Y. Dimensionality reduction by learning an invariant mapping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2006.
    Google ScholarLocate open access versionFindings
  • Harwood, B., Kumar, B., Carneiro, G., Reid, I., Drummond, T., et al. Smart mining for deep metric learning. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2821–2829, 2017.
    Google ScholarLocate open access versionFindings
  • He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
    Google ScholarLocate open access versionFindings
  • Hermans, A., Beyer, L., and Leibe, B. In defense of the triplet loss for person re-identification, 2017.
    Google ScholarFindings
  • Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium, 2017.
    Google ScholarFindings
  • Hu, J., Lu, J., and Tan, Y. Discriminative deep metric learning for face verification in the wild. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
    Google ScholarLocate open access versionFindings
  • Huai, M., Xue, H., Miao, C., Yao, L., Su, L., Chen, C., and Zhang, A. Deep metric learning: The generalization analysis and an adaptive algorithm. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pp. 2535–2541. International Joint Conferences on Artificial Intelligence Organization, 7 2019. doi: 10.24963/ijcai.2019/352. URL https://doi.org/10.24963/ijcai.2019/352.
    Locate open access versionFindings
  • Ioffe, S. and Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. International Conference on Machine Learning, 2015.
    Google ScholarLocate open access versionFindings
  • Jacob, P., Picard, D., Histace, A., and Klein, E. Metric learning with horde: High-order regularizer for deep embeddings. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    Google ScholarLocate open access versionFindings
  • Ge, W. Deep metric learning with hierarchical triplet loss. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 269–285, 2018.
    Google ScholarLocate open access versionFindings
  • Jegou, H., Douze, M., and Schmid, C. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence, 33(1):117–128, 2011.
    Google ScholarLocate open access versionFindings
  • Jiang*, Y., Neyshabur*, B., Krishnan, D., Mobahi, H., and Bengio, S. Fantastic generalization measures and where to find them. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=SJgIPJBFvH.
    Locate open access versionFindings
  • Johnson, T. B. and Guestrin, C. Training deep models faster with robust, approximate importance sampling. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 31, pp. 7265– 7275. Curran Associates, Inc., 2018.
    Google ScholarLocate open access versionFindings
  • Keskar, N. S., Mudigere, D., Nocedal, J., Smelyanskiy, M., and Tang, P. T. P. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836, 2016.
    Findings
  • Kim, W., Goyal, B., Chawla, K., Lee, J., and Kwon, K. Attention-based ensemble for deep metric learning. In Proceedings of the European Conference on Computer Vision (ECCV), 2018.
    Google ScholarLocate open access versionFindings
  • Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. 2015.
    Google ScholarFindings
  • Krause, J., Stark, M., Deng, J., and Fei-Fei, L. 3d object representations for fine-grained categorization. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 554–561, 2013.
    Google ScholarLocate open access versionFindings
  • Krogh, A. and Hertz, J. A. A simple weight decay can improve generalization. In Advances in Neural Information Processing Systems. 1992.
    Google ScholarLocate open access versionFindings
  • Lin, X., Duan, Y., Dong, Q., Lu, J., and Zhou, J. Deep variational metric learning. In The European Conference on Computer Vision (ECCV), September 2018.
    Google ScholarLocate open access versionFindings
  • Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., and Song, L. Sphereface: Deep hypersphere embedding for face recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    Google ScholarLocate open access versionFindings
  • Lloyd, S. P. Least squares quantization in pcm. IEEE Trans. Information Theory, 28:129–136, 1982.
    Google ScholarLocate open access versionFindings
  • Manning, C., Raghavan, P., and Schutze, H. Introduction to information retrieval. Natural Language Engineering, 16 (1):100–103, 2010.
    Google ScholarLocate open access versionFindings
  • Milbich, T., Roth, K., Bharadhwaj, H., Sinha, S., Bengio, Y., Ommer, B., and Cohen, J. P. Diva: Diverse visual feature aggregation for deep metric learning. 2020a.
    Google ScholarFindings
  • Milbich, T., Roth, K., Brattoli, B., and Ommer, B. Sharing matters for generalization in deep metric learning, 2020b.
    Google ScholarFindings
  • Mirzasoleiman, B., Bilmes, J., and Leskovec, J. Coresets for accelerating incremental gradient methods, 2020. URL https://openreview.net/forum?id=SygRikHtvS.
    Findings
  • Misra, I. and van der Maaten, L. Self-supervised learning of pretext-invariant representations, 2019.
    Google ScholarFindings
  • Movshovitz-Attias, Y., Toshev, A., Leung, T. K., Ioffe, S., and Singh, S. No fuss distance metric learning using proxies. In Proceedings of the IEEE International Conference on Computer Vision, pp. 360–368, 2017.
    Google ScholarLocate open access versionFindings
  • Musgrave, K., Belongie, S., and Lim, S.-N. A metric learning reality check, 2020.
    Google ScholarFindings
  • Oh Song, H., Xiang, Y., Jegelka, S., and Savarese, S. Deep metric learning via lifted structured feature embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4004–4012, 2016.
    Google ScholarLocate open access versionFindings
  • Opitz, M., Waltner, G., Possegger, H., and Bischof, H. Deep metric learning with bier: Boosting independent embeddings robustly. IEEE transactions on pattern analysis and machine intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and Lerer, A. Automatic differentiation in pytorch. In NIPS-W, 2017.
    Google ScholarLocate open access versionFindings
  • Qian, Q., Shang, L., Sun, B., Hu, J., Li, H., and Jin, R. Softtriple loss: Deep metric learning without triplet sampling. 2019.
    Google ScholarFindings
  • Roth, K. and Brattoli, B. Deep-metric-learning-baselines. https://github.com/Confusezius/ Deep-Metric-Learning-Baselines, 2019.
    Findings
  • Roth, K., Brattoli, B., and Ommer, B. Mic: Mining interclass characteristics for improved metric learning. In Proceedings of the IEEE International Conference on Computer Vision, pp. 8000–8009, 2019.
    Google ScholarLocate open access versionFindings
  • Roth, K., Milbich, T., and Ommer, B. Pads: Policy-adapted sampling for visual similarity learning, 2020.
    Google ScholarFindings
  • Rubner, Y., Tomasi, C., and Guibas, L. J. The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vision, 40(2):99–121, November 2000. ISSN 0920-5691. doi: 10.1023/A:1026543900054. URL https://doi.org/10.1023/A:1026543900054.
    Locate open access versionFindings
  • Sanakoyeu, A., Tschernezki, V., Buchler, U., and Ommer, B. Divide and conquer the embedding space for metric learning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    Google ScholarLocate open access versionFindings
  • Schroff, F., Kalenichenko, D., and Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823, 2015.
    Google ScholarLocate open access versionFindings
  • Shwartz-Ziv, R. and Tishby, N. Opening the black box of deep neural networks via information, 2017.
    Google ScholarFindings
  • Sinha, S., Zhang, H., Goyal, A., Bengio, Y., Larochelle, H., and Odena, A. Small-gan: Speeding up gan training using core-sets. arXiv preprint arXiv:1910.13540, 2019.
    Findings
  • Smith, S. L., Kindermans, P.-J., Ying, C., and Le, Q. V. Don’t decay the learning rate, increase the batch size. arXiv preprint arXiv:1711.00489, 2017.
    Findings
  • Sohn, K. Improved deep metric learning with multi-class n-pair loss objective. In Advances in Neural Information Processing Systems, pp. 1857–1865, 2016.
    Google ScholarLocate open access versionFindings
  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015.
    Google ScholarLocate open access versionFindings
  • Tishby, N. and Zaslavsky, N. Deep learning and the information bottleneck principle, 2015.
    Google ScholarFindings
  • Ustinova, E. and Lempitsky, V. Learning deep embeddings with histogram loss. In Advances in Neural Information Processing Systems, 2016.
    Google ScholarLocate open access versionFindings
  • Verma, V., Lamb, A., Beckham, C., Najafi, A., Mitliagkas, I., Courville, A., Lopez-Paz, D., and Bengio, Y. Manifold mixup: Better representations by interpolating hidden states, 2018.
    Google ScholarFindings
  • Wah, C., Branson, S., Welinder, P., Perona, P., and Belongie, S. The caltech-ucsd birds-200-2011 dataset. 2011.
    Google ScholarFindings
  • Wang, J., Zhou, F., Wen, S., Liu, X., and Lin, Y. Deep metric learning with angular loss. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2593–2601, 2017.
    Google ScholarLocate open access versionFindings
  • Wang, X., Han, X., Huang, W., Dong, D., and Scott, M. R. Multi-similarity loss with general pair weighting for deep metric learning, 2019a.
    Google ScholarFindings
  • Wang, X., Hua, Y., Kodirov, E., Hu, G., Garnier, R., and Robertson, N. M. Ranked list loss for deep metric learning. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019b.
    Google ScholarLocate open access versionFindings
  • Wu, C.-Y., Manmatha, R., Smola, A. J., and Krahenbuhl, P. Sampling matters in deep embedding learning. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2840–2848, 2017.
    Google ScholarLocate open access versionFindings
  • Xuan, H., Souvenir, R., and Pless, R. Deep randomized ensembles for metric learning. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 723–734, 2018.
    Google ScholarLocate open access versionFindings
  • Yu, B., Liu, T., Gong, M., Ding, C., and Tao, D. Correcting the triplet selection bias for triplet loss. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 71–87, 2018.
    Google ScholarLocate open access versionFindings
  • Yuan, T., Deng, W., Tang, J., Tang, Y., and Chen, B. Signalto-noise ratio: A robust distance metric for deep metric learning. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    Google ScholarLocate open access versionFindings
  • Zhai, A. and Wu, H.-Y. Classification is a strong baseline for deep metric learning, 2018.
    Google ScholarFindings
  • Zheng, W., Chen, Z., Lu, J., and Zhou, J. Hardness-aware deep metric learning. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    Google ScholarLocate open access versionFindings
Author
Ommer Bjoern
Ommer Bjoern
Cohen Joseph Paul
Cohen Joseph Paul
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科