Hyperbolic Visual Embedding Learning for Zero-Shot Recognition

CVPR, pp. 9270-9278, 2020.

Cited by: 0|Bibtex|Views124|DOI:https://doi.org/10.1109/CVPR42600.2020.00929
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com
Weibo:
This paper proposes a Hyperbolic Visual Embedding Learning Network for zero-shot recognition

Abstract:

This paper proposes a Hyperbolic Visual Embedding Learning Network for zero-shot recognition. The network learns image embeddings in hyperbolic space, which is capable of preserving the hierarchical structure of semantic classes in low dimensions. Comparing with existing zero-shot learning approaches, the network is more robust because th...More

Code:

Data:

0
Introduction
  • Real-world image recognition applications are usually faced with thousands of object classes.
  • Zero-Shot Learning (ZSL) [34, 35, 9], which aims to recognize the novel categories which are unseen during the training phase, has become an important research problem that needs to study.
  • Zero-shot learning is generally regarded as a difficult problem.
  • As reported in [17], for generalized large-scale zero-shot image recognition, the best performance attained on the ImageNet dataset is less than 10% in terms of top-5 accuracy, which is far away from real-world applications.
Highlights
  • Real-world image recognition applications are usually faced with thousands of object classes
  • Zero-Shot Learning (ZSL) [34, 35, 9], which aims to recognize the novel categories which are unseen during the training phase, has become an important research problem that needs to study
  • We propose the Hyperbolic Visual Embedding Learning Network that learns hierarchical-aware image embeddings in hyperbolic space for Zero-Shot Learning
  • As far as we know, this is the first attempt to introduce NonEuclidean Space for zero-shot learning problem. We conduct both empirical and analytic studies to demonstrate that introducing hyperbolic space into Zero-Shot Learning problem results in a model that produces more robust predictions
  • We evaluate our model on these three datasets for both the Zero-Shot Learning (ZSL) setting and the Generalized Zero-Shot Learning (GZSL) setting
  • As far as we know, this is the first attempt to introduce Non-Euclidean Space for Zero-Shot Learning problem. We conducted both empirical and analytic studies to demonstrate that introducing hyperbolic space into Zero-Shot Learning problem results in a more robust model
Results
  • As reported in [17], for generalized large-scale zero-shot image recognition, the best performance attained on the ImageNet dataset is less than 10% in terms of top-5 accuracy, which is far away from real-world applications.
Conclusion
  • The authors proposed the Hyperbolic Visual Embedding Learning Networks.
  • As far as the authors know, this is the first attempt to introduce Non-Euclidean Space for ZSL problem.
  • The authors conducted both empirical and analytic studies to demonstrate that introducing hyperbolic space into ZSL problem results in a more robust model.
  • The authors' framework outperforms existing baseline methods by a large margin
Summary
  • Introduction:

    Real-world image recognition applications are usually faced with thousands of object classes.
  • Zero-Shot Learning (ZSL) [34, 35, 9], which aims to recognize the novel categories which are unseen during the training phase, has become an important research problem that needs to study.
  • Zero-shot learning is generally regarded as a difficult problem.
  • As reported in [17], for generalized large-scale zero-shot image recognition, the best performance attained on the ImageNet dataset is less than 10% in terms of top-5 accuracy, which is far away from real-world applications.
  • Results:

    As reported in [17], for generalized large-scale zero-shot image recognition, the best performance attained on the ImageNet dataset is less than 10% in terms of top-5 accuracy, which is far away from real-world applications.
  • Conclusion:

    The authors proposed the Hyperbolic Visual Embedding Learning Networks.
  • As far as the authors know, this is the first attempt to introduce Non-Euclidean Space for ZSL problem.
  • The authors conducted both empirical and analytic studies to demonstrate that introducing hyperbolic space into ZSL problem results in a more robust model.
  • The authors' framework outperforms existing baseline methods by a large margin
Tables
  • Table1: Top-k accuracy for different models on the ImageNet dataset for hierarchical evaluation. The candidates become the categories in “hops” test set and the parent of them. The baseline models are re-implemented by us. For all models, the image features are extracted with ResNet-101
  • Table2: Top-k accuracy of different methods on ZSL setting
  • Table3: Top-k accuracy of different models on GZSL setting
  • Table4: Effect of different hyperbolic label embeddings. Image features are extracted with ResNet-101. The testing is done on unseen categories
Download tables as Excel
Related work
  • As image recognition systems have achieved nearhuman accuracy when training samples are ample [15], recent research focus has shifted to the problem of zero-shot image recognition [16, 38, 22, 8, 14, 12, 37], a challenging but more practical setting where the recognition is performed on categories that were unseen during the training.

    Early works on zero-shot learning mostly rely on semantic attributes including both user-defined attributes [18, 19] and data-driven attributes [10, 23], which are automatically discovered from visual data. These attributes are then used as the intermediate representations for knowledge transfer across classes, supporting zero-shot recognition of unseen classes. Recent works on zero-shot learning are mostly based on deep learning technologies and basically can be grouped into two major paradigms. The first paradigm is based on semantic embeddings (implicit knowledge) which directly learn a mapping from visual space to semantic space [4, 5, 11, 9, 36, 29, 31], represented by semantic vectors such as word vectors. For example, Socher et al, [32] proposed to learn a linear mapping to align the image embeddings and the label embeddings learned from two different neural networks. Motivated by this work, Frome et al, [9] proposed the DeViSE model to train this mapping using a ConvNet and a transformation layer, which showed that this paradigm can be exploited to make predictions about tens of thousands of unseen image labels. Instead of training a ConvNet to match the image features and the category embeddings, Norouzi et al [27] proposed to map image features into the semantic embedding space via convex combination, which requires no additional training.
Funding
  • This research is part of NExT++ research, which is supported by the National Research Foundation, Prime Minister’s Office, Singapore under its IRC@SG Funding Initiative
Reference
  • Gary Becigneul and Octavian-Eugen Ganea. Riemannian adaptive optimization methods. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Silvere Bonnabel. Stochastic gradient descent on riemannian manifolds. IEEE Transactions on Automatic Control, 58(9):2217–2229, 2013.
    Google ScholarLocate open access versionFindings
  • Soravit Changpinyo, Wei-Lun Chao, Boqing Gong, and Fei Sha. Synthesized classifiers for zero-shot learning. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Soravit Changpinyo, Wei-Lun Chao, and Fei Sha. Predicting visual exemplars of unseen classes for zero-shot learning. In ICCV, 2017.
    Google ScholarLocate open access versionFindings
  • Long Chen, Hanwang Zhang, Jun Xiao, Wei Liu, and ShihFu Chang. Zero-shot visual recognition using semanticspreserving adversarial embedding network. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Jia Deng, Nan Ding, Yangqing Jia, Andrea Frome, Kevin Murphy, Samy Bengio, Yuan Li, Hartmut Neven, and Hartwig Adam. Large-scale object classification using label relation graphs. In ECCV, 2014.
    Google ScholarLocate open access versionFindings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
    Google ScholarLocate open access versionFindings
  • Zhengming Ding and Hongfu Liu. Marginalized latent semantic encoder for zero-shot learning. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Andrea Frome, Gregory S. Corrado, Jonathon Shlens, Samy Bengio, Jeffrey Dean, Marc’Aurelio Ranzato, and Tomas Mikolov. Devise: A deep visual-semantic embedding model. In NIPS, 2013.
    Google ScholarFindings
  • Yanwei Fu, Timothy M Hospedales, Tao Xiang, and Shaogang Gong. Attribute learning for understanding unstructured social activity. In ECCV, 2012.
    Google ScholarLocate open access versionFindings
  • Yanwei Fu and Leonid Sigal. Semi-supervised vocabularyinformed learning. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • Yanwei Fu, Xiaomei Wang, Hanze Dong, Yu-Gang Jiang, Meng Wang, Xiangyang Xue, and Leonid Sigal. Vocabularyinformed zero-shot and open-set learning. IEEE transactions on pattern analysis and machine intelligence, 2019.
    Google ScholarLocate open access versionFindings
  • Octavian-Eugen Ganea, Gary Becigneul, and Thomas Hofmann. Hyperbolic entailment cones for learning hierarchical embeddings. In ICML, 2018.
    Google ScholarLocate open access versionFindings
  • Tristan Hascoet, Yasuo Ariki, and Tetsuya Takiguchi. On zero-shot recognition of generic objects. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9553–9561, 2019.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • He Huang, Changhu Wang, Philip S. Yu, and Chang-Dong Wang. Generative dual adversarial network for generalized zero-shot learning. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Michael Kampffmeyer, Yinbo Chen, Xiaodan Liang, Hao Wang, Yujia Zhang, and Eric P. Xing. Rethinking knowledge graph propagation for zero-shot learning. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Christoph H Lampert, Hannes Nickisch, and Stefan Harmeling. Learning to detect unseen object classes by betweenclass attribute transfer. In CVPR, 2009.
    Google ScholarLocate open access versionFindings
  • Christoph H Lampert, Hannes Nickisch, and Stefan Harmeling. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3):453–465, 2014.
    Google ScholarLocate open access versionFindings
  • Omer Levy and Yoav Goldberg. Linguistic regularities in sparse and explicit word representations. In CoNLL, 2014.
    Google ScholarLocate open access versionFindings
  • Omer Levy, Yoav Goldberg, and Ido Dagan. Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 3:211–225, 2015.
    Google ScholarLocate open access versionFindings
  • Jin Li, Xuguang Lan, Yang Liu, Le Wang, and Nanning Zheng. Compressing unknown images with product quantizer for efficient zero-shot classification. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • Jingen Liu, Benjamin Kuipers, and Silvio Savarese. Recognizing human actions by attributes. In CVPR, 2011.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013.
    Google ScholarFindings
  • George A. Miller. Wordnet: A lexical database for english. Communications of the ACM, 38(11):39–41, 1995.
    Google ScholarLocate open access versionFindings
  • Maximilian Nickel and Douwe Kiela. Poincareembeddings for learning hierarchical representations. In NIPS, 2017.
    Google ScholarLocate open access versionFindings
  • Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg Corrado, and Jeffrey Dean. Zero-shot learning by convex combination of semantic embeddings. In ICLR, 2014.
    Google ScholarLocate open access versionFindings
  • Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Glove: Global vectors for word representation. In EMNLP, 2014.
    Google ScholarLocate open access versionFindings
  • Bernardino Romera-Paredes and Philip Torr. An embarrassingly simple approach to zero-shot learning. In ICML, 2015.
    Google ScholarLocate open access versionFindings
  • Ruslan Salakhutdinov, Antonio Torralba, and Joshua B. Tenenbaum. Learning to share visual appearance for multiclass object detection. In CVPR, 2011.
    Google ScholarLocate open access versionFindings
  • Richard Socher, Milind Ganjoo, Christopher D Manning, and Andrew Ng. Zero-shot learning through cross-modal transfer. In NIPS, 2013.
    Google ScholarLocate open access versionFindings
  • Richard Socher, Milind Ganjoo, Christopher D. Manning, and Andrew Y. Ng. Zero-shot learning through cross-modal transfer. In NIPS, 2013.
    Google ScholarLocate open access versionFindings
  • Alexandru Tifrea, Gary Becigneul, and Octavian-Eugen Ganea. Poincare glove: Hyperbolic word embeddings. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Xiaolong Wang, Yufei Ye, and Abhinav Gupta. Zero-shot recognition via semantic embeddings and knowledge graphs. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Yongqin Xian, Bernt Schiele, and Zeynep Akata. Zero-shot learning - the good, the bad and the ugly. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Ziming Zhang and Venkatesh Saligrama. Zero-shot learning via semantic similarity embedding. In ICCV, 2015.
    Google ScholarLocate open access versionFindings
  • Bo Zhao, Xinwei Sun, Yanwei Fu, Yuan Yao, and Yizhou Wang. Msplit lbi: Realizing feature selection and dense estimation simultaneously in few-shot and zero-shot learning. arXiv preprint arXiv:1806.04360, 2018.
    Findings
  • Pengkai Zhu, Hanxiao Wang, and Venkatesh Saligrama. Generalized zero-shot recognition based on visually semantic embedding. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments