AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
查看解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
This paper introduces a new fundamental characteristic, \ie, the dynamic range, from real-world metric tools to deep visual recognition

Dynamic Metric Learning - Towards a Scalable Metric Space To Accommodate Multiple Semantic Scales.

CVPR, pp.5393-5402, (2021)

被引用0|浏览710
EI
下载 PDF 全文
引用
微博一下

摘要

This paper introduces a new fundamental characteristic, \ie, the dynamic range, from real-world metric tools to deep visual recognition. In metrology, the dynamic range is a basic quality of a metric tool, indicating its flexibility to accommodate various scales. Larger dynamic range offers higher flexibility. In visual recognition, the...更多
0
简介
  • This papers consider the deep metric learning for visual recognition and supplements it with an important concept in metrology, i.e., the dynamic range.
  • Similar and the smallest scale that a metric tool can provide
  • It is a basic quality of a metric, indicating the flexibility to accommodate various scales.
  • The authors introduce the dynamic range to endow a single deep metric with flexibility among multiple semantic granularities.
  • It may reveal a new perspective in understanding the generalization ability of deep visual recognition
重点内容
  • This papers consider the deep metric learning for visual recognition and supplements it with an important concept in metrology, i.e., the dynamic range
  • The smallest scale that a metric tool can provide. It is a basic quality of a metric, indicating the flexibility to accommodate various scales. We argue that such flexibility is important for deep metric learning, because different visual concepts correspond to different semantic scales
  • We propose Dynamic Metric Learning by supplementing deep metric learning with dynamic range
  • Dynamic Metric Learning (DyML) sets up the evaluation protocol based on two popular protocols adopted by image retrieval, i.e., the Cumulated Matching Characteristics (CMC) [32] and the mean Average Precision [36] The criterion of CMC indicates the probability that a true match exists in the top-K sorted list
  • We introduce the concept of “dynamic range” from real-world metric tools to deep metric for visual recognition
  • We propose a new task named Dynamic Metric Learning, construct three datasets (DyMLVehicle, DyML-Animal and DyML-Product), benchmark these datasets with popular metric learning methods, and design a novel method
方法
  • The classification-based methods generally surpasses the pair-based methods, indicating that the classification training manner usually achieves higher discriminative ability.
  • It is consistent with the observation in many other metric learning tasks [30, 27, 34].
  • One potential reason is that Cosface and Circle Loss has more hyper-parameters (i.e., the scale and the mar-
结果
  • To get an overall evaluation on the discriminative ability under all the semantic scales, DyML first evaluates the performance under each level and averages the results under three levels.
  • Using the level information of the query to fit the underlying scale is not allowed.
  • It prohibits learning several single-scaled metrics and manually choosing an appropriate one to fit each query.
结论
  • The authors introduce the concept of “dynamic range” from real-world metric tools to deep metric for visual recognition.
  • It endows a single metric with scalability to accommodate multiple semantic scales.
  • The authors propose a new task named Dynamic Metric Learning, construct three datasets (DyMLVehicle, DyML-Animal and DyML-Product), benchmark these datasets with popular metric learning methods, and design a novel method
总结
  • Introduction:

    This papers consider the deep metric learning for visual recognition and supplements it with an important concept in metrology, i.e., the dynamic range.
  • Similar and the smallest scale that a metric tool can provide
  • It is a basic quality of a metric, indicating the flexibility to accommodate various scales.
  • The authors introduce the dynamic range to endow a single deep metric with flexibility among multiple semantic granularities.
  • It may reveal a new perspective in understanding the generalization ability of deep visual recognition
  • Methods:

    The classification-based methods generally surpasses the pair-based methods, indicating that the classification training manner usually achieves higher discriminative ability.
  • It is consistent with the observation in many other metric learning tasks [30, 27, 34].
  • One potential reason is that Cosface and Circle Loss has more hyper-parameters (i.e., the scale and the mar-
  • Results:

    To get an overall evaluation on the discriminative ability under all the semantic scales, DyML first evaluates the performance under each level and averages the results under three levels.
  • Using the level information of the query to fit the underlying scale is not allowed.
  • It prohibits learning several single-scaled metrics and manually choosing an appropriate one to fit each query.
  • Conclusion:

    The authors introduce the concept of “dynamic range” from real-world metric tools to deep metric for visual recognition.
  • It endows a single metric with scalability to accommodate multiple semantic scales.
  • The authors propose a new task named Dynamic Metric Learning, construct three datasets (DyMLVehicle, DyML-Animal and DyML-Product), benchmark these datasets with popular metric learning methods, and design a novel method
表格
  • Table1: Three datasets, i.e., DyML-Vehicle, DyML-Animal, DyML-Product for Dynamic Metric Learning. We collect the raw images from publicly available datasets and supplement them with abundant multi-scale annotations. Each dataset has three hierarchical labels ranging from coarse to fine. Some level contains several semantic scales. Under the middle level and fine level, there is no intersection between training and testing classes. The coarse level allows certain class intersections and yet insists on the open-set setting
  • Table2: Evaluation of six popular deep learning methods and the proposed Cross-Scale Learning (CSL) on DyML-Vehicle, DyML-Animal and DyML-Product. For CMC and mAP, we report the overall results averaged from three scales. The ASI is an overall evaluation protocol in its nature. Best results are in bold
  • Table3: Comparison between Cosface and the proposed CSL in three specified scales (besides the overall performance). We report
  • Table4: Comparison between different training manners for CSL
Download tables as Excel
相关工作
  • 2.1. Deep Metric Learning.

    Deep metric learning (DML) plays a crucial role in a variety of computer vision applications, e.g., face recognition [30, 22, 5, 27, 12, 3], person re-identification [29, 37, 25, 23, 24], vehicle re-identification [15, 7, 38, 39] and product recognition [17, 1, 6]. Generally, these tasks aim to retrieve all the most similar images to the query image.

    During recent years, there has been remarkable progresses [30, 28, 5, 27, 34, 19, 12, 3, 22] in deep metric learning. These methods are usually divided into two types, i.e., pair-based methods and classification-based methods. Pair-based methods (e.g., Triplet loss [20], N-pair loss [21], Multi-Simi loss [33]) optimize the similarities between sample pairs in the deeply-embedded feature space. In contrast, classification-based methods learn the embedding by training a classification model on the training set, e.g., Cosface[31], ArcFace[5], NormSoftmax[35] and proxy NCA[16]. Moreover, a very recent work, i.e., Circle Loss[22], considers these two learning manners from a unified perspective. It provides a general loss function compatible to both pair-based and classification-based learning.
基金
  • This research was supported by China’s “scientific and technological innovation 2030 - major projects” (No 2020AAA0104400)
研究对象与分析
datasets: 3
As a minor contribution, we propose Cross-Scale Learning (CSL) to alleviate such conflict. We show that CSL consistently improves the baseline on all the three datasets. The datasets and the code will be publicly available at https://github.com/SupetZYK/DynamicMetricLearning

observations: 3
For evaluation with CMC and mAP, we average the performance under all the scales and only report the overall performance. The results are reported in Table 2, from which we draw three observations.

First, DyML is very challenging
. The overall performance is low under all the three baselines

observations: 3
For evaluation with CMC and mAP, we average the performance under all the scales and only report the overall performance. The results are reported in Table 2, from which we draw three observations. First, DyML is very challenging

datasets: 3
In another word, a metric for DyML should be discriminative under several semantic granularities across a wide range. To promote the research on DyML, we construct three datasets based on vehicle, animal and product, respectively. All these datasets have three different semantic scales, i.e., fine, middle and coarse

datasets: 3
In contrary to canonical metric learning for visual recognition, DyML desires discriminative ability across multiple semantic scales. • We construct three datasets for DyML, i.e., DyMLVehicle, DyML-Animal and DyML-Product. All these datasets contain images under multiple semantic granularities for both training and testing

datasets: 3
Overview. This paper provides three datasets for Dynamic Metric Learning research, i.e., DyML-Vehicle, DyML-Animal and DyML-Product. We collect all the source images from publicly available datasets and supplement them with some manual annotations to enrich the semantic scales

datasets: 3
In another word, under the coarse level, some testing classes exist in the training set, while some other testing classes are novel. The quantitative descriptions of all these three datasets are summarized in Table 1. DyML-Vehicle merges two vehicle re-ID datasets PKU VehicleID [11], VERI-Wild [14]

observations: 3
For fair comparison, we use a same loss function, i.e., Cosface [31] under all the settings. We draw three observations as follows: First, each single-scaled metric shows relatively high accuracy under its dedicated scales. For example, the “fine” metric (i.e., the metric learned with fine-level labels) achieves 54.7% Rank-1 accuracy under the fine-level testing

datasets: 3
It endows a single metric with scalability to accommodate multiple semantic scales. Based on dynamic range, we propose a new task named Dynamic Metric Learning, construct three datasets (DyMLVehicle, DyML-Animal and DyML-Product), benchmark these datasets with popular metric learning methods, and design a novel method. This research was supported by China’s “scientific and technological innovation 2030 - major projects” (No 2020AAA0104400)

datasets: 3
. Three datasets, i.e., DyML-Vehicle, DyML-Animal, DyML-Product for Dynamic Metric Learning. We collect the raw images from publicly available datasets and supplement them with abundant multi-scale annotations. Each dataset has three hierarchical labels ranging from coarse to fine. Some level contains several semantic scales. Under the middle level and fine level, there is no intersection between training and testing classes. The coarse level allows certain class intersections and yet insists on the open-set setting. Evaluation of six popular deep learning methods and the proposed Cross-Scale Learning (CSL) on DyML-Vehicle, DyML-Animal and DyML-Product. For CMC and mAP, we report the overall results averaged from three scales. The ASI is an overall evaluation protocol in its nature. Best results are in bold

引用论文
  • Yalong Bai, Yuxiang Chen, Wei Yu, Linfang Wang, and Wei Zhang. Products-10k: A large-scale product recognition dataset. arXiv preprint arXiv:2008.10545, 2020. 2
    Findings
  • Thomas Berg, Jiongxin Liu, Seung Woo Lee, Michelle L Alexander, David W Jacobs, and Peter N Belhumeur. Birdsnap: Large-scale fine-grained visual categorization of birds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2011–2018, 2014. 2
    Google ScholarLocate open access versionFindings
  • Qiong Cao, Li Shen, Weidi Xie, Omkar M Parkhi, and Andrew Zisserman. Vggface2: A dataset for recognising faces across pose and age. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pages 67–74. IEEE, 2018. 2
    Google ScholarLocate open access versionFindings
  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255.
    Google ScholarLocate open access versionFindings
  • Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2
    Google ScholarLocate open access versionFindings
  • Eran Goldman, Roei Herzig, Aviv Eisenschtat, Jacob Goldberger, and Tal Hassner. Precise detection in densely packed scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5227–5236, 2019. 2
    Google ScholarLocate open access versionFindings
  • Bing He, Jia Li, Yifan Zhao, and Yonghong Tian. Partregularized near-duplicate vehicle re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3997–4005, 2019. 2
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 5
    Google ScholarLocate open access versionFindings
  • Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Fei-Fei Li. Novel dataset for fine-grained image categorization: Stanford dogs. In Proc. CVPR Workshop on Fine-Grained Visual Categorization (FGVC), volume 2, 2011. 2
    Google ScholarLocate open access versionFindings
  • Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops, pages 554–561, 2013. 2
    Google ScholarLocate open access versionFindings
  • Hongye Liu, Yonghong Tian, Yaowei Wang, Lu Pang, and Tiejun Huang. Deep relative distance learning: Tell the difference between similar vehicles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2167–2175, 2016. 4
    Google ScholarLocate open access versionFindings
  • Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, and Le Song. Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 212–220, 2017. 2
    Google ScholarLocate open access versionFindings
  • Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1096–1104, 2016. 2
    Google ScholarLocate open access versionFindings
  • Y. Lou, Y. Bai, J. Liu, S. Wang, and L. Duan. Veri-wild: A large dataset and a new method for vehicle re-identification in the wild. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3230–3238, 2019. 4
    Google ScholarLocate open access versionFindings
  • Yihang Lou, Yan Bai, Jun Liu, Shiqi Wang, and LingYu Duan. Embedding adversarial learning for vehicle reidentification. IEEE Transactions on Image Processing, 28(8):3794–3807, 2019. 2
    Google ScholarLocate open access versionFindings
  • Yair Movshovitz-Attias, Alexander Toshev, Thomas K Leung, Sergey Ioffe, and Saurabh Singh. No fuss distance metric learning using proxies. In Proceedings of the IEEE International Conference on Computer Vision, pages 360–368, 2017. 2
    Google ScholarLocate open access versionFindings
  • Jingtian Peng, Chang Xiao, Xun Wei, and Yifan Li. Rp2k: A large-scale retail product dataset forfine-grained image classification. arXiv preprint arXiv:2006.12634, 2020. 2
    Findings
  • Qi Qian, Lei Shang, Baigui Sun, Juhua Hu, Hao Li, and Rong Jin. Softtriple loss: Deep metric learning without triplet sampling. In The IEEE International Conference on Computer Vision (ICCV), October 2019. 6
    Google ScholarLocate open access versionFindings
  • Rajeev Ranjan, Carlos D Castillo, and Rama Chellappa. L2constrained softmax loss for discriminative face verification. arXiv preprint arXiv:1703.09507, 2017. 2
    Findings
  • Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823, 2015. 2, 5, 7
    Google ScholarLocate open access versionFindings
  • Kihyuk Sohn. Improved deep metric learning with multiclass n-pair loss objective. In NIPS, 2016. 2, 5, 7
    Google ScholarLocate open access versionFindings
  • Yifan Sun, Changmao Cheng, Yuhan Zhang, Chi Zhang, Liang Zheng, Zhongdao Wang, and Yichen Wei. Circle loss: A unified perspective of pair similarity optimization. arXiv preprint arXiv:2002.10857, 2020. 2, 5, 7
    Findings
  • Yifan Sun, Qin Xu, Yali Li, Chi Zhang, Yikang Li, Shengjin Wang, and Jian Sun. Perceive where to focus: Learning visibility-aware part-level features for partial person reidentification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 393–402, 2019. 2
    Google ScholarLocate open access versionFindings
  • Y. Sun, L. Zheng, W. Deng, and S. Wang. Svdnet for pedestrian retrieval. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 3820–3828, 2017. 2
    Google ScholarLocate open access versionFindings
  • Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, and Shengjin Wang. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In The European Conference on Computer Vision (ECCV), September 2018. 2
    Google ScholarLocate open access versionFindings
  • Florian Tramer, Nicholas Carlini, Wieland Brendel, and Aleksander Madry. On adaptive attacks to adversarial example defenses. arXiv preprint arXiv:2002.08347, 2020. 2
    Findings
  • Feng Wang, Jian Cheng, Weiyang Liu, and Haijun Liu. Additive margin softmax for face verification. IEEE Signal Processing Letters, 25(7):926–930, 2018. 2, 7
    Google ScholarLocate open access versionFindings
  • Feng Wang, Xiang Xiang, Jian Cheng, and Alan Loddon Yuille. Normface: L2 hypersphere embedding for face verification. In Proceedings of the 25th ACM international conference on Multimedia, pages 1041–1049. ACM, 2017. 2
    Google ScholarLocate open access versionFindings
  • Guanshuo Wang, Yufeng Yuan, Xiong Chen, Jiwei Li, and Xi Zhou. Learning discriminative features with multiple granularities for person re-identification. 2018 ACM Multimedia Conference on Multimedia Conference - MM ’18, 2018. 2
    Google ScholarLocate open access versionFindings
  • Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5265–5274, 2018. 2, 7
    Google ScholarLocate open access versionFindings
  • Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5265–5274, 2018. 2, 5, 6, 7
    Google ScholarLocate open access versionFindings
  • Xiaogang Wang, Gianfranco Doretto, Thomas Sebastian, Jens Rittscher, and Peter H. Tu. Shape and appearance context modeling. In IEEE 11th International Conference on Computer Vision, ICCV 2007, Rio de Janeiro, Brazil, October 14-20, 2007, 2007. 5
    Google ScholarLocate open access versionFindings
  • Xun Wang, Xintong Han, Weilin Huang, Dengke Dong, and Matthew R Scott. Multi-similarity loss with general pair weighting for deep metric learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5022–5030, 2019. 2, 5, 7
    Google ScholarLocate open access versionFindings
  • Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. A discriminative feature learning approach for deep face recognition. In European conference on computer vision, pages 499–515. Springer, 2016. 2, 7
    Google ScholarLocate open access versionFindings
  • Andrew Zhai and Hao-Yu Wu. Classification is a strong baseline for deep metric learning. arXiv preprint arXiv:1811.12649, 2018. 2
    Findings
  • Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. Scalable person re-identification: A benchmark. In Proceedings of the IEEE international conference on computer vision, pages 1116–1124, 2015. 5
    Google ScholarLocate open access versionFindings
  • Zhedong Zheng, Xiaodong Yang, Zhiding Yu, Liang Zheng, Yi Yang, and Jan Kautz. Joint discriminative and generative learning for person re-identification. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. 2
    Google ScholarLocate open access versionFindings
  • Yi Zhou and Ling Shao. Vehicle re-identification by adversarial bi-directional lstm network. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 653–662. IEEE, 2018. 2
    Google ScholarLocate open access versionFindings
  • Jianqing Zhu, Huanqiang Zeng, Jingchang Huang, Shengcai Liao, Zhen Lei, Canhui Cai, and Lixin Zheng. Vehicle reidentification using quadruple directional deep learning features. IEEE Transactions on Intelligent Transportation Systems, 2019. 2
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科