Representation learning in CV计算机视觉中的表征学习是从原始数据中提取特征。特征提取涉及将原始数据处理到向量空间中,捕获表示该数据的基础时空信息。在计算机视觉中,表征学习算法可分为两类:监督学习(Supervised learning): 利用大量的标注数据来训练神经网络模型,完成模型训练之后,不直接使用分类的 fc 层的输出,而是其前一层的输出作为 Representation 用于下游任务;自监督学习(Self-Supervised Learning): 利用大规模的无标注的数据,选择合适的辅助任务(pretext)和自身的监督信号,进行训练,从而可以学习到 Representation 用于下游任务。
Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham,Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig
We present a simple method of leveraging large-scale noisy image-text data to scale up visual and vision-language representation learning
Cited by1BibtexViews25
0
0
Int. J. Comput. Vis., no. 2 (2021): 400-418
We present how to train a visual tracker using unlabeled videos in the wild, which is rarely investigated in visual tracking
Cited by0BibtexViews91DOI
0
0
NIPS 2020, (2020)
Our research advances vision-and-language representation learning by incorporating adversarial training in both pre-training and finetuning stages
Cited by0BibtexViews273
0
0
CVPR, pp.9726-9735, (2019)
Momentum Contrast is on par on Cityscapes instance segmentation, and lags behind on VOC semantic segmentation; we show another comparable case on iNaturalist in appendix
Cited by727BibtexViews745DOI
0
0
ICCV, pp.7463-7472, (2019)
Our experimental results demonstrate that we are able to learn highlevel semantic representations, and we outperform the stateof-the-art for video captioning on the YouCook II dataset
Cited by200BibtexViews186DOI
0
0
CVPR, (2019)
As part of our study, we drastically boost the performance of previously proposed techniques and outperform previously published state-of-the-art results by a large margin
Cited by188BibtexViews305
0
0
ICCV, pp.6390-6399, (2019)
We studied the effect of scaling two selfsupervised approaches along three axes: data size, model capacity and problem complexity
Cited by112BibtexViews210DOI
0
0
CVPR, (2019)
We present a novel Auto-Encoding Transformation paradigm for unsupervised training of neural networks in contrast to the conventional AutoEncoding Data approach
Cited by78BibtexViews105
0
0
CVPR, pp.10434-10443, (2019)
We find multi-task training can lead to significant gains over independent task training
Cited by55BibtexViews123DOI
0
0
CVPR, pp.10364-10374, (2019)
We have presented an unsupervised representation learning method that learns semantically meaningful features containing rotation related and unrelated parts
Cited by51BibtexViews52
0
0
Giorgos Bouritsas, Sergiy Bokhnyak, Stylianos Ploumpis,Stefanos Zafeiriou,Michael M. Bronstein
ICCV, pp.7212-7221, (2019)
In this paper we introduced a representation learning and generative framework for fixed topology 3D deformable shapes, by using a mesh convolutional operator, spiral convolutions, that efficiently encodes the inductive bias of the fixed topology
Cited by34BibtexViews18DOI
0
0
CVPR, (2019): 12456-12465
Our learning paradigm achieves the goal with the incorporation of Domain Diversification and Multi-domain-invariant Representation Learning
Cited by27BibtexViews29
0
0
CVPR, (2019): 4710-4719
We propose a novel AutoEncoder framework to explicitly disentangle pose and appearance features from RGB imagery and the long short-term memory-based integration of pose features over time produces the gait feature
Cited by12BibtexViews44
0
0
arXiv: Learning, (2018)
In this paper we presented Contrastive Predictive Coding, a framework for extracting compact latent representations to encode predictions over future observations
Cited by861BibtexViews839
0
0
Spyros Gidaris, Praveer Singh,Nikos Komodakis
ICLR, (2018)
In our work we propose to learn image features by training ConvNets to recognize the 2d rotation that is applied to the image that it gets as input
Cited by491BibtexViews119
0
0
CVPR, pp.5589-5597, (2018)
We leverage two type of free geometry data: optical flow from synthesis image and disparity map from real 3D movies. These cues effectively drive the convolutional neural networks to extract generic knowledge from the conventional videos that is useful for the high-level semantic...
Cited by69BibtexViews46
0
0
IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, (2018): 1651-1660
We proposed a novel architecture combining a Variational Auto-Encoder and a Generative Adversarial Network to create an identity-invariant representation of a face image that permits synthesis of an expression-preserving and realistic version
Cited by30BibtexViews24DOI
0
0
KDD '17: The 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ..., pp.135-144, (2017)
Extensive experiments demonstrate that the latent feature representations learned by metapath2vec and metapath2vec++ are able to improve various heterogeneous network mining tasks, such as similarity search, node classi cation, and clustering
Cited by562BibtexViews354DOI
0
0
Sylvestre-Alvise Rebuffi,Alexander Kolesnikov,Christoph H. Lampert
computer vision and pattern recognition, (2017)
We introduce iCaRL, a practical strategy for simultaneously learning classifiers and a feature representation in the class-incremental setting
Cited by555BibtexViews68DOI
0
0
Luan Tran,Xi Yin,Xiaoming Liu
CVPR, pp.1283-1292, (2017)
This paper presents Disentangled Representation learning-Generative Adversarial Network for pose-invariant face recognition and face synthesis
Cited by411BibtexViews67
0
0