Unsupervised Representation Learning by Sorting Sequences

2017 IEEE International Conference on Computer Vision (ICCV)(2017)

引用 615|浏览106
暂无评分
摘要
We present an unsupervised representation learning approach using videos without semantic labels. We leverage the temporal coherence as a supervisory signal by formulating representation learning as a sequence sorting task. We take temporally shuffled frames (i.e., in non-chronological order) as inputs and train a convolutional neural network to sort the shuffled sequences. Similar to comparison-based sorting algorithms, we propose to extract features from all frame pairs and aggregate them to predict the correct order. As sorting shuffled image sequence requires an understanding of the statistical temporal structure of images, training with such a proxy task allows us to learn rich and generalizable visual representation. We validate the effectiveness of the learned representation using our method as pre-training on high-level recognition problems. The experimental results show that our method compares favorably against state-of-the-art methods on action recognition, image classification, and object detection tasks.
更多
查看译文
关键词
unsupervised representation learning approach,semantic labels,supervisory signal,sequence sorting task,nonchronological order,convolutional neural network,sorting algorithms,image sequence,visual representation,action recognition,image classification,object detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要