Learning Visual N-Grams from Web Data
2017 IEEE International Conference on Computer Vision (ICCV)(2016)
摘要
Real-world image recognition systems need to recognize tens of thousands of classes that constitute a plethora of visual concepts. The traditional approach of annotating thousands of images per class for training is infeasible in such a scenario, prompting the use of webly supervised data. This paper explores the training of image-recognition systems on large numbers of images and associated user comments. In particular, we develop visual n-gram models that can predict arbitrary phrases that are relevant to the content of an image. Our visual n-gram models are feed-forward convolutional networks trained using new loss functions that are inspired by n-gram models commonly used in language modeling. We demonstrate the merits of our models in phrase prediction, phrase-based image retrieval, relating images and captions, and zero-shot transfer.
更多查看译文
关键词
feed-forward convolutional networks,language modeling,image retrieval,web data,image-recognition systems,manually labeled images,visual n-gram models,visual n-grams learning,image recognition systems,image captions,image annotation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络