c-RNN: A Fine-Grained Language Model for Image Captioning

Neural Processing Letters(2018)

Cited 10|Views2
No score
Captioning methods from predecessors that based on the conventional deep Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) architecture follow translation system using word-level modelling. But an optimal word segmentation algorithm is essential for segmenting sentence into words in word-level modelling, which is a very difficult task. In this paper, we built a character-level RNN (c-RNN) that directly modeled on captions with characterization where descriptive sentence is composed in a flow of characters. The c-RNN performs language task in finer level and naturally avoids the word segmentation issue. Our c-RNN empowered the language model to dynamically reason about word spelling as well as grammatical rules which results in expressive and elaborate sentence. We optimized parameters of neural nets by maximizing the probabilities of correctly generated characterized sentences. Quantitative and qualitative experiments on the most popular datasets MSCOCO and Flickr30k showed that our c-RNN could describe images with a considerably faster speed and satisfactory quality.
Translated text
Key words
Image captioning,Character-level,Convolutional Neural Network,Recurrent Neural Network,Sequence learning
AI Read Science
Must-Reading Tree
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined