Place perception from the fusion of different image representation

Pattern Recognition(2021)

引用 6|浏览48
暂无评分
摘要
•We propose a multi-task deep neural network to realize the indoor place understanding and recognition together, which imitates and learns the process of place perception in a human-style.•From the perspective of multi-modal information transformation and complementation, we propose an image captioning model to automatically generate natural language descriptions from place images, which is an additional information source to assist the decision-making in place recognition.•We propose a multi-modal feature extraction and fusion architecture based on a mixed-CNN-LSTM network that gathers both visual and linguistic features corresponding to instance-level and concept-level information, respectively.•We validate the effectiveness of the proposed strategy of using natural language descriptions to place perception through experiments on four public image datasets.
更多
查看译文
关键词
Indoor place perception,CNN,LSTM,Convolutional auto-encoder,Natural language
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要