Multimodal Document Image Classification

2019 International Conference on Document Analysis and Recognition (ICDAR)(2019)

引用 37|浏览68
暂无评分
摘要
State-of-the-art methods for document image classification rely on visual features extracted by deep convolutional neural networks (CNNs). These methods do not utilize rich semantic information present in the text of the document, which can be extracted using Optical Character Recognition (OCR). We first study the performance of state-of-the-art text classification approaches when applied to noisy text obtained from OCR. We then show that fusing this textual information with visual CNN methods produces state-of-the-art results on the RVL-CDIP classification dataset.
更多
查看译文
关键词
Classification,Document Image,Multimodal
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要