Multi-modal OCR System for the ICT Global Supply Chain

ICC 2023 - IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS(2023)

引用 0|浏览4
暂无评分
摘要
Optical Character Recognition (OCR) tools have been widely used to extract text content from images in many applications including Information and Communications Technology (ICT) supply chains. Due to the characteristics of datasheets in global ICT supply chains, models trained with popular public datasets often suffer from domain adaptation problems. First, popular open source text recognition datasets do not contain all the characters and symbols that appear in the ICT documents, meaning that models trained with these datasets cannot recognize these special characters and symbols. Second, these datasets also do not contain the samples with multiple words and multiple lines assuming that there is an Text Detection model that can extract text areas perfectly, which is not practical for ICT documents. Besides, as far as we know, there is no opensource dataset specifically designed that can be used to evaluate the OCR tools in the ICT domain. Therefore, in this study, we first build a benchmark dataset for the text recognition problem in the ICT domain, which includes the special characters and symbols in the ICT domain, and samples with multiple lines and multiple words. Then we propose a novel multi-modal sequenceto-sequence model, which not only take images as input but also their corresponding their texts generated by a pre-trained model. We conducted extensive experiments to evaluate the proposed multi-modal method on the proposed dataset, and the empirical results show that the proposed method can recognize special characters and symbols, samples with multiple lines and multiple words, outperform benchmark models consistently.
更多
查看译文
关键词
Optical Character Recognition,Document Understanding,Global ICT Supply Chain
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要