A survey of document image word spotting techniques.

Pattern Recognition(2017)

引用 152|浏览148
暂无评分
摘要
This work reviews the word spotting methods for document indexing.The nature of texts addressed by word spotting techniques is analyzed.The core steps that compose a word spotting system are thoroughly explored.Several boosting mechanisms which enhance the retrieved results are examined.Results achieved by the state of the art imply that there are still goals to be reached. Vast collections of documents available in image format need to be indexed for information retrieval purposes. In this framework, word spotting is an alternative solution to optical character recognition (OCR), which is rather inefficient for recognizing text of degraded quality and unknown fonts usually appearing in printed text, or writing style variations in handwritten documents. Over the past decade there has been a growing interest in addressing document indexing using word spotting which is reflected by the continuously increasing number of approaches. However, there exist very few comprehensive studies which analyze the various aspects of a word spotting system. This work aims to review the recent approaches as well as fill the gaps in several topics with respect to the related works. The nature of texts and inherent challenges addressed by word spotting methods are thoroughly examined. After presenting the core steps which compose a word spotting system, we investigate the use of retrieval enhancement techniques based on relevance feedback which improve the retrieved results. Finally, we present the datasets which are widely used for word spotting, we describe the evaluation standards and measures applied for performance assessment and discuss the results achieved by the state of the art.
更多
查看译文
关键词
Word spotting,Retrieval,Document indexing,Features,Representation,Relevance feedback
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要