Segmentation Of Highly Unstructured Handwritten Documents Using A Neural Network Technique

2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR)(2016)

引用 4|浏览11
暂无评分
摘要
In recent years there has been a growing interest in digitizing the extensive amounts of books and documents that existed preceding the widespread adoption of digital technologies. Many of these digitizing initiatives deal with huge collections of handwritten documents, for which document image analysis techniques (page segmentation, keyword-spotting, optical character recognition (OCR), etc) are not yet as mature as for printed text. Thus, there is an imminent need to develop techniques to understand, archive, index and search the manuscripts. The antiquated approach of manually transcribing handwritten collections and then using standard text retrieval techniques can be very expensive for large collections. But many of the manuscripts in these collections, unlike machine-printed texts, contain unstructured information, cluttered group of texts and graphics that do not necessarily follow a pre-specified format, thus making it quite challenging to automatically process.Thus, in this paper we present a convolutional neural network (CNN) based implementation that is used to segment pages of handwritten documents into their constituent sections. We showcase a multiscale sliding window based network that is trained to predict the sections of the pages in handwritten manuscripts. The results of the network are post-processed with a novel region growing technique to further improve the segmentation results. The implementation is applied on the Marianne Moore archival collection, a body of handwritten notes and memos by the renowned author Marianne Moore (1887-1972), one of the foremost modernist poets of the early twentieth-century. We present our segmentation results both quantitatively and qualitatively.
更多
查看译文
关键词
handwritten document segmentation,neural network technique,document image analysis,convolutional neural network,CNN,multiscale sliding window based network,Marianne Moore archival collection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要