Automated Recognition And Extraction Of Tabular Fields For The Indexing Of Census Records

DOCUMENT RECOGNITION AND RETRIEVAL XX(2013)

引用 11|浏览7
暂无评分
摘要
We describe a system for indexing of census records in tabular documents with the goal of recognizing the content of each cell, including both headers and handwritten entries. Each document is automatically rectified, registered and scaled to a known template following which lines and fields are detected and delimited as cells in a tabular form. Whole-word or whole-phrase recognition of noisy machine-printed text is performed using a glyph library, providing greatly increased efficiency and accuracy (approaching 100%), while avoiding the problems inherent with traditional OCR approaches. Constrained handwriting recognition results for a single author reach as high as 98% and 94.5% for the Gender field and Birthplace respectively. Multi-author accuracy (currently 82%) can be improved through an increased training set. Active integration of user feedback in the system will accelerate the indexing of records while providing a tightly coupled learning mechanism for system improvement.
更多
查看译文
关键词
Image Rectification and Registration,Image Scaling,Line Detection,Field Detection,Field Content Recognition,Form Detection and Table Recognition,Text Recognition,Machine Print Recognition,OCR,Constrained Handwriting Recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要