Document Layout Analysis: A Maximum Homogeneous Region Approach
2018 1st International Conference on Multimedia Analysis and Pattern Recognition (MAPR)(2018)
摘要
This paper presents a method for document layout analysis. This method applies the analyzing of whitespace in maximum homogeneous regions. This method focuses on the balance between processing time and performance. It consists of two main stages: classification and segmentation. Firstly, by using the analysis of whitespace analysis on Maximum multi-layer horizontal homogeneous regions, the text and non-text elements are classified. Then, text regions are extracted by using mathematical morphology. Besides, non-text elements are classified into separators, tables, images via a machine learning approach. The proposed method's effectiveness is proved by the tests on UW-III (A1) datasets.
更多查看译文
关键词
machine learning approach,document layout analysis,Maximum homogeneous region approach,whitespace analysis,Maximum multilayer horizontal homogeneous regions,nontext elements,text regions extraction,mathematical morphology,UW-III (A1) datasets
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要