A Novel Method of Text Line Segmentation for Historical Document Image of the Uchen Tibetan

Journal of Visual Communication and Image Representation(2019)

引用 14|浏览28
暂无评分
摘要
Text line segmentation is a key step in Tibetan historical document recognition. A novel method for text line segmentation was proposed based on the baseline in uchen Tibetan, and a new dataset was released, which was used to evaluate the results of text line segmentation of uchen Tibetan historical documents. In this paper, there were two steps for the proposed method: baseline detection and text line segmentation using the baseline. In baseline detection, the upper edges of all characters in the document were obtained by a horizontal gradient operator, then an edge connectivity definition was proposed by which the upper edge set was divided into disjoint subsets. Eligible sets were selected from these subsets, and the edges in these sets were joined in turn to obtain the baseline. In text line segmentation, the document image was truncated at the baseline position, then the adhesion regions were segmented again. Each connected region in the image was assigned to its nearest baseline. All connected regions belonging to the same baseline formed a text line. Experiments on the proposed dataset showed that the method could effectively avoid document distortion, the accuracy of text line segmentation was high, and the text line adhesion could be handled.
更多
查看译文
关键词
Tibetan historical document,Text line segmentation,Baseline,Upper edge,Connected region analysis,Dataset,Image processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要