Bigram Label Regularization to Reduce Over-Segmentation on Inline Math Expression Detection

2019 International Conference on Document Analysis and Recognition (ICDAR)(2019)

引用 9|浏览8
暂无评分
摘要
Inline Mathematical Expression refers to Math Expression (ME) that is blended into plaintext sentences in scientific papers. Detecting inline MEs is a non-trivial problem due to the unrestricted usage of font styles and blurred boundaries with plaintext in scientific publications. For instance, many inline MEs detected by existing algorithms are split into multiple parts incorrectly due to the misidentification of a few characters. In this paper, we propose a bigram regularization model to resolve the split problem in inline ME detection. The model incorporates neighboring constraints during labeling of ME vs. plaintext. Experimental results show that this technique significantly reduces the splits of inline MEs, with small gains in the false and miss rate. In comparison with a CRF model, our model achieves a higher F1 score and a lower miss rate.
更多
查看译文
关键词
inline math detection,bigram regularization,pairwise potentials,Bayesian model,split detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要