Combining Self-training and Minimal Annotations for Handwritten Word Recognition.

ICFHR(2022)

引用 0|浏览3
暂无评分
摘要
Handwritten Text Recognition (HTR) relies on deep learning to achieve high performances. Its success is substantially driven by large annotated training datasets resulting in powerful recognition models. Performances suffer considerably when applied to document collections with a distinctive style that is not well represented by training data. Applying a recognition model to a new data collection poses a tremendous annotation effort, which is often out of scope, for example considering historic collections. To overcome this limitation, we propose a training scheme that combines multiple data sources. Synthetically generated samples are used to train an initial model. Self-training offers the possibility to exploit unlabeled samples. We further investigate the question of how a small number of manually annotated samples can be integrated to achieve maximal performance with limited annotation effort. Therefore, we add labeled samples at different stages of self-training and propose two criteria, namely confidence and diversity, for the selection of samples to annotate. In our experiments, we show that the proposed training scheme is able to considerably close the gap to fully-supervised training on the designated training set with less than ten percent of the labeling demand.
更多
查看译文
关键词
minimal annotations,recognition,self-training
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要