Making Punctuation Restoration Robust with Disfluency Detection

2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD)(2022)

引用 2|浏览10
暂无评分
摘要
Transcripts generated by automatic speech recognition (ASR) systems usually have poor readability caused by lacking of punctuation and containing a large portion of dis-fluency. Existing methods for automatic punctuation restoration have obtain great improvements by finetuning large pre-trained language models (LMs). However, large amount of well-formatted written language are used in pre-training and finetuning LMs, leaving a mismatch between the training and application. In this paper, we modify the ELECTRA model with disfluency generator and multi-task discriminator for automatic punctuation restoration. The generator dynamically inject disfluency into the input training data, and the discriminator is trained to distinguish disfluency from generator outputs and predict punctuation marks. Experimental results demonstrate that our proposed method significantly outperforms the baselines based on the English IWSLT dataset and our newly collected Chinese dataset.
更多
查看译文
关键词
Punctuation restoration,ELECTRA,Disfluency detection,Multi-task learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要