Punctuation Prediction in Bangla Text

ACM Transactions on Asian and Low-Resource Language Information Processing(2023)

引用 0|浏览7
暂无评分
摘要
Punctuation prediction is critical as it can enhance the readability of machine-transcribed speeches or texts significantly by adding appropriate punctuation. Furthermore, systems like Automatic Speech Recognizer (ASR) produce texts that are unpunctuated, making the readability difficult for humans and also hampers the performance of various natural language processing (NLP) tasks. Such NLP related tasks have been investigated thoroughly for English; however, very limited work is done for punctuation prediction in the Bangla language. In this study, we train a bidirectional recurrent neural network (BRNN) along with Attention model with a plausibly large Bangla dataset. Afterwards, we apply extensive postprocessing techniques for predicting punctuation more accurately with the employed model. Initially, we perform experimentation with a relatively imbalanced dataset, and our model shows promising results F1=56.9 for Period) in punctuation prediction. Later, we also investigate the model’s performance using a balanced Bangla dataset to achieve higher performance scores ( F1=62.2 for Question). Thus, the goal of this study is to propose an efficient approach that can predict punctuation in Bangla texts effectively. Our study also includes investigation on how our postprocessing techniques affect the prediction performance. Being an early attempt for the punctuation prediction in Bangla text, our work is expected to significantly contribute in the NLP field for the Bangla language, and will pave the way for future work with the Bangla language in this direction.
更多
查看译文
关键词
Neural networks,punctuation prediction,natural language processing,BRNN
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要