Towards POS Tagging Methods for Bengali Language: A Comparative Analysis

Fatima Jahara,Adrita Barua,MD. Asif Iqbal,Avishek Das,Omar Sharif,Mohammed Moshiul Hoque,Iqbal H. Sarker

Advances in Intelligent Systems and ComputingIntelligent Computing and Optimization（2021）

引用 3|浏览0

暂无评分

摘要

Part of Speech (POS) tagging is recognized as a significant research problem in the field of Natural Language Processing (NLP). It has considerable importance in several NLP technologies. However, developing an efficient POS tagger is a challenging task for resource-scarce languages like Bengali. This paper presents an empirical investigation of various POS tagging techniques concerning the Bengali language. An extensively annotated corpus of around 7390 sentences has been used for 16 POS tagging techniques, including eight stochastic based methods and eight transformation-based methods. The stochastic methods are uni-gram, bi-gram, tri-gram, unigram+bigram, unigram+bigram+trigram, Hidden Markov Model (HMM), Conditional Random Forest (CRF), Trigrams ‘n’ Tags (TnT) whereas the transformation methods are Brill with the combination of previously mentioned stochastic techniques. A comparative analysis of the tagging methods is performed using two tagsets (30-tag and 11-tag) with accuracy measures. Brill combined with CRF shows the highest accuracy of 91.83% (for 11 tagset) and 84.5% (for 30 tagset) among all the tagging techniques.

查看译文

关键词

Natural language processing, Part-of-speech tagging, POS tagset, Training, Evaluation

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要