Chrome Extension
WeChat Mini Program
Use on ChatGLM

Deep Learning-Based Similar Languages’ POS Tagging: Experiments on Bhojpuri, Maithili, and Magahi

Soft Computing Theories and Applications(2023)

Cited 0|Views4
No score
Abstract
Monolingual corpora and similar language resources are vastly available for a few languages. These resources stimulate the exploration and building of potential NLP tools for new languages or dialects. This paper deals with the part-of-speech (POS) tagging for the Indo-Aryan languages, i.e., Magahi, Maithili, and Bhojpuri, a dialect of Hindi. The POS model is trained by BiLSTM-CRF and explores the effectiveness of Word2Vec, GloVe as word and FastText, and BPE as subword-level embeddings, trained on the raw corpus of these languages. All these languages are dialects of Hindi; hence, multilingual embedding at the BPE level has been evaluated. Better results are obtained than with monolingual BPE embedding. However, the best results have been obtained from word embeddings, i.e., GloVe on Maithili and Magahi, with 81.23% and 82.24%, respectively.
More
Translated text
Key words
POS tagging,Low-resource language,Word embedding
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined