Transformer Based Sentiment Analysis on Code Mixed Data

Koyyalagunta Krishna Sampath,M. Supriya

Procedia Computer Science(2024)

引用 0|浏览0
暂无评分
摘要
In India, a country known for its linguistic diversity, code mixing is a common practice, and it has a profound impact on the way people communicate through various mediums, including social media platforms and everyday conversations. The prevalence of code-mixing in social media platforms presents a substantial hurdle for machine translation and language processing tasks. The abundance of unstructured text in code-mixed form on these platforms highlights a crucial research domain within NLP. The blending of Hindi and English, known as Hinglish, and other mixed case text like Malayalam-English, Tamil-English, Telugu- English are particularly prevalent among the younger generation while communication in social media and requires appropriate processing to aid comprehension by both monolingual users and language processing models. Manual translation of this type of data proves to be laborious due to challenges like limited vocabulary, potential misunderstandings of context, grammatical errors, biases, and various other issues. Additionally, existing translation models tend to perform more effectively on monolingual language rather than code-mixed data. Therefore, it is more desirable to build models that can translate code-mixed data.This study tries to convert code-mixed Hinglish, Malayalam-English, Tamil-English, Telugu-English language in Romanised script to monolingual English which can further be given as input to NLP applications like Sentiment Analysis. This is achieved by finetuning pretrained models like IndicLID for Language Identification (LID) module and use an ensemble approach for transliteration + translation using Indictrans and IndicXlit for code mixed machine translation which will be given as input to classification algorithm which performs Sentiment Analysis and predict the sentiment. It is observed that this approach of translation of code-mixed test perform better than traditional machine translation for Indian languages Hindi, Tamil, Telugu and Malayalam.
更多
查看译文
关键词
Natural Language Processing,Code Mixing,Language Identification,Sentiment Analysis,Translation,Transliteration,,Transformers
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要