TamilEmo: Fine-grained Emotion Detection Dataset for Tamil

Charangan Vasantharajan,Ruba Priyadharshini,Prasanna Kumar Kumarasen,Rahul Ponnusamy,Sathiyaraj Thangasamy,Sean Benhur,Thenmozhi Durairaj,Kanchana Sivanraju,Anbukkarasi Sampath,Bharathi Raja Chakravarthi

arxiv（2023）

引用 0|浏览6

暂无评分

摘要

Emotional Analysis from textual input has been considered both a challenging and interesting task in Natural Language Processing. However, due to the lack of datasets in low-resource languages (e.g. Tamil), it is difficult to conduct research of high standards in this area. Therefore we introduce a large manually annotated dataset of more than 42k Tamil YouTube comments, labeled for 31 emotions for emotion recognition. The goal of this dataset is to improve emotion detection in multiple downstream tasks in Tamil. We have also created three different groupings of our emotions namely 3-class, 7-class, and 31-class, and evaluated the models’ performance in each category of the grouping. We ran several baselines of different models and our MuRIL model has achieved the highest macro F1 score of 0.67 across our 3-class group dataset. In 7-class and 31-class groups, the MuRIL and Random Forest models performed well with a macro F1 score of 0.52 and 0.29 respectively.

查看译文

关键词

Emotion Detection,Fine-grained Dataset,Low Resource,Tamil

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要