Abusive Content Detection in Arabic Tweets Using Multi-Task Learning and Transformer-Based Models

APPLIED SCIENCES-BASEL(2023)

引用 1|浏览3
暂无评分
摘要
Different social media platforms have become increasingly popular in the Arab world in recent years. The increasing use of social media, however, has also led to the emergence of a new challenge in the form of abusive content, including hate speech, offensive language, and abusive language. Existing research work focuses on automatic abusive content detection as a binary classification problem. In addition, the existing research work on the automatic detection task surrounding abusive Arabic content fails to tackle the dialect-specific phenomenon. Consequently, this has led to two important issues in the automatic abusive Arabic content detection task. In this study, we used a multi-aspect annotation schema to tackle the automatic abusive content detection problem in Arabic countries, based on the multi-class classification task and the dialectal Arabic (DA)-specific phenomenon. More precisely, the multi-aspect annotation schema includes five attributes: directness, hostility, target, group, and annotator. We specifically developed a framework to automatically detecting abusive content on Twitter using natural language processing (NLP) techniques. The developed framework used different models of machine learning (ML), deep learning (DL), and pretrained Arabic language models (LMs) using the multi-aspect annotation dataset. In addition, to investigate the impact of the other approaches, such as multi-task learning (MTL), we developed four MTL models built on top of a pretrained DA language model (called MARBERT) and trained on the multi-aspect annotation dataset. Our MTL models and pretrained Arabic LMs enhanced the performance compared to the existing DL model mentioned in the literature.
更多
查看译文
关键词
abusive content, dialectal Arabic (DA), NLP, DL, multitask learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要