Arabic Offensive Language on Twitter: Analysis and Experiments

Proceedings of the Sixth Arabic Natural Language Processing Workshop(2020)

引用 150|浏览361
暂无评分
摘要
Detecting offensive language on Twitter has many applications ranging from detecting/predicting bullying to measuring polarization. In this paper, we focus on building a large Arabic offensive tweet dataset. We introduce a method for building a dataset that is not biased by topic, dialect, or target. We produce the largest Arabic dataset to date with special tags for vulgarity and hate speech. We thoroughly analyze the dataset to determine which topics, dialects, and gender are most associated with offensive tweets and how Arabic speakers useoffensive language. Lastly, we conduct many experiments to produce strong results (F1 =83.2) on the dataset using SOTA techniques.
更多
查看译文
关键词
twitter,language
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要