Multilingual Hate Speech Detection: Comparison of Transfer Learning Methods to Classify German, Italian, and Spanish Posts.

Jan Fillies, Michael Peter Hoffmann,Adrian Paschke

2023 IEEE International Conference on Big Data (BigData)(2023)

引用 0|浏览2
暂无评分
摘要
With the increase of digital communication, a surge in online hate speech can be witnessed. Recent studies have concentrated on automated supervised detection of hate speech. However, there remains limited understanding of an effective strategy for identifying multilingual hate speech in social media posts. This study introduces an innovate experimental design for multilingual hate speech detection. It compares different approaches to automatically detect multilingual hate speech through a series of experiments and creates a classification algorithm for hate speech in German, Italian and Spanish text-based social media content. The study creates monolingual, multilingual, and translated datasets specific to the language triplet. Subsequently, the research explores suitable models for multilingual hate speech detection, evaluating a total of seven transformer-based models along with corresponding SVM models on the constructed datasets. The findings indicate that all chosen transformer-based models outperform the baseline SVM models. The research highlights the superiority of a multilingual approach, utilizing XLM-RoBERTa as a classifier model, over monolingual, multilingual, and translation-based approaches. Furthermore, the study demonstrates that translation-based methods in connection to the model DistillBERT can serve as viable alternatives to the multilingual XLM-RoBERTa approach, particularly in scenarios where computational resources are restricted and processing speed is of importance.
更多
查看译文
关键词
Machine Learning,Multilingual,Hate Speech,NLP,Big Data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要