Cross-lingual Offensive Language Detection: A Systematic Review of Datasets, Transfer Approaches and Challenges
CoRR(2024)
摘要
The growing prevalence and rapid evolution of offensive language in social
media amplify the complexities of detection, particularly highlighting the
challenges in identifying such content across diverse languages. This survey
presents a systematic and comprehensive exploration of Cross-Lingual Transfer
Learning (CLTL) techniques in offensive language detection in social media. Our
study stands as the first holistic overview to focus exclusively on the
cross-lingual scenario in this domain. We analyse 67 relevant papers and
categorise these studies across various dimensions, including the
characteristics of multilingual datasets used, the cross-lingual resources
employed, and the specific CLTL strategies implemented. According to "what to
transfer", we also summarise three main CLTL transfer approaches: instance,
feature, and parameter transfer. Additionally, we shed light on the current
challenges and future research opportunities in this field. Furthermore, we
have made our survey resources available online, including two comprehensive
tables that provide accessible references to the multilingual datasets and CLTL
methods used in the reviewed literature.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要