Low-Resource Counterspeech Generation for Indic Languages: The Case of Bengali and Hindi
CoRR(2024)
摘要
With the rise of online abuse, the NLP community has begun investigating the
use of neural architectures to generate counterspeech that can "counter" the
vicious tone of such abusive speech and dilute/ameliorate their rippling effect
over the social network. However, most of the efforts so far have been
primarily focused on English. To bridge the gap for low-resource languages such
as Bengali and Hindi, we create a benchmark dataset of 5,062 abusive
speech/counterspeech pairs, of which 2,460 pairs are in Bengali and 2,602 pairs
are in Hindi. We implement several baseline models considering various
interlingual transfer mechanisms with different configurations to generate
suitable counterspeech to set up an effective benchmark. We observe that the
monolingual setup yields the best performance. Further, using synthetic
transfer, language models can generate counterspeech to some extent;
specifically, we notice that transferability is better when languages belong to
the same language family.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要