HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive Speech Detection via Large Language Models
CoRR(2024)
摘要
The ubiquitousness of social media has led to the need for reliable and
efficient detection of offensive content to limit harmful effects. This has led
to a proliferation of datasets and models related to detecting offensive
content. While sophisticated models have attained strong performance on
individual datasets, these models often do not generalize due to differences
between how "offensive content" is conceptualized, and the resulting
differences in how these datasets are labeled. In this paper, we introduce
HateCOT, a dataset of 52,000 samples drawn from diverse existing sources with
explanations generated by GPT-3.5-Turbo and human-curated. We show that
pre-training models for the detection of offensive content on HateCOT
significantly boots open-sourced Language Models on three benchmark datasets in
both zero and few-shot settings, despite differences in domain and task.} We
further find that HateCOT enables effective K-shot fine-tuning in the
low-resource settings.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要