Dataset for identification of queerphobia

Shivum Banerjee,Hieu Pham Trung Nguyen

Journal of Student Research(2023)

引用 0|浏览0
暂无评分
摘要
While social media platforms have implemented many algorithmic approaches to moderating hate speech, there is a lack of datasets on queerphobia which has impeded efforts to automatically recognize and moderate queerphobic hate speech online. Queerphobic hate speech is speech that is intended to degrade, insult, or incite violence or prejudicial action against queer people, who are those from a sexuality, gender, or romantic minority. This speech results in worsened mental and emotional outcomes for queer people and can contribute to anti-queer violence. The goal of this study is to create a dataset of queerphobic YouTube comments to further efforts to identify and moderate queerphobic hate speech. To construct this dataset, 10,000 comments were sourced from YouTube videos which represent queerness. Then, volunteers manually annotated each comment in accordance with specific guidelines. Various natural language processing (NLP) models were used to extract features from the text, and several classifiers used these features to categorize comments as queerphobic or non-queerphobic. These NLP models illustrate a baseline for performance on this data. In making this dataset, we hope to further research in the recognition of digital queerphobia and make social media platforms safer for queer people. The dataset can be found at https://github.com/ShivumB/dataset-for-identification-of-queerphobia.
更多
查看译文
关键词
queerphobia,identification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要