Generating Counter Narratives against Online Hate Speech: Data and Strategies

ACL, pp. 1177-1190, 2020.

Cited by: 0|Views34
EI
Weibo:
To counter hatred online and avoid the undesired effects that come with content moderation, intervening in the discussion directly with textual responses is considered as a viable solution

Abstract:

Recently research has started focusing on avoiding undesired effects that come with content moderation, such as censorship and overblocking, when dealing with hatred online. The core idea is to directly intervene in the discussion with textual responses that are meant to counter the hate content and prevent it from further spreading. Ac...More

Code:

Data:

0
Full Text
Bibtex
Weibo
Introduction
  • Owing to the upsurge in the use of social media platforms over the past decade, Hate Speech (HS) has become a pervasive issue by spreading quickly and widely.
  • The standard approaches to prevent online hate spreading include the suspension of user accounts or deletion of hate comments from the social media platforms (SMPs), paving the way for the accusation of censorship and overblocking.
  • Still the authors believe that the authors must overstep reactive identifyand-delete strategies, to responsively intervene in the conversations (Bielefeldt et al, 2011; Jurgens et al, 2019)
  • In this line of action, some NonGovermental Organizations (NGOs) train operators to intervene in online hateful conversations by writing counter-narratives.
  • A CN should follow guidelines similar to those in ‘Get the Trolls Out’ project1, in order to avoid escalating the hatred in the discussion
Highlights
  • Owing to the upsurge in the use of social media platforms over the past decade, Hate Speech (HS) has become a pervasive issue by spreading quickly and widely
  • We propose using the recent large-scale unsupervised language model GPT-2 (Radford et al, 2019), which is capable of generating coherent text and can be fine-tuned and/or conditioned on various NLG tasks
  • To counter hatred online and avoid the undesired effects that come with content moderation, intervening in the discussion directly with textual responses is considered as a viable solution
  • Considering the aforementioned limitations, we presented a study on how to reduce data collection effort, using a mix of several strategies
  • We show promising results obtained by replacing crowd-filtering with an automatic classifier
  • We believe that the proposed framework can be useful for other NLG tasks such as paraphrase generation or text simplification
Results
  • The authors aim to keep the balance between the precision and recall and the authors opted for F1 score for model selection.
Conclusion
  • To counter hatred online and avoid the undesired effects that come with content moderation, intervening in the discussion directly with textual responses is considered as a viable solution.
  • In this scenario, automation strategies, such as natural language generation, are necessary to help NGO operators in their countering effort.
  • The authors believe that the proposed framework can be useful for other NLG tasks such as paraphrase generation or text simplification
Summary
  • Introduction:

    Owing to the upsurge in the use of social media platforms over the past decade, Hate Speech (HS) has become a pervasive issue by spreading quickly and widely.
  • The standard approaches to prevent online hate spreading include the suspension of user accounts or deletion of hate comments from the social media platforms (SMPs), paving the way for the accusation of censorship and overblocking.
  • Still the authors believe that the authors must overstep reactive identifyand-delete strategies, to responsively intervene in the conversations (Bielefeldt et al, 2011; Jurgens et al, 2019)
  • In this line of action, some NonGovermental Organizations (NGOs) train operators to intervene in online hateful conversations by writing counter-narratives.
  • A CN should follow guidelines similar to those in ‘Get the Trolls Out’ project1, in order to avoid escalating the hatred in the discussion
  • Results:

    The authors aim to keep the balance between the precision and recall and the authors opted for F1 score for model selection.
  • Conclusion:

    To counter hatred online and avoid the undesired effects that come with content moderation, intervening in the discussion directly with textual responses is considered as a viable solution.
  • In this scenario, automation strategies, such as natural language generation, are necessary to help NGO operators in their countering effort.
  • The authors believe that the proposed framework can be useful for other NLG tasks such as paraphrase generation or text simplification
Tables
  • Table1: Diversity analysis of the three datasets. Semantic diversity is reported in terms of CN type percentages, Lexical diversity in terms of Repetition Rate (RR - average over 5 shuffles)
  • Table2: Some examples of the categories relevant to our analysis. Hostile from CRAWL dataset, Denouncting from CROWD, Fact (other) from NICHE
  • Table3: Comparison of different approaches proposed in the literature according to the main characteristics required for the dataset
  • Table4: Evaluation results of best author configuration with different datasets. Novelty is computed w.r.t. to the corresponding training set, RR in the produced test output
  • Table5: Percentage of filtered pairs according to various filtering conditions
  • Table6: F1, Precision and Recall results for the two main classifier configurations we tested
  • Table7: Results for CN collection under various configurations. RR for ‘no suggestion’ is computed on NICHE dataset and the time needed is the one reported in (<a class="ref-link" id="cChung_et+al_2019_a" href="#rChung_et+al_2019_a">Chung et al, 2019</a>). Time is expressed in seconds. Pairsselec indicates the percentage of original author pairs that have been passed to the expert for reviewing, Pairsfinal indicates the percentage of selected pairs that have been accepted or modified by the expert themselves. Crowdtime is computed considering that annotators gave a score every 35 seconds, and we required two judgments per pair
  • Table8: Randomly sampled CNs generated from GPT-2 and TRF models trained on CROWD dataset
  • Table9: Randomly sampled CNs generated from GPT-2 and TRF models trained on NICHE dataset
  • Table10: Randomly sampled CNs, generated from GPT-2, filtered from various reviewer configurations
Download tables as Excel
Related work
Funding
  • This work was partly supported by the HATEMETER project within the EU Rights, Equality and Citizenship Programme 2014-2020
Reference
  • Susan Benesch. 2014. Countering dangerous speech: New ideas for genocide prevention. Washington, DC: United States Holocaust Memorial Museum.
    Google ScholarLocate open access versionFindings
  • Susan Benesch, Derek Ruths, Kelly P Dillon, Haji Mohammad Saleem, and Lucas Wright. 2016. Counterspeech on twitter: A field study. Dangerous Speech Project. Available at: https://dangerousspeech.org/counterspeech-ontwitter-a-field-study/.
    Findings
  • Nicola Bertoldi, Mauro Cettolo, and Marcello Federico. 201Cache-based online adaptation for machine translation enhanced computer assisted translation. In MT-Summit, pages 35–42.
    Google ScholarLocate open access versionFindings
  • Heiner Bielefeldt, Frank La Rue, and Githu Muigai. 2011. Ohchr expert workshops on the prohibition of incitement to national, racial or religious hatred. In Expert workshop on the Americas.
    Google ScholarLocate open access versionFindings
  • John Blatz, Erin Fitzgerald, George Foster, Simona Gandrabur, Cyril Goutte, Alex Kulesza, Alberto Sanchis, and Nicola Ueffing. 2004. Confidence estimation for machine translation. In Coling 2004: Proceedings of the 20th international conference on computational linguistics, pages 315–321.
    Google ScholarLocate open access versionFindings
  • Pete Burnap and Matthew L Williams. 2015. Cyber hate speech on twitter: An application of machine classification and statistical modeling for policy and decision making. Policy & Internet, 7(2):223–242.
    Google ScholarLocate open access versionFindings
  • Pete Burnap and Matthew L Williams. 2016. Us and them: identifying cyber hate on twitter across multiple protected characteristics. EPJ Data Science, 5(1):11.
    Google ScholarLocate open access versionFindings
  • Mauro Cettolo, Nicola Bertoldi, and Marcello Federico. 2014. The repetition rate of text as a predictor of the effectiveness of machine translation adaptation. In Proceedings of the 11th Biennial Conference of the Association for Machine Translation in the Americas (AMTA 2014), pages 166–179.
    Google ScholarLocate open access versionFindings
  • Yi-Ling Chung, Elizaveta Kuzmenko, Serra Sinem Tekiroglu, and Marco Guerini. 201CONAN COunter NArratives through nichesourcing: a multilingual dataset of responses to fight online hate speech. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2819–2829, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Fabio Del Vigna, Andrea Cimino, Felice DellOrletta, Marinella Petrocchi, and Maurizio Tesconi. 2017. Hate me, hate me not: Hate speech detection on facebook.
    Google ScholarFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.
    Google ScholarLocate open access versionFindings
  • Nouha Dziri, Ehsan Kamalloo, Kory W Mathewson, and Osmar Zaiane. 2018. Augmenting neural response generation with context-aware topical attention. arXiv preprint arXiv:1811.01063.
    Findings
  • Paula Fortuna and Sergio Nunes. 2018. A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR), 51(4):85.
    Google ScholarLocate open access versionFindings
  • Ona de Gibert, Naiara Perez, Aitor Garcıa-Pablos, and Montse Cuadros. 2018. Hate speech dataset from a white supremacy forum. EMNLP 2018, page 11.
    Google ScholarLocate open access versionFindings
  • Njagi Dennis Gitari, Zhang Zuping, Hanyurwimfura Damien, and Jun Long. 20A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering, 10(4):215–230.
    Google ScholarLocate open access versionFindings
  • Sergey Golovanov, Rauf Kurbanov, Sergey Nikolenko, Kyryl Truskovskyi, Alexander Tselousov, and Thomas Wolf. 2019. Large-scale transfer learning for natural language generation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6053–6058.
    Google ScholarLocate open access versionFindings
  • Ari Holtzman, Jan Buys, Maxwell Forbes, and Yejin Choi. 2019. The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751.
    Findings
  • Homa Hosseinmardi, Sabrina Arredondo Mattson, Rahat Ibn Rafiq, Richard Han, Qin Lv, and Shivakant Mishra. 2015. Detection of cyberbullying incidents on the instagram social network. arXiv preprint arXiv:1503.03909.
    Findings
  • Xinyu Hua, Zhe Hu, and Lu Wang. 20Argument generation with retrieval, planning, and realization. arXiv preprint arXiv:1906.03717.
    Findings
  • David Jurgens, Libby Hemphill, and Eshwar Chandrasekharan. 2019. A just and comprehensive strategy for using NLP to address online abuse. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3658– 3666, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Filip Klubicka and Raquel Fernandez. 2018. Examining a hate speech corpus for hate speech detection and popularity prediction. In LREC.
    Google ScholarFindings
  • Ritesh Kumar, Atul Kr Ojha, Shervin Malmasi, and Marcos Zampieri. 2018. Benchmarking aggression identification in social media. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), pages 1–11.
    Google ScholarLocate open access versionFindings
  • Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942.
    Findings
  • Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 110–119, San Diego, California. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ruli Manurung, Graeme Ritchie, and Henry Thompson. 2008. An implementation of a flexible authorreviewer model of generation using genetic algorithms. In Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation, pages 272–281, The University of the Philippines Visayas Cebu College, Cebu City, Philippines. De La Salle University, Manila, Philippines.
    Google ScholarLocate open access versionFindings
  • Binny Mathew, Navish Kumar, Pawan Goyal, Animesh Mukherjee, et al. 2018. Analyzing the hate and counter speech accounts on twitter. arXiv preprint arXiv:1812.02712.
    Findings
  • Binny Mathew, Punyajoy Saha, Hardik Tharad, Subham Rajgaria, Prajwal Singhania, Suman Kalyan Maity, Pawan Goyal, and Animesh Mukherjee. 2019. Thou shalt not hate: Countering online hate speech. In Proceedings of the International AAAI Conference on Web and Social Media, volume 13, pages 369–380.
    Google ScholarLocate open access versionFindings
  • Kevin Munger. 2017. Tweetment effects on the tweeted: Experimentally reducing racist harassment. Political Behavior, 39(3):629–649.
    Google ScholarLocate open access versionFindings
  • Jon Oberlander and Chris Brew. 2000. Stochastic text generation. Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 358(1769):1373–1387.
    Google ScholarLocate open access versionFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311–318. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jing Qian, Anna Bethke, Yinyin Liu, Elizabeth Belding, and William Yang Wang. 2019. A benchmark dataset for learning to intervene in online hate speech. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4757–4766, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog, 1(8).
    Google ScholarLocate open access versionFindings
  • Brian Richards. 1987. Type/token ratios: What do they really tell us? Journal of child language, 14(2):201– 209.
    Google ScholarLocate open access versionFindings
  • Bjorn Ross, Michael Rist, Guillermo Carbonell, Benjamin Cabrera, Nils Kurowsky, and Michael Wojatzki. 2017. Measuring the reliability of hate speech annotations: The case of the european refugee crisis. arXiv preprint arXiv:1701.08118.
    Findings
  • Carla Schieb and Mike Preuss. 2016. Governing hate speech by means of counterspeech on facebook. In 66th ica annual conference, at fukuoka, japan, pages 1–23.
    Google ScholarFindings
  • Anna Schmidt and Michael Wiegand. 2017. A survey on hate speech detection using natural language processing. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pages 1–10.
    Google ScholarLocate open access versionFindings
  • Leandro Araujo Silva, Mainack Mondal, Denzil Correa, Fabrıcio Benevenuto, and Ingmar Weber. 2016. Analyzing the targets of hate in online social media. In ICWSM, pages 687–690.
    Google ScholarLocate open access versionFindings
  • Tanya Silverman, Christopher J Stewart, Jonathan Birdwell, and Zahed Amanullah. 2016. The impact of counter-narratives. Institute for Strategic Dialogue, London. https://www.strategicdialogue.org/wp-content/uploads/2016/08/Impact-ofCounter-Narratives ONLINE.pdf–73.
    Findings
  • Lucia Specia and Atefeh Farzindar. 2010. Estimating machine translation post-editing effort with hter. In Proceedings of the Second Joint EM+/CNGL Workshop Bringing MT to the User: Research on Integrating MT in the Translation Industry (JEC 10), pages 33–41.
    Google ScholarLocate open access versionFindings
  • Rachele Sprugnoli, Stefano Menini, Sara Tonelli, Filippo Oncini, and Enrico Piras. 2018. Creating a whatsapp dataset to study pre-teen cyberbullying. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), pages 51–59.
    Google ScholarLocate open access versionFindings
  • Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of the NAACL student research workshop, pages 88–93.
    Google ScholarLocate open access versionFindings
  • Thomas Wolf, Victor Sanh, Julien Chaumond, and Clement Delangue. 2019. Transfertransfo: A transfer learning approach for neural network based conversational agents. arXiv preprint arXiv:1901.08149.
    Findings
  • Guang Xiang, Bin Fan, Ling Wang, Jason Hong, and Carolyn Rose. 2012. Detecting offensive tweets via topical feature discovery over a large scale twitter corpus. In Proceedings of the 21st ACM international conference on Information and knowledge management, pages 1980–1984. ACM.
    Google ScholarLocate open access versionFindings
  • Jingjing Xu, Xuancheng Ren, Junyang Lin, and Xu Sun. 2018. Diversity-promoting GAN: A crossentropy based generative adversarial network for diversified text generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3940–3949, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2019. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.
    Findings
  • Scott R Stroud and William Cox. 2018. The varieties of feminist counterspeech in the misogynistic online world. In Mediating Misogyny, pages 293–310. Springer.
    Google ScholarLocate open access versionFindings
  • Marco Turchi, Matteo Negri, and Marcello Federico. 2013. Coping with the subjectivity of human judgements in mt quality estimation. In Proceedings of the Eighth Workshop on Statistical Machine Translation, pages 240–251.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
    Google ScholarLocate open access versionFindings
  • Ke Wang and Xiaojun Wan. 2018. Sentigan: Generating sentimental texts via mixture adversarial networks. In IJCAI, pages 4446–4452.
    Google ScholarLocate open access versionFindings
  • William Warner and Julia Hirschberg. 2012. Detecting hate speech on the world wide web. In Proceedings of the Second Workshop on Language in Social Media, pages 19–26. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Zeerak Waseem. 2016. Are you a racist or am i seeing things? annotator influence on hate speech detection on twitter. In Proceedings of the first workshop on NLP and computational social science, pages 138– 142.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments