AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
HABERTOR understands the language of the hatespeech datasets better, is 4-5 times faster, uses less than 1/3 of the memory, and has a better performance in hatespeech classification

HABERTOR: An Efficient and Effective Deep Hatespeech Detector

EMNLP 2020, pp.7486-7502, (2020)

Cited by: 0|Views126
Full Text
Bibtex
Weibo

Abstract

We present our HABERTOR model for detecting hatespeech in large scale user-generated content. Inspired by the recent success of the BERT model, we propose several modifications to BERT to enhance the performance on the downstream hatespeech classification task. HABERTOR inherits BERT’s architecture, but is different in four aspects: (i) i...More

Code:

Data:

0
Introduction
Highlights
  • The occurrence of hatespeech has been increasing (Barna, 2019)
  • Based on the above observation and analysis, we aim to investigate whether it is possible to achieve a better hatespeech prediction performance than state-of-the-art machine learning classifiers, including classifiers based on publicly available BERT model, while significantly reducing the number of parameters compared with the BERT model
  • Measurement: We evaluate models on seven metrics: Area Under the Curve (AUC), Average Precision (AP), False Positive Rate (FPR), False Negative Rate (FNR), F1 score4
  • We report FPR at 5% of FNR (FPR@5%FNR), meaning we allow 5% of hateful texts to be misclassified as normal ones, report FPR at that point
  • We presented the HABERTOR model for detecting hatespeech
  • HABERTOR understands the language of the hatespeech datasets better, is 4-5 times faster, uses less than 1/3 of the memory, and has a better performance in hatespeech classification
Results
  • 5.2.1 Performance comparison

    Table 3 shows the performance of all models on Yahoo dataset.
  • The authors see that Fermi worked worst among all models.
  • It is mainly because Fermi transfers the pre-trained embeddings from the USE model to a SVM classifier without further fine-tuning.
  • This limits Fermi’s ability to understand domain-specific contexts.
  • BERT-base performed the best among all baselines.
  • Distilled models worked worse than BERT-base due to their compression nature on BERT-base as the teacher model
Conclusion
  • The authors presented the HABERTOR model for detecting hatespeech.
  • HABERTOR understands the language of the hatespeech datasets better, is 4-5 times faster, uses less than 1/3 of the memory, and has a better performance in hatespeech classification.
  • HABERTOR outperforms 15 state-of-the-art hatespeech classifiers and generalizes well to unseen hatespeech datasets, verifying its efficiency and its effectiveness
Summary
  • Introduction:

    The occurrence of hatespeech has been increasing (Barna, 2019). It has become easier than before to reach a large audience quickly via social media, causing an increase of the temptation for inappropriate behaviors such as hatespeech, and potential damage to social systems.
  • Researchers developed human-crafted feature-based classifiers (Chatzakou et al, 2017; Davidson et al, 2017; Waseem and Hovy, 2016; MacAvaney et al, 2019), and proposed deep neural network architectures (Zampieri et al, 2019; Gamback and Sikdar, 2017; Park and Fung, 2017; Badjatiya et al, 2017; Agrawal and Awekar, 2018).
  • They might not explore all possible important features for hatespeech detection, ignored pre-trained language model understanding, or proposed uni-directional language models by reading from left to right or right to left
  • Objectives:

    Based on the above observation and analysis, the authors aim to investigate whether it is possible to achieve a better hatespeech prediction performance than state-of-the-art machine learning classifiers, including classifiers based on publicly available BERT model, while significantly reducing the number of parameters compared with the BERT model.
  • Results:

    5.2.1 Performance comparison

    Table 3 shows the performance of all models on Yahoo dataset.
  • The authors see that Fermi worked worst among all models.
  • It is mainly because Fermi transfers the pre-trained embeddings from the USE model to a SVM classifier without further fine-tuning.
  • This limits Fermi’s ability to understand domain-specific contexts.
  • BERT-base performed the best among all baselines.
  • Distilled models worked worse than BERT-base due to their compression nature on BERT-base as the teacher model
  • Conclusion:

    The authors presented the HABERTOR model for detecting hatespeech.
  • HABERTOR understands the language of the hatespeech datasets better, is 4-5 times faster, uses less than 1/3 of the memory, and has a better performance in hatespeech classification.
  • HABERTOR outperforms 15 state-of-the-art hatespeech classifiers and generalizes well to unseen hatespeech datasets, verifying its efficiency and its effectiveness
Tables
  • Table1: Statistics of the three datasets
  • Table2: Parameters Comparison between HABERTORVAFOQF vs. other LMs. “–” indicates not available
  • Table3: Performance of all models that we train on Yahoo train data, test on Yahoo test data and report results on Yahoo News and Yahoo Finance separately. Best baseline is underlined, better results than best baseline are bold
  • Table4: Generalizability of HABERTOR and top baselines. Report AUC, AP, and F1 on each test set
  • Table5: Comparison of the traditional FGM with a fixed and scalar noise magnitude, compared to the FGM with our proposed fine-grained and adaptive noise magnitude. Better results are in bold
  • Table6: Application of our models on the sentiment classification task using Amazon Prime Pantry reviews
  • Table7: Ablation study of HABERTOR on Yahoo dataset (i.e. both Yahoo News + Finance, to save space). Default results are in bold. Better results compared to the default one are underlined
Download tables as Excel
Related work
Funding
  • This work was supported in part by NSF grant CNS-1755536
Study subjects and analysis
hateful tweets: 5054
To further validate the generalizability of HABERTOR, we perform transfer-learning experiments on other two publicly available hatespeech datasets: Twitter (Waseem and Hovy, 2016), and Wikipedia (i.e. Wiki) (Wulczyn et al, 2017). The Twitter dataset consists of 16K annotated tweets, including 5,054 hateful tweets (i.e., 31%). The Wiki dataset has 115K labeled discussion comments from English Wikipedia talk’s page, including 13,590 hateful comments (i.e., 12%)

datasets: 3
The Wiki dataset has 115K labeled discussion comments from English Wikipedia talk’s page, including 13,590 hateful comments (i.e., 12%). The statistics of 3 datasets are shown in Table 1. Train/Dev/Test split: We split the dataset into train/dev/test sets with a ratio 70%/10%/20%

Reference
  • Sweta Agrawal and Amit Awekar. 2018. Deep learning for detecting cyberbullying across multiple social media platforms. In ECIR, pages 141–153.
    Google ScholarLocate open access versionFindings
  • Ayme Arango, Jorge Perez, and Barbara Poblete. 2019. Hate speech detection is not as easy as you may think: A closer look at model validation. In SIGIR, pages 45–54.
    Google ScholarLocate open access versionFindings
  • Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, and Vasudeva Varma. 2017. Deep learning for hate speech detection in tweets. In WWW Companion, pages 759–760.
    Google ScholarLocate open access versionFindings
  • Barna. 2019. U.S. adults believe hate speech has increased — mainly online. research releases in culture & media. https://www.barna.com/research/hate-speech-increased/.
    Locate open access versionFindings
  • Valerio Basile, Cristina Bosco, Elisabetta Fersini, Debora Nozza, Viviana Patti, Francisco Manuel Rangel Pardo, Paolo Rosso, and Manuela Sanguinetti. 2019. SemEval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter. In SemEval, pages 54–63.
    Google ScholarLocate open access versionFindings
  • Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In SP, pages 39–57.
    Google ScholarLocate open access versionFindings
  • Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, et al. 2018. Universal sentence encoder for english. In EMNLP: System Demonstrations, pages 169–174.
    Google ScholarLocate open access versionFindings
  • Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Mean birds: Detecting aggression and bullying on twitter. In WebSci, pages 13–22.
    Google ScholarLocate open access versionFindings
  • Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In EMNLP.
    Google ScholarFindings
  • Kevin Clark, Urvashi Khandelwal, Omer Levy, and Christopher D Manning. 2019. What does bert look at? an analysis of bert’s attention. arXiv preprint arXiv:1906.04341.
    Findings
  • Alexis Conneau, Holger Schwenk, Loıc Barrault, and Yann Lecun. 2017. Very deep convolutional networks for text classification. In EACL, pages 1107– 1116.
    Google ScholarLocate open access versionFindings
  • Maral Dadvar and Kai Eckert. 2018. Cyberbullying detection in social networks using deep learning based models; a reproducibility study. arXiv preprint arXiv:1812.08046.
    Findings
  • Maral Dadvar, Dolf Trieschnigg, and Franciska de Jong. 2014. Experts and machines against bullies: A hybrid approach to detect cyberbullies. In CCAI, pages 275–281.
    Google ScholarLocate open access versionFindings
  • Zihang Dai, Zhilin Yang, Yiming Yang, William W Cohen, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov. 2019. Transformer-xl: Attentive language models beyond a fixed-length context. In ACL.
    Google ScholarFindings
  • Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In ICWSM.
    Google ScholarFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT.
    Google ScholarFindings
  • Karthik Dinakar, Roi Reichart, and Henry Lieberman. 2011. Modeling the detection of textual cyberbullying. In ICWSM.
    Google ScholarFindings
  • Mai ElSherief, Vivek Kulkarni, Dana Nguyen, William Yang Wang, and Elizabeth Belding. 2018a. Hate lingo: A target-based linguistic analysis of hate speech in social media. In ICWSM.
    Google ScholarLocate open access versionFindings
  • Mai ElSherief, Shirin Nilizadeh, Dana Nguyen, Giovanni Vigna, and Elizabeth Belding. 2018b. Peer to peer hate: Hate speech instigators and their targets. In ICWSM.
    Google ScholarFindings
  • Bjorn Gamback and Utpal Kumar Sikdar. 2017. Using convolutional neural networks to classify hatespeech. In ACL workshop on abusive language online, pages 85–90.
    Google ScholarLocate open access versionFindings
  • Tommi Grondahl, Luca Pajola, Mika Juuti, Mauro Conti, and N Asokan. 2018. All you need is: Evading hate speech detection. In AISEC, pages 2–12.
    Google ScholarLocate open access versionFindings
  • Vijayasaradhi Indurthi, Bakhtiyar Syed, Manish Shrivastava, Nikhil Chakravartula, Manish Gupta, and Vasudeva Varma. 2019. FERMI at SemEval-2019 task 5: Using sentence embeddings to identify hate speech against immigrants and women in twitter. In SemEval, pages 70–74. ACL.
    Google ScholarLocate open access versionFindings
  • Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu. 2019. Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351.
    Findings
  • Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Herve Jegou, and Tomas Mikolov. 2016. Fasttext. zip: Compressing text classification models. arXiv preprint arXiv:1612.03651.
    Findings
  • Yoon Kim. 2014. Convolutional neural networks for sentence classification. In EMNLP.
    Google ScholarFindings
  • Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
    Findings
  • Taku Kudo and John Richardson. 2018. SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In EMNLP: System Demonstrations.
    Google ScholarFindings
  • Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2016. Adversarial examples in the physical world. In Adversarial examples in the physical world.
    Google ScholarFindings
  • Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In AAAI.
    Google ScholarFindings
  • Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. Albert: A lite bert for self-supervised learning of language representations. In ICLR.
    Google ScholarFindings
  • Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. In ICLR.
    Google ScholarFindings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
    Findings
  • Sean MacAvaney, Hao-Ren Yao, Eugene Yang, Katina Russell, Nazli Goharian, and Ophir Frieder. 2019. Hate speech detection: Challenges and solutions. PloS one, 14(8):e0221152.
    Google ScholarLocate open access versionFindings
  • Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton Van Den Hengel. 2015. Image-based recommendations on styles and substitutes. In SIGIR, pages 43–52.
    Google ScholarLocate open access versionFindings
  • Takeru Miyato, Andrew M Dai, and Ian Goodfellow. 2017. Adversarial training methods for semisupervised text classification. In ICLR.
    Google ScholarFindings
  • Guanyi Mou, Pengyi Ye, and Kyumin Lee. 2020. Swe2: Subword enriched and significant word emphasized framework for hate speech detection. In CIKM.
    Google ScholarFindings
  • Alex Nikolov and Victor Radivchev. 2019. Nikolovradivchev at semeval-2019 task 6: Offensive tweet classification with bert and ensembles. In ACL SemEval, pages 691–695.
    Google ScholarLocate open access versionFindings
  • Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang. 2016. Abusive language detection in online user content. In WWW.
    Google ScholarFindings
  • Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linares, Chiheb Trabelsi, Renato De Mori, and Yoshua Bengio. 2019. Quaternion recurrent neural networks. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Ji Ho Park and Pascale Fung. 2017. One-step and twostep classification for abusive language detection on twitter. In ACL Workshop on Abusive Language Online.
    Google ScholarLocate open access versionFindings
  • Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In In EMNLP.
    Google ScholarLocate open access versionFindings
  • Kelly Reynolds, April Kontostathis, and Lynne Edwards. 2011. Using machine learning to detect cyberbullying. In ICMLA, volume 2, pages 241–244.
    Google ScholarLocate open access versionFindings
  • Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019.
    Google ScholarFindings
  • Yi Tay, Aston Zhang, Anh Tuan Luu, Jinfeng Rao, Shuai Zhang, Shuohang Wang, Jie Fu, and Siu Cheung Hui. 2019. Lightweight and efficient neural natural language processing with quaternion networks. In ACL, pages 1494–1503.
    Google ScholarLocate open access versionFindings
  • Cynthia Van Hee, Els Lefever, Ben Verhoeven, Julie Mennes, Bart Desmet, Guy De Pauw, Walter Daelemans, and Veronique Hoste. 2015. Automatic detection and prevention of cyberbullying. In HUSO, pages 13–18.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS, pages 5998–6008.
    Google ScholarLocate open access versionFindings
  • Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In NAACL workshop, pages 88–93.
    Google ScholarLocate open access versionFindings
  • Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
    Findings
  • Ellery Wulczyn, Nithum Thain, and Lucas Dixon. 2017. Ex machina: Personal attacks seen at scale. In WWW, pages 1391–1399.
    Google ScholarLocate open access versionFindings
  • Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, and Ritesh Kumar. 2019. Semeval-2019 task 6: Identifying and categorizing offensive language in social media (offenseval). In SemEval.
    Google ScholarLocate open access versionFindings
  • Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, and Bill Dolan. 2020. Dialogpt: Large-scale generative pre-training for conversational response generation. In ACL.
    Google ScholarFindings
  • Ziqi Zhang and Lei Luo. 2019. Hate speech detection: A solved problem? the challenging case of long tail on twitter. Semantic Web, 10(5):925–945.
    Google ScholarLocate open access versionFindings
  • Ziqi Zhang, David Robinson, and Jonathan Tepper. 2018. Detecting hate speech on twitter using a convolution-gru based deep neural network. In ESWC, pages 745–760.
    Google ScholarLocate open access versionFindings
  • A.1 Parameter Estimation for pretraining HABERTOR with language model tasks is that the [CLS]’s embedding vector summarizes information of all other tokens via the attention Transformer network (Vaswani et al., 2017).
    Google ScholarFindings
  • Activation function on Quaternions: Similar to refers to the vocabulary size) as a decoder, we can (Tay et al., 2019; Parcollet et al., 2019), we use rewrite L1 as follows: a split activation function because of its stability and simplicity. Split activation function β on a
    Google ScholarLocate open access versionFindings
Author
Yifan Hu
Yifan Hu
Changwei Hu
Changwei Hu
Kevin Yen
Kevin Yen
Fei Tan
Fei Tan
Se Rim Park
Se Rim Park
Your rating :
0

 

Tags
Comments
小科