AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We propose FIND – a framework which enables humans to debug deep learning text classifiers by disabling irrelevant hidden features

FIND: Human in the Loop Debugging Deep Text Classifiers

EMNLP 2020, (2020)

被引用0|浏览10
下载 PDF 全文
引用
微博一下

摘要

Since obtaining a perfect training dataset (i.e., a dataset which is considerably large, unbiased, and well-representative of unseen cases) is hardly possible, many real-world text classifiers are trained on the available, yet imperfect, datasets. These classifiers are thus likely to have undesirable properties. For instance, they may hav...更多

代码

数据

0
简介
  • With sufficient and high-quality training data, deep learning models can perform incredibly well (Zhang et al, 2015; Wang et al, 2019).
  • The available datasets are small, full of regular but irrelevant words, and contain unintended biases (Wiegand et al, 2019; Gururangan et al, 2018).
  • These can lead to suboptimal models with undesirable properties.
  • The models may have biases against some sub-populations or may not work effectively in the wild as they overfit the imperfect training data
重点内容
  • Deep learning has become the dominant approach to address most Natural Language Processing (NLP) tasks, including text classification
  • The rank C feature in Figure 5 got a negative score because some participants believed that this word cloud was relevant to the positive class, but the model used this feature as evidence for the negative class (Positive:Negative = 0.209:0.385)
  • After investigating the word clouds of the convolutional neural networks (CNNs) features, we found that some of them detected patterns containing both gender-related terms and occupation-related terms such as “his surgical expertise” and “she supervises nursing students”
  • Most of the Mechanical Turk (MTurk) participants answered that these word clouds were relevant to the occupations, and the corresponding features were not disabled
  • Using the proposed framework on CNN text classifiers, we found that (i) word clouds generated by running layer-wise relevance propagation (LRP) on the training data accurately revealed the behaviors of CNN features, (ii) some of the learned features might be more useful to the task than the others and (iii) disabling the irrelevant or harmful features could improve the model predictive performance and reduce unintended biases in the model
  • What is an effective way to understand each feature? We exemplified this with two word clouds representing each bidirectional LSTM networks (BiLSTMs) feature in Appendix C, and we plan to experiment with advanced visualizations such as LSTMVis (Strobelt et al, 2018) in the future
方法
  • All datasets and their splits used in the experiments are listed in Table 1.
  • The authors ran and improved three models, using different random seeds, independently of one another, and the reported results are the average of the three runs.
  • The authors used 1D CNNs with the same structures for all the tasks and datasets.
  • The authors used pre-trained 300-dim GloVe vectors (Pennington et al, 2014) as non-trainable weights in the embedding layers.
  • The authors used iNNvestigate (Alber et al, 2018) to run LRP on CNN features.
  • Each question was answered by ten workers and the answers were aggregated using majority votes or average scores depending on the question type
结果
  • Results and Discussions

    Figure 4 shows the distribution of average feature scores from one of the three CNN instances for the Yelp dataset.
  • The authors asked one annotator to consider all the word clouds again and disable every feature for which the prominent n-gram patterns contained any genderrelated terms, no matter whether the patterns detect occupation-related terms.
  • With this new disabling policy, 12 out of 30 features were disabled on average, and the model biases further decreased, as shown in Figure 7 (Debugged (One)).
  • Forcing the models to use only relevant features, increased the macro F1 on the Religion dataset
结论
  • The authors proposed FIND, a framework which enables humans to debug deep text classifiers by disabling irrelevant or harmful features.
  • Using ReLU as activation functions in LSTM cells renders the features non-negative.
  • They can be summarized using one word cloud which is more practical for debugging.
  • The authors believe that the work will inspire other researchers to foster advances in both topics towards the more tangible goal of model debugging
总结
  • Introduction:

    With sufficient and high-quality training data, deep learning models can perform incredibly well (Zhang et al, 2015; Wang et al, 2019).
  • The available datasets are small, full of regular but irrelevant words, and contain unintended biases (Wiegand et al, 2019; Gururangan et al, 2018).
  • These can lead to suboptimal models with undesirable properties.
  • The models may have biases against some sub-populations or may not work effectively in the wild as they overfit the imperfect training data
  • Methods:

    All datasets and their splits used in the experiments are listed in Table 1.
  • The authors ran and improved three models, using different random seeds, independently of one another, and the reported results are the average of the three runs.
  • The authors used 1D CNNs with the same structures for all the tasks and datasets.
  • The authors used pre-trained 300-dim GloVe vectors (Pennington et al, 2014) as non-trainable weights in the embedding layers.
  • The authors used iNNvestigate (Alber et al, 2018) to run LRP on CNN features.
  • Each question was answered by ten workers and the answers were aggregated using majority votes or average scores depending on the question type
  • Results:

    Results and Discussions

    Figure 4 shows the distribution of average feature scores from one of the three CNN instances for the Yelp dataset.
  • The authors asked one annotator to consider all the word clouds again and disable every feature for which the prominent n-gram patterns contained any genderrelated terms, no matter whether the patterns detect occupation-related terms.
  • With this new disabling policy, 12 out of 30 features were disabled on average, and the model biases further decreased, as shown in Figure 7 (Debugged (One)).
  • Forcing the models to use only relevant features, increased the macro F1 on the Religion dataset
  • Conclusion:

    The authors proposed FIND, a framework which enables humans to debug deep text classifiers by disabling irrelevant or harmful features.
  • Using ReLU as activation functions in LSTM cells renders the features non-negative.
  • They can be summarized using one word cloud which is more practical for debugging.
  • The authors believe that the work will inspire other researchers to foster advances in both topics towards the more tangible goal of model debugging
表格
  • Table1: Datasets used in the experiments
  • Table2: Results (Average ± SD) of Experiment 1: Yelp, CNNs
  • Table3: Results (Average ± SD) of Experiment 1: Amazon Products, CNNs
  • Table4: Extra results (Average ± SD) of Experiment 1: Yelp, BiLSTMs
  • Table5: Extra results (Average ± SD) of Experiment 1: Amazon Products, BiLSTMs
  • Table6: Results (Average ± SD) of Experiment 2: Biosbias, CNNs
  • Table7: Results (Average ± SD) of Experiment 2: Waseem & Wikitoxic, CNNs
  • Table8: Results (Average ± SD) of Experiment 3: 20Newsgroups & Religion, CNNs
  • Table9: Results (Average ± SD) of Experiment 3: Sentiment Analysis (Amazon Clothes), CNNs
Download tables as Excel
相关工作
  • Analyzing deep NLP models – There has been substantial work in gaining better understanding of complex, deep neural NLP models. By visualizing dense hidden vectors, Li et al (2016) found that some dimensions of the final representation learned by recurrent neural networks capture the effect of intensification and negation in the input text. Karpathy et al (2015) revealed the existence of interpretable cells in a character-level LSTM model for language modelling. For example, they found a cell acting as a line length counter and cells checking if the current letter is inside a parenthesis or a quote. Jacovi et al (2018) presented interesting findings about CNNs for text classification including the fact that one convolutional filter may detect more than one n-gram pattern and may also suppress negative n-grams. Many recent papers studied several types of knowledge in BERT (Devlin et al, 2019), a deep transformer-based model for language understanding, and found that syntactic information is mostly captured in the middle BERT layers while the final BERT layers are the most task-specific (Rogers et al, 2020). Inspired by many findings, we make the assumption that each dimension of the final representation (i.e., the vector before the output layer) captures patterns or qualities in the input which are useful for classification. Therefore, understanding the roles of these dimensions (we refer to them as features) is a prerequisite for effective human-in-the-loop model debugging, and we exploit an explanation method to gain such an understanding. Explaining predictions from text classifiers – Several methods have been devised to generate explanations supporting classifications in many forms, such as natural language texts (Liu et al, 2019), rules (Ribeiro et al, 2018), extracted rationales (Lei et al, 2016), and attribution scores (Lertvittayakumjorn and Toni, 2019). Some explanation methods, such as LIME (Ribeiro et al, 2016) and SHAP (Lundberg and Lee, 2017), are model-agnostic and do not require access to model parameters. Other methods access the model architectures and parameters to generate the explanations, such as DeepLIFT (Shrikumar et al, 2017) and LRP (layer-wise relevance propagation) (Bach et al, 2015; Arras et al, 2016). In this work, we use LRP to explain not the predictions but the learned features so as to expose the model behavior to humans and enable informed model debugging.
基金
  • The same metrics of the Religion dataset are 0.731 and 0.799. This shows that disabling irrelevant features mildly undermined the predictive performance on the indistribution dataset, but clearly enhanced the performance on the out-of-distribution dataset (see Figure 8, left). This is especially evident for the Atheism class for which the F1 score increased around 15% absolute
研究对象与分析
workers: 10
Finally, we used Amazon Mechanical Turk (MTurk) to collect crowdsourced responses for selecting features to disable. Each question was answered by ten workers and the answers were aggregated using majority votes or average scores depending on the question type (as explained next).

workers: 10
Finally, we used Amazon Mechanical Turk (MTurk) to collect crowdsourced responses for selecting features to disable. Each question was answered by ten workers and the answers were aggregated using majority votes or average scores depending on the question type (as explained next). 5 Exp 1: Feasibility Study

out-of-distribution datasets: 3
We used, as a training dataset, Amazon Clothes, with reviews of clothing, shoes, 5http://qwone.com/ ̃jason/20Newsgroups/. and jewelry products (He and McAuley, 2016), and as test sets three out-of-distribution datasets – Amazon Music (He and McAuley, 2016), Amazon Mixed (Zhang et al, 2015), and the Yelp dataset (which was used in Experiment 1). Amazon Music contains only reviews from the “Digital Music” product category which was found to have an extreme distribution shift from the clothes category (Hendrycks et al, 2020)

outof-distribution datasets: 3
For instance, one of the disabled features was highly activated by the pattern “my .... year old” which often appeared in positive reviews such as “my 3 year old son loves this.”. However, these correlated features are not very useful for the three outof-distribution datasets (Music, Mixed, and Yelp). Disabling them made the model focus more on the right evidence and increased the average macro F1 for the three datasets, as shown in Figure 8 (right)

datasets: 3
However, these correlated features are not very useful for the three outof-distribution datasets (Music, Mixed, and Yelp). Disabling them made the model focus more on the right evidence and increased the average macro F1 for the three datasets, as shown in Figure 8 (right). Nonetheless, the performance improvement here was not as apparent as in the previous task because, even without feature disabling, the majority of the features are relevant to the task and can lead the model to the correct predictions in most cases.6

引用论文
  • Maximilian Alber, Sebastian Lapuschkin, Philipp Seegerer, Miriam Hagele, Kristof T Schutt, Gregoire Montavon, Wojciech Samek, Klaus-Robert Muller, Sven Dahne, and Pieter-Jan Kindermans. 2018. innvestigate neural networks! arXiv preprint arXiv:1808.04260.
    Findings
  • Leila Arras, Franziska Horn, Gregoire Montavon, Klaus-Robert Muller, and Wojciech Samek. 2016. Explaining predictions of non-linear classifiers in NLP. In Proceedings of the 1st Workshop on Representation Learning for NLP, pages 1–7, Berlin, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Leila Arras, Gregoire Montavon, Klaus-Robert Muller, and Wojciech Samek. 2017. Explaining recurrent neural network predictions in sentiment analysis. In Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 159–168, Copenhagen, Denmark. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Sebastian Bach, Alexander Binder, Gregoire Montavon, Frederick Klauschen, Klaus-Robert Muller, and Wojciech Samek. 2015. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLOS ONE, 10(7):1–46.
    Google ScholarLocate open access versionFindings
  • Yujia Bao, Shiyu Chang, Mo Yu, and Regina Barzilay. 2018. Deriving machine attention from human rationales. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1903–1913, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Pengyu Cheng, Martin Renqiang Min, Dinghan Shen, Christopher Malon, Yizhe Zhang, Yitong Li, and Lawrence Carin. 2020. Improving disentangled text representation learning with information-theoretic guidance. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7530–7541, Online. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Maria De-Arteaga, Alexey Romanov, Hanna Wallach, Jennifer Chayes, Christian Borgs, Alexandra Chouldechova, Sahin Geyik, Krishnaram Kenthapadi, and Adam Tauman Kalai. 2019. Bias in bios: A case study of semantic representation bias in a high-stakes setting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19, page 120–128, New York, NY, USA. Association for Computing Machinery.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Lucas Dixon, John Li, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. 2018. Measuring and mitigating unintended bias in text classification. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 67–73.
    Google ScholarLocate open access versionFindings
  • Bjorn Gamback and Utpal Kumar Sikdar. 2017. Using convolutional neural networks to classify hatespeech. In Proceedings of the First Workshop on Abusive Language Online, pages 85–90, Vancouver, BC, Canada. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yvette Graham, Nitika Mathur, and Timothy Baldwin. 2014. Randomized significance tests in machine translation. In Proceedings of the Ninth Workshop on Statistical Machine Translation, pages 266–274, Baltimore, Maryland, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel Bowman, and Noah A. Smith. 2018. Annotation artifacts in natural language inference data. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 107–112, New Orleans, Louisiana. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ruining He and Julian McAuley. 2016. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In proceedings of the 25th international conference on world wide web, pages 507–517.
    Google ScholarLocate open access versionFindings
  • Dan Hendrycks, Xiaoyuan Liu, Eric Wallace, Adam Dziedzic Rishabh Krishnan, and Dawn Song. 2020. Pretrained transformers improve out-of-distribution robustness. In Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
    Google ScholarLocate open access versionFindings
  • Benjamin Hoover, Hendrik Strobelt, and Sebastian Gehrmann. 2020. exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformer Models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 187–196, Online. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Alon Jacovi, Oren Sar Shalom, and Yoav Goldberg. 2018. Understanding convolutional neural networks for text classification. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 56–65, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ayush Jaiswal, Daniel Moyer, Greg Ver Steeg, Wael AbdAlmageed, and Premkumar Natarajan. 2019. Invariant representations through adversarial forgetting. arXiv preprint arXiv:1911.04060.
    Findings
  • Jae-young Jo and Sung-Hyon Myaeng. 2020. Roles and utilization of attention heads in transformerbased neural language models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3404–3417, Online. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Rie Johnson and Tong Zhang. 2015. Effective use of word order for text categorization with convolutional neural networks. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 103–112, Denver, Colorado. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Andrej Karpathy, Justin Johnson, and Fei-Fei Li. 2015. Visualizing and understanding recurrent networks. CoRR, abs/1506.02078.
    Findings
  • Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1746–1751, Doha, Qatar. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Todd Kulesza, Margaret Burnett, Weng-Keen Wong, and Simone Stumpf. 2015. Principles of explanatory debugging to personalize interactive machine learning. In Proceedings of the 20th international conference on intelligent user interfaces, pages 126–137.
    Google ScholarLocate open access versionFindings
  • Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2016. Rationalizing neural predictions. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 107–117, Austin, Texas. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Piyawat Lertvittayakumjorn and Francesca Toni. 2019. Human-grounded evaluations of explanation methods for text classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5195–5205, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Xinlei Chen, Eduard Hovy, and Dan Jurafsky. 2016. Visualizing and understanding neural models in NLP. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 681–691, San Diego, California. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Frederick Liu and Besim Avci. 2019. Incorporating priors with feature attribution on text classification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6274–6283, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Hui Liu, Qingyu Yin, and William Yang Wang. 2019. Towards explainable NLP: A generative explanation framework for text classification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5570–5581, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in neural information processing systems, pages 4765–4774.
    Google ScholarLocate open access versionFindings
  • Gregoire Montavon, Alexander Binder, Sebastian Lapuschkin, Wojciech Samek, and Klaus-Robert Muller. 2019. Layer-wise relevance propagation: an overview. In Explainable AI: interpreting, explaining and visualizing deep learning, pages 193–209. Springer.
    Google ScholarLocate open access versionFindings
  • Eric W Noreen. 1989. Computer-intensive methods for testing hypotheses. Wiley New York.
    Google ScholarFindings
  • Ji Ho Park, Jamin Shin, and Pascale Fung. 2018. Reducing gender bias in abusive language detection. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2799–2804, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha, Qatar. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Joaquin Quionero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D. Lawrence. 2009. Dataset Shift in Machine Learning. The MIT Press.
    Google ScholarFindings
  • Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. “why should i trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 1135–1144, New York, NY, USA. Association for Computing Machinery.
    Google ScholarLocate open access versionFindings
  • Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-precision modelagnostic explanations. In Thirty-Second AAAI Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Anna Rogers, Olga Kovaleva, and Anna Rumshisky. 2020. A primer in bertology: What we know about how BERT works. CoRR, abs/2002.12327.
    Findings
  • Burr Settles. 2011. Closing the loop: Fast, interactive semi-supervised annotation with queries on features and instances. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1467–1478, Edinburgh, Scotland, UK. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning, pages 3145–3153. JMLR. org.
    Google ScholarLocate open access versionFindings
  • Hendrik Strobelt, Sebastian Gehrmann, Hanspeter Pfister, and Alexander M Rush. 2018. Lstmvis: A tool for visual analysis of hidden state dynamics in recurrent neural networks. IEEE Transactions on Visualization and Computer Graphics, 24(1):667–676.
    Google ScholarLocate open access versionFindings
  • Simone Stumpf, Vidya Rajaram, Lida Li, Weng-Keen Wong, Margaret Burnett, Thomas Dietterich, Erin Sullivan, and Jonathan Herlocker. 2009. Interacting meaningfully with machine learning systems: Three experiments. International Journal of HumanComputer Studies, 67(8):639–662.
    Google ScholarLocate open access versionFindings
  • Stefano Teso and Kristian Kersting. 2019. Explanatory interactive machine learning. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pages 239–245.
    Google ScholarLocate open access versionFindings
  • Nithum Thain, Lucas Dixon, and Ellery Wulczyn. 2017. Wikipedia talk labels: Toxicity.
    Google ScholarLocate open access versionFindings
  • Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, and Ivan Titov. 2019. Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5797–5808, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. 2019. Glue: A multi-task benchmark and analysis platform for natural language understanding. In 7th International Conference on Learning Representations, ICLR 2019.
    Google ScholarLocate open access versionFindings
  • Conference, pages 260–267, Rochester, New York. Association for Computational Linguistics.
    Google ScholarFindings
  • Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. 2018. Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 335–340.
    Google ScholarLocate open access versionFindings
  • Jingqing Zhang, Piyawat Lertvittayakumjorn, and Yike Guo. 2019. Integrating semantic knowledge to tackle zero-shot text classification. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1031–1040, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in neural information processing systems, pages 649–657.
    Google ScholarLocate open access versionFindings
  • Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2018. Gender bias in coreference resolution: Evaluation and debiasing methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 15–20, New Orleans, Louisiana. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of the NAACL Student Research Workshop, pages 88–93, San Diego, California. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Michael Wiegand, Josef Ruppenhofer, and Thomas Kleinbauer. 2019. Detection of Abusive Language: the Problem of Biased Datasets. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 602–608, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Tongshuang Wu, Daniel S. Weld, and Jeffrey Heer. 2019. Local decision pitfalls in interactive machine learning: An investigation into feature selection in sentiment analysis. ACM Trans. Comput.-Hum. Interact., 26(4).
    Google ScholarLocate open access versionFindings
  • Omar Zaidan, Jason Eisner, and Christine Piatko. 2007. Using “annotator rationales” to improve machine learning for text categorization. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main
    Google ScholarLocate open access versionFindings
  • Layer-wise Relevance Propagation (LRP) is a technique for explaining predictions of neural networks in terms of importance scores of input features (Bach et al., 2015). Originally, it was devised to explain predictions of image classifiers by creating a heatmap on the input image highlighting pixels that are important for the classification. Then Arras et al. (2016) and Arras et al. (2017) extended LRP to work on CNNs and RNNs for text classification, respectively.
    Google ScholarLocate open access versionFindings
  • We used this propagation rule, so called LRP-, in the experiments of this paper. For more details about LRP propagation rules, please see Montavon et al. (2019).
    Google ScholarLocate open access versionFindings
作者
Piyawat Lertvittayakumjorn
Piyawat Lertvittayakumjorn
Lucia Specia
Lucia Specia
Francesca Toni
Francesca Toni
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科