VisBERT: Hidden-State Visualizations for Transformers
WWW '20: The Web Conference 2020 Taipei Taiwan April, 2020, pp. 207-211, 2020.
EI
Weibo:
Abstract:
Explainability and interpretability are two important concepts, the absence of which can and should impede the application of well-performing neural networks to real-world problems. At the same time, they are difficult to incorporate into the large, black-box models that achieve state-of-the-art results in a multitude of NLP tasks. Bidire...More
Code:
Data:
Introduction
- Understanding black-box models is an increasingly prominent area of research. While the performance of neural networks has been steadily improving in nearly every domain, the ability to understand how they work, and how they come to the conclusions they draw is only improving slowly.
- Instead of the attention values, the authors follow the work in [11] and visualize the hidden states between each BERT layer, and with that the token representations, as they are transformed through the network.
- VisBERT2, an interactive web tool for interpretable visualization of hidden-states within BERT models fine-tuned on Question Answering.
Highlights
- Understanding black-box models is an increasingly prominent area of research
- In order for large neural networks to be confidently deployed in safety-critical applications, features like transparency, interpretability and explainability are paramount
- One such class of black-box models are Transformer models, Bidirectional Encoder Representations from Transformers (BERT) in particular. These models have become the state-of-the-art for many different NLP tasks in recent months
- VisBERT2, an interactive web tool for interpretable visualization of hidden-states within BERT models fine-tuned on Question Answering
- For each task we provide a separate fine-tuned BERT model
- VisBERT establishes a novel method to analyze the behavior of BERT models, in particular regarding the Question Answering task
Results
- Visualizations of the inference process of unseen examples from three diverse Question Answering datasets, including three BERT models fine-tuned on these sets.
- The presented tool allows users to test the abilities and shortcomings of own Question Answering models on arbitrary samples.
- Each encoder block includes a multi-headed self-attention module, which transforms each token using the entire input context, normalization, and a Feed-Forward network at the end, which outputs the token representations used by the subsequent layer.
- The authors can observe the changing token relations that the model forms throughout the inference process.
- To that end the authors use the hidden states after each Transformer encoder block, which contains a vector for each token with a dimensionality of 768 (BERT-base) or 1024 (BERT-large).
- The authors further categorize the tokens based on affiliation to question, supporting facts or predicted answer in order to facilitate interpretability.
- In addition to the included datasets, the tool can be extended to other Question Answering tasks.
- By using the layer-slider on top of the graph, the user is able to go through all layers of the model and observe the changes within the token representations.
- This allows users to find out which QA model (SQuAD, HotpotQA or bAbI) fits a specific question type best and produces the right result.
- A user can add distracting facts to the context and check whether the model is still able to follow the same inference path.
- The authors' tool allows to observe resulting changes in the prediction, and within the hidden states of a model.
Conclusion
- VisBERT establishes a novel method to analyze the behavior of BERT models, in particular regarding the Question Answering task.
- The authors establish this behaviour on three diverse Question Answering datasets and make all three models available for users to make their own analyses on their own data, as well as the code to reproduce this visualization.
- The authors' tool can be extended to other BERT models, fine-tuned on different QA datasets or even other NLP tasks entirely, and to other Transformer based models like GPT-2 [8].
Summary
- Understanding black-box models is an increasingly prominent area of research. While the performance of neural networks has been steadily improving in nearly every domain, the ability to understand how they work, and how they come to the conclusions they draw is only improving slowly.
- Instead of the attention values, the authors follow the work in [11] and visualize the hidden states between each BERT layer, and with that the token representations, as they are transformed through the network.
- VisBERT2, an interactive web tool for interpretable visualization of hidden-states within BERT models fine-tuned on Question Answering.
- Visualizations of the inference process of unseen examples from three diverse Question Answering datasets, including three BERT models fine-tuned on these sets.
- The presented tool allows users to test the abilities and shortcomings of own Question Answering models on arbitrary samples.
- Each encoder block includes a multi-headed self-attention module, which transforms each token using the entire input context, normalization, and a Feed-Forward network at the end, which outputs the token representations used by the subsequent layer.
- The authors can observe the changing token relations that the model forms throughout the inference process.
- To that end the authors use the hidden states after each Transformer encoder block, which contains a vector for each token with a dimensionality of 768 (BERT-base) or 1024 (BERT-large).
- The authors further categorize the tokens based on affiliation to question, supporting facts or predicted answer in order to facilitate interpretability.
- In addition to the included datasets, the tool can be extended to other Question Answering tasks.
- By using the layer-slider on top of the graph, the user is able to go through all layers of the model and observe the changes within the token representations.
- This allows users to find out which QA model (SQuAD, HotpotQA or bAbI) fits a specific question type best and produces the right result.
- A user can add distracting facts to the context and check whether the model is still able to follow the same inference path.
- The authors' tool allows to observe resulting changes in the prediction, and within the hidden states of a model.
- VisBERT establishes a novel method to analyze the behavior of BERT models, in particular regarding the Question Answering task.
- The authors establish this behaviour on three diverse Question Answering datasets and make all three models available for users to make their own analyses on their own data, as well as the code to reproduce this visualization.
- The authors' tool can be extended to other BERT models, fine-tuned on different QA datasets or even other NLP tasks entirely, and to other Transformer based models like GPT-2 [8].
Funding
- Our work is funded by the European Unions Horizon 2020 research and innovation programme under grant agreement 732328 (FashionBrain), by the German Federal Ministry of Education and Research (BMBF) under grant agreement 01UG1735BX (NOHATE) and by the German Federal Ministry of Economic Affairs and Energy (BMWi) under grant agreements 01MD19013D (Smart-MD), 01MD19003E (PLASS) and 01MK2008D (Servicemeister)
Study subjects and analysis
public QA datasets: 3
Besides that, the task often requires multiple inference steps, especially in multi-hop scenarios, which allows us to gain further insights about BERT’s reasoning process. We use the three public QA datasets SQuAD [9], HotpotQA [16] and bAbI QA [15] to show the tool’s applicability on three diverse QA tasks including multi-hop reasoning cases. Apart from that, the principle of VisBERT can be easily extended to other up- or downstream NLP tasks
diverse Question Answering datasets: 3
• VisBERT2, an interactive web tool for interpretable visualization of hidden-states within BERT models fine-tuned on Question Answering. • Visualizations of the inference process of unseen examples from three diverse Question Answering datasets, including three BERT (base and large) models fine-tuned on these sets. • Identification of four stages of inference that can be observed in all analysed Question Answering tasks
diverse Question Answering datasets: 3
Additionally, VisBERT reveals four phases in BERT’s transformations that are common to all of the datasets we examined and that mirror the traditional NLP pipeline, cf. [10]. We establish this behaviour on three diverse Question Answering datasets and make all three models available for users to make their own analyses on their own data, as well as the code to reproduce this visualization. Future Work
Reference
- Pierre Comon. 1994. Independent component analysis, A new concept? Signal Processing 36 (1994).
- Karl Pearson F.R.S. 1901. LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2 (1901).
- Sarthak Jain and Byron C. Wallace. 2019. Attention is not Explanation. In NAACL ’19.
- Robin Jia and Percy Liang. 2017. Adversarial Examples for Evaluating Reading Comprehension Systems. EMNLP ’17 (2017).
- Stuart P. Lloyd. 1982. Least squares quantization in PCM. IEEE Trans. Information Theory (1982).
- L. McInnes, J. Healy, and J. Melville. 2018. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. ArXiv e-prints (2018). arXiv:1802.03426
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In ICLR ’13 Workshop Track.
- Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners.
- Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In EMNLP ’16.
- Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019. BERT Rediscovers the Classical NLP Pipeline. In ACL ’19.
- Betty van Aken, Benjamin Winter, Alexander Löser, and Felix A Gers. 2019. How Does BERT Answer Questions?: A Layer-Wise Analysis of Transformer
- Laurens van der Maaten. 2009. Learning a Parametric Embedding by Preserving
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. In NIPS ’17.
- Jesse Vig. 2019. A Multiscale Visualization of Attention in the Transformer Model. ACL ’19 System Demonstrations (2019).
- Jason Weston, Antoine Bordes, Sumit Chopra, and Tomas Mikolov. 2016. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks. In ICLR ’16.
- Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. 2018. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. In EMNLP ’18.
- Xiaoyong Yuan, Pan He, Qile Zhu, and Xiaolin Li. 20Adversarial Examples: Attacks and Defenses for Deep Learning. arXiv preprint arXiv:1712.07107 (2017).
Full Text
Tags
Comments