AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We demonstrated the tool on the OpenAI GPT-2 and BERT models and presented three use cases for analyzing GPT-2

Visualizing Attention in Transformer-Based Language Representation Models.

arXiv: Learning, (2019)

Cited by: 0|Views2
EI
Full Text
Bibtex
Weibo

Abstract

We present an open-source tool for visualizing multi-head self-attention in Transformer-based language representation models. The tool extends earlier work by visualizing attention at three levels of granularity: the attention-head level, the model level, and the neuron level. We describe how each of these views can help to interpret the ...More

Code:

Data:

0
Introduction
  • In 2018, the BERT (Bidirectional Encoder Representations from Transformers) language representation model achieved state-of-the-art performance across NLP tasks ranging from sentiment analysis to question answering (Devlin et al, 2018).
  • Underlying BERT and GPT-2 is the Transformer model, which uses a multi-head selfattention architecture (Vaswani et al, 2017a).
  • The authors introduce a tool for visualizing attention in Transformer-based language representation models, building on the work of (Jones, 2017).
Highlights
  • In 2018, the BERT (Bidirectional Encoder Representations from Transformers) language representation model achieved state-of-the-art performance across NLP tasks ranging from sentiment analysis to question answering (Devlin et al, 2018)
  • We introduce a tool for visualizing attention in Transformer-based language representation models, building on the work of (Jones, 2017)
  • We extend the existing tool in two ways: (1) we adapt it from the original encoder-decoder implementation to the decoder-only GPT-2 model and the encoder-only BERT model, and (2) we add two visualizations: the model view, which visualizes all of the layers and attention heads in a single interface, and the neuron view, which shows how individual neurons influence attention scores
  • We present an open-source1 tool for visualizing multi-head self-attention in Transformer-based language representation models
  • We presented a tool for visualizing attention in Transformer-based language representation models
  • We demonstrated the tool on the OpenAI GPT-2 and BERT models and presented three use cases for analyzing GPT-2
Results
  • The authors present an open-source1 tool for visualizing multi-head self-attention in Transformer-based language representation models.
  • The authors present three use cases for GPT-2 showing how the tool might provide insights on how to adjust or improve the model.
  • The attention-head view visualizes the attention patterns produced by one or more attention heads in a given transformer layer, as shown in Figure 1 (GPT-22) and Figure 2 (BERT3).
  • For BERT, which uses an explicit sentence-pair model, users may specify a sentence-level attention filter; for example, in Figure 2, the user has selected the Sentence A → Sentence B filter, which only shows attention from tokens in Sentence A to tokens in Sentence B.
  • In the attention head shown in Figure 1, for example, each word attends exclusively to the previous word in the sequence.
  • To, where the two input better understand the source of this bias, the authors can prompts differ only in the gender of the pronoun visualize the attention head that produces patterns that begins the second sentence4: resembling coreference resolution, shown in Figure 4.
  • The model view (Figure 5) provides a birds-eye view of attention across all of the model’s layers and heads for a particular input.
  • The neuron view (Figure 6) visualizes the individual neurons in the query and key vectors and shows how they are used to compute attention.
  • The model view enables users to browse the attention heads across all layers in the model and see how attention patterns evolve throughout the model.
  • Use Case: Identifying Recurring Patterns The model view in Figure 5 shows that many of the attention heads follow the same pattern: they focus all of the attention on the first token in the sequence.
  • Use Case: Linking Neurons to Model Behavior To see how the neuron view might provide actionable insights, consider the attention head in Figure 7.
Conclusion
  • The authors presented a tool for visualizing attention in Transformer-based language representation models.
  • The authors demonstrated the tool on the OpenAI GPT-2 and BERT models and presented three use cases for analyzing GPT-2.
  • The authors would like to enable users to manipulate the model, either by modifying attention (Lee et al, 2017; Liu et al, 2018; Strobelt et al, 2018) or editing individual neurons (Bau et al, 2019)
Study subjects and analysis
use cases: 3
We describe how each of these views can help to interpret the model, and we demonstrate the tool on the BERT model and the OpenAI GPT-2 model. We also present three use cases for analyzing GPT-2: detecting model bias, identifying recurring patterns, and linking neurons to model behavior.

use cases: 3
We describe how each of these views can help to interpret the model, and we demonstrate the tool on the BERT model and the OpenAI GPT-2 model. We also present three use cases for analyzing GPT-2: detecting model bias, identifying recurring patterns, and linking neurons to model behavior. In 2018, the BERT (Bidirectional Encoder Representations from Transformers) language representation model achieved state-of-the-art performance across NLP tasks ranging from sentiment analysis to question answering (Devlin et al, 2018)

use cases: 3
Below, we describe these views and demonstrate them on the OpenAI GPT-2 and BERT models. We also present three use cases for GPT-2 showing how the tool might provide insights on how to adjust or improve the model. A video demonstration of the tool is available at https://youtu.be/ 187JyiA4pyk

use cases: 3
In this paper, we presented a tool for visualizing attention in Transformer-based language representation models. We demonstrated the tool on the OpenAI GPT-2 and BERT models and presented three use cases for analyzing GPT-2. For future work, we would like to evaluate empirically how attention impacts model predictions across a range of tasks (Jain and Wallace, 2019)

Reference
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In ICLR.
    Google ScholarFindings
  • Anthony Bau, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, and James Glass. 2019. Identifying and controlling important neurons in neural machine translation. In ICLR.
    Google ScholarLocate open access versionFindings
  • Yonatan Belinkov and James Glass. 2019. Analysis methods in neural language processing: A survey. Transactions of the Association for Computational Linguistics (TACL) (to appear).
    Google ScholarFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv Computation and Language.
    Google ScholarFindings
  • Sarthak Jain and Byron C. Wallace. 2019. Attention is not explanation. CoRR, abs/1902.10186.
    Findings
  • Llion Jones. 2017. Tensor2tensor transformer visualization.
    Google ScholarFindings
  • https://github.com/
    Findings
  • Jaesong Lee, Joong-Hwi Shin, and Jun-Seok Kim. 2017. Interactive visualization and manipulation of attention-based neural machine translation. In EMNLP: System Demonstrations.
    Google ScholarFindings
  • Shusen Liu, Tao Li, Zhimin Li, Vivek Srikumar, Valerio Pascucci, and Peer-Timo Bremer. 2018. Visual interrogation of attention-based models for natural language inference and machine comprehension. In EMNLP: System Demonstrations.
    Google ScholarFindings
  • Kaiji Lu, Piotr Mardziel, Fangjing Wu, Preetam Amancharla, and Anupam Datta. 2018. Gender bias in neural natural language processing. CoRR, abs/1807.11714.
    Findings
  • Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. Technical report.
    Google ScholarFindings
  • Tim Rocktaschel, Edward Grefenstette, Karl Moritz Hermann, Tomas Kocisky, and Phil Blunsom. 2016. Reasoning about entailment with neural attention. In ICLR.
    Google ScholarFindings
  • Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. In EMNLP.
    Google ScholarFindings
  • H. Strobelt, S. Gehrmann, M. Behrisch, A. Perer, H. Pfister, and A. M. Rush. 2018. Seq2Seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models. ArXiv e-prints.
    Google ScholarFindings
  • Edward Tufte. 1990. Envisioning Information. Graphics Press, Cheshire, CT, USA.
    Google ScholarFindings
  • Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Łukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, and Jakob Uszkoreit. 2018. Tensor2tensor for neural machine translation. CoRR, abs/1803.07416.
    Findings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017a. Attention is all you need. In Advances in Neural Information Processing Systems.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017b. Attention is all you need. Technical report.
    Google ScholarFindings
  • Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2018. Gender bias in coreference resolution: Evaluation and debiasing methods. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
    Google ScholarLocate open access versionFindings
Author
Jesse Vig
Jesse Vig
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科