AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
We evaluate the models using exact match and token-level F1 on the development and test sets
LUKE: Deep Contextualized Entity Representations with Entity aware Self attention
EMNLP 2020, pp.6442-6454, (2020)
Entity representations are useful in natural language tasks involving entities. In this paper, we propose new pretrained contextualized representations of words and entities based on the bidirectional transformer. The proposed model treats words and entities in a given text as independent tokens, and outputs contextualized representations...More
PPT (Upload PPT)
- Many natural language tasks involve entities, e.g., relation classification, entity typing, named entity recognition (NER), and question answering (QA).
- Key to solving such entity-related tasks is a model to learn the effective representations of entities.
- The word-based pretraining task of CWRs is not suitable for learning the representations of entities because predicting a masked word given other words in the entity, e.g., predicting “Rings” given “The Lord of the [MASK]”, is clearly easier than predicting the entire entity
- Many natural language tasks involve entities, e.g., relation classification, entity typing, named entity recognition (NER), and question answering (QA)
- Conventional entity representations assign each entity a fixed embedding vector that stores information regarding the entity in a knowledge base (KB) (Bordes et al, 2013; Trouillon et al, 2016; Yamada et al, 2016, 2017)
- By contrast, contextualized word representations (CWRs) based on the transformer (Vaswani et al, 2017), such as BERT (Devlin et al, 2019), and RoBERTa (Liu et al, 2020), provide effective general-purpose word representations trained with unsupervised pretraining tasks based on language modeling
- Model Following Sohrab and Miwa (2018), we solve the task by enumerating all possible spans in each sentence as entity name candidates, and classifying them into the target entity types or non-entity type, which indicates that the span is not an entity
- We evaluate the models using exact match (EM) and token-level F1 on the development and test sets
- We evaluate the performance of this model on the CoNLL-2003 and Stanford Question Answering Dataset (SQuAD) datasets using the same model architectures as those for RoBERTa described in the corresponding sections
- The authors conduct extensive experiments using five entityrelated tasks: entity typing, relation classification, NER, cloze-style QA, and extractive QA.
- The authors use similar model architectures for all tasks based on a simple linear classifier on top of the representations of words, entities, or both.
- The input entity sequence is built using [MASK] entities, special entities introduced for the task, or Wikipedia entities.
- The token embedding of a task-specific special entity is initialized using that of the [MASK] entity, and the query matrices of the entity-aware self-attention mechanism (Qw2e, Qe2w, and Qe2e) are initialized using the original query matrix Q.
- The mapping is automatically created using the entity hyperlinks in Wikipedia as described in detail in Appendix C
- The authors solve this task using the same model architecture as that of BERT and RoBERTa. In particular, the authors use two linear classifiers independently on top of the word representations to predict the span boundary of the answer, and train the model using cross-entropy loss.
- As shown in Table 6, this setting clearly degrades performance, i.e., 1.4 F1 points on the CoNLL-2003 dataset and 0.6 EM points on the SQuAD dataset, demonstrating the effectiveness of the entity representations on these two tasks
- The authors propose LUKE, new pretrained contextualized representations of words and entities based on the transformer.
- LUKE outputs the contextualized representations of words and entities using an improved transformer architecture with using a novel entity-aware self-attention mechanism.
- The experimental results prove its effectiveness on various entity-related tasks.
- Future work involves applying LUKE to domain-specific tasks, such as those in biomedical and legal domains.
- Open Entity TACRED CoNLL-2003 ReCoRD ReCoRD SQuAD SQuAD (Test F1) (Test F1) (Test F1) (Dev EM) (Dev F1) (Dev EM) (Dev F1).
- Table1: Results of entity typing on the Open Entity dataset
- Table2: Results of relation classification on the TA-
- Table3: Results of named entity recognition on the CoNLL-2003 dataset
- Table4: Results of cloze-style question answering on the ReCoRD dataset. All models except RoBERTa (ensemble) are based on a single model
- Table5: Results of extractive question answering on the SQuAD 1.1 dataset
- Table6: Ablation study of our entity representations
- Table7: Ablation study of our entity-aware self-attention mechanism
- Table8: Results of RoBERTa additionally trained using our Wikipedia corpus
- Table9: Hyper-parameters used to pretrain LUKE
- Table10: Hyper-parameters used for the extra pretraining of RoBERTa on our Wikipedia corpus
- Table11: Hyper-parameters and other details of our experiments
- Table12: Common hyper-parameters used in our experiments
- Static Entity Representations Conventional entity representations assign a fixed embedding to each entity in the KB. They include knowledge embeddings trained on knowledge graphs (Bordes et al, 2013; Yang et al, 2015; Trouillon et al, 2016), and embeddings trained using textual contexts or descriptions of entities retrieved from a KB (Yamada et al, 2016, 2017; Cao et al, 2017; Ganea and Hofmann, 2017). Similar to our pretraining task, NTEE (Yamada et al, 2017) and RELIC (Ling et al, 2020) use an approach that trains entity embeddings by predicting entities given their textual contexts obtained from a KB. The main drawbacks of this line of work, when representing entities in text, are that (1) they need to resolve entities in the text to corresponding KB entries to represent the entities, and (2) they cannot represent entities that do not exist in the KB.
Contextualized Word Representations Many recent studies have addressed entity-related tasks based on the contextualized representations of entities in text computed using the word representations of CWRs (Zhang et al, 2019; Baldini Soares et al, 2019; Peters et al, 2019; Joshi et al, 2020; Wang et al, 2019b, 2020). Representative examples of CWRs are ELMo (Peters et al, 2018) and BERT (Devlin et al, 2019), which are based on deep bidirectional long short-term memory (LSTM) and the transformer (Vaswani et al, 2017), respectively. BERT is trained using an MLM, a pretraining task that masks random words in the text and trains the model to predict the masked words. Most recent CWRs, such as RoBERTa (Liu et al, 2020), XLNet (Yang et al, 2019), SpanBERT (Joshi et al, 2020), ALBERT (Lan et al, 2020), BART (Lewis et al, 2020), and T5 (Raffel et al, 2020), are based on transformer trained using a task equivalent to or similar to the MLM. Similar to our proposed pretraining task that masks entities instead of words, several recent CWRs, e.g., SpanBERT, ALBERT, BART, and T5, have extended the MLM by randomly masking word spans instead of single words.
Study subjects and analysis
well-known datasets: 5
The proposed model achieves impressive empirical performance on a wide range of entity-related tasks. In particular, it obtains state-of-the-art results on five well-known datasets: Open Entity (entity typing), TACRED (relation classification), CoNLL-2003 (named entity recognition), ReCoRD (cloze-style question answering), and SQuAD 1.1 (extractive question answering). Our source code and pretrained representations are available at https://github.com/studio-ousia/luke
popular datasets: 5
The proposed mechanism considers the type of the tokens (words or entities) when computing attention scores. • LUKE achieves strong empirical performance and obtains state-of-the-art results on five popular datasets: Open Entity, TACRED, CoNLL2003, ReCoRD, and SQuAD 1.1. 2 Related Work
- Alan Akbik, Duncan Blythe, and Roland Vollgraf. 2018. Contextual String Embeddings for Sequence Labeling. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1638–1649.
- Alexei Baevski, Sergey Edunov, Yinhan Liu, Luke Zettlemoyer, and Michael Auli. 2019. Cloze-driven Pretraining of Self-attention Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 5360–5369.
- Livio Baldini Soares, Nicholas FitzGerald, Jeffrey Ling, and Tom Kwiatkowski. 2019. Matching the Blanks: Distributional Similarity for Relation Learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2895–2905.
- Antoine Bordes, Nicolas Usunier, Alberto GarciaDuran, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multirelational Data. In Advances in Neural Information Processing Systems 26, pages 2787–2795.
- Yixin Cao, Lifu Huang, Heng Ji, Xu Chen, and Juanzi Li. 2017. Bridge Text and Knowledge by Learning Multi-Prototype Entity Mention Embedding. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1623–1633.
- Eunsol Choi, Omer Levy, Yejin Choi, and Luke Zettlemoyer. 2018. Ultra-Fine Entity Typing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 87–96.
- Christopher Clark and Matt Gardner. 2018. Simple and Effective Multi-Paragraph Reading Comprehension. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 845–855.
- Kevin Clark, Urvashi Khandelwal, Omer Levy, and Christopher D Manning. 2019. What Does BERT Look at? An Analysis of BERT’s Attention. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 276–286.
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 201BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.
- Octavian-Eugen Ganea and Thomas Hofmann. 2017. Deep Joint Entity Disambiguation with Local Neural Attention. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2619–2629.
- Dan Hendrycks and Kevin Gimpel. 2016. Gaussian Error Linear Units (GELUs). arXiv preprint arXiv:1606.08415v3.
- Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S Weld, Luke Zettlemoyer, and Omer Levy. 2020. SpanBERT: Improving Pre-training by Representing and Predicting Spans. Transactions of the Association for Computational Linguistics, 8:64–77.
- Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural Architectures for Named Entity Recognition. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 260–270.
- Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In International Conference on Learning Representations.
- Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer Normalization. arXiv preprint arXiv:1607.06450v1.
- Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pretraining for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880.
- Xiepeng Li, Zhexi Zhang, Wei Zhu, Zheng Li, Yuan Ni, Peng Gao, Junchi Yan, and Guotong Xie. 2019. Pingan Smart Health and SJTU at COIN - Shared Task: utilizing Pre-trained Language Models and Common-sense Knowledge in Machine Reading Tasks. In Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing, pages 93–98.
- Jeffrey Ling, Nicholas FitzGerald, Zifei Shan, Livio Baldini Soares, Thibault Fevry, David Weiss, and Tom Kwiatkowski. 2020. Learning CrossContext Entity Representations from Text. arXiv preprint arXiv:2001.03765v1.
- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2020. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692v1.
- Simon Ostermann, Sheng Zhang, Michael Roth, and Peter Clark. 2019. Commonsense Inference in Natural Language Processing (COIN) - Shared Task Report. In Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing, pages 66–74.
- Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227– 2237.
- Matthew E. Peters, Mark Neumann, Robert Logan, Roy Schwartz, Vidur Joshi, Sameer Singh, and Noah A. Smith. 2019. Knowledge Enhanced Contextual Word Representations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 43–54.
- Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-toText Transformer. Journal of Machine Learning Research, 21(140):1–67.
- Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392.
- Emily Reif, Ann Yuan, Martin Wattenberg, Fernanda B Viegas, Andy Coenen, Adam Pearce, and Been Kim. 2019. Visualizing and Measuring the Geometry of BERT. In Advances in Neural Information Processing Systems 32, pages 8594–8603.
- Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hananneh Hajishirzi. 2017. Bidirectional Attention Flow for Machine Comprehension. In International Conference on Learning Representations.
- Mohammad Golam Sohrab and Makoto Miwa. 2018. Deep Exhaustive Model for Nested Named Entity Recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2843–2849.
- Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003.
- Theo Trouillon, Johannes Welbl, Sebastian Riedel, Eric Gaussier, and Guillaume Bouchard. 2016. Complex Embeddings for Simple Link Prediction. In Proceedings of The 33rd International Conference on Machine Learning, volume 48, pages 2071–2080.
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. In Advances in Neural Information Processing Systems 30, pages 5998–6008.
- Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. 2019a. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems. In Advances in Neural Information Processing Systems 32, pages 3266–3280.
- Ruize Wang, Duyu Tang, Nan Duan, Zhongyu Wei, Xuanjing Huang, Cuihong Cao, Daxin Jiang, Ming Zhou, and others. 2020. K-Adapter: Infusing Knowledge into Pre-trained Models with Adapters. arXiv preprint arXiv:2002.01808v3.
- Xiaozhi Wang, Tianyu Gao, Zhaocheng Zhu, Zhiyuan Liu, Juanzi Li, and Jian Tang. 2019b. KEPLER: A Unified Model for Knowledge Embedding and Pretrained Language Representation. arXiv preprint arXiv:1911.06136v1.
- Wenhan Xiong, Jingfei Du, William Yang Wang, and Veselin Stoyanov. 2020. Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model. In International Conference on Learning Representations.
- Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, and Yoshiyasu Takefuji. 2016. Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, pages 250–259.
- Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, and Yoshiyasu Takefuji. 2017. Learning Distributed Representations of Texts and Entities from Knowledge Base. Transactions of the Association for Computational Linguistics, 5:397–411.
- Bishan Yang, Scott Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In Proceedings of the International Conference on Learning Representations.
- Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. arXiv preprint arXiv:1906.08237v1.
- Sheng Zhang, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Kevin Duh, and Benjamin Van Durme. 2018a. ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension. arXiv preprint arXiv:1810.12885v1.
- Yuhao Zhang, Peng Qi, and Christopher D Manning. 2018b. Graph Convolution over Pruned Dependency Trees Improves Relation Extraction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2205– 2215.
- Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D Manning. 2017. Positionaware Attention and Supervised Data Improve Slot Filling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 35–45.
- Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu. 2019. ERNIE: Enhanced Language Representation with Informative Entities. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1441–1451.