AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
Two are Better than One: Joint Entity and Relation Extraction with Table-Sequence Encoders
Conference on Empirical Methods in Natural Language Processing, (2020): 1706-1721
- Named Entity Recognition (NER, Florian et al 2006, 2010) and Relation Extraction (RE, Zhao and Grishman 2005; Jiang and Zhai 2007; Sun et al 2011; Plank and Moschitti 2013) are two fundamental tasks in Information Extraction (IE).
- Both tasks aim to extract structured information from unstructured texts.
- Named Entity Recognition (NER, Florian et al 2006, 2010) and Relation Extraction (RE, Zhao and Grishman 2005; Jiang and Zhai 2007; Sun et al 2011; Plank and Moschitti 2013) are two fundamental tasks in Information Extraction (IE)
- We show the advantages of using this tableguided attention: (1) we do not have to calculate g function since T l is already obtained from the table encoder; (2) T l is contextualized along the row, column, and layer dimensions, which corresponds to queries, keys, and queries and keys in the previous layer, respectively
- We introduce the novel tablesequence encoders architecture for joint extraction of entities and their relations
- It learns two separate encoders rather than one – a sequence encoder and a table encoder where explicit interactions exist between the two encoders
- We introduce a new method to effectively employ useful information captured by the pre-trained language models for such a joint learning task where a table representation is involved
- We achieved state-of-the-art F1 scores for both NER and RE tasks across four standard datasets, which confirm the effectiveness of our approach
- 6.1 Data
The authors evaluate the model on four datasets, namely ACE04 (Doddington et al, 2004), ACE05 (Walker et al, 2006), CoNLL04 (Roth and tau Yih, 2004) and ADE (Gurulingappa et al, 2012).
- For NER, an entity prediction is correct if and only if its type and boundaries both match with those of a gold entity.5.
- For RE, a relation prediction is considered correct if its relation type and the boundaries of the two entities match with those in the gold data.
- The authors report the strict relation F1, where a relation prediction is considered correct if its relation type as well as the boundaries and types of the two entities all match with those in the gold data.
- The order of the two entities in a relation matters
- Following the established line of work, the authors use the F1 measure to evaluate the performance of NER and RE.
- The authors report the averaged F1 scores of 5 runs for the models.
- Since the reported numbers are the average of 5 runs, the authors can consider the model to be achieving new state-of-the-art results.
- The authors achieved state-of-the-art F1 scores for both NER and RE tasks across four standard datasets, which confirm the effectiveness of the approach
- The authors introduce the novel tablesequence encoders architecture for joint extraction of entities and their relations
- It learns two separate encoders rather than one – a sequence encoder and a table encoder where explicit interactions exist between the two encoders.
- The authors would like to investigate how the table representation may be applied to other tasks
- Another direction is to generalize the way in which the table and sequence interact to other types of representations
- Table1: Main results. : micro-averaged F1; : macroaveraged F1
- Table2: Using different pre-trained language models on ACE05. +x uses the contextualized word embeddings; +T uses the attention weights
- Table3: Ablation of the two encoders on ACE05. Gold entity spans are given in RE (gold)
- Table4: The performance on ACE05 with different number of layers. Pre-trained word embeddings and language models are not counted to the number of parameters. The underlined ones are from our default setting
- Table5: The effect of the dimensions and directions of MD-RNNs. Experiments are conducted on ACE05. The underlined ones are from our default setting
- Table6: Dataset statistics
- Table7: Hyperparameters used in our experiments
- Table8: Comparisons with different methods to learn the table representation. For MD-RNN, D+, D− and D are indicators representing the direction, in which the hidden state flows forward, backward, or unable to flow at dimension D (D could be layer, row, or col). When using multiple MD-RNNs, we separate the indicators by “;”
- Table9: Comparisons of different table filling formulations. When not filling the entire table, L only fills the lower-triangular part, and U fills the upper-triangular part
- NER and RE can be tackled by using separate models. By assuming gold entity mentions are given as inputs, RE can be regarded as a classification task. Such models include kernel methods (Zelenko et al, 2002), RNNs (Zhang and Wang, 2015), recursive neural networks (Socher et al, 2012), CNNs (Zeng et al, 2014), and Transformer models (Verga et al, 2018; Wang et al, 2019). Another branch is to detect cross-sentence level relations (Peng et al, 2017; Gupta et al, 2019), and even document-level relations (Yao et al, 2019; Nan et al, 2020). However, entities are usually not directly available in practice, so these approaches may require an additional entity recognizer to form a pipeline.
Joint learning has been shown effective since it can alleviate the error propagation issue and benefit from exploiting the interrelation between NER and RE. Many studies address the joint problem through a cascade approach, i.e., performing NER first followed by RE. Miwa and Bansal (2016) use bi-LSTM (Graves et al, 2013) and tree-LSTM (Tai et al, 2015) for the joint task. Bekoulis et al (2018a,b) formulate it as a head selection problem. Nguyen and Verspoor (2019) apply biaffine attention (Dozat and Manning, 2017) for RE. Luan et al (2019), Dixit and Al (2019), and Wadden et al (2019) use span representations to predict relations.
- This research is supported by Ministry of Education, Singapore, under its Academic Research Fund (AcRF) Tier 2 Programme (MOE AcRF Tier 2 Award No: MOE2017-T2-1156)
We evaluate our model on four datasets, namely ACE04 (Doddington et al, 2004), ACE05 (Walker et al, 2006), CoNLL04 (Roth and tau Yih, 2004) and ADE (Gurulingappa et al, 2012). More details could be found in Appendix B.
Following the established line of work, we use the F1 measure to evaluate the performance of NER and RE
- Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland Vollgraf. 2019. Flair: An easy-to-use framework for state-of-the-art nlp. In Proc. of NAACL-HLT.
- Lei Jimmy Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. arXiv preprint.
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proc. of ICLR.
- Giannis Bekoulis, Johannes Deleu, Thomas Demeester, and Chris Develder. 2018a. Adversarial training for multi-context joint entity and relation extraction. In Proc. of EMNLP.
- Giannis Bekoulis, Johannes Deleu, Thomas Demeester, and Chris Develder. 2018b. Joint entity recognition and relation extraction as a multi-head selection problem. Expert Systems with Applications.
- Yee Seng Chan and Dan Roth. 2011. Exploiting syntactico-semantic structures for relation extraction. In Proc. of NAACL-HLT.
- Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Proc. of EMNLP.
- Kevin Clark, Urvashi Khandelwal, Omer Levy, and Christopher D. Manning. 2019. What does bert look at? an analysis of bert’s attention. arXiv preprint.
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 201Bert: Pre-training of deep bidirectional transformers for language understanding. In Proc. of NAACL-HLT.
- Kalpit Dixit and Yaser Al. 2019. Span-level model for relation extraction. In Proc. of ACL.
- George R. Doddington, Alexis Mitchell, Mark A. Przybocki, Lance A. Ramshaw, Stephanie M. Strassel, and Ralph M. Weischedel. 2004. The automatic content extraction (ace) program-tasks, data, and evaluation. In Proc. of LREC.
- Timothy Dozat and Christopher D. Manning. 2017. Deep biaffine attention for neural dependency parsing. In Proc. of ICLR.
- Markus Eberts and Adrian Ulges. 2019. Span-based joint entity and relation extraction with transformer pre-training. arXiv preprint.
- Radu Florian, Hongyan Jing, Nanda Kambhatla, and Imed Zitouni. 2006. Factorizing complex models: A case study in mention detection. In Proc. of ACL.
- Radu Florian, John F. Pitrelli, Salim Roukos, and Imed Zitouni. 2010. Improving mention detection robustness to noisy input. In Proc. of EMNLP.
- Alex Graves, Santiago Fernandez, and Jurgen Schmidhuber. 2007. Multi-dimensional recurrent neural networks. In Proc. of ICANN.
- Alex Graves, Abdel-rahman Mohamed, and Geoffrey E. Hinton. 2013. Speech recognition with deep recurrent neural networks. In Proc. of ICASSP.
- Pankaj Gupta, Subburam Rajaram, Hinrich Schutze, and Thomas Runkler. 2019. Neural relation extraction within and across sentence boundaries. In Proc. of AAAI.
- Pankaj Gupta, Hinrich Schutze, and Bernt Andrassy. 2016. Table filling multi-task recurrent neural network for joint entity and relation extraction. In Proc. of COLING.
- Harsha Gurulingappa, Abdul Mateen Rajput, Angus Roberts, Juliane Fluck, Martin Hofmann-Apitius, and Luca Toldo. 2012. Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports. Journal of biomedical informatics.
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proc. of CVPR.
- Jing Jiang and ChengXiang Zhai. 2007. A systematic exploration of the feature space for relation extraction. In Proc. of HLT-NAACL.
- Arzoo Katiyar and Claire Cardie. 2017. Going out on a limb: Joint extraction of entity mentions and relations without dependency trees. In Proc. of ACL.
- Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. In Proc. of HLT-NAACL.
- Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. In Proc. of ICLR.
- Fei Li, Meishan Zhang, Guohong Fu, and Donghong Ji. 2017. A neural joint model for entity and relation extraction from biomedical text. BMC bioinformatics.
- Fei Li, Yue Zhang, Meishan Zhang, and Donghong Ji. 2016. Joint models for extracting adverse drug events from biomedical text. In Proc. of IJCAI.
- Qi Li and Heng Ji. 2014. Incremental joint extraction of entity mentions and relations. In Proc. of ACL.
- Xiaoya Li, Fan Yin, Zijun Sun, Xiayu Li, Arianna Yuan, Duo Chai, Mingxin Zhou, and Jiwei Li. 2019. Entity-relation extraction as multi-turn question answering. In Proc. of ACL.
- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint.
- Yi Luan, Dave Wadden, Luheng He, Amy Shah, Mari Ostendorf, and Hannaneh Hajishirzi. 2019. A general framework for information extraction using dynamic span graphs. In Proc. of NAACL-HLT.
- Makoto Miwa and Mohit Bansal. 2016. End-to-end relation extraction using lstms on sequences and tree structures. In Proc. of ACL.
- Makoto Miwa and Yutaka Sasaki. 2014. Modeling joint entity and relation extraction with table representation. In Proc. of EMNLP.
- Guoshun Nan, Zhijiang Guo, Ivan Sekulic, and Wei Lu. 2020. Reasoning with latent structure refinement for document-level relation extraction. In Proc. of ACL.
- Dat Quoc Nguyen and Karin Verspoor. 2019. End-toend neural relation extraction using deep biaffine attention. In Proc. of ECIR.
- Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Wen-tau Yih. 2017. Cross-sentence n-ary relation extraction with graph lstms. Transactions of the Association for Computational Linguistics.
- Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Proc. of EMNLP.
- Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proc. of NAACL.
- Barbara Plank and Alessandro Moschitti. 2013. Embedding semantic similarity in tree kernels for domain adaptation of relation extraction. In Proc. of ACL.
- Lev-Arie Ratinov and Dan Roth. 2009. Design challenges and misconceptions in named entity recognition. In Proc. of CoNLL.
- Dan Roth and Wen tau Yih. 2004. A linear programming formulation for global inference in natural language tasks. In Proc. of CoNLL.
- Erik F. Sang and Jorn Veenstra. 1999. Representing text chunks. In Proc. of EACL.
- Richard Socher, Brody Huval, Christopher D. Manning, and Andrew Y. Ng. 2012. Semantic compositionality through recursive matrix-vector spaces. In Proc. of EMNLP.
- Rupesh Kumar Srivastava, Klaus Greff, and Jurgen Schmidhuber. 2015. Highway networks. arXiv preprint.
- Ang Sun, Ralph Grishman, and Satoshi Sekine. 2011. Semi-supervised relation extraction with large-scale word clustering. In Proc. of NAACL-HLT.
- Changzhi Sun, Yuanbin Wu, Man Lan, Shiliang Sun, Wenting Wang, Kuang-Chih Lee, and Kewen Wu. 2018. Extracting entities and relations with joint minimum risk training. In Proc. of EMNLP.
- Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proc. of ACL.
- Tung Tran and Ramakanth Kavuluru. 2019. Neural metric learning for fast end-to-end relation extraction. arXiv preprint.
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proc. of NIPS.
- Patrick Verga, Emma Strubell, and Andrew McCallum. 2018. Simultaneously self-attending to all mentions for full-abstract biological relation extraction. In Proc. of NAACL-HLT.
- Jesse Vig. 2019. A multiscale visualization of attention in the transformer model. In Proc. of ACL.
- David Wadden, Ulme Wennberg, Yi Luan, and Hannaneh Hajishirzi. 2019.
- Christopher Walker, Stephanie Strassel, Julie Medero, and Kazuaki Maeda. 2006. Ace 2005 multilingual training corpus. Linguistic Data Consortium.
- Haoyu Wang, Ming Tan, Mo Yu, Shiyu Chang, Dakuo Wang, Kun Xu, Xiaoxiao Guo, and Saloni Potdar. 2019. Extracting multiple-relations in one-pass with pre-trained transformers. In Proc. of ACL.
- Ralph Weischedel, Eduard Hovy, Mitchell Marcus, Martha Palmer, Robert Belvin, Sameer Pradhan, Lance Ramshaw, and Nianwen Xue. 2011. Ontonotes: A large training corpus for enhanced processing. Handbook of Natural Language Processing and Machine Translation.
- Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, and Jamie Brew. 2019. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint.
- Yuan Yao, Deming Ye, Peng Li, Xu Han, Yankai Lin, Zhenghao Liu, Zhiyuan Liu, Lixin Huang, Jie Zhou, and Maosong Sun. 2019. Docred: A large-scale document-level relation extraction dataset. In Proc. of ACL.