AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We achieved state-of-the-art F1 scores for both NER and RE tasks across four standard datasets, which confirm the effectiveness of our approach

Two are Better than One: Joint Entity and Relation Extraction with Table-Sequence Encoders

Conference on Empirical Methods in Natural Language Processing, (2020): 1706-1721

Cited: 0|Views896
EI

Abstract

Named entity recognition and relation extraction are two important fundamental problems. Joint learning algorithms have been proposed to solve both tasks simultaneously, and many of them cast the joint task as a table-filling problem. However, they typically focused on learning a single encoder (usually learning representation in the form...More

Code:

Data:

0
Introduction
  • Named Entity Recognition (NER, Florian et al 2006, 2010) and Relation Extraction (RE, Zhao and Grishman 2005; Jiang and Zhai 2007; Sun et al 2011; Plank and Moschitti 2013) are two fundamental tasks in Information Extraction (IE).
  • Both tasks aim to extract structured information from unstructured texts.
Highlights
  • Named Entity Recognition (NER, Florian et al 2006, 2010) and Relation Extraction (RE, Zhao and Grishman 2005; Jiang and Zhai 2007; Sun et al 2011; Plank and Moschitti 2013) are two fundamental tasks in Information Extraction (IE)
  • We show the advantages of using this tableguided attention: (1) we do not have to calculate g function since T l is already obtained from the table encoder; (2) T l is contextualized along the row, column, and layer dimensions, which corresponds to queries, keys, and queries and keys in the previous layer, respectively
  • We introduce the novel tablesequence encoders architecture for joint extraction of entities and their relations
  • It learns two separate encoders rather than one – a sequence encoder and a table encoder where explicit interactions exist between the two encoders
  • We introduce a new method to effectively employ useful information captured by the pre-trained language models for such a joint learning task where a table representation is involved
  • We achieved state-of-the-art F1 scores for both NER and RE tasks across four standard datasets, which confirm the effectiveness of our approach
Methods
  • 6.1 Data

    The authors evaluate the model on four datasets, namely ACE04 (Doddington et al, 2004), ACE05 (Walker et al, 2006), CoNLL04 (Roth and tau Yih, 2004) and ADE (Gurulingappa et al, 2012).
  • For NER, an entity prediction is correct if and only if its type and boundaries both match with those of a gold entity.5.
  • For RE, a relation prediction is considered correct if its relation type and the boundaries of the two entities match with those in the gold data.
  • The authors report the strict relation F1, where a relation prediction is considered correct if its relation type as well as the boundaries and types of the two entities all match with those in the gold data.
  • The order of the two entities in a relation matters
Results
  • Following the established line of work, the authors use the F1 measure to evaluate the performance of NER and RE.
  • The authors report the averaged F1 scores of 5 runs for the models.
  • Since the reported numbers are the average of 5 runs, the authors can consider the model to be achieving new state-of-the-art results.
  • The authors achieved state-of-the-art F1 scores for both NER and RE tasks across four standard datasets, which confirm the effectiveness of the approach
Conclusion
  • The authors introduce the novel tablesequence encoders architecture for joint extraction of entities and their relations
  • It learns two separate encoders rather than one – a sequence encoder and a table encoder where explicit interactions exist between the two encoders.
  • The authors would like to investigate how the table representation may be applied to other tasks
  • Another direction is to generalize the way in which the table and sequence interact to other types of representations
Tables
  • Table1: Main results. : micro-averaged F1; : macroaveraged F1
  • Table2: Using different pre-trained language models on ACE05. +x uses the contextualized word embeddings; +T uses the attention weights
  • Table3: Ablation of the two encoders on ACE05. Gold entity spans are given in RE (gold)
  • Table4: The performance on ACE05 with different number of layers. Pre-trained word embeddings and language models are not counted to the number of parameters. The underlined ones are from our default setting
  • Table5: The effect of the dimensions and directions of MD-RNNs. Experiments are conducted on ACE05. The underlined ones are from our default setting
  • Table6: Dataset statistics
  • Table7: Hyperparameters used in our experiments
  • Table8: Comparisons with different methods to learn the table representation. For MD-RNN, D+, D− and D are indicators representing the direction, in which the hidden state flows forward, backward, or unable to flow at dimension D (D could be layer, row, or col). When using multiple MD-RNNs, we separate the indicators by “;”
  • Table9: Comparisons of different table filling formulations. When not filling the entire table, L only fills the lower-triangular part, and U fills the upper-triangular part
Download tables as Excel
Related work
Funding
  • This research is supported by Ministry of Education, Singapore, under its Academic Research Fund (AcRF) Tier 2 Programme (MOE AcRF Tier 2 Award No: MOE2017-T2-1156)
Study subjects and analysis
datasets: 4
On several standard datasets, our model shows significant improvements over existing approaches. 6.1 Data

We evaluate our model on four datasets, namely ACE04 (Doddington et al, 2004), ACE05 (Walker et al, 2006), CoNLL04 (Roth and tau Yih, 2004) and ADE (Gurulingappa et al, 2012)
. More details could be found in Appendix B.

Following the established line of work, we use the F1 measure to evaluate the performance of NER and RE

datasets: 4
• We effectively leverage the word-word interaction information carried in the attention weights from BERT, which further improves the performance. Our proposed method achieves the state-of-theart performance on four datasets, namely ACE04, ACE05, CoNLL04, and ADE. We also conduct further experiments to confirm the effectiveness of our proposed approach

cases: 4
We visualize them in Figure 4. Empirically, we found the setting only considering cases (a) and (c) in Figure 4 achieves no worse performance than considering four cases altogether. Therefore, to reduce the amount of computation, we use such a setting as default

datasets: 4
6.1 Data. We evaluate our model on four datasets, namely ACE04 (Doddington et al, 2004), ACE05 (Walker et al, 2006), CoNLL04 (Roth and tau Yih, 2004) and ADE (Gurulingappa et al, 2012). More details could be found in Appendix B

datasets: 4
6.3 Comparison with Other Models. Table 1 presents the comparison of our model with previous methods on four datasets. Our NER performance is increased by 1.2, 0.9, 1.2/0.6 and 0.4 absolute F1 points over the previous best results

standard datasets: 4
We also introduce a new method to effectively employ useful information captured by the pre-trained language models for such a joint learning task where a table representation is involved. We achieved state-of-the-art F1 scores for both NER and RE tasks across four standard datasets, which confirm the effectiveness of our approach. In the future, we would like to investigate how the table representation may be applied to other tasks

Reference
  • Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland Vollgraf. 2019. Flair: An easy-to-use framework for state-of-the-art nlp. In Proc. of NAACL-HLT.
    Google ScholarLocate open access versionFindings
  • Lei Jimmy Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. arXiv preprint.
    Google ScholarFindings
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proc. of ICLR.
    Google ScholarLocate open access versionFindings
  • Giannis Bekoulis, Johannes Deleu, Thomas Demeester, and Chris Develder. 2018a. Adversarial training for multi-context joint entity and relation extraction. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Giannis Bekoulis, Johannes Deleu, Thomas Demeester, and Chris Develder. 2018b. Joint entity recognition and relation extraction as a multi-head selection problem. Expert Systems with Applications.
    Google ScholarFindings
  • Yee Seng Chan and Dan Roth. 2011. Exploiting syntactico-semantic structures for relation extraction. In Proc. of NAACL-HLT.
    Google ScholarLocate open access versionFindings
  • Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Kevin Clark, Urvashi Khandelwal, Omer Levy, and Christopher D. Manning. 2019. What does bert look at? an analysis of bert’s attention. arXiv preprint.
    Google ScholarFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 201Bert: Pre-training of deep bidirectional transformers for language understanding. In Proc. of NAACL-HLT.
    Google ScholarLocate open access versionFindings
  • Kalpit Dixit and Yaser Al. 2019. Span-level model for relation extraction. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • George R. Doddington, Alexis Mitchell, Mark A. Przybocki, Lance A. Ramshaw, Stephanie M. Strassel, and Ralph M. Weischedel. 2004. The automatic content extraction (ace) program-tasks, data, and evaluation. In Proc. of LREC.
    Google ScholarLocate open access versionFindings
  • Timothy Dozat and Christopher D. Manning. 2017. Deep biaffine attention for neural dependency parsing. In Proc. of ICLR.
    Google ScholarLocate open access versionFindings
  • Markus Eberts and Adrian Ulges. 2019. Span-based joint entity and relation extraction with transformer pre-training. arXiv preprint.
    Google ScholarFindings
  • Radu Florian, Hongyan Jing, Nanda Kambhatla, and Imed Zitouni. 2006. Factorizing complex models: A case study in mention detection. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Radu Florian, John F. Pitrelli, Salim Roukos, and Imed Zitouni. 2010. Improving mention detection robustness to noisy input. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Alex Graves, Santiago Fernandez, and Jurgen Schmidhuber. 2007. Multi-dimensional recurrent neural networks. In Proc. of ICANN.
    Google ScholarLocate open access versionFindings
  • Alex Graves, Abdel-rahman Mohamed, and Geoffrey E. Hinton. 2013. Speech recognition with deep recurrent neural networks. In Proc. of ICASSP.
    Google ScholarLocate open access versionFindings
  • Pankaj Gupta, Subburam Rajaram, Hinrich Schutze, and Thomas Runkler. 2019. Neural relation extraction within and across sentence boundaries. In Proc. of AAAI.
    Google ScholarLocate open access versionFindings
  • Pankaj Gupta, Hinrich Schutze, and Bernt Andrassy. 2016. Table filling multi-task recurrent neural network for joint entity and relation extraction. In Proc. of COLING.
    Google ScholarLocate open access versionFindings
  • Harsha Gurulingappa, Abdul Mateen Rajput, Angus Roberts, Juliane Fluck, Martin Hofmann-Apitius, and Luca Toldo. 2012. Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports. Journal of biomedical informatics.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proc. of CVPR.
    Google ScholarLocate open access versionFindings
  • Jing Jiang and ChengXiang Zhai. 2007. A systematic exploration of the feature space for relation extraction. In Proc. of HLT-NAACL.
    Google ScholarLocate open access versionFindings
  • Arzoo Katiyar and Claire Cardie. 2017. Going out on a limb: Joint extraction of entity mentions and relations without dependency trees. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. In Proc. of HLT-NAACL.
    Google ScholarLocate open access versionFindings
  • Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. In Proc. of ICLR.
    Google ScholarLocate open access versionFindings
  • Fei Li, Meishan Zhang, Guohong Fu, and Donghong Ji. 2017. A neural joint model for entity and relation extraction from biomedical text. BMC bioinformatics.
    Google ScholarFindings
  • Fei Li, Yue Zhang, Meishan Zhang, and Donghong Ji. 2016. Joint models for extracting adverse drug events from biomedical text. In Proc. of IJCAI.
    Google ScholarLocate open access versionFindings
  • Qi Li and Heng Ji. 2014. Incremental joint extraction of entity mentions and relations. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Xiaoya Li, Fan Yin, Zijun Sun, Xiayu Li, Arianna Yuan, Duo Chai, Mingxin Zhou, and Jiwei Li. 2019. Entity-relation extraction as multi-turn question answering. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint.
    Google ScholarFindings
  • Yi Luan, Dave Wadden, Luheng He, Amy Shah, Mari Ostendorf, and Hannaneh Hajishirzi. 2019. A general framework for information extraction using dynamic span graphs. In Proc. of NAACL-HLT.
    Google ScholarLocate open access versionFindings
  • Makoto Miwa and Mohit Bansal. 2016. End-to-end relation extraction using lstms on sequences and tree structures. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Makoto Miwa and Yutaka Sasaki. 2014. Modeling joint entity and relation extraction with table representation. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Guoshun Nan, Zhijiang Guo, Ivan Sekulic, and Wei Lu. 2020. Reasoning with latent structure refinement for document-level relation extraction. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Dat Quoc Nguyen and Karin Verspoor. 2019. End-toend neural relation extraction using deep biaffine attention. In Proc. of ECIR.
    Google ScholarLocate open access versionFindings
  • Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Wen-tau Yih. 2017. Cross-sentence n-ary relation extraction with graph lstms. Transactions of the Association for Computational Linguistics.
    Google ScholarFindings
  • Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proc. of NAACL.
    Google ScholarLocate open access versionFindings
  • Barbara Plank and Alessandro Moschitti. 2013. Embedding semantic similarity in tree kernels for domain adaptation of relation extraction. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Lev-Arie Ratinov and Dan Roth. 2009. Design challenges and misconceptions in named entity recognition. In Proc. of CoNLL.
    Google ScholarLocate open access versionFindings
  • Dan Roth and Wen tau Yih. 2004. A linear programming formulation for global inference in natural language tasks. In Proc. of CoNLL.
    Google ScholarLocate open access versionFindings
  • Erik F. Sang and Jorn Veenstra. 1999. Representing text chunks. In Proc. of EACL.
    Google ScholarLocate open access versionFindings
  • Richard Socher, Brody Huval, Christopher D. Manning, and Andrew Y. Ng. 2012. Semantic compositionality through recursive matrix-vector spaces. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Rupesh Kumar Srivastava, Klaus Greff, and Jurgen Schmidhuber. 2015. Highway networks. arXiv preprint.
    Google ScholarFindings
  • Ang Sun, Ralph Grishman, and Satoshi Sekine. 2011. Semi-supervised relation extraction with large-scale word clustering. In Proc. of NAACL-HLT.
    Google ScholarLocate open access versionFindings
  • Changzhi Sun, Yuanbin Wu, Man Lan, Shiliang Sun, Wenting Wang, Kuang-Chih Lee, and Kewen Wu. 2018. Extracting entities and relations with joint minimum risk training. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Tung Tran and Ramakanth Kavuluru. 2019. Neural metric learning for fast end-to-end relation extraction. arXiv preprint.
    Google ScholarFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proc. of NIPS.
    Google ScholarLocate open access versionFindings
  • Patrick Verga, Emma Strubell, and Andrew McCallum. 2018. Simultaneously self-attending to all mentions for full-abstract biological relation extraction. In Proc. of NAACL-HLT.
    Google ScholarLocate open access versionFindings
  • Jesse Vig. 2019. A multiscale visualization of attention in the transformer model. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • David Wadden, Ulme Wennberg, Yi Luan, and Hannaneh Hajishirzi. 2019.
    Google ScholarFindings
  • Christopher Walker, Stephanie Strassel, Julie Medero, and Kazuaki Maeda. 2006. Ace 2005 multilingual training corpus. Linguistic Data Consortium.
    Google ScholarFindings
  • Haoyu Wang, Ming Tan, Mo Yu, Shiyu Chang, Dakuo Wang, Kun Xu, Xiaoxiao Guo, and Saloni Potdar. 2019. Extracting multiple-relations in one-pass with pre-trained transformers. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Ralph Weischedel, Eduard Hovy, Mitchell Marcus, Martha Palmer, Robert Belvin, Sameer Pradhan, Lance Ramshaw, and Nianwen Xue. 2011. Ontonotes: A large training corpus for enhanced processing. Handbook of Natural Language Processing and Machine Translation.
    Google ScholarLocate open access versionFindings
  • Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, and Jamie Brew. 2019. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint.
    Google ScholarFindings
  • Yuan Yao, Deming Ye, Peng Li, Xu Han, Yankai Lin, Zhenghao Liu, Zhiyuan Liu, Lixin Huang, Jie Zhou, and Maosong Sun. 2019. Docred: A large-scale document-level relation extraction dataset. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
Author
Jue WANG
Jue WANG
0
Your rating :

No Ratings

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn