Leveraging Document-Level Label Consistency for Named Entity Recognition

IJCAI 2020, pp. 3976-3982, 2020.

被引用0|引用|浏览147|DOI:https://doi.org/10.24963/ijcai.2020/550
EI
其它链接dblp.uni-trier.de|academic.microsoft.com
微博一下
The experimental results on three named entity recognition benchmarks demonstrated that the proposed method significantly outperformed the previous state-of-the-art methods

摘要

Document-level label consistency is an effective indicator that different occurrences of a particular token sequence are very likely to have the same entity types. Previous work focused on better context representations and used the CRF for label decoding. However, CRF-based methods are inadequate for modeling document-level label consist...更多

代码

数据

0
简介
  • The task of named entity recognition (NER) involves determining entity boundaries and recognizing the categories of named entities, which is a fundamental task in the field of natural language processing (NLP).
  • Because most neural NER models are implemented using bi-directional long short-term memory (BiLSTM) networks [Ma and Hovy, 2016; Lample et al, 2016; Peters et al, 2018], they have a limited ability to exploiting non-local and nonsequential dependencies such as co-references and identical mentions [Qian et al, 2019].
  • Many of the existing methods have focused on the better Local Dependency.
  • Rusty Greer 's two-run homer in the top of
重点内容
  • The task of named entity recognition (NER) involves determining entity boundaries and recognizing the categories of named entities, which is a fundamental task in the field of natural language processing (NLP)
  • In order to mitigate the side effects of incorrect draft labels, Bayesian neural networks [Gal and Ghahramani, 2016b] are used to indicate the draft labels with a high probability of being wrong, which can greatly assist in preventing the incorrect refinement of correct draft labels
  • The experimental results on three named entity recognition benchmarks demonstrated that the proposed method significantly outperformed the previous state-of-the-art methods
  • The main contributions of this paper can be summarized as follows: 1) novel two-stage label refinement networks are proposed, which can better model document-level label dependencies; 2) the proposed method can model label dependencies in parallel, which performs up to 5.48 times faster than state-of-the-art methods during the inference phase; 3) the use of Bayesian neural networks is proposed to estimate the uncertainty of the predictions and indicate potentially incorrect labels that should be refined; and 4) the experimental results across three named entity recognition datasets indicate that the proposed method significantly outperforms the start-ofthe-art methods
  • We introduce a novel two-stage label refinement approach to handle document-level label consistency
  • In order to mitigate the side effects of incorrect draft labels, we use Bayesian neural networks to indicate the labels with a high probability of being wrong
方法
  • BiLSTM-CRF. Ma and Hovy [2016] utilizes a CRF layer on the top of the BiLSTM to model the interaction between two successive labels [Lample et al, 2016] instead of making independent labeling decisions for each output.
  • BiLSTM-CRF.
  • GraphIE [Qian et al, 2019] utilizes a cooccurrence graph to incorporate document-level contextual information and CRF to model sentence-level label dependency.
  • This methods achieves great performances in many information extraction tasks, including textual, social media and visual information extraction
结果
  • Results and Analysis

    the authors detail the performances of the proposed and baseline models.
  • The authors present the results of a series of experiments to demonstrate the effectiveness of the proposed model
结论
  • The authors introduce a novel two-stage label refinement approach to handle document-level label consistency.
  • A keyvalue memory network is used to record the context representations and draft labels.
  • A multi-channel Transformer explicitly models document-level word and label dependencies based on the information derived from the memory network.
  • In order to mitigate the side effects of incorrect draft labels, the authors use Bayesian neural networks to indicate the labels with a high probability of being wrong.
  • The proposed method can model the relationship between labels in parallel for faster inference.
  • The experimental results on three named entity recognition benchmarks demonstrated that the proposed method significantly outperformed the previous state-of-the-art methods
总结
  • Introduction:

    The task of named entity recognition (NER) involves determining entity boundaries and recognizing the categories of named entities, which is a fundamental task in the field of natural language processing (NLP).
  • Because most neural NER models are implemented using bi-directional long short-term memory (BiLSTM) networks [Ma and Hovy, 2016; Lample et al, 2016; Peters et al, 2018], they have a limited ability to exploiting non-local and nonsequential dependencies such as co-references and identical mentions [Qian et al, 2019].
  • Many of the existing methods have focused on the better Local Dependency.
  • Rusty Greer 's two-run homer in the top of
  • Methods:

    BiLSTM-CRF. Ma and Hovy [2016] utilizes a CRF layer on the top of the BiLSTM to model the interaction between two successive labels [Lample et al, 2016] instead of making independent labeling decisions for each output.
  • BiLSTM-CRF.
  • GraphIE [Qian et al, 2019] utilizes a cooccurrence graph to incorporate document-level contextual information and CRF to model sentence-level label dependency.
  • This methods achieves great performances in many information extraction tasks, including textual, social media and visual information extraction
  • Results:

    Results and Analysis

    the authors detail the performances of the proposed and baseline models.
  • The authors present the results of a series of experiments to demonstrate the effectiveness of the proposed model
  • Conclusion:

    The authors introduce a novel two-stage label refinement approach to handle document-level label consistency.
  • A keyvalue memory network is used to record the context representations and draft labels.
  • A multi-channel Transformer explicitly models document-level word and label dependencies based on the information derived from the memory network.
  • In order to mitigate the side effects of incorrect draft labels, the authors use Bayesian neural networks to indicate the labels with a high probability of being wrong.
  • The proposed method can model the relationship between labels in parallel for faster inference.
  • The experimental results on three named entity recognition benchmarks demonstrated that the proposed method significantly outperformed the previous state-of-the-art methods
表格
  • Table1: Statistics of CoNLL2003, OntoNotes and CHEMDNER datasets
  • Table2: Results on CoNLL2003 test set. † refers to adopting external task-specific resources. ‡ refers to models trained on both training and development set. ∗ are results using official released codes
  • Table3: Results on CoNLL2003 test set by integrating Language Models. ∗ refers to results rerun using fastNLP3
  • Table4: Speed and Co-Acc comparison on CoNLL2003 datasets. Co-Acc refers to the accuracy of co-occurrence tokens
  • Table5: Ablation study of DocL-NER
相关工作
  • 2.1 Neural Named Entity Recognition

    In recent years, many neural network-based methods have achieved competitive performances without massive handcrafted feature engineering, including LSTM-based methods because of their advantages in modeling sequence data [Lample et al, 2016; Ma and Hovy, 2016] and CNN-based methods because of their proficiency in parallel modeling [Strubell et al, 2017]. In order to model non-local and non-sequential dependencies, sentence-level [Zhang et al, 2018c; Liu et al, 2019] and document-level contextualized information [Qian et al, 2019; Luo et al, 2020] has been adopted to eliminate the limitations of RNNs resulting from their sequential nature. In contrast to their work, which only modeled the dependencies between words, the method reported here also models the sentence-level and documentlevel dependencies between labels.

    2.2 Label Dependency Modeling

    Creating better models for label dependencies has always been the focus of sequence labeling tasks [Ye and Ling, 2018; Zhang et al, 2018b]. In particular, the CRF layer is integrated with neural encoders to capture label transition patterns

    1https://github.com/jiacheng-ye/DocL-NER [Ma and Hovy, 2016]. Many of the recent methods have introduced label embeddings to manage longer ranges of dependencies [Zhang et al, 2018b; Cui and Zhang, 2019]. However, these methods are still trapped in the sentencelevel label dependencies. Inspired by Krishnan and Manning [2006], they proposed a two-stage approach for documentlevel label consistency, but it required slower two-layer CRFs and hand-crafted features. In contrast, the method reported here can model both the sentence-level label dependencies and document-label consistencies in parallel without handcrafted features.
基金
  • This work was partially funded by China National Key R&D Program (No 2018YFB1005104, 2018YFC0831105, 2017YFB1002104), National Natural Science Foundation of China (No 61751201, 61976056, 61532011), Shanghai Municipal Science and Technology Major Project (No.2018SHZDZX01), Science and Technology Commission of Shanghai Municipality Grant (No.18DZ1201000, 16JC1420401, 17JC1420200)
引用论文
  • [Chen et al., 2019] Hui Chen, Zijia Lin, Guiguang Ding, Jian-Guang Lou, Yusen Zhang, and Borje F. Karlsson. Grn: Gated relation network to enhance convolutional neural network for named entity recognition. In AAAI, 2019.
    Google ScholarLocate open access versionFindings
  • [Chiu and Nichols, 2016] Jason PC Chiu and Eric Nichols. Named entity recognition with bidirectional lstm-cnns. TACL, 4:357–370, 2016.
    Google ScholarLocate open access versionFindings
  • [Cui and Zhang, 2019] Leyang Cui and Yue Zhang. Hierarchically-refined label attention network for sequence labeling. In EMNLP-IJCNLP, 2019.
    Google ScholarLocate open access versionFindings
  • [Dai et al., 2019] Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc Le, and Ruslan Salakhutdinov. Transformer-XL: Attentive language models beyond a fixed-length context. In ACL, 2019.
    Google ScholarLocate open access versionFindings
  • [Devlin et al., 2019] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL, pages 4171–4186, 2019.
    Google ScholarLocate open access versionFindings
  • [Gal and Ghahramani, 2016a] Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In ICML, pages 1050–1059, 2016.
    Google ScholarLocate open access versionFindings
  • [Gal and Ghahramani, 2016b] Yarin Gal and Zoubin Ghahramani. A theoretically grounded application of dropout in recurrent neural networks. In NeurIPS, 2016.
    Google ScholarLocate open access versionFindings
  • [Hu et al., 2019] Anwen Hu, Zhicheng Dou, Jian-Yun Nie, and Ji-Rong Wen. Leveraging multi-token entities in document-level named entity recognition. In AAAI, 2019.
    Google ScholarLocate open access versionFindings
  • [Kendall and Gal, 2017] Alex Kendall and Yarin Gal. What uncertainties do we need in bayesian deep learning for computer vision? In NeurIPS, pages 5574–5584, 2017.
    Google ScholarLocate open access versionFindings
  • [Krallinger et al., 2015] Martin Krallinger, Florian Leitner, Obdulia Rabal, Miguel Vazquez, Julen Oyarzabal, and Alfonso Valencia. Chemdner: The drugs and chemical names extraction challenge. Journal of cheminformatics, 7(1):S1, 2015.
    Google ScholarLocate open access versionFindings
  • [Krishnan and Manning, 2006] Vijay Krishnan and Christopher D Manning. An effective two-stage model for exploiting non-local dependencies in named entity recognition. In ACL, pages 1121–1128, 2006.
    Google ScholarLocate open access versionFindings
  • [Lample et al., 2016] Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. Neural architectures for named entity recognition. In NAACL, pages 260–270, 2016.
    Google ScholarLocate open access versionFindings
  • [Liu et al., 2019] Yijin Liu, Fandong Meng, Jinchao Zhang, Jinan Xu, Yufeng Chen, and Jie Zhou. Gcdt: A global context enhanced deep transition architecture for sequence labeling. arXiv preprint arXiv:1906.02437, 2019.
    Findings
  • [Luo et al., 2020] Ying Luo, Fengshun Xiao, and Hai Zhao. Hierarchical contextualized representation for named entity recognition. In AAAI, 2020.
    Google ScholarLocate open access versionFindings
  • [Ma and Hovy, 2016] Xuezhe Ma and Eduard Hovy. End-toend sequence labeling via bi-directional lstm-cnns-crf. In ACL, pages 1064–1074, 2016.
    Google ScholarLocate open access versionFindings
  • [Mikolov et al., 2013] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In NeurIPS. 2013.
    Google ScholarFindings
  • [Miller et al., 2016] Alexander Miller, Adam Fisch, Jesse Dodge, Amir-Hossein Karimi, Antoine Bordes, and Jason Weston. Key-value memory networks for directly reading documents. In EMNLP, pages 1400–1409, 2016.
    Google ScholarLocate open access versionFindings
  • [Pennington et al., 2014] Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global vectors for word representation. In EMNLP, 2014.
    Google ScholarLocate open access versionFindings
  • [Peters et al., 2018] Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. In NAACL-HLT, pages 2227–2237, 2018.
    Google ScholarLocate open access versionFindings
  • [Qian et al., 2019] Yujie Qian, Enrico Santus, Zhijing Jin, Jiang Guo, and Regina Barzilay. Graphie: A graph-based framework for information extraction. In NAACL-HIT, pages 751–761, 2019.
    Google ScholarLocate open access versionFindings
  • [Strubell et al., 2017] Emma Strubell, Patrick Verga, David Belanger, and Andrew McCallum. Fast and accurate entity recognition with iterated dilated convolutions. In EMNLP, 2017.
    Google ScholarLocate open access versionFindings
  • [Tjong Kim Sang and De Meulder, 2003] Erik F. Tjong Kim Sang and Fien De Meulder. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In NAACL-HLT, pages 142–147, 2003.
    Google ScholarLocate open access versionFindings
  • [Vaswani et al., 2017] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NeurIPS, pages 5998–6008, 2017.
    Google ScholarLocate open access versionFindings
  • [Weischedel et al., 2013] Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, et al. Ontonotes release 5.0 ldc2013t19. Linguistic Data Consortium, Philadelphia, PA, 23, 2013.
    Google ScholarFindings
  • [Ye and Ling, 2018] Zhixiu Ye and Zhen-Hua Ling. Hybrid semi-markov crf for neural sequence labeling. In ACL, pages 235–240, 2018.
    Google ScholarLocate open access versionFindings
  • [Zhang et al., 2018a] Boliang Zhang, Spencer Whitehead, Lifu Huang, and Heng Ji. Global attention for name tagging. CoNLL, 2018.
    Google ScholarLocate open access versionFindings
  • [Zhang et al., 2018b] Yuan Zhang, Hongshen Chen, Yihong Zhao, Qun Liu, and Dawei Yin. Learning tag dependencies for sequence tagging. In IJCAI, pages 4581–4587, 2018.
    Google ScholarLocate open access versionFindings
  • [Zhang et al., 2018c] Yue Zhang, Qi Liu, and Linfeng Song. Sentence-state lstm for text representation. In ACL, 2018.
    Google ScholarLocate open access versionFindings
下载 PDF 全文
您的评分 :
0

 

标签
评论