AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Our investigation leads to three valuable conclusions, which shows the necessity of decent name regularity to identify unseen mentions, the hazard of high mention coverage to model generalization, and the redundancy of enormous data to capture context patterns

A Rigorous Study on Named Entity Recognition: Can Fine tuning Pretrained Model Lead to the Promised Land?

EMNLP 2020, pp.7291-7300, (2020)

Cited by: 0|Views144
Full Text
Bibtex
Weibo

Abstract

Fine-tuning pretrained model has achieved promising performance on standard NER benchmarks. Generally, these benchmarks are blessed with strong name regularity, high mention coverage and sufficient context diversity. Unfortunately, when scaling NER to open situations, these advantages may no longer exist. And therefore it raises a critica...More

Code:

Data:

0
Introduction
  • Named entity recognition (NER), or more generally name tagging, aims to identify text spans pertaining to specific entity types.
  • NER is a fundamental task of information extraction which enables many downstream NLP applications, such as relation extraction (GuoDong et al, 2005; Mintz et al, 2009), event extraction (Ji and Grishman, 2008; Li et al, 2013) and machine reading comprehension (Rajpurkar et al, 2016; Wang et al.,.
  • Name Regularity Mention Coverage Context Pattern Examples Regular NER Open NER.
  • Entity types with strong Entity types with weak regularity or no regularity
Highlights
  • Named entity recognition (NER), or more generally name tagging, aims to identify text spans pertaining to specific entity types
  • One critical difference between regular and open NER is whether names of the same entity type share inner compositional structure
  • If we can figure out how many training instances are sufficient for context pattern and name regularity respectively, it will provide valuable insights for constructing open NER datasets and models more effectively and efficiently
  • Mention Reduction (MR) only keeps a part of mentions in the original training set as seeds, and replace other mentions in the training data with a mention randomly sampled from the seeds of the same type
  • This paper investigates whether current state-ofthe-art models on regular NER can still work well on open NER
  • The model performs significantly better in the Mention Permutation (MP) setting than in the Name Permutation (NP) setting in all entity types
  • Our investigation leads to three valuable conclusions, which shows the necessity of decent name regularity to identify unseen mentions, the hazard of high mention coverage to model generalization, and the redundancy of enormous data to capture context patterns
Methods
  • Experiments on Open NER

    4.1 Data Preparation

    To further verify the conclusions from the randomization test, the authors propose to conduct experiments on a real-world open NER dataset, which focuses on real world entity types with weaker name regularity than previously-used benchmarks.
  • Note that different from real scenarios, this dataset only keeps sentences containing at least one mention, Due to the partial labeling nature of Wikipedia, this dataset only keeps sentences containing at least one mention, and the performance on this dataset may over-estimate the precision than in real applications.
  • This may different from real scenarios, the authors believe it can still lead to reasonable conclusions
Results
  • In the majority of entity types, the performance slips more than 40%.
  • The model performs significantly better in the MP setting than in the NP setting in all entity types.
  • There is no significant performance improvement on PER, ORG and GPE when the preserved sentences are more than 30% of the vanilla data.
  • The performance gap between InDict and OutDict portions is still significant – more than 24% and 18% on precision and recall respectively, which verifies the necessity of name regularity for NER to achieve good generalization
Conclusion
  • If the authors can figure out how many training instances are sufficient for context pattern and name regularity respectively, it will provide valuable insights for constructing open NER datasets and models more effectively and efficiently
  • To this end, the authors conduct context reduction (CR) and mention reduction (MR) on the vanilla training set using simple data augmentation strategies.
  • The above findings shed light on the promising directions for open NER, including 1) exploiting name regularity more efficiently with easilyobtainable resources such as gazetteers; 2) preventing the overfit on popular in-dictionary mentions with constraints or regularizers; and 3) reducing the need of training data by decoupling the acquisition of context knowledge and name knowledge
Summary
  • Introduction:

    Named entity recognition (NER), or more generally name tagging, aims to identify text spans pertaining to specific entity types.
  • NER is a fundamental task of information extraction which enables many downstream NLP applications, such as relation extraction (GuoDong et al, 2005; Mintz et al, 2009), event extraction (Ji and Grishman, 2008; Li et al, 2013) and machine reading comprehension (Rajpurkar et al, 2016; Wang et al.,.
  • Name Regularity Mention Coverage Context Pattern Examples Regular NER Open NER.
  • Entity types with strong Entity types with weak regularity or no regularity
  • Methods:

    Experiments on Open NER

    4.1 Data Preparation

    To further verify the conclusions from the randomization test, the authors propose to conduct experiments on a real-world open NER dataset, which focuses on real world entity types with weaker name regularity than previously-used benchmarks.
  • Note that different from real scenarios, this dataset only keeps sentences containing at least one mention, Due to the partial labeling nature of Wikipedia, this dataset only keeps sentences containing at least one mention, and the performance on this dataset may over-estimate the precision than in real applications.
  • This may different from real scenarios, the authors believe it can still lead to reasonable conclusions
  • Results:

    In the majority of entity types, the performance slips more than 40%.
  • The model performs significantly better in the MP setting than in the NP setting in all entity types.
  • There is no significant performance improvement on PER, ORG and GPE when the preserved sentences are more than 30% of the vanilla data.
  • The performance gap between InDict and OutDict portions is still significant – more than 24% and 18% on precision and recall respectively, which verifies the necessity of name regularity for NER to achieve good generalization
  • Conclusion:

    If the authors can figure out how many training instances are sufficient for context pattern and name regularity respectively, it will provide valuable insights for constructing open NER datasets and models more effectively and efficiently
  • To this end, the authors conduct context reduction (CR) and mention reduction (MR) on the vanilla training set using simple data augmentation strategies.
  • The above findings shed light on the promising directions for open NER, including 1) exploiting name regularity more efficiently with easilyobtainable resources such as gazetteers; 2) preventing the overfit on popular in-dictionary mentions with constraints or regularizers; and 3) reducing the need of training data by decoupling the acquisition of context knowledge and name knowledge
Tables
  • Table1: Illustration of our four kinds of randomization test. The utterances in square brackets are entity mentions. Name: name regularity knowledge; Mention: high mention coverage; Context: sufficient training instances for context diversity ‘: the knowledge is preserved in this setting; Ś: the knowledge is erased from the data in the setting; Ó: the knowledge decreases
  • Table2: Micro-F1 scores of BERT-CRF tagger on original data, name permutation setting and mention permutation setting respectively. We can see that erasing name regularity and mention coverage will significantly undermine the model performance. We can see that the performance of MP further drops compared with NP, which demonstrates high mention coverage can make mention detection much simpler. To further investigate whether high mention coverage will influence the models’ generalization ability, we also compared MP with NP on the out-of-dictionary portion. The results are shown in Table 4
  • Table3: Comparasion between baseline and name permutation on in-dictionary and out-of-dictionary portions. We can see that the performance gap between InDict and OutDict is significantly enlarged when name regularity was erased
  • Table4: Experiment results on OutDict portion. We can see that mention permutation significantly performs better than name permutation, which indicates that high mention coverage may undermine the generalization ability of models
  • Table5: Comparasion between in-dictionary portion and out-of-dictionary portion on Wikipedia dataset. We can see that there is a significant gap between these two portions
Download tables as Excel
Related work
Funding
  • Moreover, this research work is supported by National Key R&D Program of China (2020AAA0105200), the National Natural Science Foundation of China under Grants no
Reference
  • Alan Akbik, Tanja Bergmann, and Roland Vollgraf. 2019. Pooled contextualized embeddings for named entity recognition. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 724–728.
    Google ScholarLocate open access versionFindings
  • Oliver Bender, Franz Josef Och, and Hermann Ney. 2003. Maximum entropy models for named entity recognition. In Proceedings of the seventh conference on Natural language learning at HLTNAACL 2003-Volume 4, pages 148–151. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yixin Cao, Zikun Hu, Tat-Seng Chua, Zhiyuan Liu, and Heng Ji. 2019. Low-resource name tagging learned with weakly labeled data. arXiv preprint arXiv:1908.09659.
    Findings
  • Hai Leong Chieu and Hwee Tou Ng. 2002. Named entity recognition: a maximum entropy approach using global information. In Proceedings of the 19th international conference on Computational linguisticsVolume 1, pages 1–7. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jason PC Chiu and Eric Nichols. 2016. Named entity recognition with bidirectional lstm-cnns. Transactions of the Association for Computational Linguistics, 4:357–370.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    Findings
  • Eugene Edgington and Patrick Onghena. 200Randomization tests. Chapman and Hall/CRC.
    Google ScholarFindings
  • Zhou GuoDong, Su Jian, Zhang Jie, and Zhang Min. 2005. Exploring various knowledge in relation extraction. In Proceedings of the 43rd annual meeting on association for computational linguistics, pages 427–434. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Heng Ji and Ralph Grishman. 2008. Refining event extraction through cross-document inference. Proceedings of ACL-08: HLT, pages 254–262.
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
    Findings
  • Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360.
    Findings
  • Qi Li, Heng Ji, and Liang Huang. 2013. Joint event extraction via structured prediction with global features. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 73–82.
    Google ScholarLocate open access versionFindings
  • Xiaoya Li, Jingrong Feng, Yuxian Meng, Qinghong Han, Fei Wu, and Jiwei Li. 2019a. A unified mrc framework for named entity recognition. arXiv preprint arXiv:1910.11476.
    Findings
  • Xiaoya Li, Fan Yin, Zijun Sun, Xiayu Li, Arianna Yuan, Duo Chai, Mingxin Zhou, and Jiwei Li. 2019b. Entity-relation extraction as multi-turn question answering. arXiv preprint arXiv:1905.05529.
    Findings
  • Hongyu Lin, Yaojie Lu, Xianpei Han, and Le Sun. 2019a. Sequence-to-nuggets: Nested entity mention detection via anchor-region networks. arXiv preprint arXiv:1906.03783.
    Findings
  • Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun, Bin Dong, and Shanshan Jiang. 2019b. Gazetteerenhanced attentive neural networks for named entity recognition. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP), pages 6233–6238.
    Google ScholarLocate open access versionFindings
  • Thomas Lin, Oren Etzioni, et al. 2012. Entity linking at web scale. In Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Webscale Knowledge Extraction, pages 84–88. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ying Lin, Shengqi Yang, Veselin Stoyanov, and Heng Ji. 20A multi-lingual multi-task architecture for low-resource sequence labeling. In ACL.
    Google ScholarFindings
  • Yaojie Lu, Hongyu Lin, Xianpei Han, and Le Sun. 20Distilling discrimination and generalization knowledge for event detection via deltarepresentation learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4366–4376.
    Google ScholarLocate open access versionFindings
  • Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2, pages 1003–1011. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jian Ni, Georgiana Dinu, and Radu Florian. 2017. Weakly supervised cross-lingual named entity recognition via effective annotation and representation projection. In ACL.
    Google ScholarFindings
  • Xiaoman Pan, Boliang Zhang, Jonathan May, Joel Nothman, Kevin Knight, and Heng Ji. 2017. Crosslingual name tagging and linking for 282 languages. In ACL.
    Google ScholarFindings
  • Nanyun Peng and Mark Dredze. 2016. Improving named entity recognition for chinese social media with word segmentation representation learning. In ACL.
    Google ScholarFindings
  • Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. arXiv preprint arXiv:1802.05365.
    Findings
  • Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250.
    Findings
  • Cicero Nogueira dos Santos and Victor Guimaraes. 2015. Boosting named entity recognition with neural character embeddings. arXiv preprint arXiv:1505.05008.
    Findings
  • Burr Settles. 2004. Biomedical named entity recognition using conditional random fields and rich feature sets. In Proceedings of the international joint workshop on natural language processing in biomedicine and its applications, pages 104–107. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Oscar Tackstrom, Dipanjan Das, Slav Petrov, Ryan McDonald, and Joakim Nivre. 2013. Token and type constraints for cross-lingual part-of-speech tagging. TACL.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
    Google ScholarLocate open access versionFindings
  • Bailin Wang and Wei Lu. 2018. Neural segmental hypergraphs for overlapping mention recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 204–214. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Bailin Wang, Wei Lu, Yu Wang, and Hongxia Jin. 2018. A neural transition-based model for nested mention recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1011–1017. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Zhiguo Wang, Haitao Mi, Wael Hamza, and Radu Florian. 2016. Multi-perspective context matching for machine comprehension. arXiv preprint arXiv:1612.04211.
    Findings
  • Congying Xia, Chenwei Zhang, Tao Yang, Yaliang Li, Nan Du, Xian Wu, Wei Fan, Fenglong Ma, and Philip Yu. 2019. Multi-grained named entity recognition. arXiv preprint arXiv:1906.08449.
    Findings
  • Jiateng Xie, Zhilin Yang, Graham Neubig, Noah A Smith, and Jaime Carbonell. 2018. Neural crosslingual named entity recognition with minimal resources. In EMNLP.
    Google ScholarFindings
  • Mengge Xue, W. Cai, Jinsong Su, Linfeng Song, Y. Ge, Y. Liu, and B. Wang. 2019. Neural collective entity linking based on recurrent random walk network learning. In IJCAI.
    Google ScholarFindings
  • Vikas Yadav and Steven Bethard. 2019. A survey on recent advances in named entity recognition from deep learning models. arXiv preprint arXiv:1910.11470.
    Findings
  • Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. arXiv preprint arXiv:1906.08237.
    Findings
  • Zhilin Yang, Ruslan Salakhutdinov, and William W Cohen. 2017. Transfer learning for sequence tagging with hierarchical recurrent networks. In ICLR.
    Google ScholarFindings
  • Zenan Zhai, Dat Quoc Nguyen, Saber A Akhondi, Camilo Thorne, Christian Druckenbrodt, Trevor Cohn, Michelle Gregory, and Karin Verspoor. 2019. Improving chemical named entity recognition in patents with contextualized word embeddings. arXiv preprint arXiv:1907.02679.
    Findings
  • Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2016. Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530.
    Findings
  • GuoDong Zhou and Jian Su. 2002. Named entity recognition using an hmm-based chunk tagger. In proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 473–480. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
小科