AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
This paper introduced a local additivity based data augmentation methods for Named Entity

Local Additivity Based Data Augmentation for Semi supervised NER

EMNLP 2020, pp.1241-1251, (2020)

Cited by: 1|Views133
Full Text
Bibtex
Weibo

Abstract

Named Entity Recognition (NER) is one of the first stages in deep language understanding yet current NER models heavily rely on human-annotated data. In this work, to alleviate the dependence on labeled data, we propose a Local Additivity based Data Augmentation (LADA) method for semi-supervised NER, in which we create virtual samples by ...More

Code:

Data:

0
Introduction
  • Named Entity Recognition (NER) that aims to detect the semantic category of entities in unstructured text (Nadeau and Sekine, 2007), is an essential prerequisite for many NLP applications.
  • Different kinds of data augmentation approaches have been designed to alleviate the dependency on labeled data for many NLP tasks, and can be categorized into two broad classes: (1) adversarial attacks at token-levels such as word substitutions (Kobayashi, 2018; Wei and Zou, 2019) or adding noise (Lakshmi Narayan et al, 2019), (2) paraphrasing at sentence-levels such as back translations (Xie et al, 2019) or submodular optimized models (Kumar et al, 2019)
  • The former has already been used for NER but struggles to create diverse augmented samples with very few word replacements.
  • Prior work like Snippext (Miao et al, 2020), MixText (Chen et al, 2020b) and AdvAug (Cheng et al, 2020) generalized the idea to the textual domain by proposing to interpolate in output space (Miao et al, 2020), embedding space (Cheng et al, 2020), or general hidden space (Chen et al, 2020b) of textual data and applied the technique to NLP tasks such as text classifications and machine translations and achieved significant improvements
Highlights
  • Named Entity Recognition (NER) that aims to detect the semantic category of entities in unstructured text (Nadeau and Sekine, 2007), is an essential prerequisite for many NLP applications
  • Despite being widely utilized in many NLP tasks like text classification, the latter often fails to maintain the labels at the token-level in those paraphrased sentences, making it difficult to be applied to NER
  • We introduce a local additivity based data augmentation approach with two variations, in which we constrain x to be close to x: 3.2 Intra-Local Additivity based Data Augmentation (LADA)
  • When unlabeled data was introduced, VSL-GG-Hier and MT + Noise performed slightly better than Flair and BERT with 5% labeled data in CoNLL, but pre-trained models (Flair, BERT) still got higher F1 scores when there were more labeled data. Both kinds of BERT + Semi-LADA significantly boosted the F1 scores on CoNLL and GermEval compared to baselines, as Semi-LADA utilized LADA on labeled data to avoid overfitting and combined back translation based data augmentations on unlabeled data for consistent training, which made full use of both labeled data and unlabeled data
  • Note that our LADA was orthogonal to these two models
  • This paper introduced a local additivity based data augmentation (LADA) methods for Named Entity
Methods
  • Based on the above interpolation based data augmentation techniques, in Section 3.1, the authors introduced a Local Additivity based Data Augmentation (LADA) for sequence labeling, where creating augmented samples is much more challenging.
  • For a given sentence with n tokens x = {x1, ..., xn}, denote the corresponding sequence label as y = {y1, ..., yn}.
  • The authors randomly sample a pair of sentences from the corpus, (x, y) and (x , y ), and compute the interpolations in the hidden space using a L-layer encoder F(.; θ).
Results
  • The authors evaluated the baselines and the methods using F1-scores on the test set.

    Utilizing Limited Labeled Data The authors varied the number of labeled data and the results were shown in Table 3.
  • When unlabeled data was introduced, VSL-GG-Hier and MT + Noise performed slightly better than Flair and BERT with 5% labeled data in CoNLL, but pre-trained models (Flair, BERT) still got higher F1 scores when there were more labeled data
  • Both kinds of BERT + Semi-LADA significantly boosted the F1 scores on CoNLL and GermEval compared to baselines, as Semi-LADA utilized LADA on labeled data to avoid overfitting and combined back translation based data augmentations on unlabeled data for consistent training, which made full use of both labeled data and unlabeled data.
  • Note that the LADA was orthogonal to these two models
Conclusion
  • This paper introduced a local additivity based data augmentation (LADA) methods for Named Entity.
Summary
  • Introduction:

    Named Entity Recognition (NER) that aims to detect the semantic category of entities in unstructured text (Nadeau and Sekine, 2007), is an essential prerequisite for many NLP applications.
  • Different kinds of data augmentation approaches have been designed to alleviate the dependency on labeled data for many NLP tasks, and can be categorized into two broad classes: (1) adversarial attacks at token-levels such as word substitutions (Kobayashi, 2018; Wei and Zou, 2019) or adding noise (Lakshmi Narayan et al, 2019), (2) paraphrasing at sentence-levels such as back translations (Xie et al, 2019) or submodular optimized models (Kumar et al, 2019)
  • The former has already been used for NER but struggles to create diverse augmented samples with very few word replacements.
  • Prior work like Snippext (Miao et al, 2020), MixText (Chen et al, 2020b) and AdvAug (Cheng et al, 2020) generalized the idea to the textual domain by proposing to interpolate in output space (Miao et al, 2020), embedding space (Cheng et al, 2020), or general hidden space (Chen et al, 2020b) of textual data and applied the technique to NLP tasks such as text classifications and machine translations and achieved significant improvements
  • Methods:

    Based on the above interpolation based data augmentation techniques, in Section 3.1, the authors introduced a Local Additivity based Data Augmentation (LADA) for sequence labeling, where creating augmented samples is much more challenging.
  • For a given sentence with n tokens x = {x1, ..., xn}, denote the corresponding sequence label as y = {y1, ..., yn}.
  • The authors randomly sample a pair of sentences from the corpus, (x, y) and (x , y ), and compute the interpolations in the hidden space using a L-layer encoder F(.; θ).
  • Results:

    The authors evaluated the baselines and the methods using F1-scores on the test set.

    Utilizing Limited Labeled Data The authors varied the number of labeled data and the results were shown in Table 3.
  • When unlabeled data was introduced, VSL-GG-Hier and MT + Noise performed slightly better than Flair and BERT with 5% labeled data in CoNLL, but pre-trained models (Flair, BERT) still got higher F1 scores when there were more labeled data
  • Both kinds of BERT + Semi-LADA significantly boosted the F1 scores on CoNLL and GermEval compared to baselines, as Semi-LADA utilized LADA on labeled data to avoid overfitting and combined back translation based data augmentations on unlabeled data for consistent training, which made full use of both labeled data and unlabeled data.
  • Note that the LADA was orthogonal to these two models
  • Conclusion:

    This paper introduced a local additivity based data augmentation (LADA) methods for Named Entity.
Tables
  • Table1: kNNs of an example sentence. Entities in sentences are colored. Green means locations , red means persons , blue means organizations and yellow means miscellaneous
  • Table2: Data statistics and our data split following <a class="ref-link" id="cBenikova_et+al_2014_a" href="#rBenikova_et+al_2014_a">Benikova et al (2014</a>)
  • Table3: The F1 scores on CoNLL 2003 and GermEval 2014 training with varying amounts of the labeled training data (5%, 10%, and 30% of the original training set). There were 10,000 unlabeled data for each dataset which was randomly sampled from the original training set. All the results were averaged over 5 runs. † denotes our methods
  • Table4: The F1 score on CoNLL 2003 and GermEval 2014 training with all the labeled training data. ‡ means incorporating our LADA data augmentation techniques into pre-trained models
  • Table5: F1 scores of BERT on test set with different strategy to tag sub-tokens trained with 5% labeled data
Download tables as Excel
Related work
  • 5.1 Named Entity Recognition

    Conditional random fields (CRFs) (Lafferty et al, 2001b; Sutton et al, 2004) have been widely used for NER, until recently they have been outperformed by neural networks. Hammerton (2003) and Collobert et al (2011) are among the first several studies to model sequence labeling using neural networks. Specifically Hammerton (2003) encoded the input sequence using a unidirectional LSTM (Hochreiter and Schmidhuber, 1997) while (Collobert et al, 2011) instead used a CNN with character level embedding to encode sentences. Ma and Hovy (2016); Lample et al (2016b) proposed LSTM-CRFs to combine neural networks with CRFs that aim to leverage both the representation learning capabilities of neural network and structured loss from CRFs. Instead of modeling NER as a sequence modeling problem, Li et al (2020) converted NER into a reading comprehension task with an input sentence and a query sentence based on the entity types and achieved competitive performance.

    5.2 Semi-supervised Learning for NER

    There has been extensive previous work (Altun et al, 2005; Søgaard, 2011; Mann and McCallum, 2010) that utilized semi-supervised learning for NER. For instance, (Zhang et al, 2017; Chen et al, 2018) applied variational autoencoders (VAEs) to semi-supervised sequence labeling; (Zhang et al, 2017) proposed to use discrete labeling sequence as latent variables while (Chen et al, 2018) used continuous latent variables in their models. Recently, contextual representations such as ELMO (Peters
Funding
  • We acknowledge the support of NVIDIA Corporation with the donation of GPU used for this research. et al, 2018b) and BERT (Devlin et al, 2019) trained on a large amount of unlabeled data have been applied to NER and achieved reasonable performances
Reference
  • Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland Vollgraf. 2019. FLAIR: An easy-to-use framework for state-of-theart NLP. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pages 54–59, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Alan Akbik, Duncan Blythe, and Roland Vollgraf. 2018. Contextual string embeddings for sequence labeling. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1638–1649, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yasemin Altun, David A. McAllester, and Mikhail Belkin. 2005. Margin semi-supervised learning for structured variables. In Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, NIPS 2005, December 5-8, 2005, Vancouver, British Columbia, Canada], pages 33– 40.
    Google ScholarLocate open access versionFindings
  • Darina Benikova, Chris Biemann, Max Kisselew, and Sebastian Pad. 201GermEval 2014 Named Entity Recognition Shared Task: Companion Paper. In Proceedings of the KONVENS GermEval workshop, pages 104–112, Hildesheim, Germany.
    Google ScholarLocate open access versionFindings
  • Sravan Bodapati, Hyokun Yun, and Yaser Al-Onaizan. 2019. Robustness to capitalization errors in named entity recognition. In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), pages 237–242, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jiaao Chen, Yuwei Wu, and Diyi Yang. 2020a. Semisupervised models via data augmentationfor classifying interactive affective responses.
    Google ScholarFindings
  • Jiaao Chen, Zichao Yang, and Diyi Yang. 2020b. Mixtext: Linguistically-informed interpolation of hidden space for semi-supervised text classification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Seattle, Washington, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Mingda Chen, Qingming Tang, Karen Livescu, and Kevin Gimpel. 201Variational sequential labelers for semi-supervised learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 215–226, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yong Cheng, Lu Jiang, Wolfgang Macherey, and Jacob Eisenstein. 2020. Advaug: Robust data augmentation for neural machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Seattle, Washington, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Kevin Clark, Minh-Thang Luong, Christopher D. Manning, and Quoc Le. 2018. Semi-supervised sequence modeling with cross-view training. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel P. Kuksa. 20Natural language processing (almost) from scratch. J. Mach. Learn. Res., 12:2493–2537.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • James Hammerton. 2003. Named entity recognition with long short-term memory. In Proceedings of the Seventh Conference on Natural Language Learning, CoNLL 2003, Held in cooperation with HLT-NAACL 2003, Edmonton, Canada, May 31 - June 1, 2003, pages 172–175. ACL.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
    Google ScholarLocate open access versionFindings
  • Sosuke Kobayashi. 2018. Contextual augmentation: Data augmentation by words with paradigmatic relations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 452–457, New Orleans, Louisiana. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ashutosh Kumar, Satwik Bhattamishra, Manik Bhandari, and Partha Talukdar. 2019. Submodular optimization-based diverse paraphrasing and its effectiveness in data augmentation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3609–3619, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001a. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML 01, page 282289, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
    Google ScholarLocate open access versionFindings
  • John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001b. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, June 28 - July 1, 2001, pages 282–289. Morgan Kaufmann.
    Google ScholarLocate open access versionFindings
  • Pooja Lakshmi Narayan, Ajay Nagesh, and Mihai Surdeanu. 20Exploration of noise strategies in semisupervised named entity classification. In Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019), pages 186– 191, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016a. Neural architectures for named entity recognition. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 260–270, San Diego, California. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016b. Neural architectures for named entity recognition. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 260–270, San Diego, California. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Xiaoya Li, Jingrong Feng, Yuxian Meng, Qinghong Han, Fei Wu, and Jiwei Li. 2020. A unified mrc framework for named entity recognition. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Seattle, Washington, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Bill Yuchen Lin, Dong-Ho Lee, Ming Shen, Ryan Moreno, Xiao Huang, Prashant Shiralkar, and Xiang Ren. 2020. Triggerner: Learning with entity triggers as explanations for named entity recognition.
    Google ScholarFindings
  • Angli Liu, Jingfei Du, and Veselin Stoyanov. 2019. Knowledge-augmented language model and its application to unsupervised named-entity recognition. In Proceedings of the 2019 Conference of the North
    Google ScholarLocate open access versionFindings
  • American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1142–1150, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarFindings
  • Xuezhe Ma and Eduard Hovy. 2016. End-to-end sequence labeling via bi-directional LSTM-CNNsCRF. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1064–1074, Berlin, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Gideon S. Mann and Andrew McCallum. 2010. Generalized expectation criteria for semi-supervised learning with weakly labeled data. J. Mach. Learn. Res., 11:955–984.
    Google ScholarLocate open access versionFindings
  • Zhengjie Miao, Yuliang Li, Xiaolan Wang, and WangChiew Tan. 2020. Snippext: Semi-supervised opinion mining with augmented data. In Proceedings of The Web Conference 2020, WWW 20, page 617628, New York, NY, USA. Association for Computing Machinery.
    Google ScholarLocate open access versionFindings
  • David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Linguisticae Investigationes, 30(1):3–26. Publisher: John Benjamins Publishing Company.
    Google ScholarLocate open access versionFindings
  • Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018a. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227–2237, New Orleans, Louisiana. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018b. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227–2237, New Orleans, Louisiana. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Nils Reimers and Iryna Gurevych. 2019. Sentencebert: Sentence embeddings using siamese bertnetworks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Anders Søgaard. 2011. Semi-supervised condensed nearest neighbor for part-of-speech tagging. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 48–52, Portland, Oregon, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • dom fields: factorized probabilistic models for labeling and segmenting sequence data. In Machine Learning, Proceedings of the Twenty-first International Conference (ICML 2004), Banff, Alberta, Canada, July 4-8, 2004, volume 69 of ACM International Conference Proceeding Series. ACM.
    Google ScholarLocate open access versionFindings
  • Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the conll-2003 shared task: Language-independent named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4, CONLL 03, page 142147, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Ioannis Mitliagkas, Aaron Courville, David Lopez-Paz, and Yoshua Bengio. 2018. Manifold mixup: Better representations by interpolating hidden states. arXiv preprint arXiv:1806.05236.
    Findings
  • Jason Wei and Kai Zou. 2019. EDA: Easy data augmentation techniques for boosting performance on text classification tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6382–6388, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Qizhe Xie, Zihang Dai, Eduard H. Hovy, Minh-Thang Luong, and Quoc V. Le. 2019. Unsupervised data augmentation. CoRR, abs/1904.12848.
    Findings
  • Sangdoo Yun, Dongyoon Han, Sanghyuk Chun, Seong Joon Oh, Youngjoon Yoo, and Junsuk Choe. 2019. Cutmix: Regularization strategy to train strong classifiers with localizable features. 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
    Google ScholarLocate open access versionFindings
  • Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. 2018. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Xiao Zhang, Yong Jiang, Hao Peng, Kewei Tu, and Dan Goldwasser. 2017. Semi-supervised structured prediction with neural CRF autoencoder. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1701–1711, Copenhagen, Denmark. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • GuoDong Zhou and Jian Su. 2002. Named entity recognition using an HMM-based chunk tagger. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 473– 480, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Charles A. Sutton, Khashayar Rohanimanesh, and Andrew McCallum. 2004. Dynamic conditional ran-
    Google ScholarFindings
Your rating :
0

 

Tags
Comments
小科