AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We propose a neural model named TWASP for joint CWS and POS tagging following the character-based sequence labeling paradigm, where a two-way attention mechanism is used to incorporate both context feature and their corresponding syntactic knowledge for each input character

Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-way Attentions of Auto-analyzed Knowledge

ACL, pp.8286-8296, (2020)

Cited by: 25|Views299
EI
Full Text
Bibtex
Weibo

Abstract

Chinese word segmentation (CWS) and partof-speech (POS) tagging are important fundamental tasks for Chinese language processing, where joint learning of them is an effective one-step solution for both tasks. Previous studies for joint CWS and POS tagging mainly follow the character-based tagging paradigm with introducing contextual inform...More
0
Introduction
  • Chinese word segmentation (CWS) and part-ofspeech (POS) tagging are two fundamental and crucial tasks in natural language processing (NLP) for Chinese.
  • The former one aims to find word boundaries in a sentence and the latter, on the top of segmentation results, assigns a POS tag to each word to indicate its syntactical property in the sentence.
  • While the subject and the object of the sentence are far away from the ambiguous part in
Highlights
  • Chinese word segmentation (CWS) and part-ofspeech (POS) tagging are two fundamental and crucial tasks in natural language processing (NLP) for Chinese
  • Many studies were proposed in the past decade for joint Chinese word segmentation and POS tagging (Jiang et al, 2008, 2009; Sun, 2011; Zeng et al, 2013; Zheng et al, 2013; Kurita et al, 2017; Shao et al, 2017; Zhang et al, 2018)
  • The ambiguity can be resolved with syntactic analysis; for instance, the dependency structure, if available, would prefer the first interpretation
  • We propose a neural model named TWASP with a two-way attention mechanism to improve joint Chinese word segmentation and POS tagging by learning from auto-analyzed syntactic knowledge, which are generated by existing natural language processing toolkits and provide necessary information for the task
  • We propose neural approach with a two-way attention mechanism to incorporate autoanalyzed knowledge for joint Chinese word segmentation and POS tagging, following a character-based sequence labeling paradigm
  • Experimental results on five benchmark datasets illustrate the validity and effectiveness of our model, where the two-way attentions can be integrated with different encoders and provide consistent improvements over baseline taggers
Conclusion
  • The authors propose neural approach with a two-way attention mechanism to incorporate autoanalyzed knowledge for joint CWS and POS tagging, following a character-based sequence labeling paradigm.
  • The authors' proposed attention module learns and weights context features and their corresponding knowledge instances in two separate ways, and use the combined attentions from the two ways to enhance the joint tagging.
  • This work presents an elegant way to use autoanalyzed knowledge and enhance neural models with existing NLP tools.
  • The authors plan to apply the same methodology to other NLP tasks
Tables
  • Table1: The statistics of all experimental datasets in terms of character, word and sentence numbers. For normal splits, OOV % is computed according to the training set; for each genre in CTB9, OOV % is computed with respect to the union of other seven genres
  • Table2: Numbers of context features (S) and their corresponding knowledge instances (K) for five benchmark datasets, based on the output of SCT and BNP. Note that the K for the UD dataset follows the CTB criteria, because SCT and BNP were trained on CTB
  • Table3: Experimental results (the F-scores for segmentation and joint tagging) of TWASP using different encoders with and without auto-analyzed knowledge on the five benchmark datasets. “Syn.” and “Dep.” refer to syntactic constituents and dependency relations, respectively. The results of SCT and BNP are also reported as references, where * marks that the segmentation and POS tagging criteria from the toolkits and the UD dataset are different
  • Table4: Comparison (in F-scores of word segmentation and joint tagging) of TWASP (with BERT or ZEN encoder) with previous studies. Cells with “-” refer to the results are not reported or they are not applicable
  • Table5: Experimental results (the F-scores for word segmentation and joint tagging) from baselines and TWASP with different encoders on eight genres of CTB9. The incorporated knowledge is the POS labels from SCT
  • Table6: Performance comparison among different ways of knowledge integration, including normal attention (with respect to what knowledge type is used), the two-way attention, and key-value memory networks
  • Table7: Comparison of different knowledge ensemble results, which are presented by the joint tagging F scores from our BERT-based TWASP on CTB5. and refer to averaging and concatenation of attentions from different knowledge types, respectively. As a reference, the best result on CTB5 for BERTbased model without knowledge ensemble is 96.77% achieved by BERT + POS (SCT) (see Table 3)
Download tables as Excel
Related work
  • There are basically two approaches to CWS and POS tagging: to perform POS tagging right after word segmentation in a pipeline, or to conduct the two tasks simultaneously, known as joint CWS and POS tagging. In the past two decades, many studies have shown that joint tagging outperforms the pipeline approach (Ng and Low, 2004; Jiang et al, 2008, 2009; Wang et al, 2011; Sun, 2011; Zeng et al, 2013). In recent years, neural methods started to play a dominant role for this task (Zheng et al, 2013; Kurita et al, 2017; Shao et al, 2017; Zhang et al, 2018), where some of them tried to incorporate extra knowledge in their studies. For example, Kurita et al (2017) exploited to model n-grams to improve the task; Shao et al (2017) extended the idea by incorporating pre-trained n-gram embeddings, as well as radical embeddings, into character representations. Zhang et al (2018) tried to leverage the knowledge from character embeddings, trained on an automatically tagged corpus by a baseline tagger. Compared to these previous studies, TWASP provides a simple, yet effective, neural model for joint tagging, without requiring a complicated mechanism of incorporating different features or pre-processing a corpus.
Funding
  • Xiang Ao was partially supported by the National Natural Science Foundation of China under Grant No 61976204, U1811461, the Natural Science Foundation of Chongqing under Grant No cstc2019jcyj-msxmX0149 and the Project of Youth Innovation Promotion Association CAS
Reference
  • Wenliang Chen, Yujie Zhang, and Hitoshi Isahara. 2006. An Empirical Study of Chinese Chunking. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 97–104, Sydney, Australia.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of
    Google ScholarFindings
  • Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.
    Google ScholarLocate open access versionFindings
  • Shizhe Diao, Jiaxin Bai, Yan Song, Tong Zhang, and Yonggang Wang. 2019. ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations. ArXiv, abs/1911.00720.
    Findings
  • Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, and Tomas Mikolov. 2018. Learning Word Vectors for 157 Languages. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan.
    Google ScholarLocate open access versionFindings
  • Honglei Guo, Huijia Zhu, Zhili Guo, Xiaoxun Zhang, Xian Wu, and Zhong Su. 2009. Domain Adaptation with Latent Semantic Association for Named Entity Recognition. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 281–289, Boulder, Colorado.
    Google ScholarLocate open access versionFindings
  • Binxuan Huang and Kathleen M Carley. 2019. SyntaxAware Aspect Level Sentiment Classification with Graph Attention Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5472–5480.
    Google ScholarLocate open access versionFindings
  • Zhongqiang Huang, Mary Harper, and Wen Wang. 2007. Mandarin Part-of-Speech Tagging and Discriminative Reranking. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 1093– 1102, Prague, Czech Republic.
    Google ScholarLocate open access versionFindings
  • Wenbin Jiang, Liang Huang, and Qun Liu. 200Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging – A Case Study. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 522–530, Suntec, Singapore.
    Google ScholarLocate open access versionFindings
  • Wenbin Jiang, Liang Huang, Qun Liu, and Yajuan Lu. 2008. A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging. In Proceedings of ACL-08: HLT, pages 897– 904, Columbus, Ohio.
    Google ScholarLocate open access versionFindings
  • Nikita Kitaev and Dan Klein. 2018. Constituency Parsing with a Self-Attentive Encoder. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2676–2686, Melbourne, Australia.
    Google ScholarLocate open access versionFindings
  • Canasai Kruengkrai, Kiyotaka Uchimoto, Jun’ichi Kazama, Yiou Wang, Kentaro Torisawa, and Hitoshi Isahara. 2009. An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 513–521, Suntec, Singapore.
    Google ScholarLocate open access versionFindings
  • Abhishek Kumar, Daisuke Kawahara, and Sadao Kurohashi. 2018. Knowledge-Enriched Two-Layered Attention Network for Sentiment Analysis. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 253–258, New Orleans, Louisiana.
    Google ScholarLocate open access versionFindings
  • Shuhei Kurita, Daisuke Kawahara, and Sadao Kurohashi. 2017. Neural Joint Model for Transitionbased Chinese Syntactic Analysis. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1204–1214, Vancouver, Canada.
    Google ScholarLocate open access versionFindings
  • Christopher Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 55–60, Baltimore, Maryland.
    Google ScholarLocate open access versionFindings
  • Katerina Margatina, Christos Baziotis, and Alexandros Potamianos. 2019. Attention-based Conditioning Methods for External Knowledge Integration. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3944– 3951, Florence, Italy.
    Google ScholarLocate open access versionFindings
  • David McClosky, Eugene Charniak, and Mark Johnson. 2010. Automatic Domain Adaptation for Parsing. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 28–36, Los Angeles, California.
    Google ScholarLocate open access versionFindings
  • Alexander Miller, Adam Fisch, Jesse Dodge, AmirHossein Karimi, Antoine Bordes, and Jason Weston. 2016. Key-Value Memory Networks for Directly Reading Documents. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1400–1409.
    Google ScholarLocate open access versionFindings
  • Hwee Tou Ng and Jin Kiat Low. 2004. Chinese Part-of-Speech Tagging: One-at-a-Time or All-atOnce? Word-Based or Character-Based? In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 277– 284, Barcelona, Spain.
    Google ScholarLocate open access versionFindings
  • Joakim Nivre, Marie-Catherine De Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajic, Christopher D Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, et al. 2016. Universal Dependencies v1: A Multilingual Treebank Collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pages 1659–1666.
    Google ScholarLocate open access versionFindings
  • Xian Qian and Yang Liu. 2012. Joint Chinese Word Segmentation, POS Tagging and Parsing. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 501– 511, Jeju Island, Korea.
    Google ScholarLocate open access versionFindings
  • Colin Raffel and Daniel PW Ellis. 2015. FeedForward Networks with Attention Can Solve Some Long-Term Memory Problems. arXiv preprint arXiv:1512.08756.
    Findings
  • Dominic Seyler, Tatiana Dembelova, Luciano Del Corro, Johannes Hoffart, and Gerhard Weikum. 2018. A Study of the Importance of External Knowledge in the Named Entity Recognition Task. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 241–246.
    Google ScholarLocate open access versionFindings
  • Yan Shao, Christian Hardmeier, Jorg Tiedemann, and Joakim Nivre. 2017. Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 173– 183, Taipei, Taiwan.
    Google ScholarLocate open access versionFindings
  • Mo Shen, Hongxiao Liu, Daisuke Kawahara, and Sadao Kurohashi. 2014. Chinese Morphological Analysis with Character-level POS Tagging. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 253–258, Baltimore, Maryland.
    Google ScholarLocate open access versionFindings
  • Yan Song, Chia-Jung Lee, and Fei Xia. 2017. Learning Word Representations with Regularization from Prior Knowledge. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pages 143–152, Vancouver, Canada.
    Google ScholarLocate open access versionFindings
  • Yan Song, Shuming Shi, Jing Li, and Haisong Zhang. 2018. Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2, pages 175–180, New Orleans, Louisiana.
    Google ScholarLocate open access versionFindings
  • Yan Song and Fei Xia. 2013. A Common Case of Jekyll and Hyde: The Synergistic Effect of Using Divided Source Training Data for Feature Augmentation. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, pages 623–631, Nagoya, Japan.
    Google ScholarLocate open access versionFindings
  • Weiwei Sun. 2011. A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-ofSpeech Tagging. In Proceedings of the 49th Annual Meeting of the Association for Computational
    Google ScholarLocate open access versionFindings
  • Yuanhe Tian, Yan Song, Fei Xia, Tong Zhang, and Yonggang Wang. 2020. Improving Chinese Word Segmentation with Wordhood Memory Networks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Seattle, Washington, USA.
    Google ScholarLocate open access versionFindings
  • Yiou Wang, Jun’ichi Kazama, Yoshimasa Tsuruoka, Wenliang Chen, Yujie Zhang, and Kentaro Torisawa. 2011. Improving Chinese Word Segmentation and POS Tagging with Semi-supervised Methods Using Large Auto-Analyzed Data. In Proceedings of 5th International Joint Conference on Natural Language Processing, pages 309–317, Chiang Mai, Thailand.
    Google ScholarLocate open access versionFindings
  • Naiwen Xue, Fei Xia, Fu-Dong Chiou, and Marta Palmer. 2005. The Penn Chinese TreeBank: Phrase structure annotation of a large corpus. Natural language engineering, 11(2):207–238.
    Google ScholarLocate open access versionFindings
  • Ikuya Yamada, Akari Asai, Jin Sakuma, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji, and Yuji Matsumoto. 2020. Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia. arXiv preprint 1812.06280v3.
    Findings
  • Xiaodong Zeng, Derek F. Wong, Lidia S. Chao, and Isabel Trancoso. 2013. Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 770–779, Sofia, Bulgaria.
    Google ScholarLocate open access versionFindings
  • Hongming Zhang, Jiaxin Bai, Yan Song, Kun Xu, Changlong Yu, Yangqiu Song, Wilfred Ng, and Dong Yu. 2019. Multiplex Word Embeddings for Selectional Preference Acquisition. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5247–5256, Hong Kong, China.
    Google ScholarLocate open access versionFindings
  • Meishan Zhang, Nan Yu, and Guohong Fu. 2018. A Simple and Effective Neural Model for Joint Word Segmentation and POS Tagging. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 26(9):1528–1538.
    Google ScholarLocate open access versionFindings
  • Xiaoqing Zheng, Hanyang Chen, and Tianyu Xu. 2013. Deep Learning for Chinese Word Segmentation and POS Tagging. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 647–657, Seattle, Washington, USA.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科