AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We propose WMSEG, a neural framework for Chinese word segmentation using wordhood memory networks, which maps n-grams and their wordhood information to keys and values in it and appropriately models the values according to the importance of keys in a specific context

Improving Chinese Word Segmentation with Wordhood Memory Networks

ACL, pp.8274-8285, (2020)

Cited by: 25|Views358
EI
Full Text
Bibtex
Weibo

Abstract

Contextual features always play an important role in Chinese word segmentation (CWS). Wordhood information, being one of the contextual features, is proved to be useful in many conventional character-based segmenters. However, this feature receives less attention in recent neural models and it is also challenging to design a framework tha...More
0
Introduction
  • Unlike most written languages in the world, the Chinese writing system does not use explicit delimiters to separate words in written text.
  • Some studies sidestepped the idea by incorporating contextual n-grams (Pei et al, 2014; Zhou et al, 2017) or word attention (Higashiyama et al, 2019) into the sequence labeling process, they are limited in either concatenating word and character embeddings or requiring a well-defined word lexicon
  • It has not been fully explored what would be the best way of representing contextual information such as wordhood features in neural CWS models.
  • Consider there are various choices of wordhood measures, it is a challenge to design a framework that can incorporate different wordhood features so that the entire CWS approach can be general while being effective in accommodating the input from any measures
Highlights
  • Unlike most written languages in the world, the Chinese writing system does not use explicit delimiters to separate words in written text
  • Since neural networks (e.g., LSTM) is considered to be able to provide a good modeling of contextual dependencies, less attention is paid to the idea of explicitly leveraging wordhood information of n-grams in the context as what had previously been done in non-neural models
  • A case study is performed to visualize how the wordhood information used in WMSEG helps Chinese word segmentation
  • The wordhood memory shows its robustness with different lexicon size when we consider WMSEG’s performance with the lexicon statistics reported in Table 4 together. The results in this experiment confirm that wordhood information is a simple yet effective source of knowledge to help Chinese word segmentation without requiring external support such as a well-defined dictionary or manually crafted heuristics, and fully illustrate that the design of our model can effectively integrate this type of knowledge
  • We draw the histograms of the F-scores obtained from WMSEG with each measure (red, green, and blue bars for accessor variety, pointwise mutual information, and description length gain, re
  • We propose WMSEG, a neural framework for Chinese word segmentation using wordhood memory networks, which maps n-grams and their wordhood information to keys and values in it and appropriately models the values according to the importance of keys in a specific context
Results
  • Results and Analyses

    the authors firstly report the results of WMSEG with different configurations on five benchmark datasets and its comparison with existing models.
  • The authors illustrate the validity of the proposed memory module by comparing WMSEG in different configurations, i.e., with and without the memory in integrating with three encoders, i.e., Bi-LSTM, BERT, and ZEN, and two decoders, i.e., softmax and CRF.
  • Among the models with ZEN, the ones with the memory module further improve their baselines, the context information carried by n-grams is already learned in pre-training ZEN
  • This indicates that wordhood information provides additional cues that can benefit CWS, and the proposed memory module is able to provide further task-specific guidance to an n-gram integrated encoder.
  • The results in this experiment confirm that wordhood information is a simple yet effective source of knowledge to help CWS without requiring external support such as a well-defined dictionary or manually crafted heuristics, and fully illustrate that the design of the model can effectively integrate this type of knowledge
Conclusion
  • The authors propose WMSEG, a neural framework for CWS using wordhood memory networks, which maps n-grams and their wordhood information to keys and values in it and appropriately models the values according to the importance of keys in a specific context.
  • To the best of the knowledge, this is the first work using key-value memory networks and utilizing wordhood information for neural models in CWS.
  • Further experiments and analyses demonstrate the robustness of WMSEG in the cross-domain scenario as well as when using different lexicons and wordhood measures
Tables
  • Table1: The rules for assigning different values to xi according to its position in a key ki,j
  • Table2: Statistics of the five benchmark datasets, in terms of the number of character and word tokens and types in each training and test set. Out-of-vocabulary (OOV) rate is the percentage of unseen word tokens in the test set
  • Table3: Statistics of CTB7 with respect to five different genres. The OOV rate for each genre is computed based on the vocabulary from all the other four genres
  • Table4: The size of lexicon N generated from different wordhood measures under our settings
  • Table5: The hyper-parameters for our models w.r.t. different encoders, i.e., Bi-LSTM, BERT and ZEN
  • Table6: Experimental results of WMSEG on SIGHAN2005 and CTB6 datasets with different configurations. “ENDN” stands for the text encoders (“BL” for Bi-LSTM and “BT” for BERT) and decoders (√“SM” for softmax and “CRF” for CRF). The “WM” column indicates whether the wordhood memories are used ( ) or not (×)
  • Table7: Performance (F-score) comparison between WMSEG (BT-CRF and ZEN-CRF with woodhood memory networks) and previous state-of-the-art models on the test set of five benchmark datasets
  • Table8: Experimental results on five genres of CTB7. Abbreviations follow the same notation in Table 6
  • Table9: Comparisons of performance gain on the WEB genre of CTB7 with respect to the baseline BERT-CRF model when the n-gram le√xicon N for WMSEG is built upon different sources. and × refer to if a corresponding data source is used or not, respectively
Download tables as Excel
Related work
Reference
  • Xinchi Chen, Xipeng Qiu, Chenxi Zhu, Pengfei Liu, and Xuanjing Huang. 2015. Long Short-Term Memory Neural Networks for Chinese Word Segmentation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1197–1206.
    Google ScholarLocate open access versionFindings
  • Xinchi Chen, Zhan Shi, Xipeng Qiu, and Xuanjing Huang. 2017. Adversarial Multi-Criteria Learning for Chinese Word Segmentation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1193–1203.
    Google ScholarLocate open access versionFindings
  • Alexis Conneau, Douwe Kiela, Holger Schwenk, Loıc Barrault, and Antoine Bordes. 2017. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 670–680, Copenhagen, Denmark.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.
    Google ScholarLocate open access versionFindings
  • Shizhe Diao, Jiaxin Bai, Yan Song, Tong Zhang, and Yonggang Wang. 2019. ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations. ArXiv, abs/1911.00720.
    Findings
  • Thomas Emerson. 2005. The Second International Chinese Word Segmentation Bakeoff. In Proceedings of the fourth SIGHAN workshop on Chinese language Processing, pages 123–133.
    Google ScholarLocate open access versionFindings
  • Haodi Feng, Kang Chen, Xiaotie Deng, and Weimin Zheng. 2004. Accessor Variety Criteria for Chinese Word Extraction. Computational Linguistics, 30(1):75–93.
    Google ScholarLocate open access versionFindings
  • Jingjing Gong, Xinchi Chen, Tao Gui, and Xipeng Qiu. 2019. Switch-LSTMs for Multi-Criteria Chinese Word Segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6457–6464.
    Google ScholarLocate open access versionFindings
  • Shohei Higashiyama, Masao Utiyama, Eiichiro Sumita, Masao Ideuchi, Yoshiaki Oida, Yohei Sakamoto, and Isaac Okada. 201Incorporating Word Attention into Character-Based Word Segmentation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2699– 2709.
    Google ScholarLocate open access versionFindings
  • Chunyu Kit and Yorick Wilks. 1999. Unsupervised Learning of Word Boundary with Description Length Gain. In EACL 1999: CoNLL-99 Computational Natural Language Learning, pages 1–6.
    Google ScholarLocate open access versionFindings
  • Gina-Anne Levow. 2006. The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition. In Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pages 108–117.
    Google ScholarLocate open access versionFindings
  • Zhongguo Li. 2011. Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 1405–1414, Portland, Oregon, USA.
    Google ScholarLocate open access versionFindings
  • Zhongguo Li and Maosong Sun. 2009. Punctuation as Implicit Annotations for Chinese Word Segmentation. Computational Linguistics, 35(4):505–512.
    Google ScholarLocate open access versionFindings
  • Yijia Liu, Wanxiang Che, Jiang Guo, Bing Qin, and Ting Liu. 2016. Exploring Segment Representations for Neural Segmentation Models. arXiv preprint arXiv:1604.05499.
    Findings
  • Ziqing Liu, Enwei Peng, Shixing Yan, Guozheng Li, and Tianyong Hao. 2018. T-Know: a Knowledge Graph-based Question Answering and Infor-mation Retrieval System for Traditional Chinese Medicine. In Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, pages 15–19, Santa Fe, New Mexico.
    Google ScholarLocate open access versionFindings
  • Ji Ma, Kuzman Ganchev, and David Weiss. 2018. State-of-the-art Chinese Word Segmentation with Bi-LSTMs. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4902–4908.
    Google ScholarLocate open access versionFindings
  • Jianqiang Ma and Erhard Hinrichs. 2015. Accurate Linear-Time Chinese Word Segmentation via Embedding Matching. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1733–1743.
    Google ScholarLocate open access versionFindings
  • Mairgup Mansur, Wenzhe Pei, and Baobao Chang. 2013. Feature-based Neural Language Model and Chinese Word Segmentation. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, pages 1271–1277, Nagoya, Japan.
    Google ScholarLocate open access versionFindings
  • Alexander Miller, Adam Fisch, Jesse Dodge, AmirHossein Karimi, Antoine Bordes, and Jason Weston. 2016. Key-Value Memory Networks for Directly Reading Documents. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1400–1409.
    Google ScholarLocate open access versionFindings
  • Wenzhe Pei, Tao Ge, and Baobao Chang. 2014. Maxmargin Tensor Neural Network for Chinese Word Segmentation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 293–303.
    Google ScholarLocate open access versionFindings
  • Fuchun Peng, Fangfang Feng, and Andrew McCallum. 2004. Chinese Segmentation and New Word Detection Using Conditional Random Fields. In Proceedings of the 20th international conference on Computational Linguistics, page 562.
    Google ScholarLocate open access versionFindings
  • Xipeng Qiu, Hengzhi Pei, Hang Yan, and Xuanjing Huang. 2019. Multi-Criteria Chinese Word Segmentation with Transformer. arXiv preprint arXiv:1906.12035.
    Findings
  • Yangyang Shi, Kaisheng Yao, Le Tian, and Daxin Jiang. 2016. Deep LSTM based Feature Mapping for Query Classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1501–1511, San Diego, California.
    Google ScholarLocate open access versionFindings
  • Prajwol Shrestha. 2014. Incremental N-gram Approach for Language Identification in CodeSwitched Text. In Proceedings of the First Workshop on Computational Approaches to Code Switching, pages 133–138, Doha, Qatar.
    Google ScholarLocate open access versionFindings
  • Damien Sileo, Tim Van De Cruys, Camille Pradel, and Philippe Muller. 2019. Mining Discourse Markers for Unsupervised Sentence Representation Learning. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3477– 3486, Minneapolis, Minnesota.
    Google ScholarLocate open access versionFindings
  • Yan Song, Dongfeng Cai, Guiping Zhang, and Hai Zhao. 2009a. Approach to Chinese Word Segmentation Based on Character-word Joint Decoding. Journal of Software, 20(9):2236–2376.
    Google ScholarLocate open access versionFindings
  • Yan Song, Jiaqing Guo, and Dongfeng Cai. 2006. Chinese Word Segmentation Based on an Approach of Maximum Entropy Modeling. In Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pages 201–204, Sydney, Australia.
    Google ScholarLocate open access versionFindings
  • Yan Song, Chunyu Kit, and Xiao Chen. 2009b. Transliteration of Name Entity via Improved Statistical Translation on Character Sequences. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009), pages 57–60, Suntec, Singapore.
    Google ScholarLocate open access versionFindings
  • Yan Song, Prescott Klassen, Fei Xia, and Chunyu Kit. 2012. Entropy-based Training Data Selection for Domain Adaptation. In Proceedings of COLING 2012: Posters, pages 1191–1200, Mumbai, India.
    Google ScholarLocate open access versionFindings
  • Yan Song, Chia-Jung Lee, and Fei Xia. 2017. Learning Word Representations with Regularization from Prior Knowledge. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pages 143–152, Vancouver, Canada.
    Google ScholarLocate open access versionFindings
  • Yan Song, Shuming Shi, Jing Li, and Haisong Zhang. 2018. Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2, pages 175–180, New Orleans, Louisiana.
    Google ScholarLocate open access versionFindings
  • Yan Song and Fei Xia. 2012. Using a Goodness Measurement for Domain Adaptation: A Case Study on Chinese Word Segmentation. In LREC, pages 3853– 3860.
    Google ScholarLocate open access versionFindings
  • Yan Song and Fei Xia. 2013. A Common Case of Jekyll and Hyde: The Synergistic Effect of Using Divided Source Training Data for Feature Augmentation. In Proceedings of the Sixth International Joint
    Google ScholarLocate open access versionFindings
  • Maosong Sun, Dayang Shen, and Benjamin K. Tsou. 1998. Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data. In 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2, pages 1265–1271, Montreal, Quebec, Canada.
    Google ScholarLocate open access versionFindings
  • Weiwei Sun and Jia Xu. 2011. Enhancing Chinese Word Segmentation Using Unlabeled Data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 970–979.
    Google ScholarLocate open access versionFindings
  • Yuanhe Tian, Yan Song, Xiang Ao, Fei Xia, Xiaojun Quan, Tong Zhang, and Yonggang Wang. 2020. Joint Chinese Word Segmentation and Partof-speech Tagging via Two-way Attentions of Autoanalyzed Knowledge. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Seattle, Washington, USA.
    Google ScholarLocate open access versionFindings
  • Huihsin Tseng, Pichuan Chang, Galen Andrew, Daniel Jurafsky, and Christopher Manning. 2005. A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005. In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, pages 168–171.
    Google ScholarLocate open access versionFindings
  • Chunqi Wang and Bo Xu. 2017. Convolutional Neural Network with Word Embeddings for Chinese Word Segmentation. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 163–172, Taipei, Taiwan.
    Google ScholarLocate open access versionFindings
  • Deyi Xiong, Min Zhang, and Haizhou Li. 2011. Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 1288–1297, Portland, Oregon, USA.
    Google ScholarLocate open access versionFindings
  • Jingjing Xu and Xu Sun. 2016. Dependency-based Gated Recursive Neural Network for Chinese Word Segmentation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 567–572, Berlin, Germany.
    Google ScholarLocate open access versionFindings
  • Naiwen Xue, Fei Xia, Fu-Dong Chiou, and Marta Palmer. 2005. The Penn Chinese Treebank: Phrase structure annotation of a large corpus. Natural language engineering, 11(2):207–238.
    Google ScholarLocate open access versionFindings
  • Nianwen Xue and Libin Shen. 2003. Chinese Word Segmentation as LMR Tagging. In Proceedings of the second SIGHAN workshop on Chinese language processing-Volume 17, pages 176–179.
    Google ScholarLocate open access versionFindings
  • Yaqin Yang and Nianwen Xue. 2012. Chinese Comma Disambiguation for Discourse Analysis. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long PapersVolume 1, pages 786–794.
    Google ScholarLocate open access versionFindings
  • Zhen Yang, Wei Chen, Feng Wang, and Bo Xu. 2018. Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1346–1355, New Orleans, Louisiana.
    Google ScholarLocate open access versionFindings
  • Jichuan Zeng, Jing Li, Yan Song, Cuiyun Gao, Michael R. Lyu, and Irwin King. 2018. Topic Memory Networks for Short Text Classification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3120– 3131, Brussels, Belgium.
    Google ScholarLocate open access versionFindings
  • Longkai Zhang, Houfeng Wang, Xu Sun, and Mairgup Mansur. 2013. Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 311–321.
    Google ScholarLocate open access versionFindings
  • Meishan Zhang, Yue Zhang, and Guohong Fu. 2016. Transition-Based Neural Word Segmentation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 421–431.
    Google ScholarLocate open access versionFindings
  • Hai Zhao, Chang-Ning Huang, and Mu Li. 2006. An Improved Chinese Word Segmentation System with Conditional Random Field. In Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pages 162–165, Sydney, Australia.
    Google ScholarLocate open access versionFindings
  • Hai Zhao and Chunyu Kit. 2008. An Empirical Comparison of Goodness Measures for Unsupervised Chinese Word Segmentation with a Unified Framework. In Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I, pages 9–16.
    Google ScholarLocate open access versionFindings
  • Hao Zhou, Zhenting Yu, Yue Zhang, Shujian Huang, Xinyu Dai, and Jiajun Chen. 2017. Word-Context Character Embeddings for Chinese Word Segmentation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 760–766, Copenhagen, Denmark.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科