AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We propose two componentenhanced Chinese character embedding models and their extensions to explore both the internal compositions and the external contexts of Chinese characters

Component-Enhanced Chinese Character Embeddings

Conference on Empirical Methods in Natural Language Processing, (2015)

Cited by: 90|Views26
EI
Full Text
Bibtex
Weibo

Abstract

Distributed word representations are very useful for capturing semantic information and have been successfully applied in a variety of NLP tasks, especially on English. In this work, we innovatively develop two component-enhanced Chinese character embedding models and their bigram extensions. Distinguished from English word embeddings, ...More

Code:

Data:

0
Introduction
  • Due to its advantage over traditional one-hot representation, distributed word representation has demonstrated its benefit for semantic representation in various NLP tasks.
  • The semantic component 亻 of the Chinese character 他 provides the meaning connected with human.
  • The components of most Chinese characters inherently bring with certain levels of semantics regardless of the contexts.
  • The components are more generic unit inside Chinese characters that provides semantics.
Highlights
  • Due to its advantage over traditional one-hot representation, distributed word representation has demonstrated its benefit for semantic representation in various NLP tasks
  • The semantic component 亻 of the Chinese character 他 provides the meaning connected with human
  • Different from word embeddings, character embeddings relate Chinese characters that occur in similar contexts with their component information
  • We develop two component-enhanced character embedding models, namely charCBOW
  • We examine the quality of the proposed two Chinese character embedding models as well as their corresponding extensions on both intrinsic word similarity evaluation and extrinsic text classification evaluation
  • We propose two componentenhanced Chinese character embedding models and their extensions to explore both the internal compositions and the external contexts of Chinese characters
Results
  • These evidences inspire them to explore novel Chinese character embedding models.
  • Different from word embeddings, character embeddings relate Chinese characters that occur in similar contexts with their component information.
  • The meaning of the Chinese word 摇篮 can be interpreted in terms of its composite characters 摇 and 篮.
  • The authors' proposed Chinese character embeddings incorporate the finergrained semantics from the components of characters and in turn enrich the representations inherently in addition to utilizing the external contexts.
  • Distinguished from English, these composite components are unique and inherent features inside Chinese characters.
  • 2http://en.wikipedia.org/wiki/Radical_ (Chinese_characters) them to assumingly understand or infer the meanings of characters without any context.
  • The component-level features inherently bring with additional information that benefits semantic representations of characters.
  • The authors know that the characters 你, 他, 伙, 侣, and 们 all have the meanings related to human because of their shared component 亻, a variant of the Chinese character 人.
  • The authors extract all the components to build a component list for each Chinese character.
  • The authors develop two component-enhanced character embedding models, namely charCBOW
  • The authors examine the quality of the proposed two Chinese character embedding models as well as their corresponding extensions on both intrinsic word similarity evaluation and extrinsic text classification evaluation.
  • In the word similarity evaluation, the authors compute the Spearman’s rank correlation (Myers and Well, 1995) between the similarity scores based on the learned embedding models and the E-TC similarity scores computed by following Tian and Zhao (2010).
  • For the text classification evaluation, the authors average the composite single character embeddings for each bi-gram.
  • Table 1 presents the word similarity evaluation results of the eight embedding models mentioned above, where A–L denote the twelve categories in E-TC.
Conclusion
  • It provides the evidence that the component information in Chinese characters is of significance.
  • The authors propose two componentenhanced Chinese character embedding models and their extensions to explore both the internal compositions and the external contexts of Chinese characters.
  • The authors plan to devise embedding models based together on the composition of component-character and of character-word.
  • The two types of compositions will serve in a coordinate fashion for the distributional representations
Tables
  • Table1: Word Similarity Results of Embedding Models
  • Table2: Text Classification Results of Embedding Models
Download tables as Excel
Funding
  • The work described in this paper was supported by the grants from the Research Grants Council of Hong Kong (PolyU 5202/12E and PolyU 152094/14E), the grants from the National Natural Science Foundation of China (61272291 and 61273278) and a PolyU internal grant (4-BCB5)
Reference
  • Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2014. Tailoring continuous word representations for dependency parsing. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio, Réjean Ducharme, Pascal Vincent, et al. 2003. A neural probabilistic language model. The Journal of Machine Learning Research, 3: 1137-1155.
    Google ScholarLocate open access versionFindings
  • Danqi Chen and Christopher D. Manning. 2014. A fast and accurate dependency parser using neural networks. In Proc. of EMNLP, pages 740–750.
    Google ScholarLocate open access versionFindings
  • Fei Cheng, Kevin Duh, Yuji Matsumoto. 201Parsing Chinese Synthetic Words with a Character-based Dependency Model. LREC.
    Google ScholarLocate open access versionFindings
  • Ronan Collobert, Jason Weston, Leon Bottou, et al. 2011. Natural language processing (almost) from scratch. JMLR, 12.
    Google ScholarLocate open access versionFindings
  • Zhendong Dong and Qiang Dong. 200HowNet and the Computation of Meaning. World Scientific Publishing Co. Pte. Ltd., Singapore.
    Google ScholarFindings
  • Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, et al. 2008. LIBLINEAR: A library for large linear classification. The Journal of Machine Learning Research, 9: 1871-1874.
    Google ScholarLocate open access versionFindings
  • Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, et al. 2001. Placing search in context: the concept revisited. In Proc. of WWW.
    Google ScholarLocate open access versionFindings
  • Connie Suk-Han Ho, Ting-Ting Ng, and Wing-Kin Ng. 2003. A “radical” approach to reading development in Chinese: The role of semantic radicals and phonetic radicals. In Journal of Literacy Research, 35(3), 849-878.
    Google ScholarLocate open access versionFindings
  • Eric H. Huang, Richard Socher, Christopher D. Manning, and Andrew Y. Ng. 2012. Improving Word Representations via Global Context and Multiple Word Prototypes. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Lingpeng Kong, Nathan Schneider, Swabha Swayamdipta, et al. 2014. A dependency parser for tweets. In Proc. of EMNLP, pages 1001–1012, Doha, Qatar, October.
    Google ScholarLocate open access versionFindings
  • Remi Lebret, Joel Legrand, and Ronan Collobert. 2013. Is deep learning really necessary for word embeddings? In Proc. of NIPS.
    Google ScholarLocate open access versionFindings
  • Omer Levy and Yoav Goldberg. 2014. Dependencybased word embeddings. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Omer Levy, Yoav Goldberg, And Ido Dagan. 2015. Improving Distributional Similarity with Lessons Learned from Word Embeddings. In Proc. of TACL.
    Google ScholarLocate open access versionFindings
  • Shujie Liu, Nan Yang, Mu Li, and Ming Zhou. 2014. A recursive recurrent neural network for statistical machine translation. In Proc. of ACL, pages 1491– 1500.
    Google ScholarLocate open access versionFindings
  • Minh-Thang Luong, Richard Socher, and Christopher D. Manning. 2013. Better Word Representations with Recursive Neural Networks for Morphology. In Proc. of CoNLL.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781.
    Findings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013b. Distributed representations of words and phrases and their composition-ality. In Advances in Neural Information Processing Systems. pages 3111-3119.
    Google ScholarLocate open access versionFindings
  • Jerome L. Myers and Arnold D. Well. 1995. Research Design & Statistical Analysis. Routledge.
    Google ScholarLocate open access versionFindings
  • Siyu Qiu, Qing Cui, Jiang Bian, and et al. 2014. Colearning of Word Representations and Morpheme Representations. In Proc. of COLING.
    Google ScholarLocate open access versionFindings
  • Herbert Rubenstein and John B. Goodenough. 1965. Contextual correlates of synonymy. Commun. ACM, 8(10):627–633, October.
    Google ScholarLocate open access versionFindings
  • Yaming Sun, Lei Lin, Duyu Tang, et al. 2014. RadicalEnhanced Chinese Character Embedding. CoRR abs/ 1404.4714.
    Findings
  • Ilya Sutskever, Oriol Vinyals, and Quoc VV Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems, pages 3104–3112.
    Google ScholarLocate open access versionFindings
  • Duyu Tang, Furu Wei, Nan Yang, et al. 2014. Learning sentiment-specific word embedding for twitter sentiment classification. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Jiu-le Tian and Wei Zhao. 2010. Words Similarity Algorithm Based on Tongyici Cilin in Semantic Web Adaptive Learning System.
    Google ScholarFindings
  • Wang Ling, Chris Dyer, Alan Black, and Isabel Trancoso. 2015. Two/too simple adaptations of word2vec for syntax problems. In Proc. of NAACL, Denver, CO.
    Google ScholarLocate open access versionFindings
  • Jason Weston, Antoine Bordes, Oksana Yakhnenko, and Nicolas Usunier. 2013. Connecting language and knowledge bases with embedding models for relation extraction. In Proc. of Computation and Language.
    Google ScholarLocate open access versionFindings
  • Yi Yang and Jacob Eisenstein. 2015. Unsupervised multi-domain adaptation with feature embeddings. In Proc. of NAACL-HIT.
    Google ScholarLocate open access versionFindings
  • Mo Yu and Mark Dredze. 2014. Improving lexical embeddings with semantic knowledge. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Meishan Zhang, Yue Zhang, Wan Xiang Che, and et al. 2013. Chinese parsing exploiting characters. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科