AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
Component-Enhanced Chinese Character Embeddings
Conference on Empirical Methods in Natural Language Processing, (2015)
- Due to its advantage over traditional one-hot representation, distributed word representation has demonstrated its benefit for semantic representation in various NLP tasks.
- The semantic component 亻 of the Chinese character 他 provides the meaning connected with human.
- The components of most Chinese characters inherently bring with certain levels of semantics regardless of the contexts.
- The components are more generic unit inside Chinese characters that provides semantics.
- Due to its advantage over traditional one-hot representation, distributed word representation has demonstrated its benefit for semantic representation in various NLP tasks
- The semantic component 亻 of the Chinese character 他 provides the meaning connected with human
- Different from word embeddings, character embeddings relate Chinese characters that occur in similar contexts with their component information
- We develop two component-enhanced character embedding models, namely charCBOW
- We examine the quality of the proposed two Chinese character embedding models as well as their corresponding extensions on both intrinsic word similarity evaluation and extrinsic text classification evaluation
- We propose two componentenhanced Chinese character embedding models and their extensions to explore both the internal compositions and the external contexts of Chinese characters
- These evidences inspire them to explore novel Chinese character embedding models.
- Different from word embeddings, character embeddings relate Chinese characters that occur in similar contexts with their component information.
- The meaning of the Chinese word 摇篮 can be interpreted in terms of its composite characters 摇 and 篮.
- The authors' proposed Chinese character embeddings incorporate the finergrained semantics from the components of characters and in turn enrich the representations inherently in addition to utilizing the external contexts.
- Distinguished from English, these composite components are unique and inherent features inside Chinese characters.
- 2http://en.wikipedia.org/wiki/Radical_ (Chinese_characters) them to assumingly understand or infer the meanings of characters without any context.
- The component-level features inherently bring with additional information that benefits semantic representations of characters.
- The authors know that the characters 你, 他, 伙, 侣, and 们 all have the meanings related to human because of their shared component 亻, a variant of the Chinese character 人.
- The authors extract all the components to build a component list for each Chinese character.
- The authors develop two component-enhanced character embedding models, namely charCBOW
- The authors examine the quality of the proposed two Chinese character embedding models as well as their corresponding extensions on both intrinsic word similarity evaluation and extrinsic text classification evaluation.
- In the word similarity evaluation, the authors compute the Spearman’s rank correlation (Myers and Well, 1995) between the similarity scores based on the learned embedding models and the E-TC similarity scores computed by following Tian and Zhao (2010).
- For the text classification evaluation, the authors average the composite single character embeddings for each bi-gram.
- Table 1 presents the word similarity evaluation results of the eight embedding models mentioned above, where A–L denote the twelve categories in E-TC.
- It provides the evidence that the component information in Chinese characters is of significance.
- The authors propose two componentenhanced Chinese character embedding models and their extensions to explore both the internal compositions and the external contexts of Chinese characters.
- The authors plan to devise embedding models based together on the composition of component-character and of character-word.
- The two types of compositions will serve in a coordinate fashion for the distributional representations
- Table1: Word Similarity Results of Embedding Models
- Table2: Text Classification Results of Embedding Models
- The work described in this paper was supported by the grants from the Research Grants Council of Hong Kong (PolyU 5202/12E and PolyU 152094/14E), the grants from the National Natural Science Foundation of China (61272291 and 61273278) and a PolyU internal grant (4-BCB5)
- Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2014. Tailoring continuous word representations for dependency parsing. In Proc. of ACL.
- Yoshua Bengio, Réjean Ducharme, Pascal Vincent, et al. 2003. A neural probabilistic language model. The Journal of Machine Learning Research, 3: 1137-1155.
- Danqi Chen and Christopher D. Manning. 2014. A fast and accurate dependency parser using neural networks. In Proc. of EMNLP, pages 740–750.
- Fei Cheng, Kevin Duh, Yuji Matsumoto. 201Parsing Chinese Synthetic Words with a Character-based Dependency Model. LREC.
- Ronan Collobert, Jason Weston, Leon Bottou, et al. 2011. Natural language processing (almost) from scratch. JMLR, 12.
- Zhendong Dong and Qiang Dong. 200HowNet and the Computation of Meaning. World Scientific Publishing Co. Pte. Ltd., Singapore.
- Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, et al. 2008. LIBLINEAR: A library for large linear classification. The Journal of Machine Learning Research, 9: 1871-1874.
- Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, et al. 2001. Placing search in context: the concept revisited. In Proc. of WWW.
- Connie Suk-Han Ho, Ting-Ting Ng, and Wing-Kin Ng. 2003. A “radical” approach to reading development in Chinese: The role of semantic radicals and phonetic radicals. In Journal of Literacy Research, 35(3), 849-878.
- Eric H. Huang, Richard Socher, Christopher D. Manning, and Andrew Y. Ng. 2012. Improving Word Representations via Global Context and Multiple Word Prototypes. In Proc. of ACL.
- Lingpeng Kong, Nathan Schneider, Swabha Swayamdipta, et al. 2014. A dependency parser for tweets. In Proc. of EMNLP, pages 1001–1012, Doha, Qatar, October.
- Remi Lebret, Joel Legrand, and Ronan Collobert. 2013. Is deep learning really necessary for word embeddings? In Proc. of NIPS.
- Omer Levy and Yoav Goldberg. 2014. Dependencybased word embeddings. In Proc. of ACL.
- Omer Levy, Yoav Goldberg, And Ido Dagan. 2015. Improving Distributional Similarity with Lessons Learned from Word Embeddings. In Proc. of TACL.
- Shujie Liu, Nan Yang, Mu Li, and Ming Zhou. 2014. A recursive recurrent neural network for statistical machine translation. In Proc. of ACL, pages 1491– 1500.
- Minh-Thang Luong, Richard Socher, and Christopher D. Manning. 2013. Better Word Representations with Recursive Neural Networks for Morphology. In Proc. of CoNLL.
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781.
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013b. Distributed representations of words and phrases and their composition-ality. In Advances in Neural Information Processing Systems. pages 3111-3119.
- Jerome L. Myers and Arnold D. Well. 1995. Research Design & Statistical Analysis. Routledge.
- Siyu Qiu, Qing Cui, Jiang Bian, and et al. 2014. Colearning of Word Representations and Morpheme Representations. In Proc. of COLING.
- Herbert Rubenstein and John B. Goodenough. 1965. Contextual correlates of synonymy. Commun. ACM, 8(10):627–633, October.
- Yaming Sun, Lei Lin, Duyu Tang, et al. 2014. RadicalEnhanced Chinese Character Embedding. CoRR abs/ 1404.4714.
- Ilya Sutskever, Oriol Vinyals, and Quoc VV Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems, pages 3104–3112.
- Duyu Tang, Furu Wei, Nan Yang, et al. 2014. Learning sentiment-specific word embedding for twitter sentiment classification. In Proc. of ACL.
- Jiu-le Tian and Wei Zhao. 2010. Words Similarity Algorithm Based on Tongyici Cilin in Semantic Web Adaptive Learning System.
- Wang Ling, Chris Dyer, Alan Black, and Isabel Trancoso. 2015. Two/too simple adaptations of word2vec for syntax problems. In Proc. of NAACL, Denver, CO.
- Jason Weston, Antoine Bordes, Oksana Yakhnenko, and Nicolas Usunier. 2013. Connecting language and knowledge bases with embedding models for relation extraction. In Proc. of Computation and Language.
- Yi Yang and Jacob Eisenstein. 2015. Unsupervised multi-domain adaptation with feature embeddings. In Proc. of NAACL-HIT.
- Mo Yu and Mark Dredze. 2014. Improving lexical embeddings with semantic knowledge. In Proc. of ACL.
- Meishan Zhang, Yue Zhang, Wan Xiang Che, and et al. 2013. Chinese parsing exploiting characters. In Proc. of ACL.