A Hybrid Classification Method Via Character Embedding In Chinese Short Text With Few Words

IEEE ACCESS(2020)

引用 14|浏览39
暂无评分
摘要
Last decades have witnessed the significance development of research in short text classification. However, most existing methods only focus on the text which contained dozens of words like Twitter or MicroBlog, but not take the short text with few words like news headline or invoice name into consideration. Meanwhile, contemporary short text classification methods either to expand feature of short text with external corpus or to learn the feature representation from all the texts, which have not take the difference between words of short text into full consideration. Notably, the classification of short text with few words are usually determined by a few specific key words contrary to documents classification or traditional short text classification. To address these problems, this paper propose a hybrid classification method of Attention mechanism and Feature selection via Character embedding in Chinese short text with few words, called AFC. More specifically, firstly, the character embedding is computed to represent Chinese short texts with few words, which takes full advantage of short text information without external corpus. Secondly, attention-based LSTM is introduced in our method to project the data into feature representation space with weighting, which make the keywords in classification have more subtle value. Furthermore, the semantic similarity between content and class label information is calculated for feature selection, which reduces the possible negative influence of some redundant information on classification. Experiments on real-world datasets demonstrate the effectiveness of our method compared to other competing methods.
更多
查看译文
关键词
Feature extraction, Natural language processing, Semantics, Learning systems, Task analysis, Internet, Licenses, Short text with few words, character embedding, attention mechanism, feature selection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要