String Vector Based Ahc For Text Clustering

2017 19TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATIONS TECHNOLOGY (ICACT) - OPENING NEW ERA OF SMART SOCIETY(2017)

引用 2|浏览7
暂无评分
摘要
In this research, we propose the string vector based version of AHC algorithm as the approach to the text clustering. Using the traditional version leads to the three main problems: huge dimensionality, sparse distribution, poor transparency, since texts need to be encoded into numerical vectors. In order to solve the problems, in this research, we encode texts into string vectors, define the similarity measure between them, and modify the AHC algorithm into the version where a string vector is given as its input. As the benefits from this research, we expect the better performance, the more compact representation, and the better transparency. Hence, this research is intended to improve the text clustering performance, by solving the problems.
更多
查看译文
关键词
Text Clustering,Semantic Similarity Similarity,String Vector,String Vector based AHC
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要