A weighted common structure based clustering technique for XML documents

Journal of Systems and Software(2010)

引用 8|浏览0
暂无评分
摘要
XML has recently become very popular as a means of representing semistructured data and as a standard for data exchange over the Web, because of its varied applicability in numerous applications. Therefore, XML documents constitute an important data mining domain. In this paper, we propose a new method of XML document clustering by a global criterion function, considering the weight of common structures. Our approach initially extracts representative structures of frequent patterns from schemaless XML documents using a sequential pattern mining algorithm. Then, we perform clustering of an XML document by the weight of common structures, without a measure of pairwise similarity, assuming that an XML document is a transaction and frequent structures extracted from documents are items of the transaction. We conducted experiments to compare our method with previous methods. The experimental results show the effectiveness of our approach.
更多
查看译文
关键词
document clustering,weighted common structure,frequent structure,xml mining,data mining,important data mining domain,xml clustering,common structure,previous method,clustering technique,schemaless xml document,new method,frequent pattern,semistructured data,xml document,data exchange,sequential pattern mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要