Exp\'{e}riences de classification d'une collection de documents XML de structure homog\`{e}ne
Clinical Orthopaedics and Related Research(2005)
摘要
This paper presents some experiments in clustering homogeneous XMLdocuments
to validate an existing classification or more generally anorganisational
structure. Our approach integrates techniques for extracting knowledge from
documents with unsupervised classification (clustering) of documents. We focus
on the feature selection used for representing documents and its impact on the
emerging classification. We mix the selection of structured features with fine
textual selection based on syntactic characteristics.We illustrate and evaluate
this approach with a collection of Inria activity reports for the year 2003.
The objective is to cluster projects into larger groups (Themes), based on the
keywords or different chapters of these activity reports. We then compare the
results of clustering using different feature selections, with the official
theme structure used by Inria.
更多查看译文
关键词
information retrieval,feature selection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要