An Automated Management Tool for Unstructured Data

Web Intelligence(2003)

引用 6|浏览3
暂无评分
摘要
The rapidly growing quantity of online data has created a need for automated, content-based categorization and search tools. The authors describe an open-source, Web-based archive management which uses latent semantic indexing, coupled with vector clustering techniques, to provide users with a fully searchable and automatically categorized interface to a data collection. The default English document parser included in the project uses part-of-speech tagging and recursive maximal noun phrase extraction to create a more effective term list for LSI than traditional stop list techniques.The archive interface supports multiple user views of the data collection. Advanced search features are implemented through relevance feedback, and do not require users to learn a query syntax.
更多
查看译文
关键词
data collection,search tool,archive interface,default english document parser,web-based archive management,advanced search feature,unstructured data,content-based categorization,effective term list,traditional stop list technique,online data,automated management tool,latent semantic indexing,grammars,search engines,content management,internet,noun phrase
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要