Clustering Retrieved Web Documents To Speed Up Web Searches

COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2015, PT I（2015）

引用 0|浏览15

暂无评分

摘要

Current web search engines, such as Google, Bing, and Yahoo!, rank the set of documents S retrieved in response to a user query and display the URL of each document D in S with a title and a snippet, which serves as an abstract of D. Snippets, however, are not as useful as they are designed for, which is supposed to assist its users to quickly identify results of interest, if they exist. These snippets fail to (i) provide distinct information and (ii) capture the main contents of the corresponding documents. Moreover, when the intended information need specified in a search query is ambiguous, it is very difficult, if not impossible, for a search engine to identify precisely the set of documents that satisfy the user's intended request without requiring additional inputs. Furthermore, a document title is not always a good indicator of the content of the corresponding document. All of these design problems can be solved by our proposed query-based cluster and labeler, called QClus. QClus generates concise clusters of documents covering various subject areas retrieved in response to a user query, which saves the user's time and effort in searching for specific information of interest without having to browse through the documents one by one. Experimental results show that QClus is effective and efficient in generating high-quality clusters of documents on specific topics with informative labels.

查看译文

关键词

Clustering, Cluster labels, User queries, Web documents

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要