AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
In section 4 we analyze a large graph generated from a large query log and in section 5 we evaluate our results

Extracting semantic relations from query logs

KDD, pp.76-85, (2007)

被引用369|浏览40
EI
下载 PDF 全文
引用
微博一下

摘要

In this paper we study a large query log of more than twenty million queries with the goal of extracting the semantic relations that are implicitly captured in the actions of users submitting queries and clicking answers. Previous query log analyses were mostly done with just the queries and not the actions that followed after them. We fi...更多

代码

数据

0
简介
  • One of the recurrent goals of mankind has been to recollect all human knowledge, as the wisdom of all the people is larger than any particular individual.
  • One natural starting point is to infer a graph from the queries
  • One such graph is the bipartite graph of queries and URLs, where a query and a URL are connected if a user clicked in a URL that was an answer for a query.
  • Another possibility, more frequent in previous research, is to define a similarity function between queries.
  • One drawback of defining a function is that it is more difficult to understand why two queries are similar and in some degree the authors add artificial artifacts that can add noise to data that is already noisy
重点内容
  • One of the recurrent goals of mankind has been to recollect all human knowledge, as the wisdom of all the people is larger than any particular individual (e.g. the wisdom
  • The Web can be seen as the largest intent to store all human knowledge, either explicitly (e.g. Wikipedia) or implicitly
  • One natural starting point is to infer a graph from the queries. One such graph is the bipartite graph of queries and URLs, where a query and a URL are connected if a user clicked in a URL that was an answer for a query
  • In section 4 we analyze a large graph generated from a large query log and in section 5 we evaluate our results
  • We found that at least 40% were synonyms, 17% were site name-domain equivalences and more than 5% were webslang, which cannot be found normally in ODP
  • We have used several query logs containing up to fifty million queries and the results are similar for all of them, so here we present data from only one log piece of 2005 coming from the Yahoo! search engine
结果
  • This implies that the webslang relations could be up to more than 50% of all the extracted relations, and a high percentage of them can be relevant.
  • The authors found that at least 40% were synonyms, 17% were site name-domain equivalences and more than 5% were webslang, which cannot be found normally in ODP
结论
  • The authors' results are really promising if the authors consider that the query log was small and over a short period of time.
  • This implies that the authors can neither follow patterns over time nor consider the number of different users involved in the clicks.
  • The authors underline that the similarity measure used in the experimental evaluation is quite strict: often queries that are not related according to ODP, are equivalent.
  • The authors found that at least 40% were synonyms, 17% were site name-domain equivalences and more than 5% were webslang, which cannot be found normally in ODP
表格
  • Table1: Some figures about the studied graphs
  • Table2: Power Laws
  • Table3: Examples of equivalent queries
  • Table4: Some path examples on similar queries
  • Table5: Matches in ODP for both types of edges
  • Table6: Precision against votes for all samples
Download tables as Excel
引用论文
  • [2] R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query clustering for boosting web page ranking. AWIC’04, [3] R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query recommendation using query logs in a search engine. EDBT Workshops, 2004.
    Google ScholarLocate open access versionFindings
  • [6] S.-L. Chuang and L.-F. Chien. Automatic query taxonomy generation for information retrieval applications. Online Information Review 27(4), 2003.
    Google ScholarLocate open access versionFindings
  • [7] S.-L. Chuang and L.-F. Chien. Enriching web taxonomies through subject categorization of query terms from search engine logs. Decision Support System 30(1), 2003.
    Google ScholarLocate open access versionFindings
  • [16] James Surowiecki. The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations, Little and Brown, 2004.
    Google ScholarFindings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科