AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
The most important measure of a search engine is the quality of its search results

The anatomy of a large-scale hypertextual Web search engine

Computer Networks, no. 1-7 (1998): 107-117

被引用18349|浏览591
EI

摘要

In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 ...更多

代码

数据

0
简介
  • The Web creates new challenges for information retrieval.
  • The amount of information on the Web is growing rapidly, as well as the number of new users inexperienced in the art of Web research.
  • The full version is available Web and the conference CD-ROM.
  • ’ E-mail: (sergey, page] @cs.stanford.edu version on the likely to surf the Web using its link graph, often starting with high quality human maintained indices such as Yahoo!
  • 3 or with search engines.
  • Automated search engines that rely on keyword matching usually return too many low quality matches.
  • Some advertisers attempt to gain people’s attention by taking measures meant to mislead
重点内容
  • The Web creates new challenges for information retrieval
  • The most important measure of a search engine is the quality of its search results
  • While a complete user evaluation is beyond the scope of this paper, our own experience with Google has shown it to produce better results than the major commercial search engines for most searches
  • Google is a complete architecture for gathering Web pages, indexing them, and performing search queries over them
  • We are planning to add simple features supported by commercial search engines like boolean operators. negation. and stemming
方法
  • Improved search quality The authors' main goal is to improve the quality of Web search engines.
  • In 1994, some people believed that a complete search index would make it possible to find anything .
  • Anyone who has used a search engine recently, can readily testify that the completeness of the index is not the only factor in the quality of search results.
  • People are still only willing to look at the first few tens of results.
  • The authors want the notion of “relevant” to only include the
结果
  • Results and performance

    The most important measure of a search engine is the quality of its search results.
  • As an example which illustrates the use of PageRank, anchor text, and proximity, Fig. 2 shows Google’s results for a search on “bill Clinton”.
  • These results demonstrates some of Google’s features.
  • Most major commercial search engines do not return any results from whitehouse.gov, much less the right ones.
  • Is not crawlable
  • It is a result of anchor text
结论
  • Google is designed to be a scalable search engine. The primary goal is to provide high quality search results over a rapidly growing World Wide Web.
  • Some simple improvements to efficiency include query caching, smart disk allocation, and subindices.
  • Another area which requires much research is updates.
  • The authors must have smart algorithms to decide what old Web pages should be recrawled and what new ones should be crawled
  • Work toward this goal has been done in [2].
  • The authors are planning to add simple features supported by commercial search engines like boolean operators.
总结
  • Introduction:

    The Web creates new challenges for information retrieval.
  • The amount of information on the Web is growing rapidly, as well as the number of new users inexperienced in the art of Web research.
  • The full version is available Web and the conference CD-ROM.
  • ’ E-mail: (sergey, page] @cs.stanford.edu version on the likely to surf the Web using its link graph, often starting with high quality human maintained indices such as Yahoo!
  • 3 or with search engines.
  • Automated search engines that rely on keyword matching usually return too many low quality matches.
  • Some advertisers attempt to gain people’s attention by taking measures meant to mislead
  • Methods:

    Improved search quality The authors' main goal is to improve the quality of Web search engines.
  • In 1994, some people believed that a complete search index would make it possible to find anything .
  • Anyone who has used a search engine recently, can readily testify that the completeness of the index is not the only factor in the quality of search results.
  • People are still only willing to look at the first few tens of results.
  • The authors want the notion of “relevant” to only include the
  • Results:

    Results and performance

    The most important measure of a search engine is the quality of its search results.
  • As an example which illustrates the use of PageRank, anchor text, and proximity, Fig. 2 shows Google’s results for a search on “bill Clinton”.
  • These results demonstrates some of Google’s features.
  • Most major commercial search engines do not return any results from whitehouse.gov, much less the right ones.
  • Is not crawlable
  • It is a result of anchor text
  • Conclusion:

    Google is designed to be a scalable search engine. The primary goal is to provide high quality search results over a rapidly growing World Wide Web.
  • Some simple improvements to efficiency include query caching, smart disk allocation, and subindices.
  • Another area which requires much research is updates.
  • The authors must have smart algorithms to decide what old Web pages should be recrawled and what new ones should be crawled
  • Work toward this goal has been done in [2].
  • The authors are planning to add simple features supported by commercial search engines like boolean operators.
表格
  • Table1: Statistics
Download tables as Excel
相关工作
  • Search research on the Web has a short and concise history. The World Wide Web Worm (WWWW) [6] was one of the first Web search engines. It was subsequently followed by several academic search engines, many of which are now public companies. Compared to the growth of the Web and the importance of search engines there are precious few documents about recent search engines [S]. According to Michael Mauldin (chief scientist, Lycos Inc.) 1.51,“the various services (including Lycos) closely guard the details of these databases”. However, there has been a fair amount of work on specific features of search engines. Especially well represented is work which can get results by post-processing the results of existing commercial search engines, or produce small scale “individualized’ search engines. Finally, there has been a lot of research on information retrieval systems. especially on well controlled collections [ 111.
基金
  • The research described here was conducted as part of the Stanford Integrated Digital Library Project, supported by the National Science Foundation under Cooperative Agreement IRI-94 11306
引用论文
  • Du~dxzse T/rror?: Delphi. Greece, 1997.
    Google ScholarFindings
  • World Wide Web Confereru (WWW 981. Brisbane. Australia. April l5- IS, 1998: also Conr\~ct. NrfMvrks ISDN
    Google ScholarLocate open access versionFindings
  • 4/,yorifhm. 1998.
    Google ScholarFindings
  • Web: hyper search engines, in: Prvc. of rhc 6rh Intemarimol WWW C’onftirrncr (WWW 971. Santa Clara. USA, PIpril 7-l I, 1997.
    Google ScholarFindings
  • Mauldin. M.L.. Lycos design choices in an Internet search service. IEEE Expert Interview. http://www.computer.org/p ubs/cxpcrt/l997/trends/xl008/mauldin.htm
    Locate open access versionFindings
  • 1991. http://www.cs.colorado.edu/home/mcbryan/mypapers
    Findings
  • PageRank citation ranking: bringing order to the Web, Manuscript in Progress. http://google.stanford.edu/-backru h/pageranksub.ps
    Findings
  • C;@>rerrce Chicago, USA, October 17-20, 1994, http://inf o.webcrawler.com/bplWWW94.html
    Findings
  • WWW Conf&xcr ( WWW Y7J. Santa Clam. USA, April 7-l I. 1997.
    Google ScholarFindings
  • 1 IO] D.K. Harman and E.M. Voorhees (Eds.). Proceedings of fhr F~ff/l Trrt REtrieid Conjtirrnce (TREC-5). Gaithersburg. Maryland. November 20-Z. 1996, Department of
    Google ScholarLocate open access versionFindings
  • 1996: full text at http:l/trec.nist.gov/
    Google ScholarFindings
  • Nostrand Reinhold, New York. NY, 1994.
    Google ScholarFindings
  • Szilagyi. A. Duda. and D. K. Gifford, HyPursuit: a hicrarchical network search engine that exploits content-linh hypertext clustering. in: Proc. oj rhr 7th.4C’M Clmfi~rrnc~, on H\pertc.a. New York. 1996.
    Google ScholarFindings
  • Sergey Brin recetved his B.S. degree in mathematics and computer science from the University of Maryland at College Park in 3993. Currently. he is a Ph.D. candidate in computer science at Stanford Univcrsity where he received his MS. in 1995. He is a recipient of a National Science Foundation Graduate Fellowship. His research interests include search engines. information extraction from unstructured sources. and of large text collections and scientilic data.
    Google ScholarFindings
  • Lawrence Page was born in East Lansing, Michigan. and received a B.S.E. in Computer Engineering at the University of Michigan Ann Arbor in 1995. He is currently a Ph.D. candidate in Computer Science at Stanford University. Some of his research interests include the link structure of the Web. human computer interaction, search engines, scalability of information access interfaces. and personal data mining.
    Google ScholarFindings
您的评分 :
0

 

标签
评论
小科