AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
A Teapot Graph is constructed as alternative to the classic Bow Tie or Daisy Graphs

A teapot graph and its hierarchical structure of the chinese web

WWW, pp.1133-1134, (2008)

Cited: 19|Views20
EI

Abstract

The shape of the Web in terms of its graphical structure has been a widely interested topic. Two graphs, Bow Tie and Daisy, have stood out from previous research. In this work, we take a different approach, by viewing the Web as a hierarchy of three levels, namely page level, host level, and domain level. Such structures are analyzed and ...More

Code:

Data:

0
Introduction
  • How does the World Wide Web look like as a graph? The question is important for information scientists in networking traffic, search engine optimization, and other areas of information technology and interesting for social scientists who are concerned about the diffusion, use, and impact of the technology.
  • The pioneering work by Broder et al [1] suggests that the Web looks like a Bow Tie of four distinct components, each in a roughly equal size, including a strongly connected component (SCC, which accounts for 29% of the total web pages), an IN component (24%), an OUT component (24%), and a disconnected component (DISC and tendrils, 24%).
  • The global Web largely resembles Bow Tie. the three national Webs all show only two components: a dominant SCC (51-72%) and a visible OUT (2846%).
Highlights
  • How does the World Wide Web look like as a graph? The question is important for information scientists in networking traffic, search engine optimization, and other areas of information technology and interesting for social scientists who are concerned about the diffusion, use, and impact of the technology
  • A major revision is the Daisy Graph proposed by Donato et al [2], in which the IN and OUT components are described as a large number of “small and shallow petals” hanging from a disproportionately larger and denser SCC in the center
  • The Teapot graph differs from an earlier Chinese web graph [3], most noticeable in the size of SCC (44% vs. 80%), which might be attributed to differences in time, crawling strategy, and other factors
  • A Teapot Graph is constructed as alternative to the classic Bow Tie or Daisy Graphs
  • The most unexpected finding is the absence of self similarity between pagelevel and host/domain levels
  • We will examine the reasons behind the dramatic change in the relative proportions of IN and OUT and identify content, technical, and geographic features of the web pages and sites appearing in different components of the structures
Results
  • 3.1 Teapot Graph

    Of the 837M pages crawled, 43 billion links are found, which amounts to 52 links per page, or almost twice as much as found in Italy (28 links/page) and Indochina (27 links/page) or 3 times as much as in UK (16 links/page).
  • As shown in Table 1, the overall graph of Chinese Web departs significantly from the Bow Tie shape because the SCC accounts for a much larger share (44%) and the OUT a smaller share (15%) than the counterparts in Bow Tie. On the other hand, the Chinese Web has not become a Daisy yet because its SCC is still smaller than half of the graph and both OUT and Disc/Tendrils are still sizable.
  • The Teapot graph differs from an earlier Chinese web graph [3], most noticeable in the size of SCC (44% vs. 80%), which might be attributed to differences in time, crawling strategy, and other factors
Conclusion
  • The authors present a large-scale experiment on the graph properties of Chinese Web. A Teapot Graph is constructed as alternative to the classic Bow Tie or Daisy Graphs.
  • A three-layer structure is further considered for the national Web. The most unexpected finding is the absence of self similarity between pagelevel and host/domain levels.
  • The authors will examine the reasons behind the dramatic change in the relative proportions of IN and OUT and identify content, technical, and geographic features of the web pages and sites appearing in different components of the structures
Tables
  • Table1: Components of Chinese Web Graph
Download tables as Excel
Funding
  • The work was funded in part by NSFC (60573166), HKSAR CERG (CityU 1456/06H) and City University of Hong Kong SRG (7001882)
Reference
  • Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A. & Wiener, J. (2000). Graph structure in the web. Computer Networks, 33(1-6), 309-320.
    Google ScholarLocate open access versionFindings
  • Donato, D. Leonardi, S., Millozzi, S., & Tsaparas, P. Mining the inner structure of the Web graph. Eighth International Workshop on the Web and Databases (WebDB 2005), June 16-17, 2005, Baltimore, Maryland.
    Google ScholarFindings
  • Liu, G., Yu, H., Han, J. & Xue, G. (2005). China web graph measurements and evolution. In Y. Zhang et al. (Eds.): APWeb 2005, LNCS 3399, 668– 679.
    Google ScholarLocate open access versionFindings
Author
Jonathan J. H. Zhu
Jonathan J. H. Zhu
Tao Meng
Tao Meng
0
Your rating :

No Ratings

Tags
Comments
avatar
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn