Studying Ranking-Incentivized Web Dynamics

SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval Virtual Event China July, 2020, pp. 2093-2096, 2020.

Cited by: 0|Bibtex|Views19|DOI:https://doi.org/10.1145/3397271.3401300
EI
Other Links: arxiv.org|dl.acm.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
Analysis of the dataset revealed that aspects of temporal changes of documents are in accordance with those reported for small-scale controlled ranking competitions between students who authored and manipulated short plaintext documents

Abstract:

The ranking incentives of many authors of Web pages play an important role in the Web dynamics. That is, authors who opt to have their pages highly ranked for queries of interest often respond to rankings for these queries by manipulating their pages; the goal is to improve the pages' future rankings. Various theoretical aspects of this d...More

Code:

Data:

0
Introduction
  • The Web is a dynamic retrieval setting. An important part of this dynamics is due to the ranking incentives of Web pages’ authors.
  • The authors believe that the dataset the authors have developed, and its future potential developments, will help to better understanding and further explore Web dynamics with respect to the inherent competitive retrieval setting which is driven by rank-incentivized authors.
  • Ziv Vasilisky, Moshe Tennenholtz, and Oren Kurland modifications and rank-promotion strategies play an important role in driving the dynamics of the Web retrieval setting [9].
Highlights
  • The Web is a dynamic retrieval setting
  • We believe that the dataset we have developed, and its future potential developments, will help to better understanding and further explore Web dynamics with respect to the inherent competitive retrieval setting which is driven by rank-incentivized authors
  • Our first goal was to create a dataset that allows to study the temporal dynamics of documents that are highly ranked for queries
  • Our dataset is composed of real Web documents, and we have no explicit evidence that their authors were trying to promote them for ClueWeb09 queries, the temporal changes of the documents are similar in several respects to those observed for short plaintext documents used in controlled ranking competitions with explicit rank-promotion incentives of their authors [14]
  • Analysis of the dataset revealed that aspects of temporal changes of documents are in accordance with those reported for small-scale controlled ranking competitions between students who authored and manipulated short plaintext documents [14]
  • We plan to further develop this dataset and use it to extend our study of the Web dynamics which revolves around rankings induced for queries
Results
  • The authors' first goal was to create a dataset that allows to study the temporal dynamics of documents that are highly ranked for queries.
  • Out of the 9986 different documents in the base set, 7425 (74.4%) have at least one past version in one of the twelve 2008 time intervals.
  • Figure 1 presents the average similarity between documents in the base set and their past versions.
  • The authors see an overall upward trend which attests to increased use of terms of queries for which the documents were highly ranked.
  • The authors note that this trend was observed in the ranking competitions of Raifer et al [14] where students explicitly worked on rank-promoting their documents for given queries.
  • The authors' dataset is composed of real Web documents, and the authors have no explicit evidence that their authors were trying to promote them for ClueWeb09 queries, the temporal changes of the documents are similar in several respects to those observed for short plaintext documents used in controlled ranking competitions with explicit rank-promotion incentives of their authors [14].
  • In Figure 5 the authors present the number of documents in the ClueWeb09 base set whose past versions are ranked at the highest rank for x intervals.
  • The authors have no explicit evidence that authors of ClueWeb09 pages were incentivized to rank-promote their documents for the ClueWeb09 queries.
  • The findings the authors presented above with regard to the characteristics of documents’ changes attested to the fact that the manipulations of the documents along time did correspond to those observed in Raifer et al.’s competitions for rank-incentivized authors.
Conclusion
  • The authors described a dataset the authors have created which contains past versions of ClueWeb09 documents that are highly ranked for ClueWeb09 queries.
  • Analysis of the dataset revealed that aspects of temporal changes of documents are in accordance with those reported for small-scale controlled ranking competitions between students who authored and manipulated short plaintext documents [14].
  • The authors plan to further develop this dataset and use it to extend the study of the Web dynamics which revolves around rankings induced for queries.
Summary
  • The Web is a dynamic retrieval setting. An important part of this dynamics is due to the ranking incentives of Web pages’ authors.
  • The authors believe that the dataset the authors have developed, and its future potential developments, will help to better understanding and further explore Web dynamics with respect to the inherent competitive retrieval setting which is driven by rank-incentivized authors.
  • Ziv Vasilisky, Moshe Tennenholtz, and Oren Kurland modifications and rank-promotion strategies play an important role in driving the dynamics of the Web retrieval setting [9].
  • The authors' first goal was to create a dataset that allows to study the temporal dynamics of documents that are highly ranked for queries.
  • Out of the 9986 different documents in the base set, 7425 (74.4%) have at least one past version in one of the twelve 2008 time intervals.
  • Figure 1 presents the average similarity between documents in the base set and their past versions.
  • The authors see an overall upward trend which attests to increased use of terms of queries for which the documents were highly ranked.
  • The authors note that this trend was observed in the ranking competitions of Raifer et al [14] where students explicitly worked on rank-promoting their documents for given queries.
  • The authors' dataset is composed of real Web documents, and the authors have no explicit evidence that their authors were trying to promote them for ClueWeb09 queries, the temporal changes of the documents are similar in several respects to those observed for short plaintext documents used in controlled ranking competitions with explicit rank-promotion incentives of their authors [14].
  • In Figure 5 the authors present the number of documents in the ClueWeb09 base set whose past versions are ranked at the highest rank for x intervals.
  • The authors have no explicit evidence that authors of ClueWeb09 pages were incentivized to rank-promote their documents for the ClueWeb09 queries.
  • The findings the authors presented above with regard to the characteristics of documents’ changes attested to the fact that the manipulations of the documents along time did correspond to those observed in Raifer et al.’s competitions for rank-incentivized authors.
  • The authors described a dataset the authors have created which contains past versions of ClueWeb09 documents that are highly ranked for ClueWeb09 queries.
  • Analysis of the dataset revealed that aspects of temporal changes of documents are in accordance with those reported for small-scale controlled ranking competitions between students who authored and manipulated short plaintext documents [14].
  • The authors plan to further develop this dataset and use it to extend the study of the Web dynamics which revolves around rankings induced for queries.
Related work
  • Much of the focus of past work on adversarial IR was on spamming [4]. Search engine optimization needless necessary be black-hat or spamming [9]. In fact, legitimate (a.k.a. white hat) document

    SIGIR ’20, July 25–30, 2020, Virtual Event, China

    Ziv Vasilisky, Moshe Tennenholtz, and Oren Kurland modifications and rank-promotion strategies play an important role in driving the dynamics of the Web retrieval setting [9]. One of our goals in developing the dataset is to study this white hat dynamics.

    There is a line of work on predicting and analyzing changes of Web pages; e.g., [12, 13]. However, these studies are for general dynamics and not for ranking-oriented dynamics which we address in this paper.
Funding
  • The work by Moshe Tennenholtz was supported by funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement 740435)
Reference
  • Ablimit Aji, Yu Wang, Eugene Agichtein, and Evgeniy Gabrilovich. 2010. Using the past to score the present: extending term weighting models through revision history analysis. In Proc. of CIKM. 629–638.
    Google ScholarLocate open access versionFindings
  • Ran Ben-Basat, Moshe Tennenholtz, and Oren Kurland. 2017. A Game Theoretic Analysis of the Adversarial Retrieval Setting. J. Artif. Intell. Res. 60 (2017), 1127– 1164.
    Google ScholarLocate open access versionFindings
  • Michael Bendersky, W Bruce Croft, and Yanlei Diao. 2011. Quality-biased ranking of web documents. In Proc. of WSDM. 95–104.
    Google ScholarLocate open access versionFindings
  • Carlos Castillo and Brian D. Davison. 2010. Adversarial Web Search. Foundations and Trends in Information Retrieval 4, 5 (2010), 377–486.
    Google ScholarFindings
  • Charles L. A. Clarke, Nick Craswell, and Ian Soboroff. 2009. Overview of the TREC 2009 Web Track. In Proc. of TREC, Ellen M. Voorhees and Lori P. Buckland (Eds.).
    Google ScholarLocate open access versionFindings
  • Gordon V. Cormack, Mark D. Smucker, and Charles L. A. Clarke. 2010. Efficient and Effective Spam Filtering and Re-ranking for Large Web Datasets. arXiv:1004.5168 [cs.IR]
    Findings
  • Jonathan L. Elsas and Susan T. Dumais. 2010. Leveraging Temporal Dynamics of Document Content in Relevance Ranking. In Proc. of WSDM. 1–10.
    Google ScholarLocate open access versionFindings
  • Gregory Goren, Oren Kurland, Moshe Tennenholtz, and Fiana Raiber. 201Ranking Robustness Under Adversarial Document Manipulations. In Proc. of SIGIR. 395–404.
    Google ScholarLocate open access versionFindings
  • Zoltán Gyöngyi and Hector Garcia-Molina. 2005. Web Spam Taxonomy. In Proc. of AIRWeb 2005. 39–47.
    Google ScholarLocate open access versionFindings
  • John D. Lafferty and Chengxiang Zhai. 2001. Document language models, query models, and risk minimization for information retrieval. In Proc. of SIGIR. 111– 119.
    Google ScholarLocate open access versionFindings
  • Sérgio Nunes, Cristina Ribeiro, and Gabriel David. 20Term Weighting Based on Document Revision History. JASIST 62 (12 2011), 2471–2478.
    Google ScholarLocate open access versionFindings
  • Kira Radinsky and Paul N. Bennett. 2013. Predicting content change on the web. In Proc. of WSDM. 415–424.
    Google ScholarLocate open access versionFindings
  • Kira Radinsky, Fernando Diaz, Susan T. Dumais, Milad Shokouhi, Anlei Dong, and Yi Chang. 20Temporal web dynamics and its application to information retrieval. In Proc. of WSDM. 781–782.
    Google ScholarLocate open access versionFindings
  • Nimrod Raifer, Fiana Raiber, Moshe Tennenholtz, and Oren Kurland. 2017. Information Retrieval Meets Game Theory: The Ranking Competition Between Documents’ Authors. In Proc. of SIGIR. 465–474.
    Google ScholarLocate open access versionFindings
  • Stephen E. Robertson. 1977. The Probability Ranking Principle in IR. Journal of Documentation (1977), 294–304. Reprinted in K. Sparck Jones and P. Willett (eds), Readings in Information Retrieval, pp. 281–286, 1997.
    Google ScholarLocate open access versionFindings
  • Moshe Tennenholtz and Oren Kurland. 2019. Rethinking search engines and recommendation systems: a game theoretic perspective. Commun. ACM 62, 12 (2019), 66–75.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments