Building Retrieval Systems for the ClueWeb22-B Corpus
CoRR(2024)
摘要
The ClueWeb22 dataset containing nearly 10 billion documents was released in
2022 to support academic and industry research. The goal of this project was to
build retrieval baselines for the English section of the "super head" part
(category B) of this dataset. These baselines can then be used by the research
community to compare their systems and also to generate data to train/evaluate
new retrieval and ranking algorithms. The report covers sparse and dense first
stage retrievals as well as neural rerankers that were implemented for this
dataset. These systems are available as a service on a Carnegie Mellon
University cluster.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要