Arabicweb16: A New Crawl For Today'S Arabic Web
SIGIR '16: The 39th International ACM SIGIR conference on research and development in Information Retrieval Pisa Italy July, 2016(2016)
摘要
Web crawls provide valuable snapshots of the Web which enable a wide variety of research, be it distributional analysis to characterize Web properties or use of language, content analysis in social science, or Information Retrieval (IR) research to develop and evaluate effective search algorithms. While many English-centric Web crawls exist, existing public Arabic Web crawls are quite limited, limiting research and development. To remedy this, we present ArabicWeb16, a new public Web crawl of roughly 150M Arabic Web pages with significant coverage of dialectal Arabic as well as Modern Standard Arabic. For IR researchers, we expect ArabicWeb16 to support various research areas: ad-hoc search, question answering, filtering, cross-dialect search, dialect detection, entity search, blog search, and spam detection.
更多查看译文
关键词
Multi-Dialect,Web Collection,Arabic Retrieval,Ad-hoc Search,Evaluation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络