Two Test Collections for Retrieval Using Named Entity Markup

CIKM '20: The 29th ACM International Conference on Information and Knowledge Management Virtual Event Ireland October, 2020(2020)

引用 0|浏览68
暂无评分
摘要
Studying the effects of semantic analysis on retrieval effectiveness can be difficult using standard test collections because both queries and documents typically lack semantic markup. This paper describes extensions to two test collections, CLEF 2003/2004 Russian and TDT-3 Chinese, to support study of the utility of named entity annotation. A new set of topic aspects that were expected to benefit from named entity markup were defined for topics in those test collections, with two queries for each aspect. One of these queries uses named entities as bag-of-words query terms or as semantic constraints on a free-text query term; the other is a bag-of-words baseline query without named entity markup. Exhaustive judgment of the documents annotated by CLEF or TDT as relevant to each corresponding topic was performed, resulting in relevance judgments for 133 Russian and 33 Chinese topic aspects that each have at least one relevant document. Named entity tags were automatically generated for the documents in both collections. Use of the test collections is illustrated with some preliminary experiments.
更多
查看译文
关键词
test collection, topic aspects, entity-based search
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要