Indiscriminateness in Representation Spaces of Terms and Documents.

ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018)(2018)

引用 3|浏览5
暂无评分
摘要
Examining the properties of representation spaces for documents or words in Information Retrieval (IR) typically R-n with n large - brings precious insights to help the retrieval process. Recently, several authors have studied the real dimensionality of the datasets, called intrinsic dimensionality, in specific parts of these spaces [14]. They have shown that this dimensionality is chiefly tied with the notion of indiscriminateness among neighbors of a query point in the vector space. In this paper, we propose to revisit this notion in the specific case of IR. More precisely, we show how to estimate indiscriminateness from IR similarities in order to use it in representation spaces used for documents and words [7,18]. We show that indiscriminateness may be used to characterize difficult queries; moreover we show that this notion, applied to word embeddings, can help to choose terms to use for query expansion.
更多
查看译文
关键词
Intrinsic dimensionality,Indiscriminability RSV scores,Distributional thesauri,Query expansion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要