Improving Short Query Representation in LDA Based Information Retrieval Systems.

Hybrid Artificial Intelligence Systems (HAIS)(2022)

引用 0|浏览0
暂无评分
摘要
Incorporation of topic modeling techniques into Information Retrieval (IR) systems has been a promising area of research in the last years. Typically, queries submitted into IR systems are concise and made up using only the essential keywords. This leads to the formulation of short length queries, which have a negative impact on the LDA algorithm accuracy and relevant documents retrieval. This work presents a novel method to improve short query representation in information retrieval systems. The new technique (LDAW), modifies its representation based on the Latent Dirichlet Allocation (LDA) model. LDAW is tested with three biomedical corpora (TREC Genomics 2004, TREC Genomics 2005, and OHSUMED) and one legal cases corpus (FIRE 2017). Results prove that the application of the proposed method clearly outperforms the baseline methods (BM25 and non-modified LDA).
更多
查看译文
关键词
Information retrieval,Short queries,Latent dirichlet allocation,Topic modeling,Text preprocessing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要