Allowing for The Grounded Use of Temporal Difference Learning in Large Ranking Models via Substate Updates

Research and Development in Information Retrieval(2021)

引用 2|浏览39
暂无评分
摘要
ABSTRACTWe introduce a modification of an established reinforcement learning method to facilitate the widespread use of temporal difference learning for IR: interpolated substate temporal difference (ISSTD) learning. While reinforcement learning methods have shown success in document ranking, these contributions have relied on relatively antiquated policy gradient methods like REINFORCE. These methods bring associated issues like high variance gradient estimates and sample inefficiency, which presents significant obstacles when training deep neural retrieval models. Within the reinforcement learning community, there exists a substantial body of work on alternative methods of training which revolve around temporal difference updates, such as Q-learning, Actor-Critic, or SARSA, that resolve some of the issues seen in REINFORCE. However, temporal difference methods require the full size of the state to be modeled internally within the ranking model, which is unrealistic for deep full text retrieval or first stage retrieval. We therefore propose ISSTD, operating on the substate, or individual documents in the case of matching models, and interpolating the temporal difference updates to the rest of the state. We provide theoretical guarantees on convergence, enabling the drop in use of ISSTD for any algorithm that relies on temporal difference updates. Furthermore, empirical results demonstrate the robustness of this approach for deep neural models, outperforming the current policy gradient approach for training deep neural retrieval models.
更多
查看译文
关键词
information retrieval, reinforcement learning, temporal difference, ranking
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要