Chrome Extension
WeChat Mini Program
Use on ChatGLM

Online Target Q-learning with Reverse Experience Replay: Efficiently Finding the Optimal Policy for Linear MDPs

ICLR 2022(2022)

Cited 23|Views68
Key words
Q Learning,RL with Function Approximation,Experience Replay,Online Target Learning
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined