Online Target Q-learning with Reverse Experience Replay: Efficiently Finding the Optimal Policy for Linear MDPs
ICLR 2022(2022)
Key words
Q Learning,RL with Function Approximation,Experience Replay,Online Target Learning
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined