Learning Adversarial MDPs with Bandit Feedback and Unknown Transition

Tiancheng Jin
Tiancheng Jin
Haipeng Luo
Haipeng Luo
Suvrit Sra
Suvrit Sra

arXiv preprint arXiv:1912.01192, 2020.

Cited by: 8|Views36

Code:

Data:

Get fulltext within 24h
Bibtex
Your rating :
0

 

Tags
Comments