Reinforcement Learning When All Actions are Not Always Available

Yash Chandak
Yash Chandak
Georgios Theocharous
Georgios Theocharous
Blossom Metevier
Blossom Metevier

national conference on artificial intelligence, 2020.

Cited by: 1|Bibtex|Views16
Other Links: academic.microsoft.com|arxiv.org

Abstract:

The Markov decision process (MDP) formulation used to model many real-world sequential decision making problems does not capture the setting where the set of available decisions (actions) at each time step is stochastic. Recently, the stochastic action set Markov decision process (SAS-MDP) formulation has been proposed, which captures t...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments