Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints

Singh Shaun
Singh Shaun
Avadhanula Vashist
Avadhanula Vashist
Cited by: 1|Bibtex|Views61
Other Links: arxiv.org

Abstract:

Recent advances in contextual bandit optimization and reinforcement learning have garnered interest in applying these methods to real-world sequential decision making problems. Real-world applications frequently have constraints with respect to a currently deployed policy. Many of the existing constraint-aware algorithms consider proble...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments