Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints
Abstract:
Recent advances in contextual bandit optimization and reinforcement learning have garnered interest in applying these methods to real-world sequential decision making problems. Real-world applications frequently have constraints with respect to a currently deployed policy. Many of the existing constraint-aware algorithms consider proble...More
Code:
Data:
Full Text
Tags
Comments