Risk-Conditioned Reinforcement Learning: A Generalized Approach for Adapting to Varying Risk Measures

AAAI 2024(2024)

引用 0|浏览1
暂无评分
摘要
In application domains requiring mission-critical decision making, such as finance and robotics, the optimal policy derived by reinforcement learning (RL) often hinges on a preference for risk management. Yet, the dynamic nature of risk measures poses considerable challenges to achieving generalization and adaptation of risk-sensitive policies in the context of RL. In this paper, we propose a risk-conditioned RL model that enables rapid policy adaptation to varying risk measures via a unified risk representation, the Weighted Value-at-Risk (WV@R). To sample risk measures that avoid undue optimism, we construct a risk proposal network employing a conditional adversarial auto-encoder and a normalizing flow. This network establishes coherent representations for risk measures, preserving the continuity in terms of the Wasserstein distance on the risk measures. The normalizing flow is used to support non-crossing quantile regression that obtains valid samples for risk measures, and it is also applied to the agent’s critic to ascertain the preservation of monotonicity in quantile estimations. Through experiments with locomotion, finance, and self-driving scenarios, we show that our model is capable of adapting to a range of risk measures, achieving comparable performance to the baseline models individually trained for each measure. Our model often outperforms the baselines, especially in the cases when exploration is required during training but risk-aversion is favored during evaluation.
更多
查看译文
关键词
ML: Reinforcement Learning,RU: Decision/Utility Theory
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要