Chrome Extension
WeChat Mini Program
Use on ChatGLM

Robust Constrained Reinforcement Learning

Computing Research Repository (CoRR)(2023)

Assistant Professor | Associate Professor | State University of New York

Cited 0|Views31
Abstract
Constrained reinforcement learning is to maximize the expected reward subject to constraints on utilities/costs. However, the training environment may not be the same as the test one, due to, e.g., modeling error, adversarial attack, non-stationarity, resulting in severe performance degradation and more importantly constraint violation. We propose a framework of robust constrained reinforcement learning under model uncertainty, where the MDP is not fixed but lies in some uncertainty set, the goal is to guarantee that constraints on utilities/costs are satisfied for all MDPs in the uncertainty set, and to maximize the worst-case reward performance over the uncertainty set. We design a robust primal-dual approach, and further theoretically develop guarantee on its convergence, complexity and robust feasibility. We then investigate a concrete example of δ -contamination uncertainty set, design an online and model-free algorithm and theoretically characterize its sample complexity.
More
Translated text
Key words
Reinforcement Learning,Robustness,Uncertainty Estimation
PDF
Bibtex
AI Read Science
AI Summary
AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.
Example
Background
Key content
Introduction
Methods
Results
Related work
Fund
Key content
  • Pretraining has recently greatly promoted the development of natural language processing (NLP)
  • We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
  • We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
  • The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
  • Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
Try using models to generate summary,it takes about 60s
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper

要点:本文提出了一种在模型不确定性下的鲁棒约束强化学习框架,旨在保证各种不确定模型下的效用/成本约束满足,并在不确定性集合中最大化最劣情况下的奖励性能。同时,作者还提出了一个强健原始-对偶算法,并在收敛性、复杂度和鲁棒性上进行了理论保证。

方法:通过鲁棒原始-对偶算法,在模型不确定性集合下进行鲁棒约束强化学习。

实验:以$\delta$-污染不确定集合为例,设计了一种在线、无模型的算法,并在理论上解释了其样本复杂度。