Robust Constrained Reinforcement Learning

Computing Research Repository (CoRR)（2023）

Assistant Professor | Associate Professor | State University of New York

Cited 0|Views31

Abstract

Constrained reinforcement learning is to maximize the expected reward subject to constraints on utilities/costs. However, the training environment may not be the same as the test one, due to, e.g., modeling error, adversarial attack, non-stationarity, resulting in severe performance degradation and more importantly constraint violation. We propose a framework of robust constrained reinforcement learning under model uncertainty, where the MDP is not ﬁxed but lies in some uncertainty set, the goal is to guarantee that constraints on utilities/costs are satisﬁed for all MDPs in the uncertainty set, and to maximize the worst-case reward performance over the uncertainty set. We design a robust primal-dual approach, and further theoretically develop guarantee on its convergence, complexity and robust feasibility. We then investigate a concrete example of δ -contamination uncertainty set, design an online and model-free algorithm and theoretically characterize its sample complexity.

Translated text

Key words

Reinforcement Learning,Robustness,Uncertainty Estimation

Bibtex

AI Read Science

AI Summary

AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.

Example

Background

Key content

Introduction

Methods

Results

Related work

Fund

Key content

Pretraining has recently greatly promoted the development of natural language processing (NLP)
We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance

Try using models to generate summary,it takes about 60s

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Data Disclaimer

The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn

Chat Paper

要点：本文提出了一种在模型不确定性下的鲁棒约束强化学习框架，旨在保证各种不确定模型下的效用/成本约束满足，并在不确定性集合中最大化最劣情况下的奖励性能。同时，作者还提出了一个强健原始-对偶算法，并在收敛性、复杂度和鲁棒性上进行了理论保证。

方法：通过鲁棒原始-对偶算法，在模型不确定性集合下进行鲁棒约束强化学习。

实验：以$\delta$-污染不确定集合为例，设计了一种在线、无模型的算法，并在理论上解释了其样本复杂度。

去 AI 文献库对话