Multi-Fidelity Reinforcement Learning for Time-Optimal Quadrotor Re-planning
CoRR(2024)
摘要
High-speed online trajectory planning for UAVs poses a significant challenge
due to the need for precise modeling of complex dynamics while also being
constrained by computational limitations. This paper presents a multi-fidelity
reinforcement learning method (MFRL) that aims to effectively create a
realistic dynamics model and simultaneously train a planning policy that can be
readily deployed in real-time applications. The proposed method involves the
co-training of a planning policy and a reward estimator; the latter predicts
the performance of the policy's output and is trained efficiently through
multi-fidelity Bayesian optimization. This optimization approach models the
correlation between different fidelity levels, thereby constructing a
high-fidelity model based on a low-fidelity foundation, which enables the
accurate development of the reward model with limited high-fidelity
experiments. The framework is further extended to include real-world flight
experiments in reinforcement learning training, allowing the reward model to
precisely reflect real-world constraints and broadening the policy's
applicability to real-world scenarios. We present rigorous evaluations by
training and testing the planning policy in both simulated and real-world
environments. The resulting trained policy not only generates faster and more
reliable trajectories compared to the baseline snap minimization method, but it
also achieves trajectory updates in 2 ms on average, while the baseline method
takes several minutes.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要