Multi-USV Task Planning Method Based On Improved Deep Reinforcement Learning

Jing Zhang,Jia Ren,Yani Cui, Delong Fu, Jingyu Cong

IEEE Internet of Things Journal(2024)

引用 0|浏览2
暂无评分
摘要
A safe and reliable task planning method is a prerequisite for the collaborative execution of ocean observation data collection tasks by multiple unmanned surface vessels (multi-USVs). Deep Reinforcement Learning (DRL) combines the powerful nonlinear function-fitting capabilities of deep neural networks with the decision-making and control abilities of reinforcement learning, providing a novel approach to solving the multi-USV task planning problem. However, when applied to the field of multi-USV task planning, it faces challenges such as a vast exploration space, extended training times, and unstable training process. To this end, this paper proposes a multi-USV task planning method based on improved deep reinforcement learning. The proposed method draws on the idea of a value decomposition network, breaking down the multi-USV task planning problem into two subproblems: task allocation and autonomous collision avoidance. Different state spaces, action spaces, and reward functions are designed for the various subproblems. Based on this, a deep neural network is used to map the state space of each subproblem to the action space of each USV, and the generated strategy of the deep neural network is assessed based on the corresponding reward function. This successfully integrates task allocation and path planning into a comprehensive task planning framework. Deep neural networks consist of the Actor networks and the Critic networks. During the training phase of the Critic network, different methods are used to train different Critic networks to improve the convergence speed of the algorithm. An improved temporal difference error method is specifically applied to train the Critic network for evaluating autonomous collision avoidance strategies, resulting in improving the autonomous collision avoidance ability of USVs. At the same time, to improve the efficiency of task allocation, hierarchical mechanisms, and regional division mechanisms are introduced to construct sub-system task planning models, which further decompose the task planning problem. A combination of successor features and an improved temporal difference error method is specifically applied to train another Critic network for evaluating the sub-systems task allocation schemes and collaborative motion trajectories, aiming to enhance the allocation efficiency of the sub-systems. Furthermore, transfer learning is employed to merge the sub-system task planning, using it as a constraint to direct the exploration and assessment of both the cluster task allocation schemes and the cluster collaborative motion trajectories. This enables rapid and accurate learning for task allocation within the multi-USV cluster. During the training phase of the Actor network, the introduction of the experience replay method and target network technique is employed to enhance the proximal policy optimization algorithm. This facilitates distributed joint training of the Actor network, thereby improving the accuracy of the algorithm. Simulation results validate the effectiveness and superiority of this method.
更多
查看译文
关键词
Multiple unmanned surface vessels (multi-USVs),task planning,proximal policy optimization,value decomposition network,successor features
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要