An Efficient Hardware Implementation of the Double Q-Learning Algorithm

2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME)(2023)

引用 0|浏览2
暂无评分
摘要
Double Q-Learning (DQL) is an off-policy reinforcement learning algorithm providing better performance in a stochastic environment compared to the Q-Learning technique. This paper proposes an efficient FPGA based hardware architecture of the Double Q-Learning algorithm. The main originality of the proposed design is that allows to parallelize the actions Z. The proposed hardware architecture also performs an efficient technique to manage the data exchange based on two designed Q-Matrix LUTRAM asynchronous reading and synchronous writing memories. The implementation of the proposed design on a Xilinx Zynq Ultra scale+ MPSoC ZCU104 with 8 bits Q-Matrix data-width achieves a maximum frequency of 194 MHz while a dynamic power consumption of 45 mW. Compared to similar one previous work based on the Q-Learning architecture, the proposed implementation provides good trade-off in terms of algorithm performance versus required logic resources. Indeed, for environments with 8 states until 256 states, the proposed DQL architecture only requires (227 LUT, 75 FF, 66 LUTRAM) and (552 LUT, 75 FF 322 LUTRAM), respectively.
更多
查看译文
关键词
FPGA,Double Q-Learning,reinforcement learning algorithm,Hardware implementation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要