Spectral Normalisation For Deep Reinforcement Learning: An Optimisation Perspective

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139(2021)

引用 43|浏览530
暂无评分
摘要
Most of the recent deep reinforcement learning advances take an RL-centric perspective and focus on refinements of the training objective. We diverge from this view and show we can recover the performance of these developments not by changing the objective, but by regularising the value-function estimator. Constraining the Lip-schitz constant of a single layer using spectral normalisation is sufficient to elevate the performance of a Categorical-DQN agent to that of a more elaborated RAINBOW agent on the challenging Atari domain. We conduct ablation studies to disentangle the various effects normalisation has on the learning dynamics and show that is sufficient to modulate the parameter updates to recover most of the performance of spectral normalisation. These findings hint towards the need to also focus on the neural component and its learning dynamics to tackle the peculiarities of Deep Reinforcement Learning.
更多
查看译文
关键词
deep reinforcement learning,spectral normalisation,reinforcement learning,optimisation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要