AKF-SR: Adaptive Kalman filtering-based successor representation

Parvin Malekzadeh,Mohammad Salimibeni,Ming Hou,Arash Mohammadi,Konstantinos N. Plataniotis

Neurocomputing（2022）

引用 4|浏览15

暂无评分

摘要

To understand animals’ behavior in finding relations between similar tasks and adapting themselves to changes in the tasks, it is necessary to know how the brain generalizes the learned knowledge from a previous task to unseen tasks. Recent studies in neuroscience suggest that Successor Representation (SR)-based models provide adaptation to changes in the goal locations or reward function faster than model-free algorithms, together with lower computational cost compared to that of model-based algorithms. However, it is not known how such representation might help animals to manage uncertainty in their decision making. Existing methods for the SR learning based on standard temporal difference methods (e.g., deep neural network-based algorithms) do not capture uncertainty about the estimated SR. In order to address this issue, the paper presents a Kalman filter-based SR framework, referred to as Adaptive Kalman Filtering-based Successor Representation (AKF–SR). First, Kalman temporal difference approach, which is a combination of Kalman filter and the temporal difference method, is used within the AKF–SR framework to cast the SR learning procedure into a filtering problem to benefit from uncertainty estimation of the SR, and also decreases in memory requirement and sensitivity to model’s parameters in comparison to deep neural network-based algorithms. An adaptive Kalman filtering approach is then applied within the proposed AKF–SR framework in order to tune the measurement noise covariance and measurement mapping function of Kalman filter as the most important parameters affecting the filter’s performance. Moreover, an active learning method that exploits the estimated uncertainty of the SR to form the behaviour policy leading to more visits to less certain values is proposed to improve the overall performance of an agent in terms of received rewards while interacting with its environment. Experimental results based on three reinforcement learning environments illustrate the efficacy of the proposed AKF–SR framework over state-of-the-art frameworks in terms of cumulative reward, reliability, time and computational cost, and speed of convergence to changes in the reward function.

查看译文

关键词

Reinforcement Learning,Successor representation,Kalman filter,Kalman temporal difference,Multiple model adaptive estimation,Radial basis function

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要