Self-generation of reward by logarithmic transformation of multiple sensor evaluations

Yuya Ono,Kentarou Kurashige,Afiqe Anuar Bin Muhammad Nor Hakim,Yuma Sakamoto

Artificial Life and Robotics（2023）

引用 0|浏览0

暂无评分

摘要

Although the design of the reward function in reinforcement learning is important, it is difficult to design a system that can adapt to a variety of environments and tasks. Therefore, we propose a method to autonomously generate rewards from sensor values, enabling task- and environment-independent reward design. Under this approach, environmental hazards are recognized by evaluating sensor values. The evaluation used for learning is obtained by integrating all the sensor evaluations that indicate danger. Although prior studies have employed weighted averages to integrate sensor evaluations, this approach does not reflect the increased danger arising from a higher amount of more sensor evaluations indicating danger. Instead, we propose the integration of sensor evaluation using logarithmic transformation. Through a path learning experiment, the proposed method was evaluated by comparing its rewards to those gained from manual reward setting and prior approaches.

查看译文

关键词

Self-generation of reward, Reinforcement learning, Danger recognition

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要