Leveraging Domain Knowledge for Robust Deep Reinforcement Learning in Networking

Ying Zheng,Haoyu Chen,Qingyang Duan,Lixiang Lin,Yiyang Shao,Wei Wang,Xin Wang,Yuedong Xu

IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2021)（2021）

引用 3|浏览43

暂无评分

摘要

The past few years has witnessed a surge of interest towards deep reinforcement learning (Deep RL) in computer networks. With extraordinary ability of feature extraction, Deep RL has the potential to re-engineer the fundamental resource allocation problems in networking without relying on pre-programmed models or assumptions about dynamic environments. However, such black-box systems suffer from poor robustness, showing high performance variance and poor tail performance. In this work, we propose a unified Teacher-Student learning framework that harnesses rich domain knowledge to improve robustness. The domain-specific algorithms, less performant but more trustable than Deep RL, play the role of teachers providing advice at critical states; the student neural network is steered to maximize the expected reward as usual and mimic the teacher's advice meanwhile. The Teacher-Student method comprises of three modules where the confidence check module locates wrong decisions and risky decisions, the reward shaping module designs a new updating function to incentive the learning of student network, and the prioritized experience replay module to effectively utilize the advised actions. We further implement our Teacher-Student framework in existing video streaming (Pensieve), load balancing (DeepLB) and TCP congestion control (Aurora). Experimental results manifest that the proposed approach reduces the performance standard deviation of DeepLB by 37%; it improves the 90th, 95th and 99th tail performance of Pensieve by 7.6%, 8.8%, 10.7% respectively; and it accelerates the rate of growth of Aurora by 2x at the initial stage, and achieves a more stable performance in dynamic environments.

查看译文

关键词

high performance variance,poor tail performance,unified Teacher-Student learning framework,rich domain knowledge,domain-specific algorithms,Deep RL,student neural network,Teacher-Student method,module locates wrong decisions,student network,prioritized experience replay module,Teacher-Student framework,dynamic environments,robust deep reinforcement learning,computer networks,fundamental resource allocation problems,pre-programmed models,black-box systems

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要