Chrome Extension
WeChat Mini Program
Use on ChatGLM

On Joint Convergence of Traffic State and Weight Vector in Learning-Based Dynamic Routing with Value Function Approximation

CoRR(2024)

Cited 0|Views28
No score
Abstract
Learning-based approaches are increasingly popular for traffic control problems. However, these approaches are applied typically as black boxes with limited theoretical guarantees and interpretability. In this paper, we consider the theory of dynamic routing over parallel servers, a representative traffic control task, using semi-gradient on-policy control algorithm, a representative reinforcement learning method. We consider a linear value function approximation on an infinite state space; a Lyapunov function is also derived from the approximator. In particular, the structure of the approximator naturally makes possible idling policies, which is an interesting and useful advantage over existing dynamic routing schemes. We show that the convergence of the approximation weights is coupled with the convergence of the traffic state. We show that if the system is stabilizable, then (i) the weight vector converges to a bounded region, and (ii) the traffic state is bounded in the mean. We also empirically show that the proposed algorithm is computationally efficient with an insignificant optimality gap.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined