On Joint Convergence of Traffic State and Weight Vector in Learning-Based Dynamic Routing with Value Function Approximation
Learning-based approaches are increasingly popular for traffic control
problems. However, these approaches are applied typically as black boxes with
limited theoretical guarantees and interpretability. In this paper, we consider
the theory of dynamic routing over parallel servers, a representative traffic
control task, using semi-gradient on-policy control algorithm, a representative
reinforcement learning method. We consider a linear value function
approximation on an infinite state space; a Lyapunov function is also derived
from the approximator. In particular, the structure of the approximator
naturally makes possible idling policies, which is an interesting and useful
advantage over existing dynamic routing schemes. We show that the convergence
of the approximation weights is coupled with the convergence of the traffic
state. We show that if the system is stabilizable, then (i) the weight vector
converges to a bounded region, and (ii) the traffic state is bounded in the
mean. We also empirically show that the proposed algorithm is computationally
efficient with an insignificant optimality gap.
MoreTranslated text
AI Read Science
Must-Reading Tree
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined