Minimax Least-Square Policy Iteration for Cost-Aware Defense of Traffic Routing against Unknown Threats
CoRR(2024)
Abstract
Dynamic routing is one of the representative control scheme in
transportation, production lines, and data transmission. In the modern context
of connectivity and autonomy, routing decisions are potentially vulnerable to
malicious attacks. In this paper, we consider the dynamic routing problem over
parallel traffic links in the face of such threats. An attacker is capable of
increasing or destabilizing traffic queues by strategic manipulating the
nominally optimal routing decisions. A defender is capable of securing the
correct routing decision. Attacking and defensive actions induce technological
costs. The defender has no prior information about the attacker's strategy. We
develop an least-square policy iteration algorithm for the defender to compute
a cost-aware and threat-adaptive defensive strategy. The policy evaluation step
computes a weight vector that minimizes the sampled temporal-difference error.
We derive a concrete theoretical upper bound on the evaluation error based on
the theory of value function approximation. The policy improvement step solves
a minimax problem and thus iteratively computes the Markov perfect equilibrium
of the security game. We also discuss the training error of the entire policy
iteration process.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined