DyETC: Dynamic Electronic Toll Collection for Traffic Congestion Alleviation
AAAI, 2018.
EI
Weibo:
Abstract:
To alleviate traffic congestion in urban areas, electronic toll collection (ETC) systems are deployed all over the world. Despite the merits, tolls are usually pre-determined and fixed from day to day, which fail to consider traffic dynamics and thus have limited regulation effect when traffic conditions are abnormal. In this paper, we pr...More
Code:
Data:
Introduction
- Governments face a worsening problem of traffic congestion in urban areas. To alleviate road congestion, a number of approaches have been proposed, among which ETC has been reported to be effective in many countries and areas (e.g., Singapore (LTA 2017), Norway (AutoPASS 2017)).
- A few dynamic road pricing schemes (Joksimovic et al 2005; Lu, Mahmassani, and Zhou 2008; Zhang, Mahmassani, and Lu 2013) have been proposed in the transportation research community which consider the variations of traffic demands over time.
- The authors propose a novel dynamic tolling scheme which optimizes traffic over the long run, with the following three major contributions
Highlights
- Nowadays, governments face a worsening problem of traffic congestion in urban areas
- A few dynamic road pricing schemes (Joksimovic et al 2005; Lu, Mahmassani, and Zhou 2008; Zhang, Mahmassani, and Lu 2013) have been proposed in the transportation research community which consider the variations of traffic demands over time
- These tolling schemes still assume that traffic demands are fixed and are known a priori, and are static in essence
- To adapt policy gradient methods to bounded action space, we propose a new form of policy function, which is derived from Beta probability distribution function f (x): f (x, λ, ν) xλ−1(1 − x)ξ−1 B(λ, ξ), (13)
- To demonstrate that our DyETC framework can be adapted to other objectives, we evaluate the total travel time of different tolling schemes under the above settings (Figure 6, where the y-axis is the total travel time)
- We evaluate the performance of policy gradient-β for its regulation effect on the morning rush hour traffic of Singapore Central Region
Results
- Evaluation on Synthetic Data
The authors first conduct experiments on synthetic data. For policy gradient methods, the authors first obtain the policy function with offline training, and use the trained policy to evaluate their performance. - The number of episodes for training PG-β is 500, 000, and the learning rates for the value and policy functions are fine-tuned as 10−8 and 10−12, respectively.
- The numbers along the roads denote the travel distance of the adjacent zones, which is obtained from Google Map. Since the OD demand is not revealed by Singapore (a) Total traffic volume (b) Total travel time government, the authors use the population of different zones to estimate it.
- The authors obtain the population of each zone in the Central Region and are able to estimate the number of vehicles in each zone.
- All the other parameters are estimated as those in the above subsection
Conclusion
- The authors propose the DyETC scheme for optimal and dynamic road tolling in urban road network.
- The authors propose a formal model of the DyETC problem, which is formulated as a discrete-time MDP.
- The authors develop a novel solution algorithm, PG-β to solve the formulated large scale MDP.
- The results show that on a real world traffic network in Singapore, PG-β increases the traffic volume by around 8%, and reduces the travel time by around 14.6% during rush hour
Summary
Introduction:
Governments face a worsening problem of traffic congestion in urban areas. To alleviate road congestion, a number of approaches have been proposed, among which ETC has been reported to be effective in many countries and areas (e.g., Singapore (LTA 2017), Norway (AutoPASS 2017)).- A few dynamic road pricing schemes (Joksimovic et al 2005; Lu, Mahmassani, and Zhou 2008; Zhang, Mahmassani, and Lu 2013) have been proposed in the transportation research community which consider the variations of traffic demands over time.
- The authors propose a novel dynamic tolling scheme which optimizes traffic over the long run, with the following three major contributions
Results:
Evaluation on Synthetic Data
The authors first conduct experiments on synthetic data. For policy gradient methods, the authors first obtain the policy function with offline training, and use the trained policy to evaluate their performance.- The number of episodes for training PG-β is 500, 000, and the learning rates for the value and policy functions are fine-tuned as 10−8 and 10−12, respectively.
- The numbers along the roads denote the travel distance of the adjacent zones, which is obtained from Google Map. Since the OD demand is not revealed by Singapore (a) Total traffic volume (b) Total travel time government, the authors use the population of different zones to estimate it.
- The authors obtain the population of each zone in the Central Region and are able to estimate the number of vehicles in each zone.
- All the other parameters are estimated as those in the above subsection
Conclusion:
The authors propose the DyETC scheme for optimal and dynamic road tolling in urban road network.- The authors propose a formal model of the DyETC problem, which is formulated as a discrete-time MDP.
- The authors develop a novel solution algorithm, PG-β to solve the formulated large scale MDP.
- The results show that on a real world traffic network in Singapore, PG-β increases the traffic volume by around 8%, and reduces the travel time by around 14.6% during rush hour
Funding
- This research is supported by the National Research Foundation, Prime Ministers Office, Singapore under its IDM Futures Funding Initiative
- LARG research is supported in part by NSF (IIS-1637736, IIS-1651089, IIS-1724157), Intel, Raytheon, and Lockheed Martin
Reference
- Akamatsu, T. 1996. Cyclic flows, markov process and stochastic traffic assignment. Transportation Research Part B: Methodological 30(5):369–386.
- Anschel, O.; Baram, N.; and Shimkin, N. 2017. Averageddqn: Variance reduction and stabilization for deep reinforcement learning. In ICML, 176–185.
- AutoPASS. 2017. Find a toll station. http://www.autopass.no/en/autopass.
- Baillon, J.-B., and Cominetti, R. 2008. Markovian traffic equilibrium. Mathematical Programming 111(1-2):33–56.
- BPR. 1964. Traffic assignment manual. US Department of Commerce.
- Coulom, R. 200Efficient selectivity and backup operators in monte-carlo tree search. In ICCG, 72–83.
- Gan, J.; An, B.; and Miao, C. 2015. Optimizing efficiency of taxi systems: Scaling-up and handling arbitrary constraints. In AAMAS, 523–531.
- Gan, J.; An, B.; Wang, H.; Sun, X.; and Shi, Z. 2013. Optimal pricing for improving efficiency of taxi systems. In IJCAI, 2811–2818.
- Gu, S.; Lillicrap, T.; Sutskever, I.; and Levine, S. 2016. Continuous deep q-learning with model-based acceleration. In ICML, 2829–2838.
- Hansen, N. 2006. The cma evolution strategy: a comparing review. Towards a new evolutionary computation 75–102.
- Hausknecht, M., and Stone, P. 2016. Deep reinforcement learning in parameterized action space. In ICLR.
- Huang, H.-J., and Li, Z.-C. 2007. A multiclass, multicriteria logit-based traffic equilibrium assignment model under atis. European Journal of Operational Research 176(3):1464– 1477.
- Joksimovic, D.; Bliemer, M. C.; Bovy, P. H.; and VerwaterLukszo, Z. 2005. Dynamic road pricing for optimizing network performance with heterogeneous users. In ICNSC, 407–412.
- Kocsis, L., and Szepesvari, C. 2006. Bandit based montecarlo planning. In ECML, 282–293.
- Lillicrap, T. P.; Hunt, J. J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; and Wierstra, D. 20Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.
- Lo, H. K., and Szeto, W. Y. 2002. A methodology for sustainable traveler information services. Transportation Research Part B: Methodological 36(2):113–130.
- Lo, H. K.; Yip, C.; and Wan, K. 2003. Modeling transfer and non-linear fare structure in multi-modal network. Transportation Research Part B: Methodological 37(2):149–170.
- LTA, S. 2016. Lta to launch autonomous mobility-ondemand trials. https://www.lta.gov.sg/apps/news/page.aspx?c=2&id=73057d63-d07a-4229-87af-f957c7f89a27.
- LTA, S. 2017. Electronic road pricing (ERP). https://www.lta.gov.sg/content/ltaweb/en/roads-andmotoring/managing-traffic-and-congestion/electronicroad-pricing-erp.html.
- Lu, C.-C.; Mahmassani, H. S.; and Zhou, X. 2008. A bicriterion dynamic user equilibrium traffic assignment model and solution algorithm for evaluating dynamic road pricing strategies. Transportation Research Part C: Emerging Technologies 16(4):371–389.
- Maei, H. R.; Szepesvari, C.; Bhatnagar, S.; and Sutton, R. S. 2010. Toward off-policy learning control with function approximation. In ICML, 719–726.
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, J.; Bellemare, M. G.; Graves, A.; Riedmiller, M.; Fidjeland, A. K.; Ostrovski, G.; et al. 2015. Humanlevel control through deep reinforcement learning. Nature 518(7540):529–533.
- Mnih, V.; Badia, A. P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; and Kavukcuoglu, K. 2016. Asynchronous methods for deep reinforcement learning. In ICML, 1928–1937.
- Moore, A. W., and Atkeson, C. G. 1993. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13(1):103–130.
- Nichols, B. D., and Dracopoulos, D. C. 2014. Application of newton’s method to action selection in continuous state-and action-space reinforcement learning. ESANN.
- of Singapore, G. 2017. Department of statistics, Singapore. http://www.singstat.gov.sg/.
- Olszewski, P. 2000. Comparison of the hcm and singapore models of arterial capacity. In TRB Highway Capacity Committee Summer Meeting. Citeseer.
- Peters, J., and Schaal, S. 2008. Reinforcement learning of motor skills with policy gradients. Neural networks 21(4):682–697.
- Precup, D.; Sutton, R. S.; and Dasgupta, S. 2001. Off-policy temporal-difference learning with function approximation. In ICML, 417–424.
- Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; and Moritz, P. 2015. Trust region policy optimization. In ICML, 1889– 1897.
- Sharon, G.; Hanna, J. P.; Rambha, T.; Levin, M. W.; Albert, M.; Boyles, S. D.; and Stone, P. 2017. Real-time adaptive tolling scheme for optimized social welfare in traffic networks. In AAMAS, 828–836.
- Sutton, R. S., and Barto, A. G. 2011. Reinforcement Learning: An Introduction.
- Sutton, R. S.; McAllester, D. A.; Singh, S. P.; Mansour, Y.; et al. 1999. Policy gradient methods for reinforcement learning with function approximation. In NIPS, volume 99, 1057–1063.
- Watkins, C. J. C. H. 1989. Learning from Delayed Rewards. Ph.D. Dissertation, University of Cambridge England.
- Williams, R. J. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8(3-4):229–256.
- Xiong, Y.; Gan, J.; An, B.; Miao, C.; and Bazzan, A. L. 2015. Optimal electric vehicle charging station placement. In IJCAI, 2662–2668.
- Xiong, Y.; Gan, J.; An, B.; Miao, C.; and Soh, Y. C. 2016. Optimal pricing for efficient electric vehicle charging station management. In AAMAS, 749–757.
- Zhang, K.; Mahmassani, H. S.; and Lu, C.-C. 2013. Dynamic pricing, heterogeneous users and perception error: Probit-based bi-criterion dynamic stochastic user equilibrium assignment. Transportation Research Part C: Emerging Technologies 27:189–204.
Full Text
Tags
Comments