DyETC: Dynamic Electronic Toll Collection for Traffic Congestion Alleviation

AAAI, 2018.

Cited by: 14|Bibtex|Views50
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com
Weibo:
To adapt policy gradient methods to bounded action space, we propose a new form of policy function, which is derived from Beta probability distribution function f: f xλ−1(1 − x)ξ−1 B(λ, ξ),

Abstract:

To alleviate traffic congestion in urban areas, electronic toll collection (ETC) systems are deployed all over the world. Despite the merits, tolls are usually pre-determined and fixed from day to day, which fail to consider traffic dynamics and thus have limited regulation effect when traffic conditions are abnormal. In this paper, we pr...More

Code:

Data:

0
Introduction
  • Governments face a worsening problem of traffic congestion in urban areas. To alleviate road congestion, a number of approaches have been proposed, among which ETC has been reported to be effective in many countries and areas (e.g., Singapore (LTA 2017), Norway (AutoPASS 2017)).
  • A few dynamic road pricing schemes (Joksimovic et al 2005; Lu, Mahmassani, and Zhou 2008; Zhang, Mahmassani, and Lu 2013) have been proposed in the transportation research community which consider the variations of traffic demands over time.
  • The authors propose a novel dynamic tolling scheme which optimizes traffic over the long run, with the following three major contributions
Highlights
  • Nowadays, governments face a worsening problem of traffic congestion in urban areas
  • A few dynamic road pricing schemes (Joksimovic et al 2005; Lu, Mahmassani, and Zhou 2008; Zhang, Mahmassani, and Lu 2013) have been proposed in the transportation research community which consider the variations of traffic demands over time
  • These tolling schemes still assume that traffic demands are fixed and are known a priori, and are static in essence
  • To adapt policy gradient methods to bounded action space, we propose a new form of policy function, which is derived from Beta probability distribution function f (x): f (x, λ, ν) xλ−1(1 − x)ξ−1 B(λ, ξ), (13)
  • To demonstrate that our DyETC framework can be adapted to other objectives, we evaluate the total travel time of different tolling schemes under the above settings (Figure 6, where the y-axis is the total travel time)
  • We evaluate the performance of policy gradient-β for its regulation effect on the morning rush hour traffic of Singapore Central Region
Results
  • Evaluation on Synthetic Data

    The authors first conduct experiments on synthetic data. For policy gradient methods, the authors first obtain the policy function with offline training, and use the trained policy to evaluate their performance.
  • The number of episodes for training PG-β is 500, 000, and the learning rates for the value and policy functions are fine-tuned as 10−8 and 10−12, respectively.
  • The numbers along the roads denote the travel distance of the adjacent zones, which is obtained from Google Map. Since the OD demand is not revealed by Singapore (a) Total traffic volume (b) Total travel time government, the authors use the population of different zones to estimate it.
  • The authors obtain the population of each zone in the Central Region and are able to estimate the number of vehicles in each zone.
  • All the other parameters are estimated as those in the above subsection
Conclusion
  • The authors propose the DyETC scheme for optimal and dynamic road tolling in urban road network.
  • The authors propose a formal model of the DyETC problem, which is formulated as a discrete-time MDP.
  • The authors develop a novel solution algorithm, PG-β to solve the formulated large scale MDP.
  • The results show that on a real world traffic network in Singapore, PG-β increases the traffic volume by around 8%, and reduces the travel time by around 14.6% during rush hour
Summary
  • Introduction:

    Governments face a worsening problem of traffic congestion in urban areas. To alleviate road congestion, a number of approaches have been proposed, among which ETC has been reported to be effective in many countries and areas (e.g., Singapore (LTA 2017), Norway (AutoPASS 2017)).
  • A few dynamic road pricing schemes (Joksimovic et al 2005; Lu, Mahmassani, and Zhou 2008; Zhang, Mahmassani, and Lu 2013) have been proposed in the transportation research community which consider the variations of traffic demands over time.
  • The authors propose a novel dynamic tolling scheme which optimizes traffic over the long run, with the following three major contributions
  • Results:

    Evaluation on Synthetic Data

    The authors first conduct experiments on synthetic data. For policy gradient methods, the authors first obtain the policy function with offline training, and use the trained policy to evaluate their performance.
  • The number of episodes for training PG-β is 500, 000, and the learning rates for the value and policy functions are fine-tuned as 10−8 and 10−12, respectively.
  • The numbers along the roads denote the travel distance of the adjacent zones, which is obtained from Google Map. Since the OD demand is not revealed by Singapore (a) Total traffic volume (b) Total travel time government, the authors use the population of different zones to estimate it.
  • The authors obtain the population of each zone in the Central Region and are able to estimate the number of vehicles in each zone.
  • All the other parameters are estimated as those in the above subsection
  • Conclusion:

    The authors propose the DyETC scheme for optimal and dynamic road tolling in urban road network.
  • The authors propose a formal model of the DyETC problem, which is formulated as a discrete-time MDP.
  • The authors develop a novel solution algorithm, PG-β to solve the formulated large scale MDP.
  • The results show that on a real world traffic network in Singapore, PG-β increases the traffic volume by around 8%, and reduces the travel time by around 14.6% during rush hour
Funding
  • This research is supported by the National Research Foundation, Prime Ministers Office, Singapore under its IDM Futures Funding Initiative
  • LARG research is supported in part by NSF (IIS-1637736, IIS-1651089, IIS-1724157), Intel, Raytheon, and Lockheed Martin
Reference
  • Akamatsu, T. 1996. Cyclic flows, markov process and stochastic traffic assignment. Transportation Research Part B: Methodological 30(5):369–386.
    Google ScholarLocate open access versionFindings
  • Anschel, O.; Baram, N.; and Shimkin, N. 2017. Averageddqn: Variance reduction and stabilization for deep reinforcement learning. In ICML, 176–185.
    Google ScholarLocate open access versionFindings
  • AutoPASS. 2017. Find a toll station. http://www.autopass.no/en/autopass.
    Findings
  • Baillon, J.-B., and Cominetti, R. 2008. Markovian traffic equilibrium. Mathematical Programming 111(1-2):33–56.
    Google ScholarLocate open access versionFindings
  • BPR. 1964. Traffic assignment manual. US Department of Commerce.
    Google ScholarFindings
  • Coulom, R. 200Efficient selectivity and backup operators in monte-carlo tree search. In ICCG, 72–83.
    Google ScholarLocate open access versionFindings
  • Gan, J.; An, B.; and Miao, C. 2015. Optimizing efficiency of taxi systems: Scaling-up and handling arbitrary constraints. In AAMAS, 523–531.
    Google ScholarLocate open access versionFindings
  • Gan, J.; An, B.; Wang, H.; Sun, X.; and Shi, Z. 2013. Optimal pricing for improving efficiency of taxi systems. In IJCAI, 2811–2818.
    Google ScholarFindings
  • Gu, S.; Lillicrap, T.; Sutskever, I.; and Levine, S. 2016. Continuous deep q-learning with model-based acceleration. In ICML, 2829–2838.
    Google ScholarLocate open access versionFindings
  • Hansen, N. 2006. The cma evolution strategy: a comparing review. Towards a new evolutionary computation 75–102.
    Google ScholarFindings
  • Hausknecht, M., and Stone, P. 2016. Deep reinforcement learning in parameterized action space. In ICLR.
    Google ScholarFindings
  • Huang, H.-J., and Li, Z.-C. 2007. A multiclass, multicriteria logit-based traffic equilibrium assignment model under atis. European Journal of Operational Research 176(3):1464– 1477.
    Google ScholarLocate open access versionFindings
  • Joksimovic, D.; Bliemer, M. C.; Bovy, P. H.; and VerwaterLukszo, Z. 2005. Dynamic road pricing for optimizing network performance with heterogeneous users. In ICNSC, 407–412.
    Google ScholarLocate open access versionFindings
  • Kocsis, L., and Szepesvari, C. 2006. Bandit based montecarlo planning. In ECML, 282–293.
    Google ScholarLocate open access versionFindings
  • Lillicrap, T. P.; Hunt, J. J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; and Wierstra, D. 20Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.
    Findings
  • Lo, H. K., and Szeto, W. Y. 2002. A methodology for sustainable traveler information services. Transportation Research Part B: Methodological 36(2):113–130.
    Google ScholarLocate open access versionFindings
  • Lo, H. K.; Yip, C.; and Wan, K. 2003. Modeling transfer and non-linear fare structure in multi-modal network. Transportation Research Part B: Methodological 37(2):149–170.
    Google ScholarLocate open access versionFindings
  • LTA, S. 2016. Lta to launch autonomous mobility-ondemand trials. https://www.lta.gov.sg/apps/news/page.aspx?c=2&id=73057d63-d07a-4229-87af-f957c7f89a27.
    Findings
  • LTA, S. 2017. Electronic road pricing (ERP). https://www.lta.gov.sg/content/ltaweb/en/roads-andmotoring/managing-traffic-and-congestion/electronicroad-pricing-erp.html.
    Findings
  • Lu, C.-C.; Mahmassani, H. S.; and Zhou, X. 2008. A bicriterion dynamic user equilibrium traffic assignment model and solution algorithm for evaluating dynamic road pricing strategies. Transportation Research Part C: Emerging Technologies 16(4):371–389.
    Google ScholarLocate open access versionFindings
  • Maei, H. R.; Szepesvari, C.; Bhatnagar, S.; and Sutton, R. S. 2010. Toward off-policy learning control with function approximation. In ICML, 719–726.
    Google ScholarLocate open access versionFindings
  • Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, J.; Bellemare, M. G.; Graves, A.; Riedmiller, M.; Fidjeland, A. K.; Ostrovski, G.; et al. 2015. Humanlevel control through deep reinforcement learning. Nature 518(7540):529–533.
    Google ScholarLocate open access versionFindings
  • Mnih, V.; Badia, A. P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; and Kavukcuoglu, K. 2016. Asynchronous methods for deep reinforcement learning. In ICML, 1928–1937.
    Google ScholarFindings
  • Moore, A. W., and Atkeson, C. G. 1993. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13(1):103–130.
    Google ScholarLocate open access versionFindings
  • Nichols, B. D., and Dracopoulos, D. C. 2014. Application of newton’s method to action selection in continuous state-and action-space reinforcement learning. ESANN.
    Google ScholarLocate open access versionFindings
  • of Singapore, G. 2017. Department of statistics, Singapore. http://www.singstat.gov.sg/.
    Findings
  • Olszewski, P. 2000. Comparison of the hcm and singapore models of arterial capacity. In TRB Highway Capacity Committee Summer Meeting. Citeseer.
    Google ScholarFindings
  • Peters, J., and Schaal, S. 2008. Reinforcement learning of motor skills with policy gradients. Neural networks 21(4):682–697.
    Google ScholarLocate open access versionFindings
  • Precup, D.; Sutton, R. S.; and Dasgupta, S. 2001. Off-policy temporal-difference learning with function approximation. In ICML, 417–424.
    Google ScholarLocate open access versionFindings
  • Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; and Moritz, P. 2015. Trust region policy optimization. In ICML, 1889– 1897.
    Google ScholarFindings
  • Sharon, G.; Hanna, J. P.; Rambha, T.; Levin, M. W.; Albert, M.; Boyles, S. D.; and Stone, P. 2017. Real-time adaptive tolling scheme for optimized social welfare in traffic networks. In AAMAS, 828–836.
    Google ScholarLocate open access versionFindings
  • Sutton, R. S., and Barto, A. G. 2011. Reinforcement Learning: An Introduction.
    Google ScholarFindings
  • Sutton, R. S.; McAllester, D. A.; Singh, S. P.; Mansour, Y.; et al. 1999. Policy gradient methods for reinforcement learning with function approximation. In NIPS, volume 99, 1057–1063.
    Google ScholarLocate open access versionFindings
  • Watkins, C. J. C. H. 1989. Learning from Delayed Rewards. Ph.D. Dissertation, University of Cambridge England.
    Google ScholarFindings
  • Williams, R. J. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8(3-4):229–256.
    Google ScholarLocate open access versionFindings
  • Xiong, Y.; Gan, J.; An, B.; Miao, C.; and Bazzan, A. L. 2015. Optimal electric vehicle charging station placement. In IJCAI, 2662–2668.
    Google ScholarFindings
  • Xiong, Y.; Gan, J.; An, B.; Miao, C.; and Soh, Y. C. 2016. Optimal pricing for efficient electric vehicle charging station management. In AAMAS, 749–757.
    Google ScholarLocate open access versionFindings
  • Zhang, K.; Mahmassani, H. S.; and Lu, C.-C. 2013. Dynamic pricing, heterogeneous users and perception error: Probit-based bi-criterion dynamic stochastic user equilibrium assignment. Transportation Research Part C: Emerging Technologies 27:189–204.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments