AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
This paper has proposed a deep reinforcement learning based solution for order dispatching

A Deep Value-network Based Approach for Multi-Driver Order Dispatching

pp.1780-1790 (2021)

Cited by: 28|Views235
EI

Abstract

Recent works on ride-sharing order dispatching have highlighted the importance of taking into account both the spatial and temporal dynamics in the dispatching process for improving the transportation system efficiency. At the same time, deep reinforcement learning has advanced to the point where it achieves superhuman performance in a nu...More

Code:

Data:

0
Introduction
  • The advent of large scale online ride hailing services such as Uber and DiDi Chuxing have substantially transformed the transportation landscape, offering huge opportunities for improving the current transportation efficiency and leading to a surge of interest in numerous research fields such as driving route planning, demand prediction, fleet management and order dispatching.
  • An optimal decision-making policy requires taking into account both the spatial extent and the temporal dynamics of the dispatching process since such decisions can have long-term effects on the distribution of available drivers across the city.
  • Previous work [8, 22] ignores the global optimality in both the spatial and temporal dimensions, e.g., either assign the nearest driver to a passenger in a local greedy manner or match them on a first-comefirst-serve basis.
  • While only optimizing over the current time step, [21] demonstrates that by accounting for the spatial optimality alone a higher success rate of global order matches can be achieved
Highlights
  • In recent years, the advent of large scale online ride hailing services such as Uber and DiDi Chuxing have substantially transformed the transportation landscape, offering huge opportunities for improving the current transportation efficiency and leading to a surge of interest in numerous research fields such as driving route planning, demand prediction, fleet management and order dispatching
  • We introduce Cerebellar Value Networks (CVNet), which is based on a type of memory-based neural networks known as CMAC (Cerebellar Model Arithmetic Computer) [1]
  • This paper has proposed a deep reinforcement learning based solution for order dispatching
  • A novel Semi-Markov Decision Process (SMDP) formulation has been proposed for the order dispatching problem to account for the temporally extended dispatching actions
  • A new network structure, Cerebellar Value Networks (CVNet), and a novel Lipschitz regularization scheme based on that structure have been proposed to ensure both the robustness and the stability of the value iteration during policy evaluation
  • We show that using transfer learning can further improve on the previous results and facilitate the scaling of CVNet across cities
Methods
  • 6.1 Characteristics of CVNet

    The authors design various experiments to illustrate the robustness and spatiotemporal effect of CVNet.
  • A small γ induces a short-sighted strategy, e.g., the earnings over a one-hour period from while a large one encourages long-term behaviors
  • This has an effect on the shape of the temporal patterns, as can be seen in the figure that for a small γ = 0.8 the value curve moves upwards temporarily during the morning rush hour period while the curves with large γ approach zero in a more monotonic manner
Results
  • Results on Simulations With Real Data

    The authors first use simulations with real data collected from DiDi’s platform to validate CVNet, and more importantly as a level ground for comparing various order dispatching policies.

    The authors give a brief descriptions below of the policies the authors compare with in the experiments.
  • The authors first use simulations with real data collected from DiDi’s platform to validate CVNet, and more importantly as a level ground for comparing various order dispatching policies.
  • In the experiments CVNet Basic1 is compared with the online production dispatching policy on three cities across China2.
  • An increase in finish rate indicates that there are fewer trip cancellations after the orders are answered
  • Together they show that CVNet improves both the driver income and user experiences for the platform.
  • It is possible to attain a greater improvement on TDI while maintaining a short pickup distance
Conclusion
  • The Bellman equations (1) can be used as update rules in dynamic-programming-like planning methods for finding the value function.
  • This paper has proposed a deep reinforcement learning based solution for order dispatching.
  • The method has been shown to achieve (a) City B (b) City C (c) City D significant improvement on both total driver income and user experience related metrics in large scale online A/B tests through DiDi’s ride-dispatching platform.
  • Results on extensive simulations and online A/B testing have shown that CVNet outperforms all the other dispatching policies.
  • The authors show that using transfer learning can further improve on the previous results and facilitate the scaling of CVNet across cities
Summary
  • Introduction:

    The advent of large scale online ride hailing services such as Uber and DiDi Chuxing have substantially transformed the transportation landscape, offering huge opportunities for improving the current transportation efficiency and leading to a surge of interest in numerous research fields such as driving route planning, demand prediction, fleet management and order dispatching.
  • An optimal decision-making policy requires taking into account both the spatial extent and the temporal dynamics of the dispatching process since such decisions can have long-term effects on the distribution of available drivers across the city.
  • Previous work [8, 22] ignores the global optimality in both the spatial and temporal dimensions, e.g., either assign the nearest driver to a passenger in a local greedy manner or match them on a first-comefirst-serve basis.
  • While only optimizing over the current time step, [21] demonstrates that by accounting for the spatial optimality alone a higher success rate of global order matches can be achieved
  • Objectives:

    Given the above SMDP and the history trajectories H , the goal is to estimate the value of the underlying policy.
  • Methods:

    6.1 Characteristics of CVNet

    The authors design various experiments to illustrate the robustness and spatiotemporal effect of CVNet.
  • A small γ induces a short-sighted strategy, e.g., the earnings over a one-hour period from while a large one encourages long-term behaviors
  • This has an effect on the shape of the temporal patterns, as can be seen in the figure that for a small γ = 0.8 the value curve moves upwards temporarily during the morning rush hour period while the curves with large γ approach zero in a more monotonic manner
  • Results:

    Results on Simulations With Real Data

    The authors first use simulations with real data collected from DiDi’s platform to validate CVNet, and more importantly as a level ground for comparing various order dispatching policies.

    The authors give a brief descriptions below of the policies the authors compare with in the experiments.
  • The authors first use simulations with real data collected from DiDi’s platform to validate CVNet, and more importantly as a level ground for comparing various order dispatching policies.
  • In the experiments CVNet Basic1 is compared with the online production dispatching policy on three cities across China2.
  • An increase in finish rate indicates that there are fewer trip cancellations after the orders are answered
  • Together they show that CVNet improves both the driver income and user experiences for the platform.
  • It is possible to attain a greater improvement on TDI while maintaining a short pickup distance
  • Conclusion:

    The Bellman equations (1) can be used as update rules in dynamic-programming-like planning methods for finding the value function.
  • This paper has proposed a deep reinforcement learning based solution for order dispatching.
  • The method has been shown to achieve (a) City B (b) City C (c) City D significant improvement on both total driver income and user experience related metrics in large scale online A/B tests through DiDi’s ride-dispatching platform.
  • Results on extensive simulations and online A/B testing have shown that CVNet outperforms all the other dispatching policies.
  • The authors show that using transfer learning can further improve on the previous results and facilitate the scaling of CVNet across cities
Tables
  • Table1: Stats of the training data consisting of one-month of driver trajectories and contextual features collected from three Chinese cities. Features are stored in a <key, value> format with key being the name, time and location
  • Table2: Results from online AB test
Download tables as Excel
Reference
  • J. S. Albus. A theory of cerebellar function. Mathematical Biosciences, 10(1-2):25– 61, 1971.
    Google ScholarLocate open access versionFindings
  • J. A. Boyan and A. W. Moore. Generalization in reinforcement learning: Safely approximating the value function. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 369–376. MIT Press, 1995.
    Google ScholarLocate open access versionFindings
  • S. J. Bradtke and M. O. Duff. Reinforcement learning methods for continuous-time Markov decision problems. Advances in Neural Information Processing Systems (NIPS), 1995.
    Google ScholarLocate open access versionFindings
  • M. Cisse, P. Bojanowski, E. Grave, Y. Dauphin, and N. Usunier. Parseval networks: Improving robustness to adversarial examples. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pages 854–863, International Convention Centre, Sydney, Australia, 06–11 Aug 2017.
    Google ScholarLocate open access versionFindings
  • G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop, 2015.
    Google ScholarFindings
  • G. E. Hinton, J. L. McClelland, and D. E. Rumelhart. Parallel distributed processing: Explorations in the microstructure of cognition, vol. 1. chapter Distributed Representations, pages 77–109. MIT Press, Cambridge, MA, USA, 1986.
    Google ScholarFindings
  • G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. science, 313(5786):504–507, 2006.
    Google ScholarLocate open access versionFindings
  • Z. Liao. Real-time taxi dispatching using global positioning systems. Communications of the ACM, 46(5):81–83, 2003.
    Google ScholarLocate open access versionFindings
  • V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
    Google ScholarLocate open access versionFindings
  • L. Moreira-Matias, J. Gama, M.-M. J. Ferreira, Michel, and L. Damas. On predicting the taxi-passenger demand: A real-time approach. In Portuguese Conference on Artificial Intelligence, pages 54–65.
    Google ScholarLocate open access versionFindings
  • A. M. Oberman and J. Calder. Lipschitz regularized Deep Neural Networks converge and generalize. arxiv preprint arXiv:1808.09540, 2018.
    Findings
  • A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell. Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016.
    Findings
  • R. S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in Neural Information Processing Systems (NIPS), 1996.
    Google ScholarLocate open access versionFindings
  • R. S. Sutton, D. Precup, and S. Singh. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1-2):181–211, Aug. 1999.
    Google ScholarLocate open access versionFindings
  • C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations, 2014.
    Google ScholarLocate open access versionFindings
  • H. Van Hasselt, A. Guez, and D. Silver. Deep reinforcement learning with double q-learning. In AAAI, pages 2094–2100, 2016.
    Google ScholarLocate open access versionFindings
  • Z. Wang, Z. Qin, X. Tang, J. Ye, and H. Zhu. Deep reinforcement learning with knowledge transfer for online rides order dispatching. In IEEE International Conference on Data Mining. IEEE, 2018.
    Google ScholarLocate open access versionFindings
  • T. Xin-min, W. Yu-ting, and H. Song-chen. Aircraft taxi route planning for asmgcs based on discrete event dynamic system modeling. In Computer Modeling and Simulation, 2010. ICCMS’10. Second International Conference on, volume 1, pages 224–228. IEEE, 2010.
    Google ScholarLocate open access versionFindings
  • Z. Xu, Z. Li, Q. Guan, D. Zhang, Q. Li, J. Nan, C. Liu, W. Bian, and J. Ye. Large-scale order dispatch in on-demand ride-hailing platforms: A learning and planning approach. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 905–913. ACM, 2018.
    Google ScholarLocate open access versionFindings
  • R. Yee. Abstraction in control learning. Technical report, Technical Report COINS 92-16, Univ. of Massachusetts, 1992.
    Google ScholarFindings
  • L. Zhang, T. Hu, Y. Min, G. Wu, J. Zhang, P. Feng, P. Gong, and J. Ye. A taxi order dispatch model based on combinatorial optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 2151–2159. ACM, 2017.
    Google ScholarLocate open access versionFindings
  • R. Zhang and M. Pavone. Control of robotic mobility-on-demand systems: a queueing-theoretical perspective. The International Journal of Robotics Research, 35(1-3):186–203, 2016.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科