AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
As the solution to combating such diversity, we have evaluated two existing transfer learning methods: finetuning and progressive network, and propose one online-feature based adaption method – correlated-feature progressive transfer

Deep Reinforcement Learning with Knowledge Transfer for Online Rides Order Dispatching.

ICDM, pp.617-626, (2018)

被引用34|浏览187
EI
下载 PDF 全文
引用
微博一下

摘要

Ride dispatching is a central operation task on a ride-sharing platform to continuously match drivers to trip-requesting passengers. In this work, we model the ride dispatching problem as a Markov Decision Process and propose learning solutions based on deep Q-networks with action search to optimize the dispatching policy for drivers on r...更多

代码

数据

0
简介
  • As GPS-enabled applications are widely used in the ridesharing market nowadays, massive amount of trip data could be collected, offering huge opportunities for providing more intelligent service and leading to a surge of interest in research fields such as demand prediction [1], [2], driving route planning [3], [4], and order dispatching [5], [6].
  • The work in [8] aims to improve the success rate of the global order matches, by involving the combinatorial optimization problem [9].
  • A higher success rate would deliver better user experience, but it should not be the only metric to be optimized
  • Another previous research [10] proposed a revenue optimization method for cruising taxi drivers.
  • Since passenger trips change drivers’ locations, there is still limitation on the global optimization if the training data only contains idle cruise logs
重点内容
  • As GPS-enabled applications are widely used in the ridesharing market nowadays, massive amount of trip data could be collected, offering huge opportunities for providing more intelligent service and leading to a surge of interest in research fields such as demand prediction [1], [2], driving route planning [3], [4], and order dispatching [5], [6]
  • We propose a novel transfer learning method for order dispatching to leverage knowledge transfer from a source city, demonstrating that reusing prior models could improve the training performance in the target cities
  • We demonstrate the learning and optimization capabilities of our deep reinforcement learning approach and the advantages of the proposed transfer learning method in learning speed through an extensive set of experiments in Section V using real trip data from the DiDi platform
  • This paper has proposed an adapted deep Q-network-based optimization method for order revenue on the DiDi ride-dispatching platform
  • As the solution to combating such diversity, we have evaluated two existing transfer learning methods: finetuning and progressive network, and propose one online-feature based adaption method – correlated-feature progressive transfer
方法
  • The authors will discuss the experiment settings and results. The authors use historical ExpressCar trip data obtained from the DiDi dispatching platform as the training data.
  • The authors normalized all state vectors with their population mean and standard deviation.
  • The authors found that this pre-processing is necessary for a stable training.
  • The authors set five testing points during training: 0% , 25%, 50%, 75%, and 100%.
  • At each checkpoint of training, the authors take a snapshot of the current network and evaluate it on the testing dataset for 5 trials of 100 episodes with random initial states.
结论
  • This paper has proposed an adapted DQN-based optimization method for order revenue on the DiDi ride-dispatching platform.
  • As the solution to combating such diversity, the authors have evaluated two existing transfer learning methods: finetuning and progressive network, and propose one online-feature based adaption method – CFPT.
  • By focusing on the correlated features across different domain, CFPT can achieve the most effective transfer and outperform the other methods
总结
  • Introduction:

    As GPS-enabled applications are widely used in the ridesharing market nowadays, massive amount of trip data could be collected, offering huge opportunities for providing more intelligent service and leading to a surge of interest in research fields such as demand prediction [1], [2], driving route planning [3], [4], and order dispatching [5], [6].
  • The work in [8] aims to improve the success rate of the global order matches, by involving the combinatorial optimization problem [9].
  • A higher success rate would deliver better user experience, but it should not be the only metric to be optimized
  • Another previous research [10] proposed a revenue optimization method for cruising taxi drivers.
  • Since passenger trips change drivers’ locations, there is still limitation on the global optimization if the training data only contains idle cruise logs
  • Methods:

    The authors will discuss the experiment settings and results. The authors use historical ExpressCar trip data obtained from the DiDi dispatching platform as the training data.
  • The authors normalized all state vectors with their population mean and standard deviation.
  • The authors found that this pre-processing is necessary for a stable training.
  • The authors set five testing points during training: 0% , 25%, 50%, 75%, and 100%.
  • At each checkpoint of training, the authors take a snapshot of the current network and evaluate it on the testing dataset for 5 trials of 100 episodes with random initial states.
  • Conclusion:

    This paper has proposed an adapted DQN-based optimization method for order revenue on the DiDi ride-dispatching platform.
  • As the solution to combating such diversity, the authors have evaluated two existing transfer learning methods: finetuning and progressive network, and propose one online-feature based adaption method – CFPT.
  • By focusing on the correlated features across different domain, CFPT can achieve the most effective transfer and outperform the other methods
表格
  • Table1: Basic characteristics of the four Chinese cities in the experiments
Download tables as Excel
引用论文
  • L. Moreira-Matias, J. Gama, M.-M. J. Ferreira, Michel, and L. Damas, “On predicting the taxi-passenger demand: A real-time approach,” in Portuguese Conference on Artificial Intelligence. Springer, 2013, pp. 54–65.
    Google ScholarLocate open access versionFindings
  • L. Moreira-Matias, J. Gama, M. Ferreira, J. Mendes-Moreira, and L. Damas, “Predicting taxi–passenger demand using streaming data,” IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 3, pp. 1393–1402, 2013.
    Google ScholarLocate open access versionFindings
  • Q. Li, Z. Zeng, B. Yang, and T. Zhang, “Hierarchical route planning based on taxi gps-trajectories,” in Geoinformatics, 2009 17th International Conference on. IEEE, 2009, pp. 1–5.
    Google ScholarFindings
  • T. Xin-min, W. Yu-ting, and H. Song-chen, “Aircraft taxi route planning for a-smgcs based on discrete event dynamic system modeling,” in Computer Modeling and Simulation, 2010. ICCMS’10. Second International Conference on, vol.
    Google ScholarLocate open access versionFindings
  • 1. IEEE, 2010, pp. 224–228.
    Google ScholarFindings
  • [5] J. Lee, G.-L. Park, H. Kim, Y.-K. Yang, P. Kim, and S.-W. Kim, “A telematics service system based on the linux cluster,” in International Conference on Computational Science. Springer, 2007, pp. 660–667.
    Google ScholarLocate open access versionFindings
  • [6] A. Glaschenko, A. Ivaschenko, G. Rzevski, and P. Skobelev, “Multi-agent real time scheduling system for taxi companies,” in 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2009), Budapest, Hungary, 2009, pp. 29–36.
    Google ScholarLocate open access versionFindings
  • [7] D.-H. Lee, H. Wang, R. Cheu, and S. Teo, “Taxi dispatch system based on current demands and real-time traffic conditions,” Transportation Research Record: Journal of the Transportation Research Board, no. 1882, pp. 193–200, 2004.
    Google ScholarLocate open access versionFindings
  • [8] L. Zhang, T. Hu, Y. Min, G. Wu, J. Zhang, P. Feng, P. Gong, and J. Ye, “A taxi order dispatch model based on combinatorial optimization,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2017, pp. 2151–2159.
    Google ScholarLocate open access versionFindings
  • [9] C. H. Papadimitriou and K. Steiglitz, Combinatorial optimization: algorithms and complexity. Courier Corporation, 1998.
    Google ScholarFindings
  • [10] T. Verma, P. Varakantham, S. Kraus, and H. C. Lau, “Augmenting decisions of taxi drivers through reinforcement learning for improving revenues,” in International Conference on Automated Planning and Scheduling, 2017, pp. 409–417.
    Google ScholarLocate open access versionFindings
  • [11] M. E. Taylor and P. Stone, “Transfer learning for reinforcement learning domains: A survey,” Journal of Machine Learning Research, vol. 10, no. Jul, pp. 1633–1685, 2009.
    Google ScholarLocate open access versionFindings
  • [12] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on knowledge and data engineering, vol. 22, no. 10, pp. 1345–1359, 2010.
    Google ScholarLocate open access versionFindings
  • [13] J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska et al., “Overcoming catastrophic forgetting in neural networks,” Proceedings of the National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2017.
    Google ScholarLocate open access versionFindings
  • [14] Y. Teh, V. Bapst, W. M. Czarnecki, J. Quan, J. Kirkpatrick, R. Hadsell, N. Heess, and R. Pascanu, “Distral: Robust multitask reinforcement learning,” in Advances in Neural Information Processing Systems, 2017, pp. 4499–4509.
    Google ScholarLocate open access versionFindings
  • [15] A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell, “Progressive neural networks,” arXiv preprint arXiv:1606.04671, 2016.
    Findings
  • [16] E. Parisotto, J. L. Ba, and R. Salakhutdinov, “Actor-mimic: Deep multitask and transfer reinforcement learning,” arXiv preprint arXiv:1511.06342, 2015.
    Findings
  • [17] I. Higgins, A. Pal, A. A. Rusu, L. Matthey, C. P. Burgess, A. Pritzel, M. Botvinick, C. Blundell, and A. Lerchner, “Darla: Improving zero-shot transfer in reinforcement learning,” arXiv preprint arXiv:1707.08475, 2017.
    Findings
  • [18] A. Maurer, M. Pontil, and B. Romera-Paredes, “The benefit of multitask representation learning,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 2853–2884, 2016.
    Google ScholarLocate open access versionFindings
  • [19] Z. Luo, Y. Zou, J. Hoffman, and L. F. Fei-Fei, “Label efficient learning of transferable representations acrosss domains and tasks,” in Advances in Neural Information Processing Systems, 2017, pp. 164–176.
    Google ScholarLocate open access versionFindings
  • [20] Z. Xu, Z. Li, Q. Guan, D. Zhang, W. Ke, Q. Li, J. Nan, C. Liu, W. Bian, and J. Ye, “Large-scale order dispatch in on-demand ridesharing platforms: a learning and planning approach,” in Proceedings of the 24rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2018.
    Google ScholarLocate open access versionFindings
  • [21] C. J. C. H. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, no. 3-4, pp. 279–292, May 1992.
    Google ScholarLocate open access versionFindings
  • [22] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
    Google ScholarLocate open access versionFindings
  • [23] Z. Wang and M. E. Taylor, “Improving Reinforcement Learning with Confidence-Based Demonstrations,” in Proceedings of the 26th International Conference on Artificial Intelligence (IJCAI), August 2017.
    Google ScholarLocate open access versionFindings
  • [24] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning.” in AAAI, 2016, pp. 2094–2100.
    Google ScholarFindings
  • [25] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press Cambridge, 1998, vol. 1, no. 1.
    Google ScholarFindings
  • [26] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” science, vol. 313, no. 5786, pp. 504–507, 2006.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科