AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We model ride-hailing as a large-scale parallel ranking problem and study the joint decisionmaking task of order dispatching and fleet management in online ride-hailing platforms

CoRide: Joint Order Dispatching and Fleet Management for Multi-Scale Ride-Hailing Platforms.

Proceedings of the 28th ACM International Conference on Information and Knowledge Management, (2019): 1983-1992

被引用16|浏览277
EI
下载 PDF 全文
引用
微博一下

摘要

How to optimally dispatch orders to vehicles and how to trade off between immediate and future returns are fundamental questions for a typical ride-hailing platform. We model ride-hailing as a large-scale parallel ranking problem and study the joint decision-making task of order dispatching and fleet management in online ride-hailing plat...更多

代码

数据

0
简介
  • Online ride-hailing platforms such as Uber and Didi Chuxing have substantially transformed the lives by sharing and reallocating transportation resources to highly promote transportation efficiency.
  • Hierarchical Reinforcement Learning; Multi-agent Reinforcement Learning; Ride-Hailing; Order Dispatching; Fleet Management
  • CoRide: Joint Order Dispatching and Fleet Management for MultiScale Ride-Hailing Platforms.
重点内容
  • Online ride-hailing platforms such as Uber and Didi Chuxing have substantially transformed our lives by sharing and reallocating transportation resources to highly promote transportation efficiency
  • There are two major decision-making tasks for such ride-hailing platforms, namely (i) order dispatching, i.e., to match the orders and vehicles in real time to directly deliver the service to the users [24, 43, 45], and fleet management, i.e., to reposition the vehicles to certain areas in advance to prepare for the later order dispatching [15, 21, 26]
  • As illustrated in Figure 1, we resemble vehicle and order as different molecules and aim at building up the system stability via reducing their number by dispatch and reposition. To address this complex criterion, we provide two novel views: (i) interconnecting order dispatching and fleet management, and joint considering intra-district and inter-district allocation. With such a practical motivation, we focus on modeling joint order dispatching and fleet management with multi-scale decision-making system
  • Wei et al [37] introduced a reinforcement learning method, which takes the uncertainty of future requests into account and can make a look-ahead decision to help the operator improve the global level-of-service of a shared-vehicle system through fleet management
  • Since each agent can only reposition vehicles located in the managing grid, we propose to formulate the problem using Partially Observable Markov Decision Process (POMDP) [27] in a hierarchical multi-agent reinforcement learning setting for both order dispatching and fleet management
  • We conduct extensive experiments to evaluate the effectiveness of our proposed method in joint order dispatching and fleet management environment
结果
  • Much of work has modeled order dispatching and fleet management as a sequential decision-making problem and solved it with reinforcement learning (RL) [15, 30, 36, 39].
  • Most of the previous work deals with either order dispatching or fleet management without regarding the high correlation of these two tasks, especially for large-scale ride-hailing platforms in large cities, which leads to sub-optimal performance.
  • With such a practical motivation, the authors focus on modeling joint order dispatching and fleet management with multi-scale decision-making system.
  • Wei et al [37] introduced a reinforcement learning method, which takes the uncertainty of future requests into account and can make a look-ahead decision to help the operator improve the global level-of-service of a shared-vehicle system through fleet management.
  • Different from all aforementioned methods, the approach is the first, to the best of the knowledge, to consider the joint modeling of order dispatching and fleet management and the only current work introducing and studying the multi-scale ride-hailing task.
  • The authors formulate the problem of controlling large-scale homogeneous vehicles in online ride-hailing platforms, which combines order dispatching system with fleet management system with the goal of maximizing the city-level ADI and ORR.
  • Since each agent can only reposition vehicles located in the managing grid, the authors propose to formulate the problem using Partially Observable Markov Decision Process (POMDP) [27] in a hierarchical multi-agent reinforcement learning setting for both order dispatching and fleet management.
  • Note that manager and worker share the same setting of multi-head attention mechanism, agent in this subsection can represent either of them.
  • The authors adopt and extend the grid-based simulator designed by Lin et al [15] to joint order dispatching and fleet management.
  • Vehicles are set online and offline alternatively according to a distribution learned from real-world dataset via maximum likelihood estimation.
结论
  • The authors conduct extensive experiments to evaluate the effectiveness of the proposed method in joint order dispatching and fleet management environment.
  • CoRide-: In order to further evaluate performance for hierarchical setting and agent communication, the authors set CoRide without multi-head attention mechanism as one of the baselines.
  • As shown in Figure 3, the communication mechanism conducts in a hierarchical way: attention among the managers communicates and learns to collaborate abstractly and globally while peers in worker-layer operate and determine key grid locally.
总结
  • Online ride-hailing platforms such as Uber and Didi Chuxing have substantially transformed the lives by sharing and reallocating transportation resources to highly promote transportation efficiency.
  • Hierarchical Reinforcement Learning; Multi-agent Reinforcement Learning; Ride-Hailing; Order Dispatching; Fleet Management
  • CoRide: Joint Order Dispatching and Fleet Management for MultiScale Ride-Hailing Platforms.
  • Much of work has modeled order dispatching and fleet management as a sequential decision-making problem and solved it with reinforcement learning (RL) [15, 30, 36, 39].
  • Most of the previous work deals with either order dispatching or fleet management without regarding the high correlation of these two tasks, especially for large-scale ride-hailing platforms in large cities, which leads to sub-optimal performance.
  • With such a practical motivation, the authors focus on modeling joint order dispatching and fleet management with multi-scale decision-making system.
  • Wei et al [37] introduced a reinforcement learning method, which takes the uncertainty of future requests into account and can make a look-ahead decision to help the operator improve the global level-of-service of a shared-vehicle system through fleet management.
  • Different from all aforementioned methods, the approach is the first, to the best of the knowledge, to consider the joint modeling of order dispatching and fleet management and the only current work introducing and studying the multi-scale ride-hailing task.
  • The authors formulate the problem of controlling large-scale homogeneous vehicles in online ride-hailing platforms, which combines order dispatching system with fleet management system with the goal of maximizing the city-level ADI and ORR.
  • Since each agent can only reposition vehicles located in the managing grid, the authors propose to formulate the problem using Partially Observable Markov Decision Process (POMDP) [27] in a hierarchical multi-agent reinforcement learning setting for both order dispatching and fleet management.
  • Note that manager and worker share the same setting of multi-head attention mechanism, agent in this subsection can represent either of them.
  • The authors adopt and extend the grid-based simulator designed by Lin et al [15] to joint order dispatching and fleet management.
  • Vehicles are set online and offline alternatively according to a distribution learned from real-world dataset via maximum likelihood estimation.
  • The authors conduct extensive experiments to evaluate the effectiveness of the proposed method in joint order dispatching and fleet management environment.
  • CoRide-: In order to further evaluate performance for hierarchical setting and agent communication, the authors set CoRide without multi-head attention mechanism as one of the baselines.
  • As shown in Figure 3, the communication mechanism conducts in a hierarchical way: attention among the managers communicates and learns to collaborate abstractly and globally while peers in worker-layer operate and determine key grid locally.
表格
  • Table1: Performance comparison of competing methods in terms of ADI and ORR with respect to the performance of RAN. For a fair comparison, the random seeds that control the dynamics of the environment are set to be the same across all methods
  • Table2: Performance comparissoonn ooff ccoommppeettiinnggmmeetthhooddssiinntteerrmmssooffAASSTTaannddTTNNFFwwiitthhtthhrreeeeddiifffeerreennttddisisccoouunnteteddrraatetess(D(DRR).). The numbers in Trajectory dennoottee ggrriiddIIDDiinnFFiigguurree77aannddiittssccoolloorrddeennootteesstthheeddiissttrriiccttiittllooccaatteeddinin..OOaannddWWmmeeaannththeevveehhiciclele is On-service and Waiting at thhee ccuurrrreenntt ggrriidd..AAllssoo,,wweeuusseeuunnddeerrlliinneeddnnuummbbeerrttoopprreesseennttfflleeeettmmaannaaggeemmeennt.t
Download tables as Excel
基金
  • The corresponding author Weinan Zhang thanks the support of National Natural Science Foundation of China (Grant No 61702327, 61772333, 61632017)
引用论文
  • Sanjeevan Ahilan and Peter Dayan. 2019. Feudal multi-agent hierarchies for cooperative reinforcement learning. arXiv preprint arXiv:1901.08492 (2019).
    Findings
  • Pierre-Luc Bacon, Jean Harb, and Doina Precup. 2017. The Option-Critic Architecture.. In AAAI. 1726–1734.
    Google ScholarLocate open access versionFindings
  • Richard Bellman. 201Dynamic programming. Courier Corporation.
    Google ScholarFindings
  • Christian Daniel, Gerhard Neumann, and Jan Peters. 2012. Hierarchical relative entropy policy search. In Artificial Intelligence and Statistics. 273–281.
    Google ScholarLocate open access versionFindings
  • Peter Dayan and Geoffrey E Hinton. 1993. Feudal reinforcement learning. In
    Google ScholarLocate open access versionFindings
  • Thomas G Dietterich. 2000. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13 (2000), 227–303.
    Google ScholarLocate open access versionFindings
  • Carlos Florensa, Yan Duan, and Pieter Abbeel. 201Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012 (2017).
    Findings
  • Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, and John Schulman. 2017. Meta learning shared hierarchies. arXiv preprint arXiv:1710.09767 (2017).
    Findings
  • Gianpaolo Ghiani, Francesca Guerriero, Gilbert Laporte, and Roberto Musmanno. 2003. Real-time vehicle routing: Solution concepts, algorithms and parallel computing strategies. European Journal of Operational Research 151, 1 (2003), 1–11.
    Google ScholarLocate open access versionFindings
  • Xiangyu Kong, Bo Xin, Fangchen Liu, and Yizhou Wang. 2017. Effective masterslave communication on a multiagent deep reinforcement learning system. In Hierarchical Reinforcement Learning Workshop at the 31st Conference on NIPS, Long Beach, CA, USA.
    Google ScholarLocate open access versionFindings
  • Tejas D Kulkarni, Karthik Narasimhan, Ardavan Saeedi, and Josh Tenenbaum. 2016. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In Advances in neural information processing systems. 3675–3683.
    Google ScholarFindings
  • Andrew Levy, George Konidaris, Robert Platt, and Kate Saenko. 2018. Learning multi-level hierarchies with hindsight. (2018).
    Google ScholarFindings
  • Minne Li, Yan Jiao, Yaodong Yang, Zhichen Gong, Jun Wang, Chenxi Wang, Guobin Wu, Jieping Ye, et al. 2019. Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning. arXiv (2019).
    Google ScholarLocate open access versionFindings
  • Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).
    Findings
  • Kaixiang Lin, Renyu Zhao, Zhe Xu, and Jiayu Zhou. 2018. Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning. arXiv preprint arXiv:1802.06444 (2018).
    Findings
  • Dominique Lord, Simon P Washington, and John N Ivan. 2005. Poisson, Poissongamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory. Accident Analysis & Prevention 37, 1 (2005), 35–46.
    Google ScholarLocate open access versionFindings
  • Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).
    Findings
  • James Munkres. 1957. Algorithms for the assignment and transportation problems. Journal of the society for industrial and applied mathematics 5, 1 (1957), 32–38.
    Google ScholarLocate open access versionFindings
  • Ofir Nachum, Shane Gu, Honglak Lee, and Sergey Levine. 2018. Data-Efficient Hierarchical Reinforcement Learning. arXiv preprint arXiv:1805.08296 (2018).
    Findings
  • Ofir Nachum, Shixiang Gu, Honglak Lee, and Sergey Levine. 2018. Near-Optimal Representation Learning for Hierarchical Reinforcement Learning. arXiv preprint arXiv:1810.01257 (2018).
    Findings
  • Takuma Oda and Yulia Tachibana. 2018. Distributed Fleet Control with Maximum Entropy Deep Reinforcement Learning. (2018).
    Google ScholarFindings
  • Doina Precup. 2000. Temporal abstraction in reinforcement learning. University of Massachusetts Amherst.
    Google ScholarFindings
  • Martin Riedmiller, Roland Hafner, Thomas Lampe, Michael Neunert, Jonas Degrave, Tom Van de Wiele, Volodymyr Mnih, Nicolas Heess, and Jost Tobias Springenberg. 2018. Learning by Playing-Solving Sparse Reward Tasks from Scratch. arXiv preprint arXiv:1802.10567 (2018).
    Findings
  • Kiam Tian Seow, Nam Hai Dang, and Der-Horng Lee. 2010. A collaborative multiagent taxi-dispatch system. IEEE Transactions on Automation Science and Engineering 7, 3 (2010), 607–616.
    Google ScholarLocate open access versionFindings
  • Olivier Sigaud and Freek Stulp. 2018. Policy Search in Continuous Action Domains: an Overview. arXiv preprint arXiv:1803.04706 (2018).
    Findings
  • Hugo P Simao, Jeff Day, Abraham P George, Ted Gifford, John Nienow, and Warren B Powell. 2009. An approximate dynamic programming algorithm for large-scale fleet management: A case application. Transportation Science 43, 2 (2009), 178–197.
    Google ScholarLocate open access versionFindings
  • Matthijs TJ Spaan. 2012. Partially observable Markov decision processes. In Reinforcement Learning. Springer, 387–414.
    Google ScholarFindings
  • Martin Stolle and Doina Precup. 2002. Learning options in reinforcement learning. In International Symposium on abstraction, reformulation, and approximation. Springer, 212–223.
    Google ScholarFindings
  • Richard S Sutton, Doina Precup, and Satinder Singh. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial intelligence 112, 1-2 (1999), 181–211.
    Google ScholarLocate open access versionFindings
  • Xiaocheng Tang and Zhiwei Qin. 2018. A Deep Value-network Based Approach for Multi-Driver Order Dispatching. Technical Report (2018).
    Google ScholarLocate open access versionFindings
  • Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J Mankowitz, and Shie Mannor. 2017. A Deep Hierarchical Approach to Lifelong Learning in Minecraft.. In AAAI, Vol. 3. 6.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NeurIPS.
    Google ScholarFindings
  • Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv (2017).
    Google ScholarLocate open access versionFindings
  • Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. 2017. Feudal networks for hierarchical reinforcement learning. arXiv preprint arXiv:1703.01161 (2017).
    Findings
  • Zheng Wang, Kun Fu, and Jieping Ye. 2018. Learning to estimate the travel time. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 858–866.
    Google ScholarLocate open access versionFindings
  • Zhaodong Wang, Zhiwei Qin, Xiaocheng Tang, Jieping Ye, and Hongtu Zhu. 2018. Deep Reinforcement Learning with Knowledge Transfer for Online Rides Order Dispatching. In 2018 IEEE International Conference on Data Mining (ICDM). IEEE, 617–626.
    Google ScholarLocate open access versionFindings
  • Chong Wei, Yinhu Wang, Xuedong Yan, and Chunfu Shao. 2018. Look-Ahead Insertion Policy for a Shared-Taxi System Based on Reinforcement Learning. IEEE Access 6 (2018), 5716–5726.
    Google ScholarLocate open access versionFindings
  • Hua Wei, Nan Xu, Huichu Zhang, Guanjie Zheng, Xinshi Zang, Chacha Chen, Weinan Zhang, Yanmin Zhu, Kai Xu, and Zhenhui Li. 2019. CoLight: Learning Network-level Cooperation for Traffic Signal Control. arXiv preprint arXiv:1905.05717 (2019).
    Findings
  • Zhe Xu, Zhixin Li, Qingwen Guan, Dingshui Zhang, Qiang Li, Junxiao Nan, Chunyang Liu, Wei Bian, and Jieping Ye. 2018. Large-Scale Order Dispatch in On-Demand Ride-Hailing Platforms: A Learning and Planning Approach. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 905–913.
    Google ScholarLocate open access versionFindings
  • Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, and Jun Wang. 2018. Mean Field Multi-Agent Reinforcement Learning (ICML).
    Google ScholarFindings
  • Fisher Yu and Vladlen Koltun. 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015).
    Findings
  • Huichu Zhang, Siyuan Feng, Chang Liu, Yaoyao Ding, Yichen Zhu, Zihan Zhou, Weinan Zhang, Yong Yu, Haiming Jin, and Zhenhui Li. 2019. CityFlow: A Multi-Agent Reinforcement Learning Environment for Large Scale City Traffic Scenario. arXiv preprint arXiv:1905.05217 (2019).
    Findings
  • Lingyu Zhang, Tao Hu, Yue Min, Guobin Wu, Junying Zhang, Pengcheng Feng, Pinghua Gong, and Jieping Ye. 2017. A taxi order dispatch model based on combinatorial optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2151–2159.
    Google ScholarLocate open access versionFindings
  • Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Dawei Yin, Yihong Zhao, and Jiliang Tang. 2017. Deep Reinforcement Learning for List-wise Recommendations. arXiv preprint arXiv:1801.00209 (2017).
    Findings
  • Qingnan Zou, Guangtao Xue, Yuan Luo, Jiadi Yu, and Hongzi Zhu. 2013. A novel taxi dispatch system for smart city. In International Conference on Distributed, Ambient, and Pervasive Interactions. Springer, 326–335.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科