Reinforcement Learning based Control of Imitative Policies for Near-Accident Driving

RSS 2020, 2020.

Cited by: 0|Bibtex|Views30|Links
Keywords:
Markov decision processleft turnModel Predictive Controllow level policydeep reinforcement learningMore(19+)
Weibo:
Hierarchical Reinforcement and Imitation Learning is generalizable to any finite number of modes, we only considered n = 2

Abstract:

Autonomous driving has achieved significant progress in recent years, but autonomous cars are still unable to tackle high-risk situations where a potential accident is likely. In such near-accident scenarios, even a minor change in the vehicle's actions may result in drastically different consequences. To avoid unsafe actions in near-ac...More

Code:

Data:

0
Introduction
  • Recent advances in learning models of human driving behavior have played a pivotal role in the development of autonomous vehicles.
  • Phase transitions in autonomous driving occur when small changes in the critical states – the ones the authors see in near-accident scenarios – require dramatically different actions of the autonomous car to stay safe.
  • The speed of the blue car in Fig. 1 can determine the ego car’s policy: if it slows down, the ego car can proceed forward and make the left turn; a small increase in its speed would require the ego car to stop and yield.
  • When training a policy, the algorithms must be able to visit and handle all the critical states individually, which can be computationally inefficient
Highlights
  • Recent advances in learning models of human driving behavior have played a pivotal role in the development of autonomous vehicles
  • We propose a new algorithm Hierarchical Reinforcement and Imitation Learning (H-REIL), which is composed of a high-level policy learned with reinforcement learning that switches between different modes and low-level policies learned with imitation learning, each of which represents a different mode
  • We develop a Hierarchical Reinforcement and Imitation Learning (H-REIL) approach composed of a highlevel policy learned with reinforcement learning, which switches optimally between different modes, and low-level policies learned with imitation learning, which represent driving in different modes
  • We proposed a novel hierarchy with reinforcement learning and imitation learning to achieve safe and efficient driving in near-accident scenarios
  • By learning low-level policies using imitation learning from drivers with different characteristics, such as different aggressiveness levels, and training a high-level reinforcement learning policy that makes the decision of which lowlevel policy to use, our method Hierarchical Reinforcement and Imitation Learning achieves a good tradeoff between safety and efficiency
  • Hierarchical Reinforcement and Imitation Learning is generalizable to any finite number of modes, we only considered n = 2
Methods
  • A video giving an overview of the experiments, as well as the proposed framework, is at https://youtu.be/CY24zlC HdI.
  • The authors describe the experiment settings.
  • A. Environment The authors consider the environment where the ego car navigates in the presence of an ado car.
  • The framework extends to cases with multiple environment cars .
  • In order to model nearaccident scenarios, the authors let the ado car employ a policy to increase the possibility of collision with the ego car
Results
  • Simulations The authors compare the average episode reward, collision rate, and completion time of different methods under all scenarios with both simulators
  • The authors compute these metrics for each model and scenario averaged over 100 test runs.
  • The authors visualize the relation between the velocity and the position of the ego car in its nominal direction in Fig. 4 for the Halting Car and the Wrong Direction scenarios in CARLO
  • The authors selected these two scenarios for visualization as the ego does not change direction
Conclusion
  • The authors proposed a novel hierarchy with reinforcement learning and imitation learning to achieve safe and efficient driving in near-accident scenarios.
  • By learning low-level policies using IL from drivers with different characteristics, such as different aggressiveness levels, and training a high-level RL policy that makes the decision of which lowlevel policy to use, the method H-REIL achieves a good tradeoff between safety and efficiency.
  • The authors hand-designed the near-accident scenarios in this work.
  • Generating them automatically as in [55] could enable broader evaluation in realistic scenarios
Summary
  • Introduction:

    Recent advances in learning models of human driving behavior have played a pivotal role in the development of autonomous vehicles.
  • Phase transitions in autonomous driving occur when small changes in the critical states – the ones the authors see in near-accident scenarios – require dramatically different actions of the autonomous car to stay safe.
  • The speed of the blue car in Fig. 1 can determine the ego car’s policy: if it slows down, the ego car can proceed forward and make the left turn; a small increase in its speed would require the ego car to stop and yield.
  • When training a policy, the algorithms must be able to visit and handle all the critical states individually, which can be computationally inefficient
  • Methods:

    A video giving an overview of the experiments, as well as the proposed framework, is at https://youtu.be/CY24zlC HdI.
  • The authors describe the experiment settings.
  • A. Environment The authors consider the environment where the ego car navigates in the presence of an ado car.
  • The framework extends to cases with multiple environment cars .
  • In order to model nearaccident scenarios, the authors let the ado car employ a policy to increase the possibility of collision with the ego car
  • Results:

    Simulations The authors compare the average episode reward, collision rate, and completion time of different methods under all scenarios with both simulators
  • The authors compute these metrics for each model and scenario averaged over 100 test runs.
  • The authors visualize the relation between the velocity and the position of the ego car in its nominal direction in Fig. 4 for the Halting Car and the Wrong Direction scenarios in CARLO
  • The authors selected these two scenarios for visualization as the ego does not change direction
  • Conclusion:

    The authors proposed a novel hierarchy with reinforcement learning and imitation learning to achieve safe and efficient driving in near-accident scenarios.
  • By learning low-level policies using IL from drivers with different characteristics, such as different aggressiveness levels, and training a high-level RL policy that makes the decision of which lowlevel policy to use, the method H-REIL achieves a good tradeoff between safety and efficiency.
  • The authors hand-designed the near-accident scenarios in this work.
  • Generating them automatically as in [55] could enable broader evaluation in realistic scenarios
Related work
  • Rule-based Methods. Traditional autonomous driving techniques are mostly based on manually designed rules [25, 26, 27]. However, it is tedious, if not impossible, to enumerate all driving rules and norms to deal with all the states. Therefore, rule-based methods often cause the vehicle to drive in an unnatural manner or completely fail in unexpected edge cases. Imitation Learning (IL). ALVINN was one of the first instances of IL applied to driving [1]. Following ALVINN, Muller et al [28] solved off-road obstacle avoidance using behavior cloning. IL learns driving policies on datasets consisting of off-policy state-action pairs. However, they suffer from potential generalization problems to new test domains due to the distribution shift. Ross et al [29] address this shortcoming by iteratively extending the base dataset with onpolicy state-action pairs, while still training the base policy offline with the updated dataset. Bansal et al [17] augment expert demonstrations with perturbations and train the IL policy with an additional loss penalizing undesired behavior. Generative Adversarial Imitation Learning [30, 31] proposes to match the state-action occupancy between trajectories of the learned policy and the expert demonstrations.
Funding
  • The authors thank Derek Phillips for the help with CARLA simulator, Wentao Zhong and Jiaqiao Zhang for additional experiments with H-REIL, and acknowledge funding by FLI grant RFP2-000
Reference
  • Dean A Pomerleau. Alvinn: An autonomous land vehicle in a neural network. In Advances in neural information processing systems, pages 305–313, 1989.
    Google ScholarLocate open access versionFindings
  • Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316, 2016.
    Findings
  • Alexander Amini, Guy Rosman, Sertac Karaman, and Daniela Rus. Variational end-to-end navigation and localization. In 2019 International Conference on Robotics and Automation (ICRA), pages 8958–8964. IEEE, 2019.
    Google ScholarLocate open access versionFindings
  • Dorsa Sadigh, S. Shankar Sastry, Sanjit A. Seshia, and Anca D. Dragan. Planning for autonomous cars that leverage effects on human actions. In Proceedings of Robotics: Science and Systems (RSS), June 2016. doi: 10.15607/RSS.2016.XII.029.
    Locate open access versionFindings
  • Dorsa Sadigh, S. Shankar Sastry, Sanjit A. Seshia, and Anca Dragan. Information gathering actions over human internal state. In Proceedings of the IEEE, /RSJ, International Conference on Intelligent Robots and Systems (IROS), pages 66–73. IEEE, October 2016. doi: 10.1109/IROS.2016.7759036.
    Locate open access versionFindings
  • Dorsa Sadigh, Nick Landolfi, S. Shankar Sastry, Sanjit A. Seshia, and Anca D. Dragan. Planning for cars that coordinate with people: Leveraging effects on human actions for planning and active information gathering over human internal state. Autonomous Robots (AURO), 42(7):1405–1426, October 2018. ISSN 1573-7527. doi: 10.1007/s10514-018-9746-1.
    Locate open access versionFindings
  • Namhoon Lee, Wongun Choi, Paul Vernaza, Christopher B Choy, Philip HS Torr, and Manmohan Chandraker. Desire: Distant future prediction in dynamic scenes with interacting agents. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 336–345, 2017.
    Google ScholarLocate open access versionFindings
  • Erdem Biyik and Dorsa Sadigh. Batch active preferencebased learning of reward functions. In Proceedings of the 2nd Conference on Robot Learning (CoRL), volume 87 of Proceedings of Machine Learning Research, pages 519–52PMLR, October 2018.
    Google ScholarLocate open access versionFindings
  • Felipe Codevilla, Eder Santana, Antonio M Lopez, and Adrien Gaidon. Exploring the limitations of behavior cloning for autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision, pages 9329–9338, 2019.
    Google ScholarLocate open access versionFindings
  • Minae Kwon, Erdem Biyik, Aditi Talati, Karan Bhasin, Dylan P. Losey, and Dorsa Sadigh. When humans aren’t optimal: Robots that collaborate with risk-aware humans. In ACM/IEEE International Conference on Human-Robot Interaction (HRI), March 2020. doi: 10.1145/3319502.3374832.
    Locate open access versionFindings
  • Chandrayee Basu, Erdem Biyik, Zhixun He, Mukesh Singhal, and Dorsa Sadigh. Active learning of reward dynamics from hierarchical queries. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), November 2019. doi: 10.1109/IROS40897.2019.8968522.
    Locate open access versionFindings
  • Markus Wulfmeier, Dushyant Rao, Dominic Zeng Wang, Peter Ondruska, and Ingmar Posner. Large-scale cost function learning for path planning using deep inverse reinforcement learning. The International Journal of Robotics Research, 36(10):1073– 1087, 2017.
    Google ScholarLocate open access versionFindings
  • Konstantinos Makantasis, Maria Kontorinaki, and Ioannis Nikolos. A deep reinforcement learning driving policy for autonomous road vehicles. arXiv preprint arXiv:1905.09046, 2019.
    Findings
  • Ahmad EL Sallab, Mohammed Abdou, Etienne Perot, and Senthil Yogamani. Deep reinforcement learning framework for autonomous driving. Electronic Imaging, 2017(19):70–76, 2017.
    Google ScholarLocate open access versionFindings
  • Felipe Codevilla, Matthias Miiller, Antonio Lopez, Vladlen Koltun, and Alexey Dosovitskiy. End-to-end driving via conditional imitation learning. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 1–9. IEEE, 2018.
    Google ScholarLocate open access versionFindings
  • Jiakai Zhang and Kyunghyun Cho. Query-efficient imitation learning for end-to-end simulated driving. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
    Google ScholarLocate open access versionFindings
  • Mayank Bansal, Alex Krizhevsky, and Abhijit Ogale. Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst. arXiv preprint arXiv:1812.03079, 2018.
    Findings
  • Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. Deepdriving: Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision, pages 2722–2730, 2015.
    Google ScholarLocate open access versionFindings
  • Mariusz Bojarski, Philip Yeres, Anna Choromanska, Krzysztof Choromanski, Bernhard Firner, Lawrence Jackel, and Urs Muller. Explaining how a deep neural network trained with endto-end learning steers a car. arXiv preprint arXiv:1704.07911, 2017.
    Findings
  • Alex Kuefler, Jeremy Morton, Tim Wheeler, and Mykel Kochenderfer. Imitating driver behavior with generative adversarial networks. In 2017 IEEE Intelligent Vehicles Symposium (IV), pages 204–211. IEEE, 2017.
    Google ScholarLocate open access versionFindings
  • Xinlei Pan, Yurong You, Ziyan Wang, and Cewu Lu. Virtual to real reinforcement learning for autonomous driving. arXiv preprint arXiv:1704.03952, 2017.
    Findings
  • Xin Huang, Stephen G McGill, Brian C Williams, Luke Fletcher, and Guy Rosman. Uncertainty-aware driver trajectory prediction at urban intersections. In 2019 International Conference on Robotics and Automation (ICRA), pages 9718–9724. IEEE, 2019.
    Google ScholarLocate open access versionFindings
  • Matthias Muller, Alexey Dosovitskiy, Bernard Ghanem, and Vladlen Koltun. Driving policy transfer via modularity and abstraction. arXiv preprint arXiv:1804.09364, 2018.
    Findings
  • Brian Paden, Michal Cap, Sze Zheng Yong, Dmitry Yershov, and Emilio Frazzoli. A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Transactions on intelligent vehicles, 1(1):33–55, 2016.
    Google ScholarLocate open access versionFindings
  • Wilko Schwarting, Javier Alonso-Mora, and Daniela Rus. Planning and decision-making for autonomous vehicles. Annual Review of Control, Robotics, and Autonomous Systems, 2018.
    Google ScholarLocate open access versionFindings
  • Chris Urmson, Joshua Anhalt, Drew Bagnell, Christopher Baker, Robert Bittner, MN Clark, John Dolan, Dave Duggins, Tugrul Galatali, Chris Geyer, et al. Autonomous driving in urban environments: Boss and the urban challenge. Journal of Field Robotics, 25(8):425–466, 2008.
    Google ScholarLocate open access versionFindings
  • Michael Montemerlo, Jan Becker, Suhrid Bhat, Hendrik Dahlkamp, Dmitri Dolgov, Scott Ettinger, Dirk Haehnel, Tim Hilden, Gabe Hoffmann, Burkhard Huhnke, et al. Junior: The stanford entry in the urban challenge. Journal of field Robotics, 25(9):569–597, 2008.
    Google ScholarLocate open access versionFindings
  • Urs Muller, Jan Ben, Eric Cosatto, Beat Flepp, and Yann L Cun. Off-road obstacle avoidance through end-to-end learning. In Advances in neural information processing systems, pages 739–746, 2006.
    Google ScholarLocate open access versionFindings
  • Stephane Ross, Geoffrey J Gordon, and J Andrew Bagnell. Noregret reductions for imitation learning and structured prediction. In In AISTATS. Citeseer, 2011.
    Google ScholarLocate open access versionFindings
  • Jonathan Ho and Stefano Ermon. Generative adversarial imitation learning. In Advances in neural information processing systems, pages 4565–4573, 2016.
    Google ScholarLocate open access versionFindings
  • Jiaming Song, Hongyu Ren, Dorsa Sadigh, and Stefano Ermon. Multi-agent generative adversarial imitation learning. In Advances in Neural Information Processing Systems (NIPS), pages 7461–7472. Curran Associates, Inc., December 2018.
    Google ScholarLocate open access versionFindings
  • Pieter Abbeel and Andrew Y Ng. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twentyfirst international conference on Machine learning, page 1. ACM, 2004.
    Google ScholarLocate open access versionFindings
  • Brian D Ziebart, Andrew L Maas, J Andrew Bagnell, and Anind K Dey. Maximum entropy inverse reinforcement learning. In Aaai, volume 8, pages 1433–1438. Chicago, IL, USA, 2008.
    Google ScholarLocate open access versionFindings
  • Sergey Levine and Vladlen Koltun. Continuous inverse optimal control with locally optimal examples. arXiv preprint arXiv:1206.4617, 2012.
    Findings
  • Chelsea Finn, Sergey Levine, and Pieter Abbeel. Guided cost learning: Deep inverse optimal control via policy optimization. In International Conference on Machine Learning, pages 49– 58, 2016.
    Google ScholarLocate open access versionFindings
  • Jianyu Chen, Bodi Yuan, and Masayoshi Tomizuka. Modelfree deep reinforcement learning for urban autonomous driving. arXiv preprint arXiv:1904.09503, 2019.
    Findings
  • Fenjiro Youssef and Benbrahim Houda. Deep reinforcement learning with external control: self-driving car application. In Proceedings of the 4th International Conference on Smart City Applications, page 58. ACM, 2019.
    Google ScholarLocate open access versionFindings
  • Shai Shalev-Shwartz, Shaked Shammah, and Amnon Shashua. Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295, 2016.
    Findings
  • Tommy Tram, Ivo Batkovic, Mohammad Ali, and Jonas Sjoberg. Learning when to drive in intersections by combining reinforcement learning and model predictive control. In 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pages 3263–3268. IEEE, 2019.
    Google ScholarLocate open access versionFindings
  • Abhishek Gupta, Vikash Kumar, Corey Lynch, Sergey Levine, and Karol Hausman. Relay policy learning: Solving longhorizon tasks via imitation and reinforcement learning. In Proceedings of the 3rd Conference on Robot Learning (CoRL), October 2019.
    Google ScholarLocate open access versionFindings
  • Peter Dayan and Geoffrey E Hinton. Feudal reinforcement learning. In Advances in neural information processing systems, pages 271–278, 1993.
    Google ScholarLocate open access versionFindings
  • Tejas D Kulkarni, Karthik Narasimhan, Ardavan Saeedi, and Josh Tenenbaum. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In Advances in neural information processing systems, pages 3675– 3683, 2016.
    Google ScholarLocate open access versionFindings
  • Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. Feudal networks for hierarchical reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 3540–3549. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Freek Stulp and Stefan Schaal. Hierarchical reinforcement learning with movement primitives. In 2011 11th IEEE-RAS International Conference on Humanoid Robots, pages 231–238. IEEE, 2011.
    Google ScholarLocate open access versionFindings
  • Robin Strudel, Alexander Pashevich, Igor Kalevatykh, Ivan Laptev, Josef Sivic, and Cordelia Schmid. Combining learned skills and reinforcement learning for robotic manipulations. arXiv preprint arXiv:1908.00722, 2019.
    Findings
  • Bohan Wu, Jayesh K Gupta, and Mykel Kochenderfer. Model primitives for hierarchical lifelong reinforcement learning. Autonomous Agents and Multi-Agent Systems, 34(1):1–38, 2020.
    Google ScholarLocate open access versionFindings
  • Hoang M Le, Nan Jiang, Alekh Agarwal, Miroslav Dudık, Yisong Yue, and Hal Daume III. Hierarchical imitation and reinforcement learning. arXiv preprint arXiv:1803.00590, 2018.
    Findings
  • Ahmed H. Qureshi, Jacob J. Johnson, Yuzhe Qin, Taylor Henderson, Byron Boots, and Michael C. Yip. Composing task-agnostic policies with deep reinforcement learning. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=H1ezFREtwH.
    Locate open access versionFindings
  • Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. Overcoming exploration in reinforcement learning with demonstrations. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 6292–6299. IEEE, 2018.
    Google ScholarLocate open access versionFindings
  • Gheorghe Comanici and Doina Precup. Optimal policy switching algorithms for reinforcement learning. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: Volume 1 - Volume 1, AAMAS 10, page 709714, Richland, SC, 2010. International Foundation for Autonomous Agents and Multiagent Systems. ISBN 9780982657119.
    Google ScholarLocate open access versionFindings
  • John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
    Findings
  • Dorsa Sadigh, Anca D. Dragan, S. Shankar Sastry, and Sanjit A. Seshia. Active preference-based learning of reward functions. In Proceedings of Robotics: Science and Systems (RSS), July 2017. doi: 10.15607/RSS.2017.XIII.053.
    Locate open access versionFindings
  • Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. Carla: An open urban driving simulator. In Proceedings of the 1st Conference on Robot Learning (CoRL), November 2017.
    Google ScholarLocate open access versionFindings
  • Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015.
    Google ScholarLocate open access versionFindings
  • Matthew O’Kelly, Aman Sinha, Hongseok Namkoong, Russ Tedrake, and John C Duchi. Scalable end-to-end autonomous vehicle testing via rare-event simulation. In Advances in Neural Information Processing Systems, pages 9827–9838, 2018.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments