AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We examine baseline methods for 3D tracking with map-derived context

Argoverse: 3d Tracking And Forecasting With Rich Maps

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), pp.8740-8749, (2019)

Cited: 445|Views182
EI

Abstract

We present Argoverse, a dataset designed to support autonomous vehicle perception tasks including 3D tracking and motion forecasting. Argoverse includes sensor data collected by a fleet of autonomous vehicles in Pittsburgh and Miami as well as 3D tracking annotations, 300k extracted interesting vehicle trajectories, and rich semantic maps...More

Code:

Data:

0
Introduction
  • Datasets and benchmarks for a variety of perception tasks in autonomous driving have been hugely influential to the computer vision community over the last few years.
  • Publicly available datasets for autonomous driving rarely include map data, even though detailed maps are critical to the development real world autonomous systems.
  • Since publicly available datasets don’t contain such rich mapped attributes it is an open research question of how to represent and utilize these features.
  • The authors examine the potential utility of these new map features on two tasks – 3D tracking and motion forecasting, and the authors offer a significant amount of real-world, annotated data to enable new benchmarks for these problems
Highlights
  • Datasets and benchmarks for a variety of perception tasks in autonomous driving have been hugely influential to the computer vision community over the last few years
  • Publicly available datasets for autonomous driving rarely include map data, even though detailed maps are critical to the development real world autonomous systems
  • We examine the potential utility of these new map features on two tasks – 3D tracking and motion forecasting, and we offer a significant amount of real-world, annotated data to enable new benchmarks for these problems
  • We focus on the Average Displacement Error (ADE) and Final Displacement Error (FDE) for a prediction horizon of 3 seconds to understand which baselines are less impacted by accumulating errors
  • Argoverse is a large dataset for autonomous driving research
  • We examine baseline forecasting methods and see that map data significantly improves accuracy
  • We examine baseline methods for 3D tracking with map-derived context
Results
  • The authors evaluate the effect of adding social context and spatial context to improve trajectory forecasting over horizons of 1 and 3 seconds into the future.
  • The authors observe that LSTM ED+map outperforms all the other baselines for a prediction horizon of 3 sec
  • This proves the importance of having a vector map for distant future prediction and making multimodal predictions.
  • NN+map has a lower FDE than LSTM ED+social and LSTM ED for higher prediction horizon (3 secs)
  • This suggests that even a shallow model working on top of a vector map works better than a deep model with social features and no vector map.
Conclusion
  • Argoverse is a large dataset for autonomous driving research.
  • Unique among such datasets, Argoverse contains rich map information such as lane centerlines, ground height, and driveable area.
  • The authors examine baseline methods for 3D tracking with map-derived context.
  • The authors examine baseline forecasting methods and see that map data significantly improves accuracy.
  • The authors will maintain a public leaderboard for 3D object tracking and motion forecasting.
  • The sensor data, map data, annotations, and code which make up Argoverse are available at Argoverse.org
Tables
  • Table1: Public self-driving datasets. We compare recent, publicly available self-driving datasets with 3D object annotations for tracking. Coverage area for nuScenes is based on its road and sidewalk raster map. Argoverse coverage area is based on our driveable area raster map
  • Table2: Tracking accuracy at different ranges. From top to bottom, accuracy for objects within 100m, 50m, and 30m
  • Table3: Forecasting Errors for different prediction horizons hypothesized centerlines. • LSTM ED: LSTM Encoder-Decoder model where the input is (xti, yit) for t = {1, . . . , Tobs} and output is (xti, yit) for t = {Tobs+1, . . . , Tpred} • LSTM ED+social: Similar to LSTM ED but with input as (xti, yit, sti), where sti denotes social features • LSTM ED+map(oracle): Similar to LSTM ED but with input as (ati, oti, mti) and output as (ati, oti), where mti denotes the map features obtained from oracle centerline. Distances (ati, oti) are then mapped to (xti, yit) for evaluation. • LSTM ED+map: Similar to LSTM ED+map(oracle) but uses top-K hypothesized centerlines. • LSTM ED+social+map (oracle): Similar to LSTM ED+map(oracle) but with input features being (ati, oti, sti, mti)
Download tables as Excel
Related work
  • Autonomous Driving Datasets with Map Information. Until recently, it was rare to find datasets that provide detailed map information associated with annotated data. Works such as TorontoCity [36] and ApolloScape [18] focus on map construction tasks but without 3D annotation for dynamic objects. The nuScenes dataset [5] contains maps in the form of binary, rasterized, top-down indicators of region of interest (where region of interest is the union of driveable area and sidewalk). This map information is provided for 1000 annotated vehicle log segments (or “scenes”) in Singapore and Boston. Like nuScenes, Argoverse includes maps of driveable area, but we also include ground height and a “vector map” of lane centerlines and their connectivity.
Funding
  • We thank our Argo AI colleagues – Ben Ballard, Brett Browning, Alex Bury, Dave Chekan, Kunal Desai, Patrick Gray, Larry Jackson, Etienne Jacques, Gang Pan, Kevin Player, Peter Rander, Bryan Salesky, Philip Tsai, Ian Volkwein, Ersin Yumer and many more – for their invaluable assistance in supporting Argoverse. Patsorn Sangkloy is supported by a a Royal Thai Government Scholarship
  • James Hays receives research funding from Argo AI, which is developing products related to the research described in this paper
Reference
  • Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, and Silvio Savarese. Social lstm: Human trajectory prediction in crowded spaces. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    Google ScholarLocate open access versionFindings
  • Mykhaylo Andriluka, Stefan Roth, and Bernt Schiele. People-tracking-by-detection and people-detection-bytracking. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008.
    Google ScholarLocate open access versionFindings
  • Andrew Bacha, Cheryl Bauman, Ruel Faruque, Michael Fleming, Chris Terwelp, Charles Reinholtz, Dennis Hong, Al Wicks, Thomas Alberi, David Anderson, Stephen Cacciola, Patrick Currier, Aaron Dalton, Jesse Farmer, Jesse Hurdus, Shawn Kimmel, Peter King, Andrew Taylor, David Van Covern, and Mike Webster. Odin: Team victortango’s entry in the darpa urban challenge. J. Field Robot., 25(8):467–492, Aug. 2008.
    Google ScholarLocate open access versionFindings
  • Keni Bernardin and Rainer Stiefelhagen. Evaluating multiple object tracking performance: The clear mot metrics. EURASIP J. Image and Video Processing, 2008.
    Google ScholarLocate open access versionFindings
  • Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuscenes: A multimodal dataset for autonomous driving. arXiv preprint arXiv:1903.11027, 2019.
    Findings
  • Sergio Casas, Wenjie Luo, and Raquel Urtasun. Intentnet: Learning to predict intention from raw sensor data. In Aude Billard, Anca Dragan, Jan Peters, and Jun Morimoto, editors, Proceedings of The 2nd Conference on Robot Learning, volume 87 of Proceedings of Machine Learning Research, pages 947–95PMLR, 29–31 Oct 2018.
    Google ScholarLocate open access versionFindings
  • Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. Deepdriving: Learning affordance for direct perception in autonomous driving. In The IEEE International Conference on Computer Vision (ICCV), 2015.
    Google ScholarLocate open access versionFindings
  • Nachiket Deo and Mohan M Trivedi. Convolutional social pooling for vehicle trajectory prediction. arXiv preprint arXiv:1805.06771, 2018.
    Findings
  • Martin Ester, Hans peter Kriegel, JÃurg Sander, and Xiaowei Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pages 226–231. AAAI Press, 1996.
    Google ScholarLocate open access versionFindings
  • Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision meets robotics: The KITTI dataset. The International Journal of Robotics Research, 32(11):1231– 1237, 2013.
    Google ScholarLocate open access versionFindings
  • Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, and Alexandre Alahi. Social gan: Socially acceptable trajectories with generative adversarial networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
    Google ScholarLocate open access versionFindings
  • K. He, G. Gkioxari, P. Dollár, and R. Girshick. Mask r-cnn. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2980–2988, Oct 2017.
    Google ScholarLocate open access versionFindings
  • Simon Hecker, Dengxin Dai, and Luc Van Gool. End-to-end learning of driving models with surround-view cameras and route planners. In European Conference on Computer Vision (ECCV), 2018.
    Google ScholarLocate open access versionFindings
  • David Held, Devin Guillory, Brice Rebsamen, Sebastian Thrun, and Silvio Savarese. A probabilistic framework for real-time 3d segmentation using spatial, temporal, and semantic cues. In Proceedings of Robotics: Science and Systems, 2016.
    Google ScholarLocate open access versionFindings
  • David Held, Jesse Levinson, and Sebastian Thrun. Precision tracking with sparse 3d and dense color 2d data. In ICRA, 2013.
    Google ScholarFindings
  • David Held, Jesse Levinson, Sebastian Thrun, and Silvio Savarese. Combining 3d shape, color, and motion for robust anytime tracking. In Proceedings of Robotics: Science and Systems, Berkeley, USA, July 2014.
    Google ScholarLocate open access versionFindings
  • M. Himmelsbach and H. j. WuÌLnsche. Lidar-based 3d object perception. In Proceedings of 1st International Workshop on Cognition for Technical Systems, 2008.
    Google ScholarLocate open access versionFindings
  • Xinyu Huang, Xinjing Cheng, Qichuan Geng, Binbin Cao, Dingfu Zhou, Peng Wang, Yuanqing Lin, and Ruigang Yang. The apolloscape dataset for autonomous driving. In arXiv:1803.06184, 2018.
    Findings
  • Namhoon Lee, Wongun Choi, Paul Vernaza, Christopher Bongsoo Choy, Philip H. S. Torr, and Manmohan Krishna Chandraker. DESIRE: distant future prediction in dynamic scenes with interacting agents. CoRR, abs/1704.04394, 2017.
    Findings
  • John Leonard, Jonathan How, Seth Teller, Mitch Berger, Stefan Campbell, Gaston Fiore, Luke Fletcher, Emilio Frazzoli, Albert Huang, Sertac Karaman, Olivier Koch, Yoshiaki Kuwata, David Moore, Edwin Olson, Steve Peters, Justin Teo, Robert Truax, Matthew Walter, David Barrett, Alexander Epstein, Keoni Maheloni, Katy Moyer, Troy Jones, Ryan Buckley, Matthew Antone, Robert Galejs, Siddhartha Krishnamurthy, and Jonathan Williams. A perception-driven autonomous urban vehicle. J. Field Robot., 25(10):727–774, Oct. 2008.
    Google ScholarLocate open access versionFindings
  • Jesse Levinson, Jake Askeland, Jan Becker, Jennifer Dolson, David Held, Sören Kammel, J. Zico Kolter, Dirk Langer, Oliver Pink, Vaughan R. Pratt, Michael Sokolsky, Ganymed Stanek, David Michael Stavens, Alex Teichman, Moritz Werling, and Sebastian Thrun. Towards fully autonomous driving: Systems and algorithms. In IEEE Intelligent Vehicles Symposium (IV), 2011, Baden-Baden, Germany, June 5-9, 2011, pages 163–168, 2011.
    Google ScholarLocate open access versionFindings
  • Justin Liang and Raquel Urtasun. End-to-end deep structured models for drawing crosswalks. In The European Conference on Computer Vision (ECCV), September 2018.
    Google ScholarLocate open access versionFindings
  • Xingyu Liu, Charles R Qi, and Leonidas J Guibas. Flownet3d: Learning scene flow in 3d point clouds. arXiv preprint arXiv:1806.01411, 2019.
    Findings
  • Wenjie Luo, Bin Yang, and Raquel Urtasun. Fast and furious: Real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional net. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
    Google ScholarLocate open access versionFindings
  • Suraj M S, Hugo Grimmett, Lukas Platinsky, and Peter Ondruska. Visual vehicle tracking through noise and occlusions using crowd-sourced maps. In Intelligent Robots and Systems (IROS), 2018 IEEE international conference on, pages 4531–4538. IEEE, 2018.
    Google ScholarLocate open access versionFindings
  • Yuexin Ma, Xinge Zhu, Sibo Zhang, Ruigang Yang, Wenping Wang, and Dinesh Manocha. Trafficpredict: Trajectory prediction for heterogeneous traffic-agents. In Proceedings of the 33rd National Conference on Artifical Intelligence, AAAI’19. AAAI Press, 2019.
    Google ScholarLocate open access versionFindings
  • Will Maddern, Geoffrey Pascoe, Chris Linegar, and Paul Newman. 1 year, 1000 km: The Oxford Robotcar dataset. The International Journal of Robotics Research, 36(1):3–15, 2017.
    Google ScholarLocate open access versionFindings
  • A. Milan, L. Leal-Taixé, I. Reid, S. Roth, and K. Schindler. MOT16: A benchmark for multi-object tracking. arXiv:1603.00831 [cs], Mar. 2016. arXiv: 1603.00831.
    Findings
  • Michael Montemerlo, Jan Becker, Suhrid Bhat, Hendrik Dahlkamp, Dmitri Dolgov, Scott Ettinger, Dirk Haehnel, Tim Hilden, Gabe Hoffmann, Burkhard Huhnke, Doug Johnston, Stefan Klumpp, Dirk Langer, Anthony Levandowski, Jesse Levinson, Julien Marcil, David Orenstein, Johannes Paefgen, Isaac Penny, Anna Petrovskaya, Mike Pflueger, Ganymed Stanek, David Stavens, Antone Vogt, and Sebastian Thrun. Junior: The stanford entry in the urban challenge. J. Field Robot., 25(9):569–597, Sept. 2008.
    Google ScholarLocate open access versionFindings
  • Gaurav Pandey, James R Mcbride, and Ryan M Eustice. Ford campus vision and lidar data set. Int. J. Rob. Res., 30(13):1543–1552, Nov. 2011.
    Google ScholarLocate open access versionFindings
  • Abhishek Patil, Srikanth Malla, Haiming Gang, and Yi-Ting Chen. The h3d dataset for full-surround 3d multi-object detection and tracking in crowded urban scenes. In International Conference on Robotics and Automation, 2019.
    Google ScholarLocate open access versionFindings
  • Luis Patino, Tom Cane, Alain Vallee, and James Ferryman. Pets 2016: Dataset and challenge. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 1–8, 2016.
    Google ScholarLocate open access versionFindings
  • Akshay Rangesh, Kevan Yuen, Ravi Kumar Satzoda, Rakesh Nattoji Rajaram, Pujitha Gunaratne, and Mohan M. Trivedi. A multimodal, full-surround vehicular testbed for naturalistic studies and benchmarking: Design, calibration and deployment. CoRR, abs/1709.07502, 2017.
    Findings
  • Xibin Song, Peng Wang, Dingfu Zhou, Rui Zhu, Chenye Guan, Yuchao Dai, Hao Su, Hongdong Li, and Ruigang Yang. Apollocar3d: A large 3d car instance understanding benchmark for autonomous driving. CoRR, abs/1811.12222, 2018.
    Findings
  • Christopher Urmson, Joshua Anhalt, J. Andrew (Drew) Bagnell, Christopher R. Baker, Robert E. Bittner, John M. Dolan, David Duggins, David Ferguson, Tugrul Galatali, Hartmut Geyer, Michele Gittleman, Sam Harbaugh, Martial Hebert, Thomas Howard, Alonzo Kelly, David Kohanbash, Maxim Likhachev, Nick Miller, Kevin Peterson, Raj Rajkumar, Paul Rybski, Bryan Salesky, Sebastian Scherer, Young-Woo Seo, Reid Simmons, Sanjiv Singh, Jarrod M. Snider, Anthony (Tony) Stentz, William (Red) L. Whittaker, and Jason Ziglar. Tartan racing: A multi-modal approach to the darpa urban challenge. Technical report, Carnegie Mellon University, Pittsburgh, PA, April 2007.
    Google ScholarFindings
  • Shenlong Wang, Min Bai, Gellert Mattyus, Hang Chu, Wenjie Luo, Bin Yang, Justin Liang, Joel Cheverie, Sanja Fidler, and Raquel Urtasun. Torontocity: Seeing the world with a million eyes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
    Google ScholarLocate open access versionFindings
  • Weiyue Wang, Ronald Yu, Qiangui Huang, and Ulrich Neumann. Sgpn: Similarity group proposal network for 3d point cloud instance segmentation. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • Bin Yang, Ming Liang, and Raquel Urtasun. Hdnet: Exploiting hd maps for 3d object detection. In Aude Billard, Anca Dragan, Jan Peters, and Jun Morimoto, editors, Proceedings of The 2nd Conference on Robot Learning, volume 87 of Proceedings of Machine Learning Research, pages 146–155. PMLR, 29–31 Oct 2018.
    Google ScholarLocate open access versionFindings
0
Your rating :

No Ratings

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn