It Is Not the Journey but the Destination: Endpoint Conditioned Trajectory Prediction

european conference on computer vision, pp. 759-776, 2020.

Cited by: 0|Bibtex|Views48
Other Links: arxiv.org|academic.microsoft.com
Weibo:
In this work we presented Predicted Endpoint Conditioned Network, a Pedestrian endpoint Conditioned trajectory prediction network

Abstract:

Human trajectory forecasting with multiple socially interacting agents is of critical importance for autonomous navigation in human environments, e.g., for self-driving cars and social robots. In this work, we present Predicted Endpoint Conditioned Network (PECNet) for flexible human trajectory prediction. PECNet infers distant trajecto...More

Code:

Data:

0
Introduction
  • Predicting the movement of dynamic objects is a central problem for autonomous agents, be it humans, social robots [1], or self-driving cars [2].
  • Humans have the will to exert causal forces to change their motion and constantly adjust their paths as they navigate around obstacles to achieve their goals [4].
  • This complicated planning process is partially internal, and makes predicting human trajectories from observations challenging.
  • A multitude of aspects should be taken into account beyond just past movement history, for instance latent predetermined goals, other moving agents in the scene, and social behavioral patterns
Highlights
  • Predicting the movement of dynamic objects is a central problem for autonomous agents, be it humans, social robots [1], or self-driving cars [2]
  • Stanford Drone Dataset: Table 2 shows the results of our proposed method against the previous baselines & state-of-the-art methods
  • Our proposed method achieves a superior performance compared to the previous state-of-the-art [38, 39] on both Average Displacement Error (ADE) & Final Displacement Error (FDE) metrics by a significant margin of 19.5% (ADE) & 26.8% (FDE)
  • In this work we presented Predicted Endpoint Conditioned Network (PECNet), a Pedestrian endpoint Conditioned trajectory prediction network
  • We showed that PECNet predicts rich and diverse multimodal socially compliant trajectories across a variety of scenes
  • We performed extensive ablations on several design choices such as endpoint conditioning position, number of samples, and choice of training signal to pinpoint the performance gains from PECNet
Methods
  • The authors aim to tackle the task of human trajectory prediction by reasoning about all the humans in the scene jointly while respecting social norms.
  • In the second step, the authors jointly consider the past histories {Tpk}αk=1 of all the pedestrians {pk}αk=1 present in the scene and their estimated endpoints {Gk}αk=1 for predicting socially compliant future trajectories Tfk. In the rest of this section the authors describe in detail, the approach to achieve this, using the endpoint estimation VAE for sampling the future endpoints G and a trajectory prediction module to use the sampled endpoints Gk to predict Tf
Results
  • The authors compare and discuss the method’s performance against above mentioned baselines on the ADE & FDE metrics.
  • Even without using the proposed social pooling module (OURS), the authors achieve a very good performance, underlining the importance of future endpoint conditioning in trajectory prediction.
  • Without the social pooling (OUR-S) the performance is still superior to the state-of-the-art by 34.6%, underlining the usefulness of conditioning on the endpoint in the method.
  • In Figure 6, the authors present animations of several socially compliant predictions
  • Both visualizations together show that along with producing state-of-the-art results, PECNet can perform rich multi-modal multi-agent forecasting
Conclusion
  • In this work the authors presented PECNet, a Pedestrian endpoint Conditioned trajectory prediction network.
  • The authors showed that PECNet predicts rich and diverse multimodal socially compliant trajectories across a variety of scenes.
  • The authors performed extensive ablations on several design choices such as endpoint conditioning position, number of samples, and choice of training signal to pinpoint the performance gains from PECNet. The authors introduced the “truncation trick” [42] for trajectory prediction, a simple method for boosting trajectory prediction accuracy in the few-shots regime.
  • The authors benchmarked PECNet across multiple datasets including Stanford Drone Dataset [5], ETH [6], and UCY [7], in all of which PECNet achieved the state-of-the-art
Summary
  • Introduction:

    Predicting the movement of dynamic objects is a central problem for autonomous agents, be it humans, social robots [1], or self-driving cars [2].
  • Humans have the will to exert causal forces to change their motion and constantly adjust their paths as they navigate around obstacles to achieve their goals [4].
  • This complicated planning process is partially internal, and makes predicting human trajectories from observations challenging.
  • A multitude of aspects should be taken into account beyond just past movement history, for instance latent predetermined goals, other moving agents in the scene, and social behavioral patterns
  • Objectives:

    The authors aim to tackle the task of human trajectory prediction by reasoning about all the humans in the scene jointly while respecting social norms.
  • Methods:

    The authors aim to tackle the task of human trajectory prediction by reasoning about all the humans in the scene jointly while respecting social norms.
  • In the second step, the authors jointly consider the past histories {Tpk}αk=1 of all the pedestrians {pk}αk=1 present in the scene and their estimated endpoints {Gk}αk=1 for predicting socially compliant future trajectories Tfk. In the rest of this section the authors describe in detail, the approach to achieve this, using the endpoint estimation VAE for sampling the future endpoints G and a trajectory prediction module to use the sampled endpoints Gk to predict Tf
  • Results:

    The authors compare and discuss the method’s performance against above mentioned baselines on the ADE & FDE metrics.
  • Even without using the proposed social pooling module (OURS), the authors achieve a very good performance, underlining the importance of future endpoint conditioning in trajectory prediction.
  • Without the social pooling (OUR-S) the performance is still superior to the state-of-the-art by 34.6%, underlining the usefulness of conditioning on the endpoint in the method.
  • In Figure 6, the authors present animations of several socially compliant predictions
  • Both visualizations together show that along with producing state-of-the-art results, PECNet can perform rich multi-modal multi-agent forecasting
  • Conclusion:

    In this work the authors presented PECNet, a Pedestrian endpoint Conditioned trajectory prediction network.
  • The authors showed that PECNet predicts rich and diverse multimodal socially compliant trajectories across a variety of scenes.
  • The authors performed extensive ablations on several design choices such as endpoint conditioning position, number of samples, and choice of training signal to pinpoint the performance gains from PECNet. The authors introduced the “truncation trick” [42] for trajectory prediction, a simple method for boosting trajectory prediction accuracy in the few-shots regime.
  • The authors benchmarked PECNet across multiple datasets including Stanford Drone Dataset [5], ETH [6], and UCY [7], in all of which PECNet achieved the state-of-the-art
Tables
  • Table1: Network architecture details for
  • Table2: Comparison of our method against several recently published multi-modal baselines and previous state-of-the-art method (denoted by *) on the Stanford Drone Dataset [<a class="ref-link" id="c5" href="#r5">5</a>]. ‘Our-S’ represents ablation of our method without social pooling. We report results for both ADE & FDE in pixels for both K = 5 and 20. Lower is better
  • Table3: Quantitative results for various previously published methods and state-ofthe-art method (denoted by *) on commonly used trajectory prediction datasets. Both ADE and FDE are reported in metres in world coordinates. ‘Our-S’ represents ablation of our method without social pooling
Download tables as Excel
Related work
  • There have been many previous studies [8] on how to forecast pedestrians’ trajectories and predict their behaviors. Several previous works propose to learn statistical behavioral patterns from the observed motion trajectories [9,10,11,12,13,14,15,16,17,18] for future trajectory prediction. Since then, many studies have developed models to account for agent interactions that may affect the trajectory — specifically, through scene and/or social information. Recently, there has been a significant focus on multimodal trajectory prediction to capture different possible future trajectories given the past. There has also been some research on goal-directed path planning, which consider pedestrians’ goals while predicting a path.

    2.1 Context-Based Prediction

    Many previous studies have imported environment semantics, such as crosswalks, road, or traffic lights, to their proposed trajectory prediction scheme. Kitani et al [19] encoded agent-space interactions by a Markov Decision Process (MDP) to predict potential trajectories for an agent. Ballan et al [20] leveraged a dynamic Bayesian network to construct motion dependencies and patterns from training data and transferred the trained knowledge to testing data. With the great success of the deep neural network, the Recurrent Neural Network (RNN) has become a popular modeling approach for sequence learning. Kim et al [21] trained a RNN combining multiple Long Short-term Memory (LSTM) units to predict the location of nearby cars. These approaches incorporate rich environment cues from the RGB image of the scene for pedestrians’ trajectory forecasting.
Funding
  • We show that PECNet improves state-of-the-art performance on the Stanford Drone trajectory prediction benchmark by ~19.5% and on the ETH/UCY benchmark by ~40.8%
  • Our proposed method achieves a superior performance compared to the previous state-of-the-art [38, 39] on both ADE & FDE metrics by a significant margin of 19.5% (ADE) & 26.8% (FDE)
  • Thus, PECNet is 16x faster and quick enough for practical, real-time use in autonomous agents. It performs significantly better than other image based methods primarily because extracting a global context vector for the image with a pretrained network, trained for other vision tasks as proposed in previous works [31] is perhaps sub-optimal method for merging image information in trajectory prediction
Study subjects and analysis
unique pedestrians: 11000
The dataset consists of 20 scenes captured using a drone in top down view around the university campus containing several moving agents like humans and vehicles. It consists of over 11, 000 unique pedestrians capturing over 185, 000 interactions between agents and over 40, 000 interactions between the agent and scene [5]. We use the standard test train split as used in [29, 31, 39] and other previous works

samples: 20
Hence, predicting the last observed way-point allows for lower prediction error than way-points in the middle! This in a nutshell, confirms the motivation of this work. Effect of Number of samples (K): All the previous works use K = 20 samples (except DESIRE which uses K = 5) to evaluate the multi-modal predictions for metrics ADE & FDE. Referring to Figure 4, we see the expected decreasing trend in ADE & FDE with time as K increases

samples: 20
Previous state-of-the-art achieves 12.58. [39] using K = 20 samples which is. ADE [Truncated] matched by PECNet at half the num-

samples: 20
However, in this work, except for this section, all the results are reported without the truncation trick to promote diversity in samples. Note that PECNet doesn’t use the RGB image of the scene directly, keeping it extremely lightweight with a forward pass at inference taking under 18 millseconds (on an unoptimized implementation) compared to 296 milli-seconds for Social GAN [29] with batch size of 1 and K = 20 samples on Nvidia V100 GPU. Thus, PECNet is 16x faster and quick enough for practical, real-time use in autonomous agents

Reference
  • Maren Bennewitz, Wolfram Burgard, and Sebastian Thrun. Learning motion patterns of persons for mobile service robots. In Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No. 02CH37292), volume 4, pages 3601–3606. IEEE, 2002.
    Google ScholarLocate open access versionFindings
  • S Thrun, W Burgard, and D Fox. Probabilistic robotics (intelligent robotics and autonomous agents series), ser. intelligent robotics and autonomous agents, 2005.
    Google ScholarFindings
  • Chris L Baker, Rebecca Saxe, and Joshua B Tenenbaum. Action understanding as inverse planning. Cognition, 113(3):329–349, 2009.
    Google ScholarLocate open access versionFindings
  • Brian D Ziebart, Nathan Ratliff, Garratt Gallagher, Christoph Mertz, Kevin Peterson, J Andrew Bagnell, Martial Hebert, Anind K Dey, and Siddhartha Srinivasa. Planning-based prediction for pedestrians. In 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3931– 3936. IEEE, 2009.
    Google ScholarLocate open access versionFindings
  • Alexandre Robicquet, Amir Sadeghian, Alexandre Alahi, and Silvio Savarese. Learning social etiquette: Human trajectory understanding in crowded scenes. In European conference on computer vision, pages 549– 565.
    Google ScholarLocate open access versionFindings
  • Stefano Pellegrini, Andreas Ess, and Luc Van Gool. Improving data association by joint modeling of pedestrian trajectories and groupings. In European conference on computer vision, pages 452–465.
    Google ScholarLocate open access versionFindings
  • Alon Lerner, Yiorgos Chrysanthou, and Dani Lischinski. Crowds by example. In Computer graphics forum, volume 26, pages 655–664. Wiley Online Library, 2007.
    Google ScholarLocate open access versionFindings
  • Andrey Rudenko, Luigi Palmieri, Michael Herman, Kris M. Kitani, Dariu M. Gavrila, and Kai O. Arras. Human Motion Trajectory Prediction: A Survey. arXiv e-prints, 2019.
    Google ScholarFindings
  • Eckhard Kruse and Friedrich M Wahl. Camera-based observation of obstacle motions to derive statistical data for mobile robot motion planning. In Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No. 98CH36146), volume 1, pages 662–667. IEEE, 1998.
    Google ScholarLocate open access versionFindings
  • Lin Liao, Dieter Fox, Jeffrey Hightower, Henry Kautz, and Dirk Schulz. Voronoi tracking: Location estimation using sparse and noisy sensor data. In Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003)(Cat. No. 03CH37453), volume 1, pages 723–728. IEEE, 2003.
    Google ScholarLocate open access versionFindings
  • Maren Bennewitz, Wolfram Burgard, Grzegorz Cielniak, and Sebastian Thrun. Learning motion patterns of people for compliant robot motion. The International Journal of Robotics Research, 24(1):31–48, 2005.
    Google ScholarLocate open access versionFindings
  • Meng Keat Christopher Tay and Christian Laugier. Modelling smooth paths using gaussian processes. In Field and Service Robotics, pages 381–390.
    Google ScholarLocate open access versionFindings
  • Eugen Kafer, Christoph Hermes, Christian Wohler, Helge Ritter, and Franz Kummert. Recognition of situation classes at road intersections. In 2010 IEEE International Conference on Robotics and Automation, pages 3960– 3965. IEEE, 2010.
    Google ScholarLocate open access versionFindings
  • Georges Aoude, Joshua Joseph, Nicholas Roy, and Jonathan How. Mobile agent trajectory prediction using bayesian nonparametric reachability trees. In Infotech@ Aerospace 2011, page 1512. 2011.
    Google ScholarLocate open access versionFindings
  • Christoph G Keller and Dariu M Gavrila. Will the pedestrian cross? a study on pedestrian path prediction. IEEE Transactions on Intelligent Transportation Systems, 15(2):494–506, 2013.
    Google ScholarLocate open access versionFindings
  • Michael Goldhammer, Konrad Doll, Ulrich Brunsmann, Andre Gensler, and Bernhard Sick. Pedestrian’s trajectory forecast in public traffic with artificial neural networks. In 2014 22nd International Conference on Pattern Recognition, pages 4110–4115. IEEE, 2014.
    Google ScholarLocate open access versionFindings
  • Shuang Xiao, Zhan Wang, and John Folkesson. Unsupervised robot learning to predict person motion. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 691–696. IEEE, 2015.
    Google ScholarLocate open access versionFindings
  • Tomasz Piotr Kucner, Martin Magnusson, Erik Schaffernicht, Victor Hernandez Bennetts, and Achim J Lilienthal. Enabling flow awareness for mobile robots in partially observable environments. IEEE Robotics and Automation Letters, 2(2):1093–1100, 2017.
    Google ScholarLocate open access versionFindings
  • Kris M Kitani, Brian D Ziebart, James Andrew Bagnell, and Martial Hebert. Activity forecasting. In European Conference on Computer Vision, pages 201–214.
    Google ScholarLocate open access versionFindings
  • Lamberto Ballan, Francesco Castaldo, Alexandre Alahi, Francesco Palmieri, and Silvio Savarese. Knowledge transfer for scene-specific motion prediction. In European Conference on Computer Vision, pages 697–713.
    Google ScholarLocate open access versionFindings
  • ByeoungDo Kim, Chang Mook Kang, Jaekyum Kim, Seung Hi Lee, Chung Choo Chung, and Jun Won Choi. Probabilistic vehicle trajectory prediction over occupancy grid map via recurrent neural network. In 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), pages 399–404. IEEE, 2017.
    Google ScholarLocate open access versionFindings
  • Dirk Helbing and Peter Molnar. Social force model for pedestrian dynamics. Physical review E, 51(5):4282, 1995.
    Google ScholarLocate open access versionFindings
  • Ramin Mehran, Alexis Oyama, and Mubarak Shah. Abnormal crowd behavior detection using social force model. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 935–942. IEEE, 2009.
    Google ScholarLocate open access versionFindings
  • Kota Yamaguchi, Alexander C Berg, Luis E Ortiz, and Tamara L Berg. Who are you with and where are you going? In CVPR 2011, pages 1345–1352. IEEE, 2011.
    Google ScholarLocate open access versionFindings
  • Alexandre Alahi, Vignesh Ramanathan, and Li Fei-Fei. Socially-aware large-scale crowd forecasting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2203–2210, 2014.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
    Google ScholarLocate open access versionFindings
  • Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, and Silvio Savarese. Social lstm: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 961–971, 2016.
    Google ScholarLocate open access versionFindings
  • Namhoon Lee, Wongun Choi, Paul Vernaza, Christopher B Choy, Philip HS Torr, and Manmohan Chandraker. Desire: Distant future prediction in dynamic scenes with interacting agents. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 336–345, 2017.
    Google ScholarLocate open access versionFindings
  • Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, and Alexandre Alahi. Social gan: Socially acceptable trajectories with generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2255–2264, 2018.
    Google ScholarLocate open access versionFindings
  • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David WardeFarley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
    Google ScholarLocate open access versionFindings
  • Amir Sadeghian, Vineet Kosaraju, Ali Sadeghian, Noriaki Hirose, Hamid Rezatofighi, and Silvio Savarese. Sophie: An attentive gan for predicting paths compliant to social and physical constraints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1349–1358, 2019.
    Google ScholarLocate open access versionFindings
  • Haosheng Zou, Hang Su, Shihong Song, and Jun Zhu. Understanding human behaviors in crowds by imitating the decision-making process. In ThirtySecond AAAI Conference on Artificial Intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • Jonathan Ho and Stefano Ermon. Generative adversarial imitation learning. In Advances in neural information processing systems, pages 4565–4573, 2016.
    Google ScholarLocate open access versionFindings
  • Eike Rehder and Horst Kloeden. Goal-directed pedestrian prediction. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 50–58, 2015.
    Google ScholarLocate open access versionFindings
  • Eike Rehder, Florian Wirth, Martin Lauer, and Christoph Stiller. Pedestrian prediction by planning using deep neural networks. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 1–5. IEEE, 2018.
    Google ScholarLocate open access versionFindings
  • Nicholas Rhinehart, Rowan McAllister, Kris Kitani, and Sergey Levine. Precog: Prediction conditioned on goals in visual multi-agent settings. arXiv preprint arXiv:1905.01296, 2019.
    Findings
  • Jiachen Li, Hengbo Ma, and Masayoshi Tomizuka. Conditional generative neural system for probabilistic trajectory prediction. arXiv preprint arXiv:1905.01631, 2019.
    Findings
  • Apratim Bhattacharyya, Michael Hanselmann, Mario Fritz, Bernt Schiele, and Christoph-Nikolas Straehle. Conditional flow variational autoencoders for structured sequence prediction. arXiv preprint arXiv:1908.09008, 2019.
    Findings
  • Nachiket Deo and Mohan M Trivedi. Trajectory forecasts in unknown environments conditioned on grid-based plans. arXiv preprint arXiv:2001.00735, 2020.
    Findings
  • Amir Sadeghian, Vineet Kosaraju, Agrim Gupta, Silvio Savarese, and Alexandre Alahi. Trajnet: Towards a benchmark for human trajectory prediction. arXiv preprint, 2018.
    Google ScholarFindings
  • Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7794–7803, 2018.
    Google ScholarLocate open access versionFindings
  • Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
    Findings
Full Text
Your rating :
0

 

Tags
Comments