AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We investigate both recurrent and convolutional architectures and evaluate on short-term prediction and long-term generation

Modeling Human Motion with Quaternion-based Neural Networks.

International Journal of Computer Vision, no. 4 (2020): 855-872

Cited by: 32|Views148
EI

Abstract

Previous work on predicting or generating 3D human pose sequences regresses either joint rotations or joint positions. The former strategy is prone to error accumulation along the kinematic chain, as well as discontinuities when using Euler angles or exponential maps as parameterizations. The latter requires re-projection onto skeleton co...More

Code:

Data:

0
Highlights
  • Modeling human motion is useful for many applications, including human action recognition (Du et al, 2015), action detection (Gu et al, 2018), or action anticipation (Kitani et al, 2012a)
  • Deep learning-based approaches have been successful in other pattern recognition tasks (Krizhevsky et al, 2012; Hinton et al, 2012; Bahdanau et al, 2015), and they have been studied for the prediction of sequences of 3D-skeleton joint positions (i.e. 3D human pose), both for short-term (Fragkiadaki et al, 2015; Martinez et al, 2017) and long-term modeling (Holden et al, 2016, 2017)
  • – We introduce a version of QuaterNet based on a convolutional neural network and compare to the original recurrent neural network approach
  • A recent trend in sequence modeling consists in replacing Recurrent Neural Networks (RNN) with convolutional neural networks (CNN) for tasks that were typically tackled with the former
  • Quaternion-valued RNNs (Parcollet et al, 2018a) and CNNs (Zhu et al, 2018; Gaudet and Maida, 2018; Parcollet et al, 2018b) have been proposed, resulting in promising results on tasks with long-range dependencies such as speech recognition. These architectures would be interesting for human motion modeling
  • We propose QuaterNet, a neural network architecture based on quaternions for rotation parameterization – an overlooked aspect in previous work
Tables
  • Table1: Results under the standard protocol (Fragkiadaki et al, 2015), with 4 samples per sequence. We shows the mean angle error for short-term motion prediction on Human 3.6M for different actions: simple baselines (top), previous RNN results (middle), QuaterNet (bottom). Bold indicates the best result, underlined indicates the second best. abs. = model absolute rotations, vel. = model velocities, TF = teacher forcing
  • Table2: Results under our proposed protocol, with 128 samples per sequence compared to 4 samples as in Table 1. We show the error for all 15 actions, as well as the average across actions
Download tables as Excel
Related work
  • The modeling of human motion relies on data from motion capture. This technology acquires sequences of 3dimensional joint positions at high frame rate (120 Hz – 1 kHz) and enables a wide range of applications, such as performance animation in movies and video games, and motion generation. In that context, the task of generating human motion sequences has been addressed with different strategies ranging from purely concatenation-based approaches (Arikan et al, 2003), concatenate-and-blend (Treuille et al, 2007), to hidden Markov models (Tanco and Hilton, 2000), switching linear dynamic systems (Pavlovic et al, 2000), restricted Boltzmann machines (Taylor et al, 2006), Gaussian processes (Wang et al, 2008), and random forests (Lehrmann et al, 2014).

    Recently, Recurrent Neural Networks (RNN) have been applied to short (Fragkiadaki et al, 2015; Martinez et al, 2017) and long-term prediction (Zhou et al, 2018). Convolutional networks (Holden et al, 2016; Li et al, 2018a) and feed-forward networks (Holden et al, 2017) have been successfully applied to long-term generation of locomotion. Early work took great care in choosing a model expressing the inter-dependence between joints (Jain et al, 2016), while recent work favors universal approximators (Martinez et al, 2017; Butepage et al, 2017; Holden et al, 2016, 2017). Beside choosing the neural architecture, framing the pose prediction task is equally important. Specifically, defining input and output variables, their representation as well as the loss function used for training are particularly impactful, as we show in our experiments. Equally important are the control variables conditioning motion generation. Long-term generation is an highly underspecified task with high uncertainty. In practice, animators for movies and games are interested in motion generators that can be conditioned from high level controls like trajectories and velocities (Holden et al, 2017), style (Li et al, 2018b) or action classes (Kiasari et al, 2018). Game development tools typically rely on classical move trees (Menache, 1999), which allows for a wide range of controls and excellent run-time efficiency. These advantages comes with a high development effort to deal with all possible action transitions. The development cost of move trees makes learning-based approach an attractive area of research.
Funding
  • The database was created with funding from NSF EIA-0196217
Study subjects and analysis
samples: 4
This exact methodology is adopted by Liu et al (2016); Martinez et al (2017); Pavllo et al (2018b); Gui et al (2018) and makes the quantitative results across these papers comparable. However, using only four samples results in a very high variance of the test results as we show next. This

samples: 4
To quantify the issue, we compute the zero-velocity baseline (Martinez et al, 2017) for an increasing number of samples per sequence. Figure 7 shows that four samples per sequence are not enough, since the error can vary by 10% (0.395 – 0.435) between the 25th. VA0-ebvlsoeocllouitctyietymmbooaddseeellline = 1

samples: 128
(b) Average after 80 ms and 75th quantile for the average over all actions (Figure 7(b)). This range can be reduced to 1.7% (0.413 – 0.420) with 128 samples, a number we believe to be a good compromise between variance and computational effort. Finally, we compare different approaches under the new protocol

pairs: 5
We selected only workers with “master” status. Each task compared 5 pairs of clips where methods are randomly ordered. Each task contains a control pair with an obvious flaw to exclude unreliable workers

samples: 4
. Results under the standard protocol (Fragkiadaki et al, 2015), with 4 samples per sequence. We shows the mean angle error for short-term motion prediction on Human 3.6M for different actions: simple baselines (top), previous RNN results (middle), QuaterNet (bottom). Bold indicates the best result, underlined indicates the second best. abs. = model absolute rotations, vel. = model velocities, TF = teacher forcing. Results under our proposed protocol, with 128 samples per sequence compared to 4 samples as in Table 1. We show the error for all 15 actions, as well as the average across actions

samples: 128
Results under the standard protocol (Fragkiadaki et al, 2015), with 4 samples per sequence. We shows the mean angle error for short-term motion prediction on Human 3.6M for different actions: simple baselines (top), previous RNN results (middle), QuaterNet (bottom). Bold indicates the best result, underlined indicates the second best. abs. = model absolute rotations, vel. = model velocities, TF = teacher forcing. Results under our proposed protocol, with 128 samples per sequence compared to 4 samples as in Table 1. We show the error for all 15 actions, as well as the average across actions. Recurrent architecture. “QMul” stands for quaternion multiplication: if included, it forces the model to output velocities; if bypassed, the model emits absolute rotations. The center block (in yellow) is the recurrent backbone of the network

Reference
  • Akhter I, Black MJ (2015) Pose-conditioned joint angle limits for 3d human pose reconstruction. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
    Google ScholarLocate open access versionFindings
  • Arikan O, Forsyth DA, O’Brien JF (2003) Motion synthesis from annotations. In: ACM Transactions on Graphics (SIGGRAPH)
    Google ScholarLocate open access versionFindings
  • Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:160706450
    Google ScholarFindings
  • Badler NI, Phillips CB, Webber BL (1993) Simulating humans: computer graphics animation and control. Oxford University Press
    Google ScholarFindings
  • Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations (ICLR)
    Google ScholarLocate open access versionFindings
  • Bengio S, Vinyals O, Jaitly N, Shazeer N (2015) Scheduled sampling for sequence prediction with recurrent neural networks. In: Advances in Neural Information Processing Systems (NIPS)
    Google ScholarLocate open access versionFindings
  • Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. Journal of machine learning research
    Google ScholarLocate open access versionFindings
  • Butepage J, Black MJ, Kragic D, Kjellstrom H (2017) Deep representation learning for human motion prediction and classification. In: Conference on Computer Vision and Pattern Recognition (CVPR)
    Google ScholarLocate open access versionFindings
  • Butepage J, Kjellstrom H, Kragic D (2018) Anticipating many futures: Online human motion prediction and generation for human-robot interaction. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp 1–9
    Google ScholarLocate open access versionFindings
  • Byravan A, Fox D (2017) SE3-nets: Learning rigid body motion using deep neural networks. In: IEEE International Conference on Robotics and Automation (ICRA) Chao YW, Yang J, Price BL, Cohen S, Deng J (2017) Forecasting human dynamics from static images. Conference on Computer Vision and Pattern Recognition (CVPR) Cho K, Van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Conference on Empirical Methods in Natural Language Processing (EMNLP) Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. NIPS Deep Learning and Representation Learning Workshop CMU (2003) CMU graphics lab motion capture database. http://mocap.cs.cmu.edu.
    Locate open access versionFindings
  • Shoemake K (1985) Animating rotation with quaternion curves. Transactions on Computer Graphics (SIGGRAPH) Stoer J, Bulirsch R (1993) Introduction to Numerical Analysis. Springer-Verlag Tanco LM, Hilton A (2000) Realistic synthesis of novel human movements from a database of motion. In: Workshop on Human Motion (HUMO) Taylor GW, Hinton GE, Roweis ST (2006) Modeling human motion using binary latent variables. In: Advances in Neural Information Processing Systems (NIPS) Toyer S, Cherian A, Han T, Gould S (2017) Human pose forecasting via deep markov models. In: International Conference on Digital Image Computing: Techniques and Applications (DICTA) Treuille A, Lee Y, Popovic Z (2007) Near-optimal character animation with continuous control. ACM Transactions on Graphics (tog) 26(3):7 Villegas R, Yang J, Zou Y, Sohn S, Lin X, Lee H (2017) Learning to generate long-term future via hierarchical prediction. In: International Conference on Machine Learning (ICML)
    Google ScholarLocate open access versionFindings
  • Villegas R, Yang J, Ceylan D, Lee H (2018) Neural kinematic networks for unsupervised motion retargetting. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp 8639–8648
    Google ScholarLocate open access versionFindings
  • Walker J, Doersch C, Gupta A, Hebert M (2016) An uncertain future: Forecasting from static images using variational autoencoders. In: European Conference on Computer Vision (ECCV)
    Google ScholarLocate open access versionFindings
  • Walker J, Marino K, Gupta A, Hebert M (2017) The pose knows: Video forecasting by generating pose futures. International Conference on Computer Vision (ICCV)
    Google ScholarLocate open access versionFindings
  • Wang JM, Fleet DJ, Hertzmann A (2008) Gaussian process dynamical models for human motion. Transaction on Pattern Analysis and Machine Intelligence (TPAMI)
    Google ScholarLocate open access versionFindings
  • Wang Z, Chai J, Xia S (2018) Combining recurrent neural networks and adversarial training for human motion synthesis and control. arXiv 1806.08666
    Findings
  • Wiseman S, Rush AM (2016) Sequence-to-sequence learning as beam-search optimization. In: Conference on Empirical Methods in Natural Language Processing (EMNLP)
    Google ScholarLocate open access versionFindings
  • Xia S, Wang C, Chai J, Hodgins J (2015) Realtime style transfer for unlabeled heterogeneous human motion. In: ACM Transactions on Graphics (SIGGRAPH)
    Google ScholarLocate open access versionFindings
  • Zhou F, De la Torre F, Hodgins JK (2013) Hierarchical aligned cluster analysis for temporal clustering of human motion. Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
    Google ScholarLocate open access versionFindings
  • Zhou X, Sun X, Zhang W, Liang S, Wei Y (2016a) Deep kinematic pose regression. In: European Conference on Computer Vision (ECCV) Workshops
    Google ScholarLocate open access versionFindings
  • Zhou X, Wan Q, Zhang W, Xue X, Wei Y (2016b) Model-based deep hand pose estimation. In: IJCAI
    Google ScholarFindings
  • Zhou Y, Li Z, Xiao S, He C, Li H (2018) Autoconditioned LSTM network for extended complex human motion synthesis. In: International Conference on Learning Representations (ICLR)
    Google ScholarLocate open access versionFindings
  • Zhu X, Xu Y, Xu H, Chen C (2018) Quaternion convolutional neural networks. In: European Conference on Computer Vision (ECCV)
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科