MobiPose: real-time multi-person pose estimation on mobile devices

Jinrui Zhang
Jinrui Zhang
Xiaohui Xu
Xiaohui Xu
Fucheng Jia
Fucheng Jia

SenSys '20: The 18th ACM Conference on Embedded Networked Sensor Systems Virtual Event Japan November, 2020, pp. 136-149, 2020.

被引用0|浏览35
EI
微博一下
The results show that MobiPose can reduce the end-to-end system latency by 60% on average compared with the baseline system, with a latency reduction of 10%, 20%, 30% by the motion vectors-based tracker, the optimized Pose Estimation model and the parallel execution of CPU and GP...

摘要

Human pose estimation is a key technique for many vision-based mobile applications. Yet existing multi-person pose-estimation methods fail to achieve a satisfactory user experience on commodity mobile devices such as smartphones, due to their long model-inference latency. In this paper, we propose MobiPose, a system designed to enable rea...更多

代码

数据

0
ZH
下载 PDF 全文
引用
微博一下
简介
  • Pose Estimation (PE) aims to localize the key joints of human bodies in images or videos, such as knees and wrists [29].
  • With the recent advances in deep learning, state-of-the-art methods of PE usually employ the Convolutional Neural Network (CNN) model to estimate human poses in images or videos [7, 9, 40, 41].
  • Those deep-learning-based PE methods fail to achieve a satisfactory performance on resource-limited mobile devices like smartphones.
  • Benefiting from the advances in deep learning, state-of-the-art methods employ CNN models to conduct both human-detection and PE tasks
重点内容
  • Pose Estimation (PE) aims to localize the key joints of human bodies in images or videos, such as knees and wrists [29]
  • The results show that MobiPose can reduce the end-to-end system latency by 60% on average compared with the baseline system, with a latency reduction of 10%, 20%, 30% by the motion vectors (MVs)-based tracker, the optimized PE model and the parallel execution of CPU and GPU, respectively
  • We plan to improve the performance of our system from the following aspects: 1) use H.265 codec to achieve more fine-grained MVs, to help the MV-based tracker to accurately track key points across frames; 2) conduct more evaluations to investigate the impact of system parameters, such as frame size and camera resolution, on the MV-based tracker; 3) optimize the tail latency caused by the Convolutional Neural Network (CNN)-based detector; 4) use bottom-up solutions for multi-person PE to address the occlusion issue; 5) design a load-aware scheduler which can dynamically assigns task to CPU and GPU according to their instantaneous workload
  • We have presented MobiPose, a real-time multi-person PE system running on mobile devices
  • MobiPose estimates the human poses in live videos captured by cameras of mobile devices
  • For a given set of computation tasks, i.e., 100 frames with 3 persons per frame, Figure 11(e) shows that MobiPose achieves 62.5% and 37.9% reduction on average in the total energy consumption compared to the baseline running on mobile CPU and GPU
  • To alleviate the substantial computing overhead involved in multiperson PE, MobiPose employs three novel techniques to improve the efficiency, i.e., the motion-vector-based method for fast location of the human proposals across frames, a mobile-friendly PE model with low latency and sufficient accuracy, and an efficient parallel PE engine
结果
  • The authors first evaluate the key components of MobiPose, in terms of accuracy and latency of the MV-based tracker and the optimized PE model, and the speedup of the parallel execution using both GPU and CPU.
  • The results show that MobiPose can reduce the end-to-end system latency by 60% on average compared with the baseline system, with a latency reduction of 10%, 20%, 30% by the MV-based tracker, the optimized PE model and the parallel execution of CPU and GPU, respectively.
  • For PE accuracy, the authors use percentage of correct keypoints (PCKh) as the metric.
结论
  • Generality of MobiPose MobiPose is a software optimization solution that can be deployed to various mobile and IoT devices, besides the three devices used in this paper.
  • It is difficult to generalize the implementation for one unit to another one
  • To avoid this issue, the authors implement MobiPose on GPU and CPU, which have been widely embedded in the modern SoCs. the ideas in MobiPose, i.e., dispatching model inferences tasks onto the heterogeneous cores to accelerate parallel model inferences, can be readily applied to the specific processing units.In this paper, the authors have presented MobiPose, a real-time multi-person PE system running on mobile devices.
  • Comprehensive experiments demonstrate that MobiPose effectively improves the inference performance by 4.5× and 2.8× and saves 62.5% and 37.9% energy consumption per frame on mobile CPU and GPU, respectively, while achieves a higher accuracy, compared to the baseline
总结
  • Introduction:

    Pose Estimation (PE) aims to localize the key joints of human bodies in images or videos, such as knees and wrists [29].
  • With the recent advances in deep learning, state-of-the-art methods of PE usually employ the Convolutional Neural Network (CNN) model to estimate human poses in images or videos [7, 9, 40, 41].
  • Those deep-learning-based PE methods fail to achieve a satisfactory performance on resource-limited mobile devices like smartphones.
  • Benefiting from the advances in deep learning, state-of-the-art methods employ CNN models to conduct both human-detection and PE tasks
  • Results:

    The authors first evaluate the key components of MobiPose, in terms of accuracy and latency of the MV-based tracker and the optimized PE model, and the speedup of the parallel execution using both GPU and CPU.
  • The results show that MobiPose can reduce the end-to-end system latency by 60% on average compared with the baseline system, with a latency reduction of 10%, 20%, 30% by the MV-based tracker, the optimized PE model and the parallel execution of CPU and GPU, respectively.
  • For PE accuracy, the authors use percentage of correct keypoints (PCKh) as the metric.
  • Conclusion:

    Generality of MobiPose MobiPose is a software optimization solution that can be deployed to various mobile and IoT devices, besides the three devices used in this paper.
  • It is difficult to generalize the implementation for one unit to another one
  • To avoid this issue, the authors implement MobiPose on GPU and CPU, which have been widely embedded in the modern SoCs. the ideas in MobiPose, i.e., dispatching model inferences tasks onto the heterogeneous cores to accelerate parallel model inferences, can be readily applied to the specific processing units.In this paper, the authors have presented MobiPose, a real-time multi-person PE system running on mobile devices.
  • Comprehensive experiments demonstrate that MobiPose effectively improves the inference performance by 4.5× and 2.8× and saves 62.5% and 37.9% energy consumption per frame on mobile CPU and GPU, respectively, while achieves a higher accuracy, compared to the baseline
表格
  • Table1: Latency and accuracy of single-person pose estimation on Snapdragon 845
  • Table2: The performance of PE model(FP32) with difference input resolutions on Snapdragon 845
  • Table3: Hardware configurations of the three mobile devices used in experiments
  • Table4: Accuracy of our PE Model and PoseNet
  • Table5: Performance improvement of parallel execution of our PE model using both GPU and CPU
  • Table6: Accuracy comparison for multi-person PE
  • Table7: Speedup of MobiPose compared to baseline
Download tables as Excel
相关工作
  • Object Tracking in Videos Several solutions are proposed to track object across consecutive frames [8, 27, 42, 48]. Ujiie et al [42] use motion vectors to interpolate the bounding boxes of objects between key frames. Wu et al [48] exploit a smaller CNN network to analyze information-sparse motion vectors to compensate the results extracting from key frames, to improve the accuracy of action recognition. From the perspective of system design, both Chen et al [8] and Liu et al [27] exploit motion vectors to track cars and road signs through frames, based on software solution and Nvidia codec, respectively. These works are designed for undeformed objects without changes in shape or pose, while the MV-based tracker in MobiPose is specially designed for human bodies with frequent changes of pose. By carefully filtering the noise in the background and calibrating with the output of PE models, MobiPose can achieve reliable fine-grained tracking of human bodies.
基金
  • This research was supported in part by the National Key R&D Program of China (2019YFA0706403), National Natural Science Foundation of China (62072472, 61702561, 61702562, U19A2067), 111 Project (B18059) and the Natural Science Foundation of Hunan Province (2020JJ5774)
  • This work was also partially supported by the Microsoft Research Asia “Star Track” Program
研究对象与分析
persons: 3
We have implemented the MobiPose system on offthe-shelf commercial smartphones and conducted comprehensive experiments to evaluate the effectiveness of the proposed techniques. Evaluation results show that MobiPose achieves over 20 frames per second pose estimation with 3 persons per frame, and significantly outperforms the state-of-the-art baseline, with a speedup of up to 4.5× and 2.8× in latency on CPU and GPU, respectively, and an improvement of 5.1% in pose-estimation model accuracy. Furthermore, MobiPose achieves up to 62.5% and 37.9% energyper-frame saving on average in comparison with the baseline on mobile CPU and GPU, respectively

persons: 3
Furthermore, in the case of multi-person PE, the latency of PoseNet will increase linearly with the number of persons, and cannot meet the requirement of real-time PE. For example, PoseNet takes more than 180 milliseconds to process an image with three persons, without counting the time of human detection which is a necessary step in top-down multi-person PE, as shown in Figure 1. One feasible solution is to offload the model-inference task of PE onto a remote cloud or nearby edge server with more powerful computation capability

persons: 3
We have implemented the MobiPose system on commodity smartphones and conducted comprehensive experiments. Evaluation results show that MobiPose achieves the real-time PE of over 20 frames per second (FPS), i.e., a latency of 47ms, with 3 persons per frame. MobiPose significantly outperforms prior state-of-theart methods and baseline 1, in terms of both latency and accuracy

persons: 3
Furthermore, even with PoseNet, it is still very challenging to achieve real-time multi-person PE on the smartphone. For example, given an image with only 3 persons, even with the fastest human detection and running on the GPU, the total latency will be 99.6ms (18.6ms of human detection plus 3 times of PE of 27ms each), resulting in a frame rate of only 10 FPS. For running on the CPU, the total latency will be 215ms, i.e., only 4.6 FPS

persons: 3
In real applications, there are other overheads like data copying and re-shaping. Indeed, our measurement shows that it takes PoseNet for 145ms to conduct PE of 3 persons, even running on the GPU. Such a long latency causes severe lagging in UI, leading to poor user experience

labeled persons: 1500
Each video contains 41 frames. As a total, the selected videos contain around 1,500 labeled persons. GPU Android system

persons: 4
The GPU-only scheduler also processes PE models in serial. We measure the latency in three settings, each with 2, 3, and 4 persons for parallel PE execution, respectively. In each setting we conduct 100 rounds of test and report the averaged results in Table 5

persons: 3
Latency In Figure 11, we compare the performance of MobiPose and baseline on different mobile devices. As shown in Figure 11(a), MobiPose can achieve over 20 FPS with 3 persons per frame, i.e., the end-to-end latency of 47 milliseconds, on Vivo IQOO, while the baseline can only get 8.3 FPS. On other mobile devices, MobiPose also significantly reduces the latency compared to the baseline, i.e., by 62.6% on Xiaomi 8 and 47% on HiKey970 on average, as shown in Figure 11(b) and Figure 11(c), respectively

persons: 4
1.23× to 1.33× 1.04× to 1.34× 1.32× to 1.41×. * Number of persons per frame. 2\3\4 means the total latency of PE for 2, 3, and 4 persons, respectively. head sho. elb. wri. hip knee ank

persons: 4
Figure 11(f) shows that MobiPose trades higher system memory for lower latency, since it runs several model inference instance on both CPU and GPU simultaneously. However, even in the case of 4 persons per frame, the total memory usage of MobiPose is less than 300MB, which is tolerable as mainstream smartphones have multi-gigabytes memory. Figure 12 shows some example results of MobiPose obtained on the Snapdragon 855 SoC from live videos in real world

persons: 3
* The number of persons per frame. However, for a given set of computation tasks, i.e., 100 frames with 3 persons per frame, Figure 11(e) shows that MobiPose achieves 62.5% and 37.9% reduction on average in the total energy consumption compared to the baseline running on mobile CPU and GPU. 6 RELATED WORK

引用论文
  • Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org.
    Findings
  • Howard Andrew, Sandler Mark, Chu Grace, Chen Liang-Chieh, and Chen Bo. 2019. Searching for mobilenetv3. In Proceedings of the IEEE International Conference on Computer Vision. 1314–1324.
    Google ScholarLocate open access versionFindings
  • Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2014. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
    Google ScholarLocate open access versionFindings
  • AI Benchmark. 2020. http://ai-benchmark.com/.
    Findings
  • Erik Bochinski, Tobias Senst, and Thomas Sikora. 2018. Extending IOU based multi-object tracking by visual information. In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 1–6.
    Google ScholarLocate open access versionFindings
  • Alexander Branover, Denis Foley, and Maurice Steinman. 2012. Amd fusion apu: Llano. Ieee Micro 32, 2 (2012), 28–37.
    Google ScholarLocate open access versionFindings
  • Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 201Realtime multiperson 2d pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7291–7299.
    Google ScholarLocate open access versionFindings
  • Tiffany Yu-Han Chen, Lenin Ravindranath, Shuo Deng, Paramvir Bahl, and Hari Balakrishnan. 2015. Glimpse: Continuous, real-time object recognition on mobile devices. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems. ACM, 155–168.
    Google ScholarLocate open access versionFindings
  • Xiao Chu, Wei Yang, Wanli Ouyang, Cheng Ma, Alan L Yuille, and Xiaogang Wang. 2017. Multi-context attention for human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1831–1840.
    Google ScholarLocate open access versionFindings
  • Suman Deb, Alpana Sharan, Shivangi Chaturvedi, Ankit Arun, and Aayush Gupta. 2018. Interactive Dance Lessons through Human Body Pose Estimation and Skeletal Topographies Matching. International Journal of Computational Intelligence & IoT 2, 4 (2018).
    Google ScholarLocate open access versionFindings
  • Shen Yongzeng Li Xiaofeng Wu Donglin. 2012. RESEARCH AND APPLICATION OF OPENMAX IL FRAMEWORK BASED ON ANDROID [J]. Computer Applications and Software 8 (2012).
    Google ScholarFindings
  • Yong Du, Wei Wang, and Liang Wang. 2015. Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1110–1118.
    Google ScholarLocate open access versionFindings
  • Ahmed Elhayek, Onorina Kovalenko, Pramod Murthy, Jameel Malik, and Didier Stricker. 2018. Fully automatic multi-person human motion capture for VR applications. In International Conference on Virtual Reality and Augmented Reality. Springer, 28–47.
    Google ScholarLocate open access versionFindings
  • Google. 2019. https://blog.tensorflow.org/2019/08/track-human-poses-in-realtime-on-android-tensorflow-lite.html.
    Findings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
    Google ScholarLocate open access versionFindings
  • Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
    Findings
  • Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132–7141.
    Google ScholarLocate open access versionFindings
  • Loc N Huynh, Youngki Lee, and Rajesh Krishna Balan. 2017. Deepmon: Mobile gpu-based deep learning framework for continuous vision applications. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 82–95.
    Google ScholarLocate open access versionFindings
  • Andrey Ignatov, Radu Timofte, William Chou, Ke Wang, Max Wu, Tim Hartley, and Luc Van Gool. 2018. Ai benchmark: Running deep neural networks on android smartphones. In Proceedings of the European conference on computer vision (ECCV). 0–0. [20] ildoonet Kim. 20https://github.com/ildoonet/tf-pose-estimation.
    Locate open access versionFindings
  • [21] Alejandro Jaimes and Nicu Sebe. 2007. Multimodal human–computer interaction: A survey. Computer vision and image understanding 108, 1-2 (2007), 116–134.
    Google ScholarLocate open access versionFindings
  • [22] Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 1–12.
    Google ScholarLocate open access versionFindings
  • [23] Youngsok Kim, Joonsung Kim, Dongju Chae, Daehyun Kim, and Jangwoo Kim. 2019. Layer: Low Latency On-Device Inference Using Cooperative SingleLayer Acceleration and Processor-Friendly Quantization. In Proceedings of the Fourteenth EuroSys Conference 2019. 1–15.
    Google ScholarLocate open access versionFindings
  • [24] Kotlin. 2019. https://developer.android.com/kotlin.
    Findings
  • [25] Royson Lee, Stylianos I Venieris, Lukasz Dudziak, Sourav Bhattacharya, and Nicholas D Lane. 2019. MobiSR: Efficient On-Device Super-Resolution through Heterogeneous Mobile Processors. In The 25th Annual International Conference on Mobile Computing and Networking. 1–16.
    Google ScholarLocate open access versionFindings
  • [26] Xiaohua Lei, Xiuhua Jiang, and Caihong Wang. 2013. Design and implementation of a real-time video stream analysis system based on FFMPEG. In 2013 Fourth World Congress on Software Engineering. IEEE, 212–216.
    Google ScholarLocate open access versionFindings
  • [27] Luyang Liu, Hongyu Li, and Marco Gruteser. 2019. Edge assisted real-time object detection for mobile augmented reality. In MobiCom. ACM.
    Google ScholarFindings
  • [28] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21–37.
    Google ScholarLocate open access versionFindings
  • [29] Zhao Liu, Jianke Zhu, Jiajun Bu, and Chun Chen. 2015. A survey of human pose estimation: the body parts parsing based methods. Journal of Visual Communication and Image Representation 32 (2015), 10–19.
    Google ScholarLocate open access versionFindings
  • [30] Eric Marchand, Hideaki Uchiyama, and Fabien Spindler. 2015. Pose estimation for augmented reality: a hands-on survey. IEEE transactions on visualization and computer graphics 22, 12 (2015), 2633–2651.
    Google ScholarLocate open access versionFindings
  • [31] Arvind Narayanan, Saurabh Verma, Eman Ramadan, Pariya Babaie, and Zhi-Li Zhang. 2018. Deepcache: A deep learning based framework for content caching. In Proceedings of the 2018 Workshop on Network Meets AI & ML. ACM, 48–53.
    Google ScholarLocate open access versionFindings
  • [32] Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In European conference on computer vision. Springer, 483–499.
    Google ScholarLocate open access versionFindings
  • [33] Guanghan Ning, Ping Liu, Xiaochuan Fan, and Chi Zhang. 2018. A top-down approach to articulated human pose estimation and tracking. In Proceedings of the European Conference on Computer Vision (ECCV).
    Google ScholarLocate open access versionFindings
  • [34] Edson Luiz Padoin, Laércio Lima Pilla, Márcio Castro, Francieli Z Boito, Philippe Olivier Alexandre Navaux, and Jean-François Méhaut. 2014. Performance/energy trade-off in scientific computing: the case of ARM big. LITTLE and Intel Sandy Bridge. IET Computers & Digital Techniques 9, 1 (2014), 27–35.
    Google ScholarLocate open access versionFindings
  • [35] Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele. 2016. DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
    Google ScholarLocate open access versionFindings
  • [36] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234–241.
    Google ScholarLocate open access versionFindings
  • [37] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and LiangChieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520.
    Google ScholarLocate open access versionFindings
  • [38] Monsoon Solutions. 2016. Power monitor. Updated: Jan (2016).
    Google ScholarFindings
  • [39] Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. 2019. Deep high-resolution representation learning for human pose estimation. arXiv preprint arXiv:1902.09212 (2019).
    Findings
  • [40] Wei Tang, Pei Yu, and Ying Wu. 2018. Deeply learned compositional models for human pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV). 190–206.
    Google ScholarLocate open access versionFindings
  • [41] Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, and Christoph Bregler. 2015. Efficient object localization using convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 648–656.
    Google ScholarLocate open access versionFindings
  • [42] Takayuki Ujiie, Masayuki Hiromoto, and Takashi Sato. 2018. Interpolation-based object detection using motion vectors for embedded real-time tracking systems. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 616–624.
    Google ScholarLocate open access versionFindings
  • [43] Olivier Valery, Pangfeng Liu, and Jan-Jan Wu. 2017. Cpu/gpu collaboration techniques for transfer learning on mobile devices. In 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS). IEEE, 477–484.
    Google ScholarLocate open access versionFindings
  • [44] Olivier Valery, Pangfeng Liu, and Jan-Jan Wu. 2019. A collaborative CPU-GPU approach for deep learning on mobile devices. Concurrency and Computation: Practice and Experience 31, 17 (2019), e5225.
    Google ScholarLocate open access versionFindings
  • [45] Ji Wang, Bokai Cao, Philip Yu, Lichao Sun, Weidong Bao, and Xiaomin Zhu. 2018. Deep learning towards mobile applications. In 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS). IEEE, 1385–1393.
    Google ScholarLocate open access versionFindings
  • [46] Greg Welch, Gary Bishop, et al. 1995. An introduction to the Kalman filter. (1995).
    Google ScholarFindings
  • [47] Tom Williams, Nhan Tran, Josh Rands, and Neil T Dantam. 2018.
    Google ScholarFindings
  • [48] Chao-Yuan Wu, Manzil Zaheer, Hexiang Hu, R Manmatha, Alexander J Smola, and Philipp Krähenbühl. 2018. Compressed video action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6026–6035.
    Google ScholarLocate open access versionFindings
  • [49] Bin Xiao, Haiping Wu, and Yichen Wei. 2018. Simple baselines for human pose estimation and tracking. In Proceedings of the European Conference on Computer Vision (ECCV). 466–481.
    Google ScholarLocate open access versionFindings
  • [50] Bruce Xiaohan Nie, Caiming Xiong, and Song-Chun Zhu. 2015. Joint action recognition and pose estimation from video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1293–1301.
    Google ScholarLocate open access versionFindings
  • [51] Mengwei Xu, Jiawei Liu, Yuanqiang Liu, Felix Xiaozhu Lin, Yunxin Liu, and Xuanzhe Liu. 2019. A first look at deep learning apps on smartphones. In The World Wide Web Conference. 2125–2136.
    Google ScholarFindings
  • [52] Mengwei Xu, Mengze Zhu, Yunxin Liu, Felix Xiaozhu Lin, and Xuanzhe Liu. 2018. DeepCache: Principled cache for mobile deep vision. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking. 129–144.
    Google ScholarLocate open access versionFindings
  • [53] Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Thirty-second AAAI conference on artificial intelligence.
    Google ScholarLocate open access versionFindings
  • [54] Wei Yang, Shuang Li, Wanli Ouyang, Hongsheng Li, and Xiaogang Wang. 2017. Learning feature pyramids for human pose estimation. In Proceedings of the IEEE
    Google ScholarLocate open access versionFindings
  • [55] Takanori Yokoyama, Toshiki Iwasaki, and Toshinori Watanabe. 2009. Motion vector based moving object detection and tracking in the MPEG compressed domain. In 2009 Seventh International Workshop on Content-Based Multimedia Indexing. IEEE, 201–206.
    Google ScholarLocate open access versionFindings
  • [56] Xiaoping Yun and Eric R Bachmann. 2006. Design, implementation, and experimental results of a quaternion-based Kalman filter for human body motion tracking. IEEE transactions on Robotics 22, 6 (2006), 1216–1227.
    Google ScholarLocate open access versionFindings
  • [57] Zihua Zeng. 2019. https://github.com/edvardHua/PoseEstimationForMobile.
    Findings
  • [58] Yuhao Zhu, Anand Samajdar, Matthew Mattina, and Paul Whatmough. 2018. Euphrates: Algorithm-soc co-design for low-power mobile continuous vision. arXiv preprint arXiv:1803.11232 (2018).
    Findings
您的评分 :
0

 

标签
评论