Action Recognition行为识别Action Recognition是指对视频中人的行为动作进行识别,即读懂视频。
CVPR, (2019)
We propose the actional-structural graph convolution networks for skeleton-based action recognition
Cited by200BibtexViews156
0
0
computer vision and pattern recognition, (2019)
We introduce the Action Transformer model for recognizing and localizing human actions in video clips. We repurpose a Transformer-style architecture to aggregate features from the spatiotemporal context around the person whose actions we are trying to classify. We show that by us...
Cited by163BibtexViews202
0
0
CVPR, (2019): 2566-2576
We introduce a self-supervised method for learning visual correspondence from unlabeled video. The main idea is to use cycle-consistency in time as free supervisory signal for learning visual representations from scratch. At training time, our model learns a feature map represent...
Cited by143BibtexViews157
0
0
CVPR, (2019): 284-293
To understand the world, we humans constantly need to relate the present to the past, and put events in context. In this paper, we enable existing video models to do the same. We propose a long-term feature bank---supportive information extracted over the entire span of a video--...
Cited by137BibtexViews509
0
0
IEEE Transactions on Pattern Analysis and Machine Intelligence, no. 8 (2019)
Skeleton-based human action recognition has recently attracted increasing attention thanks to the accessibility and the popularity of 3D skeleton data. One of the key challenges in skeleton-based action recognition lies in the large view variations when capturing data. In order t...
Cited by97BibtexViews85DOI
0
0
International Conference on Computer Vision, (2019): 5552-5561
Group convolution has been shown to offer great computational savings in various 2D convolutional architectures for image classification. It is natural to ask: 1) if group convolution can help to alleviate the high computational cost of video classification networks; 2) what fact...
Cited by67BibtexViews143
0
0
CVPR, (2019): 1227-1236
We propose an attention enhanced graph convolutional LSTM network for skeletonbased action recognition, which is the first attempt of graph convolutional LSTM for this task
Cited by63BibtexViews111
0
0
CVPR, (2019): 9945-9953
In this paper, we propose a convolutional layer inspired by optical flow algorithms to learn motion representations. Our representation flow layer is a fully-differentiable layer designed to optimally capture the `flowu0027 of any representation channel within a convolutional neu...
Cited by34BibtexViews45
0
0
CVPR, (2019): 254-263
This paper focuses on the temporal aspect for recognizing human activities in videos; an important visual cue that has long been undervalued. We revisit the conventional definition of activity and restrict it to Complex Action: a set of one-actions with a weak temporal pattern ...
Cited by28BibtexViews84
0
0
International Conference on Computer Vision, (2019): 852-861
Video recognition models have progressed significantly over the past few years, evolving from shallow classifiers trained on hand-crafted features to deep spatiotemporal networks. However, labeled video data required to train such models has not been able to keep up with the ever...
Cited by18BibtexViews144
0
0
CVPR, (2019): 7872-7881
Spatio-temporal feature learning is of central importance for action recognition in videos. Existing deep neural network models either learn spatial and temporal features independently (C2D) or jointly with unconstrained parameters (C3D). In this paper, we propose a novel neural ...
Cited by12BibtexViews35
0
0
CVPR, (2019): 4273-4281
Correspondences between frames encode rich information about dynamic content in videos. However, it is challenging to effectively capture and learn those due to their irregular structure and complex dynamics. In this paper, we propose a novel neural network that learns video re...
Cited by11BibtexViews57
0
0
CVPR, (2019)
Can performance on the task of action quality assessment (AQA) be improved by exploiting a description of the action and its quality? Current AQA and skills assessment approaches propose to learn features that serve only one task - estimating the final score. In this paper, we pr...
Cited by10BibtexViews31
0
0
ICCV, pp.6231-6241, (2019)
While many action recognition datasets consist of collections of brief, trimmed videos each containing a relevant action, videos in the real-world (e.g., on YouTube) exhibit very different properties: they are often several minutes long, where brief relevant clips are often int...
Cited by9BibtexViews124DOI
0
0
CVPR, (2019): 2457-2466
True video understanding requires making sense of non-lambertian scenes where the color of light arriving at the camera sensor encodes information about not just the last object it collided with, but about multiple mediums -- colored windows, dirty mirrors, smoke or rain. Layered...
Cited by4BibtexViews104
0
0
Jinwoo Choi,Chen Gao,Jia-Bin Huang, Joseph C. E. Messou
NeurIPS, pp.851-863, (2019)
Figure 1: Quiz time! Can you guess what action the (blocked) person is doing in the four videos? Even though we cannot see a human actor, we can easily predict the action by considering where the scene is. Training a CNN model from these examples may lead to a strong bias toward ...
Cited by1BibtexViews36
0
0
CVPR, (2018)
We show the significance of non-local modeling for the tasks of video classification, object detection and segmentation, and pose estimation
Cited by2515BibtexViews384
0
0
Du Tran,Heng Wang,Lorenzo Torresani, Jamie Ray,Yann LeCun, Manohar Paluri
computer vision and pattern recognition, (2018)
In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition. Our motivation stems from the observation that 2D CNNs applied to individual frames of the video have remained solid performers in action recogn...
Cited by798BibtexViews308DOI
0
0
IEEE Transactions on Pattern Analysis and Machine Intelligence, no. 6 (2018): 1510-1517
Typical human actions last several seconds and exhibit characteristic spatio-temporal structure. Recent methods attempt to capture this structure and learn action representations with convolutional neural networks. Such representations, however, are typically learned at the level...
Cited by674BibtexViews178DOI
0
0
小科