Joint Learning in the Spatio-Temporal and Frequency Domains for Skeleton-Based Action Recognition
IEEE Transactions on Multimedia(2020)
摘要
Benefiting from its succinctness and robustness, skeleton-based action recognition has recently attracted much attention. Most existing methods utilize local networks (e.g. recurrent network, convolutional network, and graph convolutional network) to extract spatio-temporal dynamics hierarchically. As a consequence, the local and non-local dependencies, which contain more details and semantics respectively, are asynchronously captured in different level of layers. Moreover, existing methods are limited to the spatio-temporal domain and ignore information in the frequency domain. To better extract synchronous detailed and semantic information from multi-domains, we propose a residual frequency attention (rFA) block to focus on discriminative patterns in the frequency domain, and a synchronous local and non-local (SLnL) block to simultaneously capture the details and semantics in the spatio-temporal domain. In addition, to optimize the whole learning processes of the multi-branch network, we put it under a pseudo multi-task learning paradigm. During training, 1) a soft-margin focal loss (SMFL) is proposed to optimize the intra-branch separated learning process, which can automatically conduct data selection and encourage intrinsic margins in classifiers; 2) A mutual learning policy is also proposed to further facilitate the inter-branch collaborative learning process. Eventually, our approach achieves the state-of-the-art performance on several large-scale datasets for skeleton-based action recognition.
更多查看译文
关键词
Action recognition,frequency attention,synchronous local and non-local learning,soft-margin focal loss,multi-task learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络