RubiksNet: Learnable 3D Shift for Efficient Video Action Recognition
Proceedings of the European Conference on Computer Vision (ECCV), 2020.
Video action recognition is a complex task dependent on modeling spatial and temporal context. Standard approaches rely on 2D or 3D convolutions to process such context, resulting in expensive operations with millions of parameters. Recent efficient architectures leverage a channel-wise shift-based primitive as a replacement for temporal ...More
Full Text (Upload PDF)
PPT (Upload PPT)