Reinforcement Learning for Weakly Supervised Temporal Grounding of Natural Language in Untrimmed Videos
MM '20: The 28th ACM International Conference on Multimedia Seattle WA USA October, 2020, pp. 1283-1291, 2020.
Temporal grounding of natural language in untrimmed videos is a fundamental yet challenging multimedia task facilitating cross-media visual content retrieval. We focus on the weakly supervised setting of this task that merely accesses to coarse video-level language description annotation without temporal boundary, which is more consistent...More
PPT (Upload PPT)