Finding "It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos
CVPR, pp. 5948-5957, 2018.
EI
Abstract:
Grounding textual phrases in visual content with standalone image-sentence pairs is a challenging task. When we consider grounding in instructional videos, this problem becomes profoundly more complex: the latent temporal structure of instructional videos breaks independence assumptions and necessitates contextual understanding for resolv...More
Code:
Data:
Full Text
Tags
Comments