Finding "It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos
CVPR, pp. 5948-5957, 2018.
Grounding textual phrases in visual content with standalone image-sentence pairs is a challenging task. When we consider grounding in instructional videos, this problem becomes profoundly more complex: the latent temporal structure of instructional videos breaks independence assumptions and necessitates contextual understanding for resolv...More
PPT (Upload PPT)