Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos

european conference on computer vision, pp. 333-351, 2020.

Cited by: 0|Bibtex|Views34
Other Links: arxiv.org|academic.microsoft.com

Abstract:

Automatically generating sentences to describe events and temporally localizing sentences in a video are two important tasks that bridge language and videos. Recent techniques leverage the multimodal nature of videos by using off-the-shelf features to represent videos, but interactions between modalities are rarely explored. Inspired by...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments