Multi-modal Transformer for Video Retrieval

european conference on computer vision, pp. 214-229, 2020.

Cited by: 0|Bibtex|Views44|DOI:https://doi.org/10.1007/978-3-030-58548-8_13
Other Links: arxiv.org|academic.microsoft.com

Abstract:

The task of retrieving video content relevant to natural language queries plays a critical role in effectively handling internet-scale datasets. Most of the existing methods for this caption-to-video retrieval problem do not fully exploit cross-modal cues present in video. Furthermore, they aggregate per-frame visual features with limited...More

Code:

Data:

Full Text
Your rating :
0

 

Tags
Comments