Reducing 0s bias in video moment retrieval with a circular competence-based captioner

Information Processing & Management(2023)

引用 0|浏览33
The current study addresses the problem of retrieving a specific moment from an untrimmed video by a sentence query. Existing methods have achieved high performance by designing various structures to match visual-text relations. Yet, these methods tend to return an interval starting from 0s, which we named “0s bias”. In this paper, we propose a Circular Co-Teaching (CCT) mechanism using a captioner to improve an existing retrieval model (localizer) from two aspects: biased annotations and easy samples. Correspondingly, CCT contains two processes: (1) Pseudo Query Generation (captioner to localizer), aiming at transferring the knowledge from generated queries to the localizer to balance annotations; (2) Competence-based Curriculum Learning (localizer to captioner), training the captioner in an easy-to-hard fashion guided by localization results, making pairs of the false-positive moment and pseudo query become easy samples for the localizer. Extensive experiments show that our CCT can alleviate “0s bias” with even 4% improvement for existing approaches on average in two public datasets (ActivityNet-Captions, and Charades-STA), in terms of R@1,IoU=0.7. Notably, our method also outperforms baselines in an out-of-distribution scenario. We also quantitatively validate CCT’s ability to cope with “0s bias” by a proposed metric, DM. Our study not only theoretically contributes to detecting “0s bias”, but also provides a highly effective tool for video moment retrieval by alleviating such bias.
Video moment retrieval,Competence-based captioner,Circular Co-Teaching,0s bias
AI 理解论文