Referring to Objects in Videos using Spatio-Temporal Identifying Descriptions
arXiv: Computer Vision and Pattern Recognition, 2019.
This paper presents a new task, the grounding of spatio-temporal identifying descriptions in videos. Previous work suggests potential bias in existing datasets and emphasizes the need for a new data creation schema to better model linguistic structure. We introduce a new data collection scheme based on grammatical constraints for surface ...More
PPT (Upload PPT)