Temporal Modular Networks for Retrieving Complex Compositional Activities in Videos
ECCV, pp. 569-586, 2018.
A major challenge in computer vision is scaling activity understanding to the long tail of complex activities without requiring collecting large quantities of data for new actions. The task of video retrieval using natural language descriptions seeks to address this through rich, unconstrained supervision about complex activities. However...More
PPT (Upload PPT)