Combining Multiple Cues for Visual Madlibs Question Answering
International Journal of Computer Vision, pp. 1-23, 2018.
This paper presents an approach for answering fill-in-the-blank multiple choice questions from the Visual Madlibs dataset. Instead of generic and commonly used representations trained on the ImageNet classification task, our approach employs a combination of networks trained for specialized tasks such as scene recognition, person activity...More
Full Text (Upload PDF)
PPT (Upload PPT)