Conditional Image-Text Embedding Networks
european conference on computer vision, 2018.
This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model. In order to differentiate text phrases into semantically distinct subspaces, we propose a concept weight branch that automatically assigns phrases to embeddings, whereas prior works predef...More
PPT (Upload PPT)