Unified Visual-Semantic Embeddings: Bridging Vision and Language with Structured Meaning Representations
CVPR, pp. 6609-6618, 2019.
We propose the Unified Visual-Semantic Embeddings (Unified VSE) for learning a joint space of visual representation and textual semantics. The model unifies the embeddings of concepts at different levels: objects, attributes, relations, and full scenes. We view the sentential semantics as a combination of different semantic components suc...More
PPT (Upload PPT)