Practical Cross-modal Manifold Alignment for Grounded Language
Abstract:
We propose a cross-modality manifold alignment procedure that leverages triplet loss to jointly learn consistent, multi-modal embeddings of language-based concepts of real-world items. Our approach learns these embeddings by sampling triples of anchor, positive, and negative data points from RGB-depth images and their natural language d...More
Code:
Data:
Full Text
Tags
Comments