A Sensorimotor Perspective on Contrastive Multiview Visual Representation Learning

IEEE Transactions on Cognitive and Developmental Systems(2022)

引用 0|浏览24
The contrastive multiview visual representation learning (CMVRL) framework has recently gained a lot of traction in the unsupervised representation learning literature. Combining a simple data augmentation strategy and a contrastive learning objective, it has been able to generate representations that compare favorably to their supervised counterparts on common downstream visual tasks. The theoretical understanding of this empirical success is currently an active area of research. In this article, we propose a sensorimotor perspective on the various components of the framework. We show how it can be interpreted as building representations that geometrically embed the stable semantic content that a situated agent experiences on short spatiotemporal scales when actively exploring its environment. We also discuss the relevance of the approach in light of contemporary active, dynamical, and hierarchical theories of perception. Finally, we extrapolate this sensorimotor perspective to outline promising future research directions that could push the state of the art further and help better understand how an autonomous agent could develop useful visual representations in an unsupervised fashion.
Artificial perception,contrastive multiview learning,representation learning,sensorimotor,unsupervised learning
AI 理解论文
Chat Paper