MAAL: Multimodality-Aware Autoencoder-Based Affordance Learning for 3D Articulated Objects

Yuanzhi Liang,Xiaohan Wang,Linchao Zhu,Yi Yang

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)（2023）

引用 1|浏览29

暂无评分

摘要

Inferring affordance for 3D articulated objects is a challenging and practical problem. It is a primary problem for applying robots to real-world scenarios. The exploration can be summarized as figuring out where to act and how to act. Correspondingly, the task mainly requires producing actionability scores, action proposals, and success likelihood scores according to the given 3D object information and robotic information. Current works usually directly process multi-modal inputs with early fusion and apply critic networks to produce scores, which leads to insufficient multi-modal learning ability and inefficiently iterative training in multiple stages. This paper proposes a novel Multimodality-Aware Autoencoder-based affordance Learning (MAAL) for the 3D object affordance problem. It is an efficient pipeline, trained in one go, and only requires a few positive samples in training data. More importantly, MAAL contains a MultiModal Energized Encoder (MME) for better multi-modal learning. It comprehensively models all multi-modal inputs from 3D objects and robotic actions. Jointly considering information from multiple modalities, the encoder further learns interactions between robots and objects. MME empowers the better multi-modal learning ability for understanding object affordance. Experimental results and visualizations, based on a large-scale dataset PartNet-Mobility, show the effectiveness of MAAL in learning multi-modal data and solving the 3D articulated object affordance problem.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要