Compositional scene modeling with global object-centric representations

Machine Learning(2024)

引用 0|浏览27
暂无评分
摘要
The appearance of the same object may vary in different scene images due to occlusions between objects. Humans can quickly identify the same object, even if occlusions exist, by completing the occluded parts based on its complete canonical image in the memory. Achieving this ability is still challenging for existing models, especially in the unsupervised learning setting. Inspired by such an ability of humans, we propose a novel object-centric representation learning method to identify the same object in different scenes that may be occluded by learning global object-centric representations of complete canonical objects without supervision. The representation of each object is divided into an extrinsic part, which characterizes scene-dependent information (i.e., position and size), and an intrinsic part, which characterizes globally invariant information (i.e., appearance and shape). The former can be inferred with an improved IC-SBP module. The latter is extracted by combining rectangular and arbitrary-shaped attention and is used to infer the identity representation via a proposed patch-matching strategy with a set of learnable global object-centric representations of complete canonical objects. In the experiment, three 2D scene datasets are used to verify the proposed method’s ability to recognize the identity of the same object in different scenes. A complex 3D scene dataset and a real-world dataset are used to evaluate the performance of scene decomposition. Our experimental results demonstrate that the proposed method outperforms the comparison methods in terms of same object recognition and scene decomposition.
更多
查看译文
关键词
Object-centric representation,Patch-matching,Compositional scene representation,Generative model,Unsupervised learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要