AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We have proposed an approach to jointly estimate the layout of rooms as well as the clutter present in the scene using RGB-D data

Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors

ICCV, pp.1273-1280, (2013)

被引用87|浏览46
EI WOS
下载 PDF 全文
引用
微博一下

摘要

In this paper we propose an approach to jointly estimate the layout of rooms as well as the clutter present in the scene using RGB-D data. Towards this goal, we propose an effective model that is able to exploit both depth and appearance features, which are complementary. Furthermore, our approach is efficient as we exploit the inherent d...更多

代码

数据

0
简介
  • Finding the 3D structures composing the world is key for developing autonomous systems that can navigate the environment, and importantly, recognize and interact with it.
  • While finding such structures from monocular imagery is extremely difficult, depth sensors can be employed to reduce the inherent ambiguities of still images.
  • The superiority of RGB-D sensors when compared to more traditional imagery has been demonstrated for the tasks of semantic segmentation [25, 26, 8], inferring support relations [26], 3D detection [14] or estimating physical properties of images [2]
重点内容
  • Finding the 3D structures composing the world is key for developing autonomous systems that can navigate the environment, and importantly, recognize and interact with it
  • While finding such structures from monocular imagery is extremely difficult, depth sensors can be employed to reduce the inherent ambiguities of still images
  • We make use of both appearance and depth features, which, as we show in our experimental evaluation are complementary, and frame a joint optimization problem which exploits the dependencies between these two tasks
  • Wang et al [29] reason jointly about the layout as well as the clutter present in the scene. They propose to make use of an iterated conditional modes (ICM) algorithm, to tractably deal with the complex potentials resulting from the interaction of the clutter and the layout
  • We have proposed an approach to jointly estimate the layout of rooms as well as the clutter present in the scene using RGB-D data
  • We demonstrate the effectiveness of our approach using the challenging NYU v2 dataset [26] and show that by employing depth we boost performance of the layout estimation task by 6% while clutter estimation improves by 13%
  • We demonstrated that clutter can be further employed to segment several furniture classes
结果
  • The authors demonstrate the effectiveness of the approach on the challenging NYU v2 dataset and show that employing depth reduces the layout error by 6% and the clutter estimation by 13%.
  • The authors demonstrate the effectiveness of the approach using the challenging NYU v2 dataset [26] and show that by employing depth the authors boost performance of the layout estimation task by 6% while clutter estimation improves by 13%
  • They propose to make use of an iterated conditional modes (ICM) algorithm, to tractably deal with the complex potentials resulting from the interaction of the clutter and the layout.
  • Importance of depth features: As shown in Table 2, by employing depth, the approach improves accuracy by 10% in labeling and by 6% in layout estimation
结论
  • The authors have proposed an approach to jointly estimate the layout of rooms as well as the clutter present in the scene using RGB-D data.
  • Towards this goal, the authors derived and efficient algorithm to perform inference within a joint model and demonstrate its effectiveness on the NYU v2 data set, showing impressive error reductions over the state-of-theart of 6% for the layout task and 13% in estimating cluttered.
  • The authors plan to further extend the approach to be able to exploit video as well as to incorporate objects in the form of 3D cuboids
表格
  • Table1: Table 1
  • Table2: Comparison to the state-of-the-art with different features
  • Table3: Super-pixel Estimation: Unsupervised segmentation results as a function of the importance of the appearance and depth terms. A good compromise is equal weighting for both appearance and depth. The original SLIC corresponds to λd = 0. Its performance is clearly inferior to using both sources of information
  • Table4: Intersection over union (IOU) computed as in the PASCAL segmentation challenge. The labeling task consist on 6 classes, the five walls and clutter. Note that by using depth the average IOU measure improves by more than 17%, a very significant result
  • Table5: IOU for the semantic classes as a function of C
Download tables as Excel
相关工作
  • Early approaches to semantic scene understanding in the outdoor setting focused on producing qualitative 3D parses [19, 13, 6], ground plane estimates [30] or parsing facades [27, 17]. More recently, accurate estimations of the road topologies at intersections [4] as well as the 3D vehicles present in the scene [5] have been estimated from stereo and monocular video respectively. Depth sensors in the form of high-end laser scanners have become a standard in the context of autonomous driving (e.g., the Google car).

    Indoor scene understanding approaches have taken advantage of the Manhattan world properties of rooms and frame the layout estimation task as the prediction of a 3D cuboid aligned with the three main dominant orientations [10, 11, 18, 22, 23, 16, 29]. Assuming vanishing points to be given, Hedau et al [10] and Wang et al [29] showed that the problem has only four degrees of freedom. Inference, however, remains difficult as a priori the involved potentials, counting features in each of the faces defined by the layout, are high-order. As a consequence, only a few candidates were utilized, resulting in suboptimal solutions. A few years later, Schwing et al [22] showed that the a priori high-order potentials, are decomposable into sums of pairwise potentials by extending the concept of integral images to accumulators oriented with the dominant orientations. As a consequence denser parameterizations were possible, resulting in much better performance. In [23], a branch and bound approach was developed to retrieve a global optimum of the layout problem. More general layouts than 3D cuboids were predicted in [3]. Among other applications, room layouts have been used in [7] to predict affordances and in [12, 20] to estimate the free space.
基金
  • We demonstrate the effectiveness of our approach on the challenging NYU v2 dataset and show that employing depth reduces the layout error by 6% and the clutter estimation by 13%
  • We demonstrate the effectiveness of our approach using the challenging NYU v2 dataset [26] and show that by employing depth we boost performance of the layout estimation task by 6% while clutter estimation improves by 13%
  • They propose to make use of an iterated conditional modes (ICM) algorithm, to tractably deal with the complex potentials resulting from the interaction of the clutter and the layout. However, this algorithm gets easily trapped in local optima. As a result, their layout estimation results are more than 5% lower than the state-of-the-art
  • As we take advantage of the inherent decomposition of the potentials, our approach is efficient and results in impressive performance improving 6% over the state-of-the-art in the layout task and 13% in estimating clutter
  • As shown in Table 2, in the layout task we outperform [22] by more than 5%
  • For the labeling task, the results are even better since usage of a depth cue improves GC by 13%
  • Importance of depth features: As shown in Table 2, by employing depth, our approach improves accuracy by 10% in labeling and by 6% in layout estimation
引用论文
  • R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk. SLIC Superpixels Compared to State-of-theart Superpixel Methods. In PAMI, 2012. 3
    Google ScholarLocate open access versionFindings
  • J. T. Barron and J. Malik. Intrinsic Scene Properties from a Single RGB-D Image. In Proc. CVPR, 2013. 1
    Google ScholarLocate open access versionFindings
  • A. Flint, D. Murray, and I. Reid. Manhatten Scene Understanding Using Monocular, Stereo, and 3D Features. In Proc. ICCV, 2011. 2
    Google ScholarLocate open access versionFindings
  • A. Geiger, M. Lauer, and R. Urtasun. A Generative Model for 3D Urban Scene Understanding from Movable Platforms. In Proc. CVPR, 2011. 2
    Google ScholarLocate open access versionFindings
  • A. Geiger, C. Wojek, and R. Urtasun. Joint 3D Estimation of Objects and Scene Layout. In Proc. NIPS, 2011. 2
    Google ScholarLocate open access versionFindings
  • A. Gupta, A. Efros, and M. Hebert. Blocks world revisited: Image understanding using qualitative geometry and mechanics. In Proc. ECCV, 2010. 2
    Google ScholarLocate open access versionFindings
  • A. Gupta, S. Satkin, A. A. Efros, and M. Hebert. From 3D Scene Geometry to Human Workspace. In Proc. CVPR, 2011. 2
    Google ScholarLocate open access versionFindings
  • S. Gupta, P. Arbelaez, and J. Malik. Perceptual Organization and Recognition of Indoor Scenes from RGBD Images. In Proc. CVPR, 2013. 1
    Google ScholarLocate open access versionFindings
  • T. Hazan and A. Shashua. Norm-Product Belief Propagation: Primal-Dual Message-Passing for LP-Relaxation and Approximate-Inference. Trans. on Information Theory, 2010. 5
    Google ScholarLocate open access versionFindings
  • V. Hedau, D. Hoiem, and D. Forsyth. Recovering the Spatial Layout of Cluttered Rooms. In Proc. ICCV, 2009. 1, 2, 3, 6
    Google ScholarLocate open access versionFindings
  • V. Hedau, D. Hoiem, and D. Forsyth. Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry. In Proc. ECCV, 2010. 1, 2, 6
    Google ScholarLocate open access versionFindings
  • V. Hedau, D. Hoiem, and D. Forsyth. Recovering Free Space of Indoor Scenes from a Single Image. In Proc. CVPR, 202
    Google ScholarLocate open access versionFindings
  • D. Hoiem, A. A. Efros, and M. Hebert. Automatic Photo Pop-up. In Siggraph, 2005. 2, 3, 5
    Google ScholarLocate open access versionFindings
  • H. Jiang and J. Xiao. A Linear Approach to Matching Cuboids in RGBD Images. In Proc. CVPR, 2013. 1
    Google ScholarLocate open access versionFindings
  • D. C. Lee, A. Gupta, M. Hebert, and T. Kanade. Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces. In Proc. NIPS, 2010. 3
    Google ScholarLocate open access versionFindings
  • D. C. Lee, M. Hebert, and T. Kanade. Geometric Reasoning for Single Image Structure Recovery. In Proc. CVPR, 2009. 2, 3
    Google ScholarLocate open access versionFindings
  • A. Martinovic, M. Mathias, J. Weissenberg, and L. van Gool. A Three-Layered Approach to Facade Parsing. In Proc. ECCV, 2012. 2
    Google ScholarLocate open access versionFindings
  • L. Pero, J. Bowdish, D. Fried, B. Kermgard, E. Hartley, and K. Barnard. Bayesian geometric modeling of indoor scenes. In Proc. CVPR, 2012. 1, 2
    Google ScholarLocate open access versionFindings
  • A. Saxena, M. Sun, and A. Y. Ng. Make3D: Learning 3D Scene Structure from a Single Still Image. In PAMI, 2008. 2
    Google ScholarLocate open access versionFindings
  • A. G. Schwing, S. Fidler, M. Pollefeys, and R. Urtasun. Box In the Box: Joint 3D Layout and Object Reasoning from Single Images. In Proc. ICCV, 2013. 2
    Google ScholarLocate open access versionFindings
  • A. G. Schwing, T. Hazan, M. Pollefeys, and R. Urtasun. Distributed Message Passing for Large Scale Graphical Models. In Proc. CVPR, 2011. 5
    Google ScholarLocate open access versionFindings
  • A. G. Schwing, T. Hazan, M. Pollefeys, and R. Urtasun. Efficient Structured Prediction for 3D Indoor Scene Understanding. In Proc. CVPR, 2012. 1, 2, 3, 4, 5, 6
    Google ScholarLocate open access versionFindings
  • A. G. Schwing and R. Urtasun. Efficient Exact Inference for 3D Indoor Scene Understanding. In ECCV, 2012. 1, 2, 5
    Google ScholarLocate open access versionFindings
  • J. Shotton, R. Girshick, A. Fitzgibbon, T. Sharp, M. Cook, M. Finocchio, R. Moore, P. Kohli, A. Criminisi, A. Kipman, and A. Blake. Efficient Human Pose Estimation from Single Depth Images. In PAMI, 2012. 1
    Google ScholarLocate open access versionFindings
  • N. Silberman and R. Fergus. Indoor Scene Segmentation using a Structured Light Sensor. In Workshop on 3D Representation and Recognition, 2011. 1
    Google ScholarLocate open access versionFindings
  • N. Silberman, P. Kohli, D. Hoiem, and R. Fergus. Indoor Segmentation and Support Inference from RGBD Images. In Proc. ECCV, 2012. 1, 2, 3, 5
    Google ScholarLocate open access versionFindings
  • O. Teboul, I. Kokinos, L. Simon, P. Koutsourakis, and N. Paragios. Shape Grammar Parsing via Reinforcement Learning. In Proc. CVPR, 2011. 2
    Google ScholarLocate open access versionFindings
  • I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces. In Proc. ICML, 2004. 5
    Google ScholarLocate open access versionFindings
  • H. Wang, S. Gould, and D. Koller. Discriminative Learning with Latent Variables for Cluttered Indoor Scene Understanding. In Proc. ECCV, 2010. 1, 2, 3
    Google ScholarLocate open access versionFindings
  • C. Wojek, S. Roth, K. Schindler, and B. Schiele. Monocular 3D Scene Modeling and Inference: Understanding MultiObject Traffic Scenes. In Proc. ECCV, 2010. 2
    Google ScholarLocate open access versionFindings
  • L. B. Xiaofeng Ren and D. Fox. Rgb-( d ) scene labeling: Features and algorithms. In CVPR, 2012. 6
    Google ScholarLocate open access versionFindings
  • K. Yamaguchi, D. McAllester, and R. Urtasun. Robust Monocular Epipolar Flow Estimatio. In Proc. CVPR, 2013. 3
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科