Background Matting: The World is Your Green Screen

CVPR, pp. 2288-2297, 2020.

Cited by: 2|Views107
EI
Weibo:
A key challenge is the absence of real ground truth data for the background matting problem

Abstract:

We propose a method for creating a matte -- the per-pixel foreground color and alpha -- of a person by taking photos or videos in an everyday setting with a handheld camera. Most existing matting methods require a green screen background or a manually created trimap to produce a good matte. Automatic, trimap-free methods are appearing, ...More

Code:

Data:

0
Introduction
  • Imagine being able to create a matte the per-pixel color and alpha of a person by taking photos or videos in an everyday setting with just a handheld smartphone.
  • Taking one extra photo in the moment requires a small amount of foresight, but the effort is tiny compared to creating a trimap after the fact
  • This advantage is even greater for video input.
  • Most existing matting methods require a green screen background or a manually created trimap to produce a good matte.
  • In the trimap free approach, the authors ask the user to take an additional photo of the background without the subject at the time of capture
  • This step requires a small amount of foresight but is far less timeconsuming than creating a trimap.
  • The authors demonstrate results on a wide variety of photos and videos and show significant improvement over the state of the art
Highlights
  • Imagine being able to create a matte the per-pixel color and alpha of a person by taking photos or videos in an everyday setting with just a handheld smartphone
  • We propose taking an additional photo of the background just before or after the subject is in frame, and using this photo to perform background matting
  • Person matting without big camera movement in front of a static background is, we argue, a very useful and not uncommon scenario, and we deliver state-of-the-art results under these circumstances
  • We compare results across 220 synthetic composites from the Adobe Dataset [36]: 11 held-out mattes of human subjects composed over 20 random backgrounds, in Table 1
  • The fixed-camera setup consisted of an inexpensive selfie stick tripod
  • A key challenge is the absence of real ground truth data for the background matting problem
Results
  • Results on Synthetic

    Composite Adobe Dataset

    The authors train GAdobe on 26.9k exemplars: 269 objects composited over 100 random backgrounds, plus perturbed versions of the backgrounds as input to the network.
  • The authors omitted LFM from this comparison, as the released model was trained on additional training data, along with the training set of Adobe dataset.
  • That said, it produces a SAD and MSE of 2.00, 1.08e−2, resp., while the method achieves error of 1.72, 0.97e−2.
  • The authors update the weights of D after 5 successive updates of GReal
Conclusion
  • The authors have proposed a background matting technique that enables casual capture of high quality foreground+alpha mattes in natural settings.
  • The authors' method requires the photographer to take a shot with a subject and without, not moving much between shots.
  • This approach avoids using a green screen or painstakingly constructing a detailed trimap as typically needed for high matting quality.
  • The authors have developed a deep learning framework trained on synthetic-composite data and adapted to real data using an adversarial network
Summary
  • Introduction:

    Imagine being able to create a matte the per-pixel color and alpha of a person by taking photos or videos in an everyday setting with just a handheld smartphone.
  • Taking one extra photo in the moment requires a small amount of foresight, but the effort is tiny compared to creating a trimap after the fact
  • This advantage is even greater for video input.
  • Most existing matting methods require a green screen background or a manually created trimap to produce a good matte.
  • In the trimap free approach, the authors ask the user to take an additional photo of the background without the subject at the time of capture
  • This step requires a small amount of foresight but is far less timeconsuming than creating a trimap.
  • The authors demonstrate results on a wide variety of photos and videos and show significant improvement over the state of the art
  • Objectives:

    The goal of the work is to fully eliminate the need for manually created trimaps.
  • Results:

    Results on Synthetic

    Composite Adobe Dataset

    The authors train GAdobe on 26.9k exemplars: 269 objects composited over 100 random backgrounds, plus perturbed versions of the backgrounds as input to the network.
  • The authors omitted LFM from this comparison, as the released model was trained on additional training data, along with the training set of Adobe dataset.
  • That said, it produces a SAD and MSE of 2.00, 1.08e−2, resp., while the method achieves error of 1.72, 0.97e−2.
  • The authors update the weights of D after 5 successive updates of GReal
  • Conclusion:

    The authors have proposed a background matting technique that enables casual capture of high quality foreground+alpha mattes in natural settings.
  • The authors' method requires the photographer to take a shot with a subject and without, not moving much between shots.
  • This approach avoids using a green screen or painstakingly constructing a detailed trimap as typically needed for high matting quality.
  • The authors have developed a deep learning framework trained on synthetic-composite data and adapted to real data using an adversarial network
Tables
  • Table1: Alpha matte error on Adobe Dataset (lower is better)
  • Table2: User study on 10 real world videos (fixed camera)
  • Table3: User study on 10 real world videos (handheld)
  • Table4: User Study: Ours-Real vs Ours-Adobe
Download tables as Excel
Related work
  • Matting is a standard technique used in photo editing and visual effects. In an uncontrolled setting, this is known as the “natural image matting” problem; pulling the matte requires solving for seven unknowns per pixel (F, B, α) and is typically solved with the aid of a trimap. In a studio, the subject is photographed in front of a uniformly lit, constant-colored background (e.g., a green screen); reasonable results are attainable if the subject avoids wearing colors that are similar to the background. We take a middle ground in our work: we casually shoot the subject in a natural (non-studio) setting, but include an image of the background without the subject to make the matting problem more tractable. In this section, we discuss related work on natural image matting, captured without unusual hardware.

    Traditional approaches. Traditional (non-learning based) matting approaches generally require a trimap as input. They can be roughly categorized into sampling-based techniques and propagation-based techniques. Samplingbased methods [11, 9, 14, 29, 33, 34, 2] use sampling to build the color statistics of the known foreground and background, and then solve for the matte in the ‘unknown’ region. Propagation-based approaches [6, 18, 20, 21, 31, 13, 15] aim to propagate the alpha matte from the foreground and the background region into the ‘unknown’ region to solve the matting equation. Wang and Cohen [35] presents a nice survey of many different matting techniques.
Funding
  • This work was supported by NSF/Intel Visual and Experimental Computing Award #1538618 and the UW Reality Lab
Reference
  • Yagiz Aksoy, Tae-Hyun Oh, Sylvain Paris, Marc Pollefeys, and Wojciech Matusik. Semantic soft segmentation. ACM Transactions on Graphics (TOG), 37(4):72, 2018. 3
    Google ScholarLocate open access versionFindings
  • Yagiz Aksoy, Tunc Ozan Aydin, and Marc Pollefeys. Designing effective inter-pixel information flow for natural image matting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 29–37, 2017. 2
    Google ScholarLocate open access versionFindings
  • Shaofan Cai, Xiaoshuai Zhang, Haoqiang Fan, Haibin Huang, Jiangyu Liu, Jiaming Liu, Jiaying Liu, Jue Wang, and Jian Sun. Disentangled image matting. International Conference on Computer Vision (ICCV), 2019. 2
    Google ScholarLocate open access versionFindings
  • Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018. 4, 6, 12
    Google ScholarLocate open access versionFindings
  • Quan Chen, Tiezheng Ge, Yanyu Xu, Zhiqiang Zhang, Xinxin Yang, and Kun Gai. Semantic human matting. In 2018 ACM Multimedia Conference on Multimedia Conference, pages 618–626. ACM, 2018. 3
    Google ScholarLocate open access versionFindings
  • Qifeng Chen, Dingzeyu Li, and Chi-Keung Tang. Knn matting. IEEE transactions on pattern analysis and machine intelligence, 35(9):2175–2188, 2013. 2
    Google ScholarFindings
  • Donghyeon Cho, Yu-Wing Tai, and Inso Kweon. Natural image matting using deep convolutional neural networks. In European Conference on Computer Vision, pages 626–643. Springer, 2016. 2
    Google ScholarLocate open access versionFindings
  • Yung-Yu Chuang, Aseem Agarwala, Brian Curless, David H Salesin, and Richard Szeliski. Video matting of complex scenes. In ACM Transactions on Graphics (ToG), volume 21, pages 243–24ACM, 2002. 3, 5
    Google ScholarLocate open access versionFindings
  • Yung-Yu Chuang, Brian Curless, David H Salesin, and Richard Szeliski. A bayesian approach to digital matting. In CVPR (2), pages 264–271, 2001. 2, 3, 5
    Google ScholarLocate open access versionFindings
  • Ahmed Elgammal, David Harwood, and Larry Davis. Nonparametric model for background subtraction. In European conference on computer vision, pages 751–767. Springer, 2000. 3
    Google ScholarLocate open access versionFindings
  • Eduardo SL Gastal and Manuel M Oliveira. Shared sampling for real-time alpha matting. In Computer Graphics Forum, volume 29, pages 575–584. Wiley Online Library, 2010. 2
    Google ScholarLocate open access versionFindings
  • Minglun Gong and Yee-Hong Yang. Near-real-time image matting with known background. In 2009 Canadian Conference on Computer and Robot Vision, pages 81–87. IEEE, 2009. 3, 5
    Google ScholarLocate open access versionFindings
  • Leo Grady, Thomas Schiwietz, Shmuel Aharon, and Rudiger Westermann. Random walks for interactive alpha-matting. In Proceedings of VIIP, volume 2005, pages 423–429, 2005. 2
    Google ScholarLocate open access versionFindings
  • Kaiming He, Christoph Rhemann, Carsten Rother, Xiaoou Tang, and Jian Sun. A global sampling method for alpha matting. In CVPR 2011, pages 2049–2056. IEEE, 2011. 2
    Google ScholarLocate open access versionFindings
  • Kaiming He, Jian Sun, and Xiaoou Tang. Fast matting using large kernel matting laplacian matrices. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 2165–2172. IEEE, 2010. 2
    Google ScholarLocate open access versionFindings
  • Qiqi Hou and Feng Liu. Context-aware image matting for simultaneous foreground and alpha estimation. International Conference on Computer Vision (ICCV), 2019. 2, 5
    Google ScholarLocate open access versionFindings
  • Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2011
    Google ScholarLocate open access versionFindings
  • Philip Lee and Ying Wu. Nonlocal matting. In CVPR 2011, pages 2193–2200. IEEE, 2011. 2
    Google ScholarLocate open access versionFindings
  • Sun-Young Lee, Jong-Chul Yoon, and In-Kwon Lee. Temporally coherent video matting. Graphical Models, 72(3):25– 33, 2010. 3
    Google ScholarLocate open access versionFindings
  • Anat Levin, Dani Lischinski, and Yair Weiss. A closed-form solution to natural image matting. IEEE transactions on pattern analysis and machine intelligence, 30(2):228–242, 2007. 2
    Google ScholarLocate open access versionFindings
  • Anat Levin, Alex Rav-Acha, and Dani Lischinski. Spectral matting. IEEE transactions on pattern analysis and machine intelligence, 30(10):1699–1712, 2008. 2
    Google ScholarLocate open access versionFindings
  • Hao Lu, Yutong Dai, Chunhua Shen, and Songcen Xu. Indices matter: Learning to index for deep image matting. International Conference on Computer Vision (ICCV), 2019. 2, 5
    Google ScholarLocate open access versionFindings
  • Sebastian Lutz, Konstantinos Amplianitis, and Aljosa Smolic. Alphagan: Generative adversarial networks for natural image matting. arXiv preprint arXiv:1807.10088, 2018. 2
    Findings
  • Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 2794–2802, 2017. 5
    Google ScholarLocate open access versionFindings
  • Massimo Piccardi. Background subtraction techniques: a review. In 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No. 04CH37583), volume 4, pages 3099–3104. IEEE, 2004. 3
    Google ScholarLocate open access versionFindings
  • Richard J Qian and M Ibrahim Sezan. Video background replacement without a blue screen. In Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348), volume 4, pages 143–146. IEEE, 1999. 3
    Google ScholarLocate open access versionFindings
  • Soumyadip Sengupta, Angjoo Kanazawa, Carlos D. Castillo, and David W. Jacobs. Sfsnet: Learning shape, refectance and illuminance of faces in the wild. In Computer Vision and Pattern Regognition (CVPR), 2018. 5
    Google ScholarLocate open access versionFindings
  • Ehsan Shahrian, Brian Price, Scott Cohen, and Deepu Rajan. Temporally coherent and spatially accurate video matting. In Computer Graphics Forum, volume 33, pages 381–390. Wiley Online Library, 2014. 3
    Google ScholarLocate open access versionFindings
  • Ehsan Shahrian, Deepu Rajan, Brian Price, and Scott Cohen. Improving image matting using comprehensive sampling sets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 636–643, 2013. 2
    Google ScholarLocate open access versionFindings
  • Xiaoyong Shen, Xin Tao, Hongyun Gao, Chao Zhou, and Jiaya Jia. Deep automatic portrait matting. In European Conference on Computer Vision, pages 92–107. Springer, 2016. 2, 3
    Google ScholarLocate open access versionFindings
  • Jian Sun, Jiaya Jia, Chi-Keung Tang, and Heung-Yeung Shum. Poisson matting. In ACM Transactions on Graphics (ToG), volume 23, pages 315–321. ACM, 2004. 2, 3, 5
    Google ScholarLocate open access versionFindings
  • Jingwei Tang, Yagiz Aksoy, Cengiz Oztireli, Markus Gross, and Tunc Ozan Aydin. Learning-based sampling for natural image matting. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. 2
    Google ScholarLocate open access versionFindings
  • Jue Wang and Michael F Cohen. An iterative optimization approach for unified image segmentation and matting. In Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1, volume 2, pages 936–943. Citeseer, 2005. 2
    Google ScholarLocate open access versionFindings
  • Jue Wang and Michael F Cohen. Optimized color sampling for robust matting. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8. IEEE, 2007. 2
    Google ScholarLocate open access versionFindings
  • Jue Wang, Michael F Cohen, et al. Image and video matting: a survey. Foundations and Trends® in Computer Graphics and Vision, 3(2):97–175, 2008. 2
    Google ScholarLocate open access versionFindings
  • Ning Xu, Brian Price, Scott Cohen, and Thomas Huang. Deep image matting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2970– 2979, 2017. 2, 3, 4, 5, 6, 14
    Google ScholarLocate open access versionFindings
  • Yunke Zhang, Lixue Gong, Lubin Fan, Peiran Ren, Qixing Huang, Hujun Bao, and Weiwei Xu. A late fusion cnn for digital matting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7469– 7478, 2019. 3, 4, 5
    Google ScholarLocate open access versionFindings
  • Bingke Zhu, Yingying Chen, Jinqiao Wang, Si Liu, Bo Zhang, and Ming Tang. Fast deep matting for portrait animation on mobile phone. In Proceedings of the 25th ACM international conference on Multimedia, pages 297–305. ACM, 2017. 3
    Google ScholarLocate open access versionFindings
  • Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycleconsistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pages 2223– 2232, 2017. 4, 7, 11
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments