AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
We have proposed STacked Conditional Generative Adversarial Network to jointly learn shadow detection and shadow removal

Stacked Conditional Generative Adversarial Networks for Jointly Learning Shadow Detection and Shadow Removal

computer vision and pattern recognition, pp.1788-1797, (2018)

被引用67|浏览46
下载 PDF 全文
引用
微博一下

摘要

Understanding shadows from a single image consists of two types of task in previous studies, containing shadow detection and shadow removal. In this paper, we present a multi-task perspective, which is not embraced by any existing work, to jointly learn both detection and removal in an end-to-end fashion that aims at enjoying the mutually...更多

代码

数据

0
简介
  • Both shadow detection and shadow removal reveal their respective advantages for scene understanding.
  • In the history of shadow detection, a series of data-driven statistical learning approaches [15, 26, 50, 59, 22, 49] have been proposed
  • Their main objective is to find the shadow regions, in a form of an image mask that separates shadow and non-shadow areas
重点内容
  • Both shadow detection and shadow removal reveal their respective advantages for scene understanding
  • Since no existing approaches have explored the joint learning aspect of these two tasks, in this work, we propose a STacked Conditional Generative Adversarial Network (ST-Conditional Generative Adversarial Network) framework and aim to tackle shadow detection and shadow removal problems simultaneously in an end-to-end fashion
  • We propose STacked Conditional Generative Adversarial Networks (ST-Conditional Generative Adversarial Network), a novel stacked architecture that
  • We have proposed STacked Conditional Generative Adversarial Network (ST-Conditional Generative Adversarial Network) to jointly learn shadow detection and shadow removal
方法
  • Compared Methods and Metrics

    For detection part, the authors compare ST-CGAN with the stateof-the-art StackedCNN [52], cGAN [36] and scGAN [36].
结论
  • The authors' framework has at least four unique advantages as follows: 1) it is the first end-to-end approach that tackles shadow detection and shadow removal simultaneously; 2) the authors design a novel stacked mode, which densely connects all the tasks in the purpose of multi-task learning, that proves its effectiveness and suggests the future extension on other types of multiple tasks; 3) the stacked adversarial components are able to preserve the global scene characteristics hierarchically, it leads to a fine-grained and natural recovery of shadow-free images; 4) ST-CGAN consistently improves the overall performances on both the detection and removal of shadows.
  • As an additional contribution, the authors publicly release the first large-scale dataset which contains shadow, shadow mask and shadow-free image triplets
总结
  • Introduction:

    Both shadow detection and shadow removal reveal their respective advantages for scene understanding.
  • In the history of shadow detection, a series of data-driven statistical learning approaches [15, 26, 50, 59, 22, 49] have been proposed
  • Their main objective is to find the shadow regions, in a form of an image mask that separates shadow and non-shadow areas
  • Methods:

    Compared Methods and Metrics

    For detection part, the authors compare ST-CGAN with the stateof-the-art StackedCNN [52], cGAN [36] and scGAN [36].
  • Conclusion:

    The authors' framework has at least four unique advantages as follows: 1) it is the first end-to-end approach that tackles shadow detection and shadow removal simultaneously; 2) the authors design a novel stacked mode, which densely connects all the tasks in the purpose of multi-task learning, that proves its effectiveness and suggests the future extension on other types of multiple tasks; 3) the stacked adversarial components are able to preserve the global scene characteristics hierarchically, it leads to a fine-grained and natural recovery of shadow-free images; 4) ST-CGAN consistently improves the overall performances on both the detection and removal of shadows.
  • As an additional contribution, the authors publicly release the first large-scale dataset which contains shadow, shadow mask and shadow-free image triplets
表格
  • Table1: Comparisons with other popular shadow related datasets. Ours is unique in the content and type, whilst being in the same order of magnitude to the most large-scale datasets in amount
  • Table2: The architecture for generator G1/G2 of ST-CGAN. Cvi means a classic convolutional layer whilst CvTi stands for a transposed convolutional layer that upsamples a feature map. Cv4 (×3) indicates that the block of Cv4 is replicated for additional two times, three in total. “#C in” and “#C out” denote for the amount of input channels and output channels respectively. “before” shows the immediate layer before a block and “after” gives the subsequent one directly. “link” explains the specific connections that lie in U-Net architectures [<a class="ref-link" id="c44" href="#r44">44</a>] in which → decides the direction of connectivity, i.e., Cv0 → CvT11 bridges the output of Cv0 concatenated to the input of CvT11. LReLU is short for Leaky ReLU activation [<a class="ref-link" id="c31" href="#r31">31</a>] and BN is a abbreviation of Batch Normalization [<a class="ref-link" id="c17" href="#r17">17</a>]
  • Table3: The architectures for discriminator D1/D2 of ST-CGAN. Annotations are kept the same with Table 2
  • Table4: Detection with quantitative results using BER, smaller is better. For our proposed architecture, we use image triplets of ISTD training set. These models are tested on three datasets. The best and second best results are marked in red and blue colors, respectively
  • Table5: Detection with quantitative results using BER, smaller is better. For our proposed architecture, we use image pairs of SBU training set together with their roughly generated shadow-free images by Guo et al [<a class="ref-link" id="c12" href="#r12">12</a>] to form image triplets for training. The best and second best results are marked in red and blue colors, respectively
  • Table6: Removal with quantitative results using RMSE, smaller is better. The original difference between the shadow and shadow-free images is reported in the third column. We perform multi-task training on ISTD and compare it with three state-of-the-art methods. The best and second best results are marked in red and blue colors, respectively
  • Table7: Component analysis of ST-CGAN on ISTD by using RMSE for removal and BER for detection, smaller is better. The metrics related to shadow and non-shadow part are also provided. The best and second best results are marked in red and blue colors, respectively
  • Table8: Comparisons between stacked learning (ours) and multibranch learning with removal and detection results on ISTD dataset
Download tables as Excel
相关工作
  • Shadow Detection. To improve the robustness of shadow detection on consumer photographs and web quality images, a series of data-driven approaches [15, 26, 59] have been taken and been proved to be effective. Recently, Khan et al [22] first introduced deep Convolutional Neural Networks (CNNs) [45] to automatically learn features for shadow regions/boundaries that significantly outperforms the previous state-of-the-art. A multikernel model for shadow region classification was proposed by Vicente et al [49] and it is efficiently optimized based on least-squares SVM leave-one-out estimates. More recent work of Vicente et al [50] used a stacked CNN with separated steps, including first generating the image level shadow-prior and training a patch-based CNN which produces shadow masks for local patches. Nguyen et al [36] presented the first application of adversarial training for shadow detection and developed a novel conditional GAN architecture with a tunable sensitivity parameter. Shadow Removal. Early works are motivated by physical models of illumination and color. For instance, Finlayson et al [5, 7] provide the illumination invariant solutions that work well only on high quality images. Many existing approaches for shadow removal include two steps in general. For the removal part of these two-stage solutions, the shadow is erased either in the gradient domain [6, 35, 2] or the image intensity domain [1, 11, 12, 8, 23]. On the contrary, a few works [47, 55, 42] recover the shadow-free image by intrinsic image decomposition and preclude the need of shadow prediction in an end-to-end manner. However, these methods suffer from altering the colors of the non-shadow regions. Qu et al [43] further propose a multicontext architecture which consists of three levels (global localization, appearance modeling and semantic modeling) of embedding networks, to explore shadow removal in an end-to-end and fully automatic framework. CGAN and Stacked GAN. CGANs have achieved impressive results in various image-to-image translation problems, such as image superresolution [27], image inpainting [41], style transfer [28] and domain adaptation/transfer [18, 60, 30]. The key of CGANs is the introduction of the adversarial loss with an informative conditioning variable, that forces the generated images to be with high quality and indistinguishable from real images. Besides, recent researches have proposed some variants of GAN, which mainly explores the stacked scheme of its usage. Zhang et al [57] first put forward the StackGAN to progressively produce photo-realistic image synthesis with considerably high resolution. Huang et al [16] design a top-down stack of GANs, each learned to generate lower-level representations conditioned on higher-level representations for the purpose of generating more qualified images. Therefore, our proposed stacked form is distinct from all the above relevant versions in essence. Multi-task Learning. The learning hypothesis is biased to prefer a shared embedding learnt across multiple tasks. The widely adopted architecture of multi-task formulation is a shared component with multi-branch outputs, each for an individual task. For example, in Mask R-CNN [13] and MultiNet [48], 3 parallel branches for object classification, bounding-box regression and semantic segmentation respectively are utilized. Misra et al [34] propose “crossstitch” unit to learn shared representations from multiple supervisory tasks. In Multi-task Network Cascades[4], all tasks share convolutional features, whereas later task also depends the output of a preceding one.
基金
  • This work was supported by the National Science Fund of China under Grant Nos
引用论文
  • E. Arbel and H. Hel-Or. Shadow removal using intensity surfaces and texture anchor points. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 33(6):1202–1216, 2011, 2, 3
    Google ScholarLocate open access versionFindings
  • J. T. Barron and J. Malik. Shape, illumination, and reflectance from shading. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 37(8):1670–1687, 2015. 3
    Google ScholarLocate open access versionFindings
  • R. Cucchiara, C. Grana, M. Piccardi, A. Prati, and S. Sirotti. Improving shadow suppression in moving object detection with hsv color information. In IEEE Intelligent Transportation Systems (ITSC), pages 334–339, 2001. 1
    Google ScholarLocate open access versionFindings
  • J. Dai, K. He, and J. Sun. Instance-aware semantic segmentation via multi-task network cascades. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016. 3
    Google ScholarLocate open access versionFindings
  • G. D. Finlayson, M. S. Drew, and C. Lu. Entropy minimization for shadow removal. International Journal of Computer Vision (IJCV), 85(1):35–57, 2009. 1, 3
    Google ScholarLocate open access versionFindings
  • G. D. Finlayson, S. D. Hordley, and M. S. Drew. Removing shadows from images. In European Conference on Computer Vision (ECCV), pages 823–83Springer, 2002. 3
    Google ScholarLocate open access versionFindings
  • G. D. Finlayson, S. D. Hordley, C. Lu, and M. S. Drew. On the removal of shadows from images. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 28(1):59–68, 2006. 1, 3
    Google ScholarLocate open access versionFindings
  • H. Gong and D. Cosker. Interactive shadow removal and ground truth for variable scene categories. In British Machine Vision Conference (BMVC). University of Bath, 2014. 1, 2, 3, 6, 7
    Google ScholarLocate open access versionFindings
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS), pages 2672–2680, 2014. 4
    Google ScholarLocate open access versionFindings
  • M. Gryka, M. Terry, and G. J. Brostow. Learning to remove soft shadows. ACM Transactions on Graphics (TOG), 34(5):153, 2015. 1, 3, 4
    Google ScholarLocate open access versionFindings
  • R. Guo, Q. Dai, and D. Hoiem. Single-image shadow detection and removal using paired regions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2033–2040, 201, 3, 7
    Google ScholarLocate open access versionFindings
  • R. Guo, Q. Dai, and D. Hoiem. Paired regions for shadow detection and removal. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(12):2956–2967, 2013. 1, 3, 4, 6, 7
    Google ScholarLocate open access versionFindings
  • K. He, G. Gkioxari, P. Dollar, and R. Girshick. Mask r-cnn. arXiv preprint arXiv:1703.06870, 2017. 2, 3
    Findings
  • G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten. Densely connected convolutional networks. arXiv preprint arXiv:1608.06993, 2016. 2
    Findings
  • X. Huang, G. Hua, J. Tumblin, and L. Williams. What characterizes a shadow boundary under the sun and sky? In IEEE International Conference on Computer Vision (ICCV), pages 898–905, 2011. 1, 2, 3
    Google ScholarLocate open access versionFindings
  • X. Huang, Y. Li, O. Poursaeed, J. Hopcroft, and S. Belongie. Stacked generative adversarial networks. arXiv preprint arXiv:1612.04357, 203
    Findings
  • S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning(ICML), pages 448–456, 2015. 5
    Google ScholarLocate open access versionFindings
  • P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Imageto-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004, 2016. 3, 5
    Findings
  • I. N. Junejo and H. Foroosh. Estimating geo-temporal location of stationary cameras using shadow trajectories. In European conference on computer vision (ECCV), pages 318– 331. Springer, 2008. 1
    Google ScholarLocate open access versionFindings
  • K. Karsch, V. Hedau, D. Forsyth, and D. Hoiem. Rendering synthetic objects into legacy photographs. ACM Transactions on Graphics (TOG), 30(6):157, 2011. 1 Springer, 2017. 5
    Google ScholarLocate open access versionFindings
  • [22] S. H. Khan, M. Bennamoun, F. Sohel, and R. Togneri. Automatic feature learning for robust shadow detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1939–1946, 2014. 1, 2, 3
    Google ScholarLocate open access versionFindings
  • [23] S. H. Khan, M. Bennamoun, F. Sohel, and R. Togneri. Automatic shadow detection and removal from a single image. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 38(3):431–446, 2016. 1, 3
    Google ScholarLocate open access versionFindings
  • [24] D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. 5
    Findings
  • [25] J.-F. Lalonde, A. A. Efros, and S. G. Narasimhan. Estimating natural illumination from a single outdoor image. In IEEE International Conference on Computer Vision (ICCV), pages 183–190, 2009. 1
    Google ScholarLocate open access versionFindings
  • [26] J.-F. Lalonde, A. A. Efros, and S. G. Narasimhan. Detecting ground shadows in outdoor consumer photographs. In European Conference on Computer Vision (ECCV), pages 322– 335. Springer, 2010. 1, 2, 3
    Google ScholarLocate open access versionFindings
  • [27] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. arXiv preprint arXiv:1609.04802, 2016. 3
    Findings
  • [28] C. Li and M. Wand. Precomputed real-time texture synthesis with markovian generative adversarial networks. In European Conference on Computer Vision (ECCV), pages 702– 716. Springer, 2016. 3
    Google ScholarLocate open access versionFindings
  • [29] F. Liu and M. Gleicher. Texture-consistent shadow removal. In European Conference on Computer Vision (ECCV), pages 437–450.
    Google ScholarLocate open access versionFindings
  • [30] M.-Y. Liu, T. Breuel, and J. Kautz. Unsupervised image-to-image translation networks. arXiv preprint arXiv:1703.00848, 2017. 3
    Findings
  • [31] A. L. Maas, A. Y. Hannun, and A. Y. Ng. Rectifier nonlinearities improve neural network acoustic models. In Proc. ICML, volume 30, 2013. 5
    Google ScholarLocate open access versionFindings
  • [32] I. Mikic, P. C. Cosman, G. T. Kogut, and M. M. Trivedi. Moving shadow and object detection in traffic scenes. In International Conference on Pattern Recognition (ICPR), volume 1, pages 321–324. IEEE, 2000. 1
    Google ScholarLocate open access versionFindings
  • [33] M. Mirza and S. Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014. 2, 4
    Findings
  • [34] I. Misra, A. Shrivastava, A. Gupta, and M. Hebert. Crossstitch networks for multi-task learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016. 3
    Google ScholarLocate open access versionFindings
  • [35] A. Mohan, J. Tumblin, and P. Choudhury. Editing soft shadows in a digital photograph. IEEE Computer Graphics and Applications, 27(2):23–31, 2007. 3
    Google ScholarLocate open access versionFindings
  • [36] V. Nguyen, T. F. Yago Vicente, M. Zhao, M. Hoai, and D. Samaras. Shadow detection with conditional generative adversarial networks. In IEEE International Conference on Computer Vision (ICCV), pages 4510–4518, 2017. 2, 3, 6, 7
    Google ScholarLocate open access versionFindings
  • [37] T. Okabe, I. Sato, and Y. Sato. Attached shadow coding: Estimating surface normals from shadows under unknown reflectance and lighting conditions. In IEEE International Conference on Computer Vision (ICCV), pages 1693–1700, 2009. 1
    Google ScholarLocate open access versionFindings
  • [38] A. Panagopoulos, D. Samaras, and N. Paragios. Robust shadow and illumination estimation using a mixture model. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 651–658, 2009. 1
    Google ScholarLocate open access versionFindings
  • [39] A. Panagopoulos, C. Wang, D. Samaras, and N. Paragios. Illumination estimation and cast shadow detection through a higher-order graphical model. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 673– 680, 2011. 1
    Google ScholarLocate open access versionFindings
  • [40] A. Panagopoulos, C. Wang, D. Samaras, and N. Paragios. Simultaneous cast shadows, illumination and geometry inference using hypergraphs. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(2):437–449, 2013. 1
    Google ScholarLocate open access versionFindings
  • [41] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros. Context encoders: Feature learning by inpainting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2536–2544, 2016. 3
    Google ScholarLocate open access versionFindings
  • [42] L. Qu, J. Tian, Z. Han, and Y. Tang. Pixel-wise orthogonal decomposition for color illumination invariant and shadowfree image. Optics express, 23(3):2220–2239, 2015. 3
    Google ScholarLocate open access versionFindings
  • [43] L. Qu, J. Tian, S. He, Y. Tang, and R. W. Lau. Deshadownet: A multi-context embedding deep network for shadow removal. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 1, 2, 3, 4, 6
    Google ScholarLocate open access versionFindings
  • [44] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pages 234–241. Springer, 2015. 5
    Google ScholarLocate open access versionFindings
  • [45] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. 3
    Findings
  • [46] Y. Tai, J. Yang, X. Liu, and C. Xu. Memnet: A persistent memory network for image restoration. In IEEE International Conference on Computer Vision (ICCV), 2017. 2
    Google ScholarLocate open access versionFindings
  • [47] M. F. Tappen, W. T. Freeman, and E. H. Adelson. Recovering intrinsic images from a single image. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 27(9):1459–1472, 2005. 1, 3
    Google ScholarLocate open access versionFindings
  • [48] M. Teichmann, M. Weber, M. Zoellner, R. Cipolla, and R. Urtasun. Multinet: Real-time joint semantic reasoning for autonomous driving. arXiv preprint arXiv:1612.07695, 2016. 3
    Findings
  • [49] Y. Vicente, F. Tomas, M. Hoai, and D. Samaras. Leave-oneout kernel optimization for shadow detection. In IEEE International Conference on Computer Vision (ICCV), pages 3388–3396, 2015. 1, 2, 3
    Google ScholarLocate open access versionFindings
  • [50] Y. Vicente, F. Tomas, M. Hoai, and D. Samaras. Noisy label recovery for shadow detection in unfamiliar domains. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3783–3792, 2016. 1, 2, 3
    Google ScholarLocate open access versionFindings
  • [51] Y. Vicente, F. Tomas, M. Hoai, and D. Samaras. Leave-oneout kernel optimization for shadow detection and removal. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), PP(99):1–1, 2017. 1
    Google ScholarLocate open access versionFindings
  • [52] Y. Vicente, F. Tomas, L. Hou, C.-P. Yu, M. Hoai, and D. Samaras. Large-scale training of shadow detectors with noisily-annotated shadow examples. In European Conference on Computer Vision (ECCV), pages 816–832. Springer, 2016. 3, 4, 6
    Google ScholarLocate open access versionFindings
  • [53] W. Wang, X. Li, J. Yang, and T. Lu. Mixed link networks. arXiv preprint arXiv:1802.01808, 2018. 2
    Findings
  • [54] T.-P. Wu, C.-K. Tang, M. S. Brown, and H.-Y. Shum. Natural shadow matting. ACM Transactions on Graphics (TOG), 26(2):8, 2007. 1, 2
    Google ScholarLocate open access versionFindings
  • [55] Q. Yang, K.-H. Tan, and N. Ahuja. Shadow removal using bilateral filtering. IEEE Transactions on Image Processing (TIP), 21(10):4361–4368, 2012. 3, 6, 7
    Google ScholarLocate open access versionFindings
  • [56] H. Zhang and V. M. Patel. Density-aware single image deraining using a multi-stream dense network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 2
    Google ScholarLocate open access versionFindings
  • [57] H. Zhang, T. Xu, H. Li, S. Zhang, X. Huang, X. Wang, and D. Metaxas. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. arXiv preprint arXiv:1612.03242, 2016. 3
    Findings
  • [58] L. Zhang, Q. Zhang, and C. Xiao. Shadow remover: Image shadow removal based on illumination recovering optimization. IEEE Transactions on Image Processing (TIP), 24(11):4623–4636, 2015. 1
    Google ScholarLocate open access versionFindings
  • [59] J. Zhu, K. G. Samuel, S. Z. Masood, and M. F. Tappen. Learning to recognize shadows in monochromatic natural images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 223–230, 2010. 1, 2, 3, 4, 6
    Google ScholarLocate open access versionFindings
  • [60] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired imageto-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593, 2017. 3
    Findings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科