Quality Guided Sketch-to-Photo Image Synthesis

CVPR Workshops, pp. 3575-3584, 2020.

Cited by: 0|Views8
EI
Weibo:
We proposed a novel sketch-to-image translation model using a hybrid discriminator and a multi-stage generator

Abstract:

Facial sketches drawn by artists are widely used for visual identification applications and mostly by law enforcement agencies, but the quality of these sketches depend on the ability of the artist to clearly replicate all the key facial features that could aid in capturing the true identity of a subject. Recent works have attempted to ...More

Code:

Data:

0
Introduction
  • Facial sketches drawn by forensic artists aimed at replicating images dictated verbally are a popular practice for law enforcement agencies, these sketches are meant to represent the true features of the individual of interest.
  • Sketches can be seen as images that contain minimal pixel information bounded by edges that could be translated into photo-realistic images with significant features and pixel content [5].
  • Edge information from such sketches might contain key structural information that aid in providing high quality visual rendition, which is crucial in classifying images as valuable or not.
  • The tendency to obtain rich pixel content with perceptual quality and discriminative information is still a daunting task, especially for models that move from strictly edge strokes to photo-realistic images
Highlights
  • Facial sketches drawn by forensic artists aimed at replicating images dictated verbally are a popular practice for law enforcement agencies, these sketches are meant to represent the true features of the individual of interest
  • Sketches can be seen as images that contain minimal pixel information bounded by edges that could be translated into photo-realistic images with significant features and pixel content [5]
  • We develop a single hybrid discriminator that predicts and distinguishes real and synthesized photos according to the set of desired attributes
  • The gallery comprises of WVU Multi-Modal, CUHK, FERET, IIIT-D and CelebAHQ datasets
  • We proposed a novel sketch-to-image translation model using a hybrid discriminator and a multi-stage generator
Results
  • To evaluate the performance of the approach, the authors compared a set of images against a gallery of mugshots utilizing a synthesised image probe.
  • The gallery comprises of WVU Multi-Modal, CUHK, FERET, IIIT-D and CelebAHQ datasets
  • The purpose of this experiment is to assess the verification performance of the proposed method with a relatively large number of subject candidates, Figures (6, 7, 8 and 9) show CMC curves for CelebA, CUHK and IIIT-D datasets, respectively.
  • To evaluate the performance of the synthesized images, the authors implemented a face verifier called DeepFace [34], pre-trained on a VGG based network [33].
  • A similar protocol implemented in [19] was used
Conclusion
  • The authors proposed a novel sketch-to-image translation model using a hybrid discriminator and a multi-stage generator.
  • The authors' model shows that the perceptual appeal of sketches can be achieved with the network reconfiguration Loss.
  • 35.34 6.32 0.896 of the generator and discriminator processes.
  • The breaking down of the functionality of the network into smaller subsets helped to improve the training process and led to better results under short periods.
  • The authors' verification results confirm that sketch-to-image translation problems would find lots of applications in industry
Summary
  • Introduction:

    Facial sketches drawn by forensic artists aimed at replicating images dictated verbally are a popular practice for law enforcement agencies, these sketches are meant to represent the true features of the individual of interest.
  • Sketches can be seen as images that contain minimal pixel information bounded by edges that could be translated into photo-realistic images with significant features and pixel content [5].
  • Edge information from such sketches might contain key structural information that aid in providing high quality visual rendition, which is crucial in classifying images as valuable or not.
  • The tendency to obtain rich pixel content with perceptual quality and discriminative information is still a daunting task, especially for models that move from strictly edge strokes to photo-realistic images
  • Results:

    To evaluate the performance of the approach, the authors compared a set of images against a gallery of mugshots utilizing a synthesised image probe.
  • The gallery comprises of WVU Multi-Modal, CUHK, FERET, IIIT-D and CelebAHQ datasets
  • The purpose of this experiment is to assess the verification performance of the proposed method with a relatively large number of subject candidates, Figures (6, 7, 8 and 9) show CMC curves for CelebA, CUHK and IIIT-D datasets, respectively.
  • To evaluate the performance of the synthesized images, the authors implemented a face verifier called DeepFace [34], pre-trained on a VGG based network [33].
  • A similar protocol implemented in [19] was used
  • Conclusion:

    The authors proposed a novel sketch-to-image translation model using a hybrid discriminator and a multi-stage generator.
  • The authors' model shows that the perceptual appeal of sketches can be achieved with the network reconfiguration Loss.
  • 35.34 6.32 0.896 of the generator and discriminator processes.
  • The breaking down of the functionality of the network into smaller subsets helped to improve the training process and led to better results under short periods.
  • The authors' verification results confirm that sketch-to-image translation problems would find lots of applications in industry
Tables
  • Table1: A quantitative comparison of the GAN-metric performance for Pix2Pix, cCycle-GAN, C-GAN, HFs2P and ours. Our proposed approach shows an improvement overall
  • Table2: A quantitative comparison of the GAN-metric performance for BP-GAN, CA-GAN, SCA-GAN, C-GAN and ours. Our proposed approach shows an improvement overall
  • Table3: A description of the ablation study conducted on the sketch-photo-synthesizer network. The various key components that make up the framework were altered to identify their respective impact on the GAN metric performance (i.e, FID, SSIM and IS)
Download tables as Excel
Related work
  • Generative Adversarial Networks. Adversarial networks [11] have shown great possibilities in the image generation sphere. The main idea behind GANs is to develop an adaptive loss function that improves simultaneously with the generator model. This loss function is formalized by a trainable discriminator which aims to distinguish between the real and generated samples. During the training process, the generator learns to produce more realistic samples in order to fool the discriminator. Recent works showcase (a) Generator (b) Discriminator the inclusion of conditional constraints [42, 39] while training GANs on images, text, videos and 3D objects and a medley of the aforementioned concepts. Despite the proven capacity of GANs in learning data distributions, they suffer from two major issues, namely the unstable training and mode collapse.
Reference
  • Multimodal Dataset biometric dataset collection, biomdata. https://biic.wvu.edu/data-sets/multimodal-dataset. Accessed:2020-03-02.
    Findings
  • Himanshu S. Bhatt, Samarth Bharadwaj, Richa Singh, and Mayank Vatsa. On matching sketches with digital face images. 2010 Fourth IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS), pages 1– 7, 2010.
    Google ScholarLocate open access versionFindings
  • Yang Cao, Changhu Wang, Liqing Zhang, and Lei Zhang. Edgel index for large-scale sketch-based image search. pages 761 – 768, 07 2011.
    Google ScholarFindings
  • Wentao Chao, Liang Chang, Xuguang Wang, Jian Cheng, Xiaoming Deng, and Fuqing Duan. High-fidelity face sketch-to-photo synthesis using generative adversarial network. 2019 IEEE International Conference on Image Processing (ICIP), pages 4699–4703, 2019.
    Google ScholarLocate open access versionFindings
  • Wengling Chen. Sketchygan: Towards diverse and realistic sketch to image synthesis. pages 9416–9425, 06 2018.
    Google ScholarFindings
  • A. Dabouei, H. Kazemi, S. M. Iranmanesh, J. Dawson, and N. M. Nasrabadi. Fingerprint distortion rectification using deep convolutional neural networks. In 2018 International Conference on Biometrics (ICB), pages 1–8, Feb 2018.
    Google ScholarLocate open access versionFindings
  • Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Lecture notes in computer science. 09 2014.
    Google ScholarLocate open access versionFindings
  • Mathias Eitz, James Hays, and Marc Alexa. How do humans sketch objects? ACM Trans. Graph., 31(4):44:1–44:10, July 2012.
    Google ScholarLocate open access versionFindings
  • Yuke Fang, Weihong Deng, Junping Du, and Jiani Hu. Identity-aware cyclegan for face photo-sketch synthesis and recognition. Pattern Recognition, 102:107249, 2020.
    Google ScholarLocate open access versionFindings
  • Leon Gatys, Alexander Ecker, and Matthias Bethge. Image style transfer using convolutional neural networks. pages 2414–2423, 06 2016.
    Google ScholarFindings
  • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 2672–2680. Curran Associates, Inc., 2014.
    Google ScholarLocate open access versionFindings
  • Raia Hadsell, Sumit Chopra, and Yann LeCun. Dimensionality reduction by learning an invariant mapping. 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), 2:1735–1742, 2006.
    Google ScholarLocate open access versionFindings
  • Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pages 6629–6640, USA, 2017. Curran Associates Inc.
    Google ScholarLocate open access versionFindings
  • Mingming Hu and Jingtao Guo. Facial attribute-controlled sketch-to-image translation with generative adversarial networks. EURASIP Journal on Image and Video Processing, 2020:1–13, 2020.
    Google ScholarLocate open access versionFindings
  • Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz. Multimodal unsupervised image-to-image translation. arXiv preprint arXiv:1804.04732, 2018.
    Findings
  • P. Isola, J. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with conditional adversarial networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5967–5976, July 2017.
    Google ScholarLocate open access versionFindings
  • Justin Johnson, Alexandre Alahi, and Fei-Fei Li. Perceptual losses for real-time style transfer and super-resolution. CoRR, abs/1603.08155, 2016.
    Findings
  • Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for improved quality, stability, and variation. ArXiv, abs/1710.10196, 2017.
    Findings
  • H. Kazemi, M. Iranmanesh, A. Dabouei, S. Soleymani, and N. M. Nasrabadi. Facial attributes guided deep sketch-tophoto synthesis. In 2018 IEEE Winter Applications of Computer Vision Workshops (WACVW), pages 1–8, March 2018.
    Google ScholarLocate open access versionFindings
  • Diederik P. Kingma, Tim Salimans, and Max Welling. Improved variational inference with inverse autoregressive flow. ArXiv, abs/1606.04934, 2017.
    Findings
  • Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
    Findings
  • B. Klare, Z. Li, and A. K. Jain. Matching forensic sketches to mug shot photos. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(3):639–646, 2011.
    Google ScholarLocate open access versionFindings
  • Pierre-Yves Laffont, Zhile Ren, Xiaofeng Tao, Chao Qian, and James Hays. Transient attributes for high-level understanding and editing of outdoor scenes. ACM Trans. Graph., 33:149:1–149:11, 07 2014.
    Google ScholarLocate open access versionFindings
  • Cheng-Han Lee, Ziwei Liu, Lingyun Wu, and Ping Luo. Maskgan: Towards diverse and interactive facial image manipulation. arXiv preprint arXiv:1907.11922, 2019.
    Findings
  • Wei Li, Rui Zhao, and Xiaogang Wang. Human reidentification with transferred metric learning. In ACCV, 2012.
    Google ScholarLocate open access versionFindings
  • Jianxin Lin, Yingce Xia, Tao Qin, Zhibo Chen, and Tie-Yan Liu. Conditional image-to-image translation. pages 5524– 5532, 06 2018.
    Google ScholarFindings
  • Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
    Google ScholarLocate open access versionFindings
  • Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759, 2016.
    Findings
  • Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. Gaugan: semantic image synthesis with spatially adaptive normalization. pages 1–1, 07 2019.
    Google ScholarFindings
  • P. Jonathon Phillips, Harry Wechsler, Jeffrey Huang, and Patrick J. Rauss. The feret database and evaluation procedure for face-recognition algorithms. Image Vis. Comput., 16:295–306, 1998.
    Google ScholarLocate open access versionFindings
  • Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, and James Hays. Scribbler: Controlling deep image synthesis with sketch and color. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
    Google ScholarLocate open access versionFindings
  • Yujun Shen, Bolei Zhou, Ping Luo, and Xiaoou Tang. Facefeat-gan: a two-stage approach for identity-preserving face synthesis. CoRR, abs/1812.01288, 2018.
    Findings
  • K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.
    Findings
  • Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, and Lior Wolf. Deepface: Closing the gap to human-level performance in face verification. 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 1701–1708, 2014.
    Google ScholarLocate open access versionFindings
  • V. Talreja, S. Soleymani, M. C. Valenti, and N. M. Nasrabadi. Learning to authenticate with deep multibiometric hashing and neural network decoding. In ICC 2019 - 2019 IEEE International Conference on Communications (ICC), pages 1–7, May 2019.
    Google ScholarLocate open access versionFindings
  • Veeru Talreja, Fariborz Taherkhani, Matthew C. Valenti, and Nasser M. Nasrabadi. Using deep cross modal hashing and error correcting codes for improving the efficiency of attribute guided facial image retrieval. 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pages 564–568, 2018.
    Google ScholarLocate open access versionFindings
  • Veeru Talreja, Matthew C. Valenti, and Nasser M. Nasrabadi. Multibiometric secure system based on deep learning. 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pages 298–302, 2017.
    Google ScholarLocate open access versionFindings
  • Joshua B. Tenenbaum and William T. Freeman. Separating style and content with bilinear models. Neural Computation, 12:1247–1283, 2000.
    Google ScholarLocate open access versionFindings
  • Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning. In Advances in neural information processing systems, pages 3630–3638, 2016.
    Google ScholarLocate open access versionFindings
  • Nannan Wang, Wenjin Zha, Jie Li, and Xinbo Gao. Back projection: An effective postprocessing method for ganbased face sketch synthesis. Pattern Recognit. Lett., 107:59– 65, 2017.
    Google ScholarLocate open access versionFindings
  • Xiaolong Wang and Abhinav Gupta. Generative image modeling using style and structure adversarial networks. volume 9908, pages 318–335, 10 2016.
    Google ScholarLocate open access versionFindings
  • Xinchen Yan, Jimei Yang, Kihyuk Sohn, and Honglak Lee. Attribute2image: Conditional image generation from visual attributes. In European Conference on Computer Vision, pages 776–791.
    Google ScholarLocate open access versionFindings
  • Sheng You, Ning You, and Minxue Pan. Pi-rec: Progressive image reconstruction network with edge and color domain. arXiv preprint arXiv:1903.10146, 2019.
    Findings
  • Jun Yu, Xingxin Xu, Fei Gao, Shengjie Shi, Meng Wang, Dacheng Tao, and Qingming Huang. Towards realistic face photo-sketch synthesis via composition-aided gans. 2017.
    Google ScholarFindings
  • Han Zhang, Tao Xu, and Hongsheng Li. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. pages 5908–5916, 10 2017.
    Google ScholarFindings
  • Richard Zhang, Phillip Isola, and Alexei Efros. Colorful image colorization. volume 9907, pages 649–666, 10 2016.
    Google ScholarLocate open access versionFindings
  • Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycleconsistent adversarial networks. In Computer Vision (ICCV), 2017 IEEE International Conference on, 2017.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments