AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We introduced Wavelet Flow, a multi-scale, conditional normalizing flow architecture based on wavelets

Wavelet Flow: Fast Training of High Resolution Normalizing Flows

NIPS 2020, (2020)

Cited by: 0|Views17
EI
Full Text
Bibtex
Weibo

Abstract

Normalizing flows are a class of probabilistic generative models which allow for both fast density computation and efficient sampling and are effective at modelling complex distributions like images. A drawback among current methods is their significant training cost, sometimes requiring months of GPU training time to achieve state-of-t...More

Code:

Data:

Introduction
  • The authors introduce Wavelet Flow, a multi-scale, conditional normalizing flow architecture based on wavelets.
  • Wavelet Flows are fast when sampling and computing probability density, but are efficient to train even with high resolution data.
  • Wavelet Flow is applicable to any suitably structured data domain including audio, images, videos, and 3D scans.
  • The authors introduce the first normalizing flow model trained on high resolution images, i.e., 1024 × 1024
Highlights
  • We introduce Wavelet Flow, a multi-scale, conditional normalizing flow architecture based on wavelets
  • The Wavelet Flow models are competitive with other methods, outperforming Glow on ImageNet while somewhat underperforming Glow on Large-scale Scene Understanding (LSUN) but with significantly faster training
  • 128 We showed how the multi-scale structure of our Wavelet Flow model could be used to extract consistent distributions of 64 low resolution signals, as well as to perform super resolution
  • This paper focused on natural images, but Wavelet Flow is directly applicable to other domains, e.g., video, medical imaging, and audio
Methods
  • The specifics of Wavelet Flow are described. To begin, wavelets are introduced (Sec. 2.1), normalizing flows are briefly described (Sec. 2.2).
  • While a Fourier basis is global in nature, wavelets are constructed to be localized, meaning that the value of a wavelet coefficient reflects the structure of the signal in a local region.
  • This is beneficial as modern deep learning architectures generally, including normalizing flows, are well tailored to spatially structured signal representations due to the widespread use of convolutional operations.
  • The authors briefly introduce wavelets; see [35] for a thorough introduction
Results
  • Quantitative results in BPD are shown in Table 1.
  • The Wavelet Flow models are competitive with other methods, outperforming Glow on ImageNet while somewhat underperforming Glow on LSUN but with significantly faster training.
  • Training times for each dataset and scale vary with the smallest scales taking a few hours and largest scales typically requiring five or six days.
  • The total Model ImageNet [40].
  • LSUN [48] bedroom tower church CelebA-HQ [22] FFHQ [23] 64 × 64.
  • CelebA-HQ 1024 or FFHQ 1024 as training is impractical at those resolutions
Conclusion
  • The authors introduced Wavelet Flow, a multi-scale, conditional normalizing flow architecture based on wavelets.
  • With images there remains work to explore the use of Wavelet Flow for problems such as image restoration and super resolution.
  • While the authors explored some architecture choices for the conditional flows, the authors expect that improvements in performance will be found with other architectures, e.g., [17, 33, 13, 20].
  • Adapting dequantization for use in a Wavelet Flow may rectify this and is likely to improve performance as it has with other normalizing flow models [17, 19]
Summary
  • Introduction:

    The authors introduce Wavelet Flow, a multi-scale, conditional normalizing flow architecture based on wavelets.
  • Wavelet Flows are fast when sampling and computing probability density, but are efficient to train even with high resolution data.
  • Wavelet Flow is applicable to any suitably structured data domain including audio, images, videos, and 3D scans.
  • The authors introduce the first normalizing flow model trained on high resolution images, i.e., 1024 × 1024
  • Methods:

    The specifics of Wavelet Flow are described. To begin, wavelets are introduced (Sec. 2.1), normalizing flows are briefly described (Sec. 2.2).
  • While a Fourier basis is global in nature, wavelets are constructed to be localized, meaning that the value of a wavelet coefficient reflects the structure of the signal in a local region.
  • This is beneficial as modern deep learning architectures generally, including normalizing flows, are well tailored to spatially structured signal representations due to the widespread use of convolutional operations.
  • The authors briefly introduce wavelets; see [35] for a thorough introduction
  • Results:

    Quantitative results in BPD are shown in Table 1.
  • The Wavelet Flow models are competitive with other methods, outperforming Glow on ImageNet while somewhat underperforming Glow on LSUN but with significantly faster training.
  • Training times for each dataset and scale vary with the smallest scales taking a few hours and largest scales typically requiring five or six days.
  • The total Model ImageNet [40].
  • LSUN [48] bedroom tower church CelebA-HQ [22] FFHQ [23] 64 × 64.
  • CelebA-HQ 1024 or FFHQ 1024 as training is impractical at those resolutions
  • Conclusion:

    The authors introduced Wavelet Flow, a multi-scale, conditional normalizing flow architecture based on wavelets.
  • With images there remains work to explore the use of Wavelet Flow for problems such as image restoration and super resolution.
  • While the authors explored some architecture choices for the conditional flows, the authors expect that improvements in performance will be found with other architectures, e.g., [17, 33, 13, 20].
  • Adapting dequantization for use in a Wavelet Flow may rectify this and is likely to improve performance as it has with other normalizing flow models [17, 19]
Tables
  • Table1: Quantitative performance in bits per dimension. RealNVP and Glow are not evaluated on
  • Table2: Hyper-parameters used with Wavelet Flow per level, and across the evaluation datasets
  • Table3: Comparison of the total number of parameters used in Wavelet Flow, Glow, and RealNVP
  • Table4: Wavelet Flow training times in GPU hours on the evaluation datasets
  • Table5: Training speed measured using seconds-per-image averaged over 100 iterations. Values for Glow were obtained by running their provided code on our hardware, a single NVIDIA TITAN X (Pascal) GPU, and without using any distributed computation frameworks. Note that the Wavelet Flow model for CelebA-HQ 256 × 256 is contained within the larger model for 1024 × 1024 images
  • Table6: Quantitative results in bits-per-dimension on 5-bit CelebA-HQ 256 × 256
  • Table7: Frechet Inception Distance [<a class="ref-link" id="c41" href="#r41">41</a>] scores on LSUN 64 × 64 between Glow (affine) and Wavelet
Download tables as Excel
Funding
  • Acknowledgments and Disclosure of Funding This work was started as part of J.J.Y.’s internship at Borealis AI and was supported by the Mitacs Accelerate Program, funded in part by the Canada First Research Excellence Fund (CFREF) for the Vision: Science to Applications (VISTA) program (M.A.B.) and the NSERC Discovery Grant program (M.A.B., K.G.D.)
Reference
  • C. H. Anderson. A filter-subtract-decimate hierarchical pyramid signal analyzing and synthesizing technique, 1987. US Patent 4,718,104, Washington, DC.
    Google ScholarFindings
  • Lynton Ardizzone, Carsten Lüth, Jakob Kruse, Carsten Rother, and Ullrich Köthe. Guided image generation with conditional invertible neural networks. arXiv preprint arXiv:1907.02392, 2019.
    Findings
  • Ronald Newbold Bracewell. The Fourier transform and its applications. McGraw-Hill New York, 1986. ISBN 9780070070134.
    Google ScholarFindings
  • Joan Bruna and Stéphane Mallat. Invariant scattering convolution networks. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 35(8):1872–1886, 2013.
    Google ScholarLocate open access versionFindings
  • P. Burt and E. Adelson. The Laplacian pyramid as a compact image code. IEEE Transactions on Communications, 31(4):532–540, 1983.
    Google ScholarLocate open access versionFindings
  • Peter J. Burt. Fast filter transform for image processing. Computer Graphics and Image Processing, 16(1):20 – 51, 1981.
    Google ScholarLocate open access versionFindings
  • James L. Crowley and Alice C. Parker. A representation for shape based on peaks and ridges in the difference of low-pass transform. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 6(2):156–170, 1984.
    Google ScholarLocate open access versionFindings
  • Emily L. Denton, Soumith Chintala, Arthur Szlam, and Rob Fergus. Deep generative image models using a Laplacian pyramid of adversarial networks. In Neural Information Processing Systems (NeurIPS), pages 1486–1494, 2015.
    Google ScholarLocate open access versionFindings
  • Laurent Dinh, David Krueger, and Yoshua Bengio. NICE: Non-linear independent components estimation. In Proceedings of the International Conference on Learning Representations (ICLR) Workshop, 2015.
    Google ScholarLocate open access versionFindings
  • Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using RealNVP. In Proceedings of the International Conference on Learning Representations (ICLR), 2017.
    Google ScholarLocate open access versionFindings
  • Garoe Dorta, Sara Vicente, Lourdes Agapito, Neill D.F. Campbell, Simon Prince, and Ivor Simpson. Laplacian pyramid of conditional variational autoencoders. In Proceedings of the European Conference on Visual Media Production (CVMP), pages 1–9, 2017.
    Google ScholarLocate open access versionFindings
  • Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. Generative adversarial nets. In Neural Information Processing Systems (NeurIPS), 2014.
    Google ScholarLocate open access versionFindings
  • Will Grathwohl, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, and David Duvenaud. FFJORD: Free-form continuous dynamics for scalable reversible generative models. In Proceedings of the International Conference on Learning Representations (ICLR), 2019.
    Google ScholarLocate open access versionFindings
  • Ishaan Gulrajani, Kundan Kumar, Faruk Ahmed, Adrien Ali Taiga, Francesco Visin, David Vazquez, and Aaron Courville. PixelVAE: A latent variable model for natural images. In Proceedings of the International Conference on Learning Representations (ICLR), 2017.
    Google ScholarLocate open access versionFindings
  • Alfred Haar. Zur Theorie der orthogonalen Funktionensysteme. Mathematische Annalen, 69(3): 331–371, 1910.
    Google ScholarLocate open access versionFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    Google ScholarLocate open access versionFindings
  • Jonathan Ho, Xi Chen, Aravind Srinivas, Yan Duan, and Pieter Abbeel. Flow++: Improving flow-based generative models with variational dequantization and architecture design. In Proceedings of the International Conference on Machine Learning (ICML), 2019.
    Google ScholarLocate open access versionFindings
  • Matthew D Hoffman and Andrew Gelman. The no-u-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1):1593–1623, 2014.
    Google ScholarLocate open access versionFindings
  • Emiel Hoogeboom, Taco S. Cohen, and Jakub M. Tomczak. Learning discrete distributions by dequantization. arXiv preprint, arXiv:2001.11235, 2020.
    Findings
  • Priyank Jaini, Kira A. Selby, and Yaoliang Yu. Sum-of-squares polynomial flow. In Proceedings of the International Conference on Machine Learning (ICML), 5 2019.
    Google ScholarLocate open access versionFindings
  • Animesh Karnewar, Oliver Wang, and Raghu Sesha Iyengar. MSG-GAN: Multi-scale gradient GAN for stable image synthesis. CoRR, abs/1903.06048, 2019.
    Findings
  • Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of GANs for improved quality, stability, and variation. In Proceedings of the International Conference on Learning Representations (ICLR), 2018.
    Google ScholarLocate open access versionFindings
  • Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4401–4410, 2019.
    Google ScholarLocate open access versionFindings
  • Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), 2014.
    Google ScholarLocate open access versionFindings
  • Diederik P. Kingma and Prafulla Dhariwal. Glow: Generative flow with invertible 1x1 convolutions. In Neural Information Processing Systems (NeurIPS), pages 10215–10224, 2018.
    Google ScholarLocate open access versionFindings
  • Diederik P. Kingma and Max Welling. An introduction to variational autoencoders. arXiv preprint, arXiv:1906.02691, 2019.
    Findings
  • Ivan Kobyzev, Simon Prince, and Marcus Brubaker. Normalizing flows: An introduction and review of current methods. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2020.
    Google ScholarLocate open access versionFindings
  • Jan J. Koenderink. The structure of images. Biological Cybernetics, 50(5):363–370, 1984.
    Google ScholarLocate open access versionFindings
  • Junpeng Lao, Christopher Suter, Ian Langmore, Cyril Chimisov, Ashish Saxena, Pavel Sountsov, Dave Moore, Rif A. Saurous, Matthew D. Hoffman, and Joshua V. Dillon. tfp.mcmc: Modern Markov chain Monte Carlo tools built for modern hardware. arXiv preprint, arXiv:2002.01184, 2020.
    Findings
  • Tony Lindeberg. Scale-space for discrete signals. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 12(3):234–254, 1990.
    Google ScholarLocate open access versionFindings
  • Tony Lindeberg. Scale-space theory: A basic tool for analyzing structures at different scales. Journal of Applied Statistics, 21(1-2):225–270, 1994.
    Google ScholarLocate open access versionFindings
  • Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), 2015.
    Google ScholarLocate open access versionFindings
  • Xuezhe Ma, Xiang Kong, Shanghang Zhang, and Eduard Hovy. MaCow: Masked convolutional generative flow. In Neural Information Processing Systems (NeurIPS), pages 5891–5900, 2019.
    Google ScholarLocate open access versionFindings
  • Stéphane Mallat. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 11(7):674–693, 1989.
    Google ScholarLocate open access versionFindings
  • Stephane Mallat. A wavelet tour of signal processing: The Sparse Way. Elsevier, 3rd edition, 2009. ISBN 9780123743701.
    Google ScholarFindings
  • David Marr. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. Henry Holt and Co., Inc., USA, 1982. ISBN 0716715678.
    Google ScholarLocate open access versionFindings
  • George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. Normalizing flows for probabilistic modeling and inference. arXiv preprint, arXiv:1912.02762, 2019.
    Findings
  • Ali Razavi, Aäron van den Oord, and Oriol Vinyals. Generating diverse high-fidelity images with VQ-VAE-2. In Neural Information Processing Systems (NeurIPS), pages 14837–14847, 2019.
    Google ScholarLocate open access versionFindings
  • Scott E. Reed, Aäron van den Oord, Nal Kalchbrenner, Sergio Gomez Colmenarejo, Ziyu Wang, Yutian Chen, Dan Belov, and Nando de Freitas. Parallel multiscale autoregressive density estimation. In Proceedings of the International Conference on Machine Learning (ICML), pages 2912–2921, 2017.
    Google ScholarLocate open access versionFindings
  • Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015.
    Google ScholarLocate open access versionFindings
  • Tim Salimans, Ian J. Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training GANs. In Neural Information Processing Systems (NeurIPS), 2016.
    Google ScholarLocate open access versionFindings
  • Tim Salimans, Andrej Karpathy, Xi Chen, and Diederik P. Kingma. PixelCNN++: Improving the PixelCNN with discretized logistic mixture likelihood and other modifications. In Proceedings of the International Conference on Learning Representations (ICLR), 2017.
    Google ScholarLocate open access versionFindings
  • Tamar Rott Shaham, Tali Dekel, and Tomer Michaeli. SinGAN: Learning a generative model from a single natural image. In Proceedings of the International Conference on Computer Vision (ICCV), pages 4569–4579, 2019.
    Google ScholarLocate open access versionFindings
  • Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. In Proceedings of the International Conference on Machine Learning (ICML), 2016.
    Google ScholarLocate open access versionFindings
  • Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 8798–8807, 2018.
    Google ScholarLocate open access versionFindings
  • Christina Winkler, Daniel Worrall, Emiel Hoogeboom, and Max Welling. Learning likelihoods with conditional normalizing flows. arXiv preprint arXiv:1912.00042, 2019.
    Findings
  • Andrew P. Witkin. Scale-space filtering. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 1019–1022, 1983.
    Google ScholarLocate open access versionFindings
  • Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
    Findings
  • Han Zhang, Tao Xu, and Hongsheng Li. StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the International Conference on Computer Vision (ICCV), pages 5908–5916, 2017.
    Google ScholarLocate open access versionFindings
  • Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris N. Metaxas. StackGAN++: Realistic image synthesis with stacked generative adversarial networks. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 41(8): 1947–1962, 2019. Code and project page is available at: https://yorkucvil.github.io/Wavelet-Flow
    Locate open access versionFindings
  • [40] Two downsampled versions at resolutions of 32 × 32 and 64 × 64 are used. The training set consists of 1.28 million images, and the validation set contains 50 000 images. The validation set is used as the test set, which is also done in [10, 25].
    Google ScholarFindings
Your rating :
0

 

Tags
Comments
小科