## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Robust Optimal Transport with Applications in Generative Modeling and Domain Adaptation

NIPS 2020, (2020)

EI

Keywords

Abstract

Optimal Transport (OT) distances such as Wasserstein have been used in several areas such as GANs and domain adaptation. OT, however, is very sensitive to outliers (samples with large noise) in the data since in its objective function, every sample, including outliers, is weighed similarly due to the marginal constraints. To remedy this...More

Introduction

- Estimating distances between probability distributions lies at the heart of several problems in machine learning and statistics.
- A class of distance measures that has gained immense popularity in several machine learning applications is Optimal Transport (OT) [29].
- Optimal transport enjoys several nice properties including structure preservation, existence in smooth and non-smooth settings, being well defined for discrete and continuous distributions [29], etc.
- Given two probability distributions PX , PY ∈ P rob(X ) and a continuous cost function c : X × X → R, optimal transport finds the minimum cost for transporting the density PX to PY [29], given by the following cost function.
- The dual form of the optimization (1) is often used

Highlights

- Estimating distances between probability distributions lies at the heart of several problems in machine learning and statistics
- The applications of previous formulations of robust Optimal Transport (OT) are limited in practical deep learning problems such as GANs and domain adaptation due to the instability of their optimization solvers
- We derive a computationally efficient dual form of the robust OT objective that is suited for deep learning applications
- The use of optimal transport (OT) distances such as the Wasserstein distance have become increasingly popular in machine learning with several applications in generative modeling, image-to-image translation, inpainting, domain adaptation, etc
- Using OT for large-scale machine learning problems can be problematic since noise in large datasets is inevitable
- Our robust OT formulation leads to improved accuracy compared to the standard adversarial adaptation methods
- Building on theoretical formulations of unbalanced OT which suffer from computational instability in deep learning applications, we have developed an efficient learning method that is provably robust against outliers and is amenable to complex deep learning applications such as deep generative modeling and domain adaptation

Methods

- Accuracy Source only

Adversarial

Robust adversarial

Adversarial

Robust adversarial

63.9 follows: The authors artificially add outlier samples to the CIFAR-10 dataset such they occupy γ fraction of the samples. - While Wasserstein GAN fits outliers in addition to the CIFAR samples, robust Wasserstein GAN effectively ignores outliers and generates samples only from the CIFAR-10 dataset.
- Since Wasserstein GAN generates outlier samples in addition to the CIFAR-10 samples, the FID scores get worse as the outlier fraction increases.
- Robust Wasserstein GAN, on the other hand, obtains good FID even for large fraction of outliers.
- This trend is consistent for both outlier distributions MNIST and uniform noise

Results

- The authors' robust OT formulation leads to improved accuracy compared to the standard adversarial adaptation methods.

Conclusion

- The authors study the robust optimal transport which is insensitive to outliers in the data.
- Building on theoretical formulations of unbalanced OT which suffer from computational instability in deep learning applications, the authors have developed an efficient learning method that is provably robust against outliers and is amenable to complex deep learning applications such as deep generative modeling and domain adaptation
- These attributes ensure broader impacts of this work in both theoretical and applied machine learning communities and can act as a bridge between the two.
- To the best of the knowledge, this work does not lead to any negative outcomes either in ethical or societal aspects

Summary

## Introduction:

Estimating distances between probability distributions lies at the heart of several problems in machine learning and statistics.- A class of distance measures that has gained immense popularity in several machine learning applications is Optimal Transport (OT) [29].
- Optimal transport enjoys several nice properties including structure preservation, existence in smooth and non-smooth settings, being well defined for discrete and continuous distributions [29], etc.
- Given two probability distributions PX , PY ∈ P rob(X ) and a continuous cost function c : X × X → R, optimal transport finds the minimum cost for transporting the density PX to PY [29], given by the following cost function.
- The dual form of the optimization (1) is often used
## Objectives:

The authors' objective is to handle outliers in deep learning applications involving OT.## Methods:

Accuracy Source only

Adversarial

Robust adversarial

Adversarial

Robust adversarial

63.9 follows: The authors artificially add outlier samples to the CIFAR-10 dataset such they occupy γ fraction of the samples.- While Wasserstein GAN fits outliers in addition to the CIFAR samples, robust Wasserstein GAN effectively ignores outliers and generates samples only from the CIFAR-10 dataset.
- Since Wasserstein GAN generates outlier samples in addition to the CIFAR-10 samples, the FID scores get worse as the outlier fraction increases.
- Robust Wasserstein GAN, on the other hand, obtains good FID even for large fraction of outliers.
- This trend is consistent for both outlier distributions MNIST and uniform noise
## Results:

The authors' robust OT formulation leads to improved accuracy compared to the standard adversarial adaptation methods.## Conclusion:

The authors study the robust optimal transport which is insensitive to outliers in the data.- Building on theoretical formulations of unbalanced OT which suffer from computational instability in deep learning applications, the authors have developed an efficient learning method that is provably robust against outliers and is amenable to complex deep learning applications such as deep generative modeling and domain adaptation
- These attributes ensure broader impacts of this work in both theoretical and applied machine learning communities and can act as a bridge between the two.
- To the best of the knowledge, this work does not lead to any negative outcomes either in ethical or societal aspects

- Table1: Quantitative evaluation of robust WGAN on clean datasets. In each cell, the top row corresponds to the Inception score and the bottom row corresponds to the FID score
- Table2: Cross-domain recognition accuracy on VISDA-17 dataset using Resnet-18 model averaged over 3 runs
- Table3: Adaptation accuracy on VISDA-17 using Resnet-50 model averaged over 3 runs
- Table4: Adaptation accuracy on VISDA-17 using Resnet-101 model averaged over 3 runs
- Table5: Sensitivity Analysis of ρ
- Table6: Analyzing mode drop: Training robust GAN on imbalanced CelebA. Each column denotes an experiment where GAN models are trained on input dataset having the respective fraction of males as given in row 1. Rows 2 and 3 denote the fraction of males in the generated dataset obtained by training Vanilla and Robust GAN respectively. We observe that images of males are generated even when the fraction of males in the input dataset is as low as 2%
- Table7: Architectures and hyper-parameters: Resnet model
- Table8: Architectures and hyper-parameters: DCGAN model
- Table9: Architectures of weight network
- Table10: Hyper-parameters for domain adaptation experiments

Funding

- This project was supported in part by NSF CAREER AWARD 1942230, a Simons Fellowship on Deep Learning Foundations, and a MURI program from the Army Research Office under the grant W911NF17-1-0304

Reference

- Martin Arjovsky, Soumith Chintala, and L’eon Bottou. Wasserstein generative adversarial networks. In Proceedings of the 34nd International Conference on Machine Learning, ICML 2017, Sydney, Australia, 7-9 August, 2017, 2017.
- Yogesh Balaji, Rama Chellappa, and Soheil Feizi. Normalized wasserstein for mixture distributions with applications in adversarial learning and domain adaptation. In The IEEE International Conference on Computer Vision (ICCV), October 2019.
- Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, and Gabriel PeyrÃ©. Iterative bregman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 2015.
- Nicolas Bonneel, Gabriel Peyré, and Marco Cuturi. Wasserstein Barycentric Coordinates: Histogram Regression Using Optimal Transport. ACM Transactions on Graphics (SIGGRAPH 2016), 35(4), 2016.
- Fabio Maria Carlucci, Lorenzo Porzi, Barbara Caputo, Elisa Ricci, and Samuel Rota Bulo. Autodial: Automatic domain alignment layers. In International Conference on Computer Vision (ICCV), 2017.
- Lenaic Chizat, Gabriel Peyré, Bernhard Schmitzer, and François-Xavier Vialard. Unbalanced optimal transport: Dynamic and kantorovich formulation. arXiv preprint arXiv:1508.05216, 2015.
- Lenaic Chizat, Gabriel Peyré, Bernhard Schmitzer, and François-Xavier Vialard. Scaling algorithms for unbalanced transport problems. arXiv preprint arXiv:1607.05816, 2016.
- Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 2292–2300. Curran Associates, Inc., 2013.
- Charlie Frogner, Chiyuan Zhang, Hossein Mobahi, Mauricio Araya, and Tomaso A Poggio. Learning with a wasserstein loss. In Advances in Neural Information Processing Systems, pages 2053–2061, 2015.
- Yaroslav Ganin and Victor Lempitsky. Unsupervised domain adaptation by backpropagation. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1180–1189, Lille, France, 07–09 Jul 2015. PMLR.
- Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. Improved training of wasserstein gans. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 5767–5777. Curran Associates, Inc., 2017.
- Stanislav Kondratyev, Léonard Monsaingeon, Dmitry Vorotnikov, et al. A new optimal transport distance on the space of finite radon measures. Advances in Differential Equations, 21(11/12):1117–1164, 2016.
- Matthias Liero, Alexander Mielke, and Giuseppe Savaré. Optimal entropy-transport problems and a new hellinger–kantorovich distance between positive measures. Inventiones mathematicae, 211(3):969–1117, Mar 2018.
- Mingsheng Long, Yue Cao, Jianmin Wang, and Michael I. Jordan. Learning transferable features with deep adaptation networks. In Proceedings of the 32nd International Conference on Machine Learning, pages 97–105, 2015.
- Mingsheng Long, ZHANGJIE CAO, Jianmin Wang, and Michael I Jordan. Conditional adversarial domain adaptation. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. CesaBianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 1640–1650. Curran Associates, Inc., 2018.
- Mingsheng Long, Jianmin Wang, and Michael I. Jordan. Unsupervised domain adaptation with residual transfer networks. CoRR, abs/1602.04433, 2016.
- Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I. Jordan. Deep transfer learning with joint adaptation networks. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 2208–22PMLR, 2017.
- Takeru Miyato and Masanori Koyama. cGANs with projection discriminator. In International Conference on Learning Representations, 2018.
- Hongseok Namkoong and John C Duchi. Stochastic gradient methods for distributionally robust optimization with f-divergences. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 2208–2216. Curran Associates, Inc., 2016.
- Xingchao Peng, Qinxun Bai, Xide Xia, Zijun Huang, Kate Saenko, and Bo Wang. Moment matching for multi-source domain adaptation. arXiv preprint arXiv:1812.01754, 2018.
- Xingchao Peng, Ben Usman, Neela Kaushik, Judy Hoffman, Dequan Wang, and Kate Saenko. Visda: The visual domain adaptation challenge. CoRR, abs/1710.06924, 2017.
- Pedro Oliveira Pinheiro. Unsupervised domain adaptation with similarity learning. CoRR, abs/1711.08995, 2017.
- Kuniaki Saito, Kohei Watanabe, Yoshitaka Ushiku, and Tatsuya Harada. Maximum classifier discrepancy for unsupervised domain adaptation. arXiv preprint arXiv:1712.02560, 2017.
- Maziar Sanjabi, Jimmy Ba, Meisam Razaviyayn, and Jason D Lee. On the convergence and robustness of training gans with regularized optimal transport. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 7091–7101. Curran Associates, Inc., 2018.
- Swami Sankaranarayanan, Yogesh Balaji, Carlos D. Castillo, and Rama Chellappa. Generate to adapt: Aligning domains using generative adversarial networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
- Jian Shen, Yanru Qu, Weinan Zhang, and Yong Yu. Wasserstein distance guided representation learning for domain adaptation. In AAAI, pages 4058–4065. AAAI Press, 2018.
- Justin Solomon, Fernando de Goes, Gabriel Peyré, Marco Cuturi, Adrian Butscher, Andy Nguyen, Tao Du, and Leonidas J. Guibas. Convolutional wasserstein distances: efficient optimal transportation on geometric domains. ACM Trans. Graph., 34(4):66:1–66:11, 2015.
- Justin Solomon, Raif M. Rustamov, Leonidas J. Guibas, and Adrian Butscher. Wasserstein propagation for semi-supervised learning. In ICML, volume 32 of JMLR Workshop and Conference Proceedings, pages 306–314. JMLR.org, 2014.
- Cédric Villani. Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008.
- Karren D. Yang and Caroline Uhler. Scalable unbalanced optimal transport using generative adversarial networks. In International Conference on Learning Representations, 2019.
- 5. Samples generated by our dual and unbalanced OT dual is shown in Figure. 6. We observe that the model trained using the Unbalanced OT dual produces loss curve that is flat and does not learn a proper solution. This is also evident from Figure.
- 6. Models trained using our dual generates CIFAR-like samples, while the one trained with unbalanced OT dual produces noisy images.

Tags

Comments