Extensive experimental results on handwritten digit recognition, object recognition and action recognition demonstrate that our proposed Maximum Mean Discrepancy-adversarial autoencoder is able to learn domain-invariant features, which lead to stateof-the-art performance for doma...
Domain Generalization With Adversarial Feature Learning
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), pp.5400-5409, (2018)
In this paper, we tackle the problem of domain generalization: how to learn a generalized feature representation for an "unseen" target domain by taking the advantage of multiple seen source-domain data. We present a novel framework based on adversarial autoencoders to learn a generalized latent feature representation across domains for d...更多
下载 PDF 全文
- In some computer vision applications, it is often the case that there are only some unlabeled training data in the domain of interest (a.k.a. the target domain), while there are plenty of labeled training data in some related domain(s) (a.k.a. the source domain(s)).
- Many studies have been conducted to leverage the unlabeled data of the target domain given in advance for adapting the model learned with the source-domain labeled data to the target domain
- These are referred to as domain adaptation methods [28, 13, 9].
- A key research issue is how to learn a representation of good generalization for the unseen target domain from some related source domains
- In some computer vision applications, it is often the case that there are only some unlabeled training data in the domain of interest (a.k.a. the target domain), while there are plenty of labeled training data in some related domain(s) (a.k.a. the source domain(s))
- We proposed a novel framework for domain generalization, which aims to learn an universal representation across domains by minimizing the difference between the seen source domains, and by matching the distribution of data with the learned representation to a prior distribution
- We evaluate the performance on other vision problems, such as object recognition on Caltech , PASCAL VOC2007 , LabelMe  and SUN09 , as well as the action recognition based on different angles on IXMAS 
- We propose a novel framework for domain generalization, denoted by Maximum Mean Discrepancy-adversarial autoencoder
- The main idea is to learn a feature representation by jointly optimization a multi-domain autoencoder regularized by the Maximum Mean Discrepancy distance, an discriminator and a classifier in an adversarial training manner
- Extensive experimental results on handwritten digit recognition, object recognition and action recognition demonstrate that our proposed Maximum Mean Discrepancy-adversarial autoencoder is able to learn domain-invariant features, which lead to stateof-the-art performance for domain generalization
- The authors compare the proposed MMD-AAE with the following baseline methods for domain generalization in terms of classification accuracy.
- To make a fair comparison, the authors set the dimension of hidden layer to be 500 for handwritten digit recognition, and 2,000 for object and action recognition
- As reviewed in Section 2, there exist many deep learning based domain adaptation methods that use either MMD minimization [21, 22] or adversarial training  to align the source-domain distribution(s) to the target domain distribution.
- The authors' proposed method aims to align distributions of the seen source domains by jointly imposing the multi-domain MMD distance as well as adversarial loss to a prior distribution.
- The main idea is to learn a feature representation by jointly optimization a multi-domain autoencoder regularized by the MMD distance, an discriminator and a classifier in an adversarial training manner.
- Extensive experimental results on handwritten digit recognition, object recognition and action recognition demonstrate that the proposed MMD-AAE is able to learn domain-invariant features, which lead to stateof-the-art performance for domain generalization
- Table1: Performance on handwritten digit recognition. The best performance is highlighted in boldface
- Table2: Performance on object recognition
- Table3: Performance on action recognition
- Table4: Impact of different components on performance
- Table5: Performance with different prior distributions
- Table6: Performance with different parameters
- As mentioned at the beginning of the previous section, both domain adaptation and domain generalization aim to learn a precise classifier to be used for the target domain by leveraging labeled data from the source domain(s). The difference between them is that for domain adaptation, some unlabeled data and even a few labeled data from the target domain are utilized to capture properties of the target domain for model adaptation [27, 9, 33, 4, 13, 22, 35, 21]. While numerous approaches have been proposed for domain adaptation, less attention has been raised for domain generalization. Some representative works have been reviewed in the previous section.
Our work is also related to Generative Adversarial Network (GAN)  which has been explored for generative tasks. In GAN, there are two types of networks: A generative model G that aims to capture the distribution of the training data for data generation, and a discriminative model D that aims to distinguish between the instances drawn from G and the original data sampled from the training dataset. The generative model G and the discriminative model D are jointly trained in a competitive fashion: 1) Train D to distinguish the true instances from the fake instances generated by G. 2) Train G to fool D with its generated instances. Recently, many GAN-style algorithms have been proposed. For example, Li et al  proposed a generative model, where MMD is employed to match the hidden representations generated from training data and random noise. Makhzani et al  proposed adversarial autoencoder (AAE) to train the encoder and the decoder using an adversarial learning strategy.
- The ROSE Lab is supported by the National Research Foundation, Singapore, and the Infocomm Media Development Authority, Singapore
- Pan thanks for the supports from NTU Singapore Nanyang Assistant Professorship (NAP) grant M4081532.020 and Singapore MOE AcRF Tier-1 grant 2016-T1-001-159
- Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013.
- M. J. Choi, J. J. Lim, A. Torralba, and A. S. Willsky. Exploiting hierarchical context on a large database of object categories. In CVPR, 2010.
- J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. In ICML, 2014.
- L. Duan, I. W. Tsang, D. Xu, and T. Chua. Domain adaptation from multiple sources via auxiliary classifiers. In ICML, pages 289–296, 2009.
- V. Dumoulin, I. Belghazi, B. Poole, A. Lamb, M. Arjovsky, O. Mastropietro, and A. Courville. Adversarially learned inference. arXiv preprint arXiv:1606.00704, 2016.
- M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2):303– 338, 2010.
- C. Fang, Y. Xu, and D. N. Rockmore. Unbiased metric learning: On the utilization of multiple datasets and web images for softening bias. In ICCV, pages 1657–1664, 2013.
- L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Computer vision and Image understanding, 106(1):59–70, 2007.
- B. Fernando, A. Habrard, M. Sebban, and T. Tuytelaars. Unsupervised visual domain adaptation using subspace alignment. In ICCV, 2013.
- Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domainadversarial training of neural networks. Journal of Machine Learning Research, 17(59):1–35, 2016.
- M. Ghifary, W. Bastiaan Kleijn, M. Zhang, and D. Balduzzi. Domain generalization for object recognition with multi-task autoencoders. In ICCV, 2015.
- M. Ghifary, W. B. Kleijn, M. Zhang, D. Balduzzi, and W. Li. Deep reconstruction-classification networks for unsupervised domain adaptation. In ECCV, 2016.
- B. Gong, Y. Shi, F. Sha, and K. Grauman. Geodesic flow kernel for unsupervised domain adaptation. In CVPR, 2012.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In NIPS, 2014.
- A. Gretton, K. M. Borgwardt, M. Rasch, B. Scholkopf, and A. J. Smola. A kernel method for the two-sample-problem. In NIPS, 2006.
- A. Gretton, D. Sejdinovic, H. Strathmann, S. Balakrishnan, M. Pontil, K. Fukumizu, and B. K. Sriperumbudur. Optimal kernel choice for large-scale two-sample tests. In NIPS, 2012.
- D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Y. LeCun. The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998.
- D. Li, Y. Yang, Y.-Z. Song, and T. M. Hospedales. Deeper, broader and artier domain generalization. In ICCV, 2017.
- Y. Li, K. Swersky, and R. Zemel. Generative moment matching networks. In ICML, 2015.
- M. Long, Y. Cao, J. Wang, and M. Jordan. Learning transferable features with deep adaptation networks. In ICML, 2015.
- M. Long, H. Zhu, J. Wang, and M. I. Jordan. Unsupervised domain adaptation with residual transfer networks. In NIPS, 2016.
- A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey. Adversarial autoencoders. ICLR Workshop, 2016.
- X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. P. Smolley. Least squares generative adversarial networks. arXiv preprint ArXiv:1611.04076, 2016.
- S. Motiian, M. Piccirilli, D. A. Adjeroh, and G. Doretto. Unified deep supervised domain adaptation and generalization. In ICCV, 2017.
- K. Muandet, D. Balduzzi, and B. Scholkopf. Domain generalization via invariant feature representation. In ICML, 2013.
- S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 22(2):199–210, 2011.
- S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):1345–1359, 2010.
- B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman. Labelme: a database and web-based tool for image annotation. International journal of computer vision, 77(1):157– 173, 2008.
- K. Saenko, B. Kulis, M. Fritz, and T. Darrell. Adapting visual category models to new domains. In ECCV, 2010.
- A. J. Smola, A. Gretton, L. Song, and B. Scholkopf. A hilbert space embedding for distributions. In ALT, pages 13–31, 2007.
- B. K. Sriperumbudur, K. Fukumizu, A. Gretton, G. R. G. Lanckriet, and B. Scholkopf. Kernel choice and classifiability for RKHS embeddings of probability distributions. In NIPS, pages 1750–1758, 2009.
- B. Sun, J. Feng, and K. Saenko. Return of frustratingly easy domain adaptation. In AAAI, 2016.
- E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell. Adversarial discriminative domain adaptation. In CVPR, 2017.
- E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell. Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474, 2014.
- R. Wan, B. Shi,, L.-Y. Duan, A. H. Tan, W. Gao, and A. C. Kot. Region-aware reflection removal with unified content and gradient priors. In IEEE Transactions on Image Processing.
- H. Wang, A. Klaser, C. Schmid, and C.-L. Liu. Dense trajectories and motion boundary descriptors for action recognition. International journal of computer vision, 103(1):60– 79, 2013.
- D. Weinland, R. Ronfard, and E. Boyer. Free viewpoint action recognition using motion history volumes. Computer vision and image understanding, 104(2):249–257, 2006.
- Z. Xu, W. Li, L. Niu, and D. Xu. Exploiting low-rank structure from latent domains for domain generalization. In ECCV. 2014.
- P. Yang and W. Gao. Multi-view discriminant transfer learning. In IJCAI, 2013.