AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Denoising autoencoders provide an intuitive solution for domain adaptation: transform the features into a representation that is resistant to the noise that may characterize the domain adaptation process

Fast Easy Unsupervised Domain Adaptation with Marginalized Structured Dropout.

ACL, pp.538-544, (2014)

Cited by: 29|Views15
EI
Full Text
Bibtex
Weibo

Abstract

Unsupervised domain adaptation often relies on transforming the instance representation. However, most such approaches are designed for bag-of-words models, and ignore the structured features present in many problems in NLP. We propose a new technique called marginalized structured dropout, which exploits feature structure to obtain a rem...More

Code:

Data:

0
Introduction
  • Unsupervised domain adaptation is a fundamental problem for natural language processing, as the authors hope to apply the systems to datasets unlike those for which the authors have annotations.
  • This is relevant as labeled datasets become stale in comparison with rapidly evolving social media writing styles (Eisenstein, 2013), and as there is increasing interest in natural language processing for historical texts (Piotrowski, 2012).
  • While the marginalized denoising autoencoder is considerably faster than the original denoising autoencoder, it requires solving a system of equations that can grow very large, as realistic NLP tasks can involve 105 or more features
Highlights
  • Unsupervised domain adaptation is a fundamental problem for natural language processing, as we hope to apply our systems to datasets unlike those for which we have annotations
  • In this paper we investigate noising functions that are explicitly designed for structured feature spaces, which are common in NLP
  • We show how it is possible to marginalize over both types of noise, and find that the solution for structured dropout is substantially simpler and more efficient than the marginalized denoising autoencoder (mDA) approach of Chen et al (2012), which does not consider feature structure
  • Our work focuses on unsupervised domain adaptation, where no labeled data is available in the target domain
  • Denoising autoencoders provide an intuitive solution for domain adaptation: transform the features into a representation that is resistant to the noise that may characterize the domain adaptation process
Methods
  • The authors refer to baseline as training a CRF tagger on the source domain and testing on the target domain with only base features.
  • The authors include PCA to project the entire dataset onto a low-dimensional sub-space.
  • The authors compare against Structural Correspondence Learning (SCL; Blitzer et al, 2006), another feature learning algorithm.
  • The authors include the entire dataset to compute the feature projections; the authors conducted experiments using only the test and training data for feature projections, with very similar results.
Results
  • Table 2 presents results for different domain adaptation tasks.
  • The authors compute the transfer ratio, which is defined as adaptation accuracy baseline accuracy , shown in.
  • MDA outperforms SCL and PCA, the latter of which shows little improvement over the base features.
  • The various noising approaches for mDA give very similar results.
  • Structured dropout is orders of magnitude faster than the alternatives, as shown in Table 3.
  • The scrambling noise is most time-consuming, with cost dominated by a matrix multiplication
Conclusion
  • Conclusion and Future Work

    Denoising autoencoders provide an intuitive solution for domain adaptation: transform the features into a representation that is resistant to the noise that may characterize the domain adaptation process.
  • The authors take another step towards simplicity by showing that structured dropout can make marginalization even easier, obtaining dramatic speedups without sacrificing accuracy
Summary
  • Introduction:

    Unsupervised domain adaptation is a fundamental problem for natural language processing, as the authors hope to apply the systems to datasets unlike those for which the authors have annotations.
  • This is relevant as labeled datasets become stale in comparison with rapidly evolving social media writing styles (Eisenstein, 2013), and as there is increasing interest in natural language processing for historical texts (Piotrowski, 2012).
  • While the marginalized denoising autoencoder is considerably faster than the original denoising autoencoder, it requires solving a system of equations that can grow very large, as realistic NLP tasks can involve 105 or more features
  • Methods:

    The authors refer to baseline as training a CRF tagger on the source domain and testing on the target domain with only base features.
  • The authors include PCA to project the entire dataset onto a low-dimensional sub-space.
  • The authors compare against Structural Correspondence Learning (SCL; Blitzer et al, 2006), another feature learning algorithm.
  • The authors include the entire dataset to compute the feature projections; the authors conducted experiments using only the test and training data for feature projections, with very similar results.
  • Results:

    Table 2 presents results for different domain adaptation tasks.
  • The authors compute the transfer ratio, which is defined as adaptation accuracy baseline accuracy , shown in.
  • MDA outperforms SCL and PCA, the latter of which shows little improvement over the base features.
  • The various noising approaches for mDA give very similar results.
  • Structured dropout is orders of magnitude faster than the alternatives, as shown in Table 3.
  • The scrambling noise is most time-consuming, with cost dominated by a matrix multiplication
  • Conclusion:

    Conclusion and Future Work

    Denoising autoencoders provide an intuitive solution for domain adaptation: transform the features into a representation that is resistant to the noise that may characterize the domain adaptation process.
  • The authors take another step towards simplicity by showing that structured dropout can make marginalization even easier, obtaining dramatic speedups without sacrificing accuracy
Tables
  • Table1: Statistics of the Tycho Brahe Corpus
  • Table2: Accuracy results for adaptation from labeled data in 1800-1849, and in 1750-1849
  • Table3: Time, in seconds, to compute the feature transformation
Download tables as Excel
Related work
  • Domain adaptation Most previous work on domain adaptation focused on the supervised setting, in which some labeled data is available in the target domain (Jiang and Zhai, 2007; Daume III, 2007; Finkel and Manning, 2009). Our work focuses on unsupervised domain adaptation, where no labeled data is available in the target domain. Several representation learning methods have been proposed to solve this problem. In structural correspondence learning (SCL), the induced representation is based on the task of predicting the presence of pivot features. Autoencoders apply a similar idea, but use the denoised instances as the latent representation (Vincent et al, 2008; Glorot et al, 2011b; Chen et al, 2012). Within the context of denoising autoencoders, we have focused
Funding
  • This research was supported by National Science Foundation award 1349837
Reference
  • John Blitzer, Ryan McDonald, and Fernando Pereira. 2006. Domain adaptation with structural correspondence learning. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, EMNLP ’06, pages 120–128, Stroudsburg, PA, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • John Blitzer, Mark Dredze, and Fernando Pereira. 2007.
    Google ScholarFindings
  • Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Association for Computational Linguistics, Prague, Czech Republic.
    Google ScholarLocate open access versionFindings
  • John Blitzer. 2008. Domain Adaptation of Natural Language Processing Systems. Ph.D. thesis, University of Pennsylvania.
    Google ScholarFindings
  • Minmin Chen, Zhixiang Xu, Kilian Weinberger, and Fei Sha. 2012. Marginalized denoising autoencoders for domain adaptation. In John Langford and Joelle Pineau, editors, Proceedings of the 29th International Conference on Machine Learning (ICML12), ICML ’12, pages 767–774. ACM, New York, NY, USA, July.
    Google ScholarLocate open access versionFindings
  • Hal Daume III. 2007. Frustratingly easy domain adaptation. In ACL, volume 1785, page 1787.
    Google ScholarLocate open access versionFindings
  • Paramveer S Dhillon, Dean P Foster, and Lyle H Ungar. 2011. Multi-view learning of word embeddings via cca. In NIPS, volume 24, pages 199–207.
    Google ScholarLocate open access versionFindings
  • Cıcero Nogueira Dos Santos, Ruy L Milidiu, and Raul P Renterıa. 200Portuguese part-of-speech tagging using entropy guided transformation learning. In Computational Processing of the Portuguese Language, pages 143–152. Springer.
    Google ScholarLocate open access versionFindings
  • Jacob Eisenstein. 2013. What to do about bad language on the internet. In Proceedings of NAACL, Atlanta, GA.
    Google ScholarLocate open access versionFindings
  • Jenny Rose Finkel and Christopher D Manning. 2009. Hierarchical bayesian domain adaptation. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 602–6Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Charlotte Galves and Pablo Faria. 2010. Tycho Brahe Parsed Corpus of Historical Portuguese. http://www.tycho.iel.unicamp.br/tycho/corpus/en/index.html.
    Findings
  • Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011a. Deep sparse rectifier networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR W&CP Volume, volume 15, pages 315–323.
    Google ScholarLocate open access versionFindings
  • Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011b. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 513–520.
    Google ScholarLocate open access versionFindings
  • Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R Salakhutdinov. 2012. Improving neural networks by preventing coadaptation of feature detectors. arXiv preprint arXiv:1207.0580.
    Findings
  • Fei Huang and Alexander Yates. 2009. Distributional representations for handling sparsity in supervised sequence-labeling. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1Volume 1, pages 495–503. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Fei Huang and Alexander Yates. 2012. Biased representation learning for domain adaptation. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1313–1323. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jing Jiang and ChengXiang Zhai. 2007. Instance weighting for domain adaptation in nlp. In ACL, volume 2007, page 22.
    Google ScholarLocate open access versionFindings
  • Fabio N Kepler and Marcelo Finger. 2006. Comparing two markov methods for part-of-speech tagging of portuguese. In Advances in Artificial IntelligenceIBERAMIA-SBIA 2006, pages 482–491. Springer.
    Google ScholarLocate open access versionFindings
  • Taesun Moon and Jason Baldridge. 2007. Part-ofspeech tagging for middle english through alignment and projection of parallel diachronic texts. In EMNLP-CoNLL, pages 390–399.
    Google ScholarLocate open access versionFindings
  • Cıcero Nogueira Dos Santos, Ruy L. Milidiu, and Raul P. Renterıa. 2008. Portuguese part-of-speech tagging using entropy guided transformation learning. In Proceedings of the 8th international conference on Computational Processing of the Portuguese Language, PROPOR ’08, pages 143–152, Berlin, Heidelberg. Springer-Verlag.
    Google ScholarLocate open access versionFindings
  • Naoaki Okazaki. 2007. Crfsuite: a fast implementation of conditional random fields (crfs).
    Google ScholarFindings
  • Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. Knowledge and Data Engineering, IEEE Transactions on, 22(10):1345–1359.
    Google ScholarLocate open access versionFindings
  • Marco Pennacchiotti and Fabio Massimo Zanzotto. 2008. Natural language processing across time: An empirical investigation on italian. In Advances in Natural Language Processing, pages 371–382. Springer.
    Google ScholarLocate open access versionFindings
  • Michael Piotrowski. 2012. Natural language processing for historical texts. Synthesis Lectures on Human Language Technologies, 5(2):1–157.
    Google ScholarLocate open access versionFindings
  • Adwait Ratnaparkhi. 1996. A maximum entropy model for part-of-speech tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, April 16.
    Google ScholarLocate open access versionFindings
  • Noah A Smith. 2011. Linguistic structure prediction. Synthesis Lectures on Human Language Technologies, 4(2):1–274.
    Google ScholarLocate open access versionFindings
  • Anders Søgaard. 2013. Semi-supervised learning and domain adaptation in natural language processing. Synthesis Lectures on Human Language Technologies, 6(2):1–103.
    Google ScholarLocate open access versionFindings
  • Kristina Toutanova, Dan Klein, Christopher D Manning, and Yoram Singer. 2003. Feature-rich part-ofspeech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language TechnologyVolume 1, pages 173–180. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, pages 1096–1103. ACM.
    Google ScholarLocate open access versionFindings
  • Sida I. Wang, Mengqiu Wang, Stefan Wager, Percy Liang, and Christopher D. Manning. 2013. Feature noising for log-linear structured prediction. In Empirical Methods in Natural Language Processing (EMNLP).
    Google ScholarLocate open access versionFindings
  • Min Xiao and Yuhong Guo. 2013. Domain adaptation for sequence labeling tasks with a probabilistic language adaptation model. In Sanjoy Dasgupta and David Mcallester, editors, Proceedings of the 30th International Conference on Machine Learning (ICML-13), volume 28, pages 293–301. JMLR Workshop and Conference Proceedings.
    Google ScholarLocate open access versionFindings
Author
Your rating :
0

 

Tags
Comments
小科