AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
On average across the test sets, all Task Refinement Learning-Pivot Based Language Modeling methods improve over the original PBLM with the best performing method, RF2, improving by as much as 2.1% on average

Task Refinement Learning for Improved Accuracy and Stability of Unsupervised Domain Adaptation

ACL (1), pp.5895-5906, (2019)

被引用2|浏览115
EI
下载 PDF 全文
引用
微博一下

摘要

Pivot Based Language Modeling (PBLM) (<a class="ref-link" id="cZiser_2018_a" href="#rZiser_2018_a">Ziser and Reichart, 2018a</a>), combining LSTMs with pivot-based methods, has yielded significant progress in unsupervised domain adaptation. However, this approach is still challenged by the large pivot detection problem that should be solv...更多

代码

数据

0
简介
  • Domain adaptation (DA, (Daume III, 2007; BenDavid et al, 2010)) is a fundamental challenge in NLP, as many language processing algorithms require costly labeled data that can be found in only a handful of domains.
  • Earlier DReL approaches (Blitzer et al, 2006, 2007) were based on a linear mapping of the original feature space to a new one, modeling the connections between pivot features – features that are frequent in the source and the target domains and are highly correlated with the task label in the source domain – and the complementary set of non-pivot features.
  • The authors believe this is the most realistic setup if one likes to extend the reach of NLP to a large number of domains
重点内容
  • Domain adaptation (DA, (Daume III, 2007; BenDavid et al, 2010)) is a fundamental challenge in NLP, as many language processing algorithms require costly labeled data that can be found in only a handful of domains
  • Task Refinement Learning (TRL)-Pivot Based Language Modeling (PBLM)-CNN is more robust than plain PBLM-CNN, consistently achieving a higher maximum, minimum and average results as well as a lower standard deviation across the 30 configurations we considered for each model
  • On average across the test sets, all TRL-PBLM methods improve over the original PBLM (NoTRL) with the best performing method, RF2, improving by as much as 2.1% on average (80.9 vs. 78.8)
  • We proposed Task Refinement Learning algorithms for domain adaptation with representation learning
  • Our TRL algorithms are tailored to the PBLM representation learning model of ZR18 and aim to provide more effective training for this model
  • The resulting PBLM-CNN model improves both the accuracy and the stability of the original PBLM-CNN model where PBLM is trained without TRL
方法
  • The authors implemented the setup of ZR18, including datasets, baselines, and hyperparameter details.

    Task and Domains Following ZR18, and a large body of DA work, the authors experiment with the task of binary cross-domain sentiment classification with the product review domains of Blitzer et al (2007) – Books (B), DVDs (D), Electronic items (E) and Kitchen appliances (K).
  • The authors implemented the setup of ZR18, including datasets, baselines, and hyperparameter details.
  • The authors consider the airline review domain that was presented by ZR18, who demonstrated that adaptation from the Blitzer product domains to this domain, and vice versa, is more challenging than adaptation between the Blitzer product domains.
  • For each of the domains the authors consider 2000 labeled reviews, 1000 positive and 1000 negative, and unlabeled reviews: 6000 (B), 34741 (D), 13153 (E), 16785 (K) and 39396 (A).
  • The authors include each of the domains considered in ZR18 at least once.
结果
  • Overall Performance The authors' first result is presented in Table 1.
  • On average across the test sets, all TRL-PBLM methods improve over the original PBLM (NoTRL) with the best performing method, RF2, improving by as much as 2.1% on average (80.9 vs 78.8).
  • In all 6 setups one of the TRLPBLM methods performs best.
  • In two setups RF2 improves over NoTRL by more than 3.5%: 80.2 vs 75 (E-D) and 86.1 vs 82.5 (B-K).
  • In the remaining two setups a TRL method improves, by less than 0.5%.
  • The 80.9% averaged accuracy of RF2 compares favorably with the 74.4% of AE-SCL-SR, the strongest baseline from ZR18
结论
  • The authors proposed Task Refinement Learning algorithms for domain adaptation with representation learning.
  • The authors' TRL algorithms are tailored to the PBLM representation learning model of ZR18 and aim to provide more effective training for this model.
  • The resulting PBLM-CNN model improves both the accuracy and the stability of the original PBLM-CNN model where PBLM is trained without TRL.
  • In future work the authors would like to develop more sophisticated TRL algorithms, for both in-domain and domain adaptation NLP setups.
  • The authors would like to establish the theoretical groundings to the improved stability achieved by TRL, and to explore this effect beyond domain adaptation
总结
  • Introduction:

    Domain adaptation (DA, (Daume III, 2007; BenDavid et al, 2010)) is a fundamental challenge in NLP, as many language processing algorithms require costly labeled data that can be found in only a handful of domains.
  • Earlier DReL approaches (Blitzer et al, 2006, 2007) were based on a linear mapping of the original feature space to a new one, modeling the connections between pivot features – features that are frequent in the source and the target domains and are highly correlated with the task label in the source domain – and the complementary set of non-pivot features.
  • The authors believe this is the most realistic setup if one likes to extend the reach of NLP to a large number of domains
  • Methods:

    The authors implemented the setup of ZR18, including datasets, baselines, and hyperparameter details.

    Task and Domains Following ZR18, and a large body of DA work, the authors experiment with the task of binary cross-domain sentiment classification with the product review domains of Blitzer et al (2007) – Books (B), DVDs (D), Electronic items (E) and Kitchen appliances (K).
  • The authors implemented the setup of ZR18, including datasets, baselines, and hyperparameter details.
  • The authors consider the airline review domain that was presented by ZR18, who demonstrated that adaptation from the Blitzer product domains to this domain, and vice versa, is more challenging than adaptation between the Blitzer product domains.
  • For each of the domains the authors consider 2000 labeled reviews, 1000 positive and 1000 negative, and unlabeled reviews: 6000 (B), 34741 (D), 13153 (E), 16785 (K) and 39396 (A).
  • The authors include each of the domains considered in ZR18 at least once.
  • Results:

    Overall Performance The authors' first result is presented in Table 1.
  • On average across the test sets, all TRL-PBLM methods improve over the original PBLM (NoTRL) with the best performing method, RF2, improving by as much as 2.1% on average (80.9 vs 78.8).
  • In all 6 setups one of the TRLPBLM methods performs best.
  • In two setups RF2 improves over NoTRL by more than 3.5%: 80.2 vs 75 (E-D) and 86.1 vs 82.5 (B-K).
  • In the remaining two setups a TRL method improves, by less than 0.5%.
  • The 80.9% averaged accuracy of RF2 compares favorably with the 74.4% of AE-SCL-SR, the strongest baseline from ZR18
  • Conclusion:

    The authors proposed Task Refinement Learning algorithms for domain adaptation with representation learning.
  • The authors' TRL algorithms are tailored to the PBLM representation learning model of ZR18 and aim to provide more effective training for this model.
  • The resulting PBLM-CNN model improves both the accuracy and the stability of the original PBLM-CNN model where PBLM is trained without TRL.
  • In future work the authors would like to develop more sophisticated TRL algorithms, for both in-domain and domain adaptation NLP setups.
  • The authors would like to establish the theoretical groundings to the improved stability achieved by TRL, and to explore this effect beyond domain adaptation
表格
  • Table1: Sentiment accuracy when hyper-parameters are tuned with development data
  • Table2: Statistics of the test set accuracy distribution achieved by the PBLM-CNN sentiment classifier, when adapted between domains with RF2, BasicTRL, and NoTRL (the first two are TRL-based methods). The statistics are computed across 30 model configurations
  • Table3: Ablation analysis. B-TRL is BasicTRL
  • Table4: Top 10 nearest neighbors (ranked from the closest neighbor downward) of the pivot ”highly recommended” according to three models: NoTRL (plain PBLM), BasicTRL and RF2. TRL training results in all members of the neighbor list of a pivot being of the same sentiment class as the pivot itself
Download tables as Excel
基金
  • This research has been funded by an ISF personal grant on ”Domain Adaptation in NLP: Combining Deep Learning with Domain and Task Knowledge”
研究对象与分析
domain pairs: 6
However, while in CL the prediction task is fixed but the trained algorithm is exposed to increasingly more complex training examples in subsequent stages, in TRL the algorithm is trained to solve increasingly more complex tasks in subsequent stages, but the training data is kept fixed across the stages. We implemented the experimental setup of ZR18 for sentiment classification, considering all their 5 domains for a total 6 domain pairs (§ 4).2. Our TRL-PBLM-CNN model is identical to the state-of-the-art PBLM-CNN of ZR18, except that PBLM is trained with one of our TRL methods

domain pairs: 20
Domain adaptation is a long standing NLP challenge (Roark and Bacchiani, 2003; Chelba and. 2Since TRL-PBLM requires multiple PBLM training stages, it was computationally demanding to experiment with all the 20 domain pairs of ZR18. See § 4 for more details

cases: 6
Moreover, even the min values of RF2 consistently outperform the NoDA model (where a classifier is trained on the source domain and applied to the target domain without domain adaptation; bottom line of Table 1) and the min values of BasicTRL outperform NoDA in 5 of 6 setups (average difference of 3.9% for RF2 and for 3.5% for BasicTRL). In contrast, the min value of NoTRL is outperformed by NoDA in 5 of 6 cases (with an averaged gap of 2.8%). Model Selection Stability Additional comparison between Table 2 and Table 1 further reveals that model selection by development data has a more negative impact on NoTRL, compared to RF2 and BasicTRL

cases: 4
Moreover, the averaged difference between the best test set model and the one selected by the development data for NoTRL is 1.3%, and in one setup (E-D) the difference is as high as 4.3%. For RF2, in contrast, there are four cases where the best performing test set model is selected by the development data (E-D, K-B, A-B and K-A), and the averaged gap between the selected model and the best test set model is only 0.1%. For BasicTRL the corresponding numbers are two setups and an averaged difference of 0.6%

引用论文
  • Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Machine learning, 79(1-2):151–175.
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio, Jerome Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48. ACM.
    Google ScholarLocate open access versionFindings
  • John Blitzer, Mark Dredze, Fernando Pereira, et al. 2007.
    Google ScholarLocate open access versionFindings
  • John Blitzer, Ryan McDonald, and Fernando Pereira. 2006. Domain adaptation with structural correspondence learning. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Danushka Bollegala, Takanori Maehara, and Ken-ichi Kawarabayashi. 201Unsupervised cross-domain word representation learning. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Danushka Bollegala, Yutaka Matsuo, and Mitsuru Ishizuka. 2011. Relation adaptation: learning to extract novel relations with minimum supervision. In Proc. of IJCAI.
    Google ScholarLocate open access versionFindings
  • Ciprian Chelba and Alex Acero. 2004. Adaptation of maximum entropy capitalizer: Little data can help a lot. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Jing Jiang and ChengXiang Zhai. 2007. Instance weighting for domain adaptation in nlp. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Minmin Chen, Yixin Chen, and Kilian Q Weinberger. 2011. Automatic feature decomposition for single view co-training. In Proc. of ICML.
    Google ScholarLocate open access versionFindings
  • Diederik Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proc. of ICLR.
    Google ScholarLocate open access versionFindings
  • Minmin Chen, Zhixiang Xu, Kilian Weinberger, and Fei Sha. 2012. Marginalized denoising autoencoders for domain adaptation. In Proc. of ICML.
    Google ScholarLocate open access versionFindings
  • Stephane Clinchant, Gabriela Csurka, and Boris Chidlovskii. 2016. A domain adaptation regularization for denoising autoencoders. In Proc. of ACL (short papers).
    Google ScholarLocate open access versionFindings
  • Hal Daume III. 2007. Frustratingly easy domain adaptation. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Hal Daume III and Daniel Marcu. 2006. Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, 26:101–126.
    Google ScholarLocate open access versionFindings
  • Jeffrey L Elman. 1993. Learning and development in neural networks: The importance of starting small. Cognition, 48(1):71–99.
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Max Welling. 2014. Autoencoding variational bayes. In Proc. of ICLR.
    Google ScholarLocate open access versionFindings
  • Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard Zemel. 2016. The variational fair autoencoder.
    Google ScholarFindings
  • Yishay Mansour, Mehryar Mohri, and Afshin Rostamizadeh. 2009. Domain adaptation with multiple sources. In Proc. of NIPS.
    Google ScholarLocate open access versionFindings
  • David McClosky, Eugene Charniak, and Mark Johnson. 2010. Automatic domain adaptation for parsing. In Proc. of NAACL.
    Google ScholarLocate open access versionFindings
  • Quang Nguyen. 2015. The airline review dataset. https://github.com/quankiquanki/skytrax-reviews-dataset. Scraped from www.airlinequality.com.
    Findings
  • Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, Francois Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. Journal of Machine Learning Research, 17(59):1–35.
    Google ScholarLocate open access versionFindings
  • Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In In proc. of ICML, pages 513–520.
    Google ScholarLocate open access versionFindings
  • Chen Gong, Dacheng Tao, Stephen J Maybank, Wei Liu, Guoliang Kang, and Jie Yang. 2016. Multimodal curriculum learning for semi-supervised image classification. IEEE Transactions on Image Processing, 25(7):3249–3260.
    Google ScholarLocate open access versionFindings
  • Stephan Gouws, GJ Van Rooyen, MIH Medialab, and Yoshua Bengio. 2012. Learning structural correspondences across different linguistic domains with synchronous neural language models. In Proc. of the xLite Workshop on Cross-Lingual Technologies, NIPS.
    Google ScholarLocate open access versionFindings
  • Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard L Lewis, and Satinder Singh. 2015. Actionconditional video prediction using deep networks in atari games. In Proc. of NIPS.
    Google ScholarLocate open access versionFindings
  • Sinno Jialin Pan, Xiaochuan Ni, Jian-Tao Sun, Qiang Yang, and Zheng Chen. 2010. Cross-domain sentiment classification via spectral feature alignment. In Proceedings of the 19th international conference on World wide web, pages 751–760. ACM.
    Google ScholarLocate open access versionFindings
  • Anastasia Pentina, Viktoriia Sharmanska, and Christoph H Lampert. 2015. Curriculum learning of multiple tasks. In Proc. of CVPR, pages 5492–5500.
    Google ScholarLocate open access versionFindings
  • Nils Reimers and Iryna Gurevych. 2017. Reporting score distributions makes a difference: Performance study of lstm-networks for sequence tagging. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic backpropagation and approximate inference in deep generative models. In Proc. of ICML.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
    Google ScholarLocate open access versionFindings
  • Brian Roark and Michiel Bacchiani. 2003. Supervised and unsupervised pcfg adaptation to novel domains. In Proc. of HLT-NAACL.
    Google ScholarLocate open access versionFindings
  • Jiayuan Huang, Arthur Gretton, Karsten M Borgwardt, Bernhard Scholkopf, and Alex J Smola. 2007. Correcting sample selection bias by unlabeled data. In Proc. of NIPS.
    Google ScholarLocate open access versionFindings
  • Alexander M Rush, Roi Reichart, Michael Collins, and Amir Globerson. 2012. Improved parsing and pos tagging using inter-sentence consistency constraints. In Proc. of EMNLP-CoNLL.
    Google ScholarLocate open access versionFindings
  • Frank Hutter, Holger Hoos, and Kevin Leyton-Brown. 2014. An efficient approach for assessing hyperparameter importance. In Proc. of ICML.
    Google ScholarLocate open access versionFindings
  • Mrinmaya Sachan and Eric Xing. 2016. Easy questions first? a case study on curriculum learning for question answering. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Tobias Schnabel and Hinrich Schutze. 2014. Flors: Fast and simple domain adaptation for part-ofspeech tagging. Transactions of the Association for Computational Linguistics, 2:15–26.
    Google ScholarLocate open access versionFindings
  • Will Y Zou, Richard Socher, Daniel Cer, and Christopher D Manning. 2013. Bilingual word embeddings for phrase-based machine translation. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Yangyang Shi, Martha Larson, and Catholijn M Jonker. 2015. Recurrent neural network language model adaptation with curriculum learning. Computer Speech & Language, 33(1):136–154.
    Google ScholarLocate open access versionFindings
  • Valentin I Spitkovsky, Hiyan Alshawi, and Daniel Jurafsky. 2010. From baby steps to leapfrog: How less is more in unsupervised dependency parsing. In Proc. of NAACL-HLT.
    Google ScholarLocate open access versionFindings
  • Ivan Titov. 2011. Domain adaptation by constraining inter-domain variability of latent feature representation. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: a simple and general method for semi-supervised learning. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and composing robust features with denoising autoencoders. In Proc. of ICML.
    Google ScholarLocate open access versionFindings
  • John Wieting, Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2016. Charagram: Embedding words and sentences via character n-grams. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Yi Yang and Jacob Eisenstein. 2014. Fast easy unsupervised domain adaptation with marginalized structured dropout. In Proc. of ACL (short papers).
    Google ScholarLocate open access versionFindings
  • Jianfei Yu and Jing Jiang. 2016. Learning sentence embeddings with auxiliary tasks for cross-domain sentiment classification. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Yang Zhang, Philip David, and Boqing Gong. 2017. Curriculum domain adaptation for semantic segmentation of urban scenes. In The IEEE International Conference on Computer Vision (ICCV), volume 2, page 6.
    Google ScholarLocate open access versionFindings
  • Yftah Ziser and Roi Reichart. 2017. Neural structural correspondence learning for domain adaptation. In Proc. of CoNLL.
    Google ScholarLocate open access versionFindings
  • Yftah Ziser and Roi Reichart. 2018a. Pivot based language modeling for improved neural domain adaptation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), volume 1, pages 1241–1251.
    Google ScholarLocate open access versionFindings
  • Yftah Ziser and Roi Reichart. 2018b. Deep pivot-based modeling for cross-language cross-domain transfer with minimal guidance. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 238–249.
    Google ScholarLocate open access versionFindings
  • Blitzer et al. (2007) product review data: http://www.cs.jhu.edu/
    Findings
  • The airline review data is (Nguyen, 2015).
    Google ScholarFindings
  • Code for the PBLM and PBLM-CNN models (Ziser and Reichart, 2018a): https://github.com/yftah89/ PBLM-Domain-Adaptation.
    Findings
  • Code for the AE-SCL and AE-SCL-SR models of ZR17 (Ziser and Reichart, 2017): https://github.com/yftah89/ Neural-SCLDomain-Adaptation.
    Findings
  • Code for the SCL-MI method of Blitzer et al. (2007): see footnote 6 (the URL does not fit into the line width).
    Google ScholarLocate open access versionFindings
  • Code for MSDA (Chen et al., 2012): http://www.cse.wustl.edu/̃mchen.
    Findings
  • Code for the domain adversarial network used as part of the MSDA-DAN baseline (Ganin et al., 2016): https://github.com/GRAAL-Research/domain_adversarial_neural_network.
    Findings
  • As noted in the experimental setup, for all previous work models (except from the PBLM models of (Ziser and Reichart, 2018a)), we follow the experimental setup of (Ziser and Reichart, 2017) including their hyperparameter estimation protocol. The hyperparameters of the PBLM models are provided here (they are identical to those of (Ziser and Reichart, 2018a)):
    Google ScholarFindings
  • Note that Ziser and Reichart (2018a) also considered the word embedding size of 32 and 64. In our preliminary experiments these hyperparameters provided very poor performance for the plain PBLM model, so we excluded them from our full set of experiments.
    Google ScholarLocate open access versionFindings
作者
Yftah Ziser
Yftah Ziser
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科