Deep Pivot-Based Modeling for Cross-language Cross-domain Transfer with Minimal Guidance.

EMNLP, pp.238-249, (2018)

引用20|浏览60
EI
下载 PDF 全文
引用
微博一下

摘要

While cross-domain and cross-language transfer have long been prominent topics in NLP research, their combination has hardly been explored. In this work we consider this problem, and propose a framework that builds on pivotbased learning, structure-aware Deep Neural Networks (particularly LSTMs and CNNs) and bilingual word embeddings, wit...更多

代码

数据

0
简介
  • The field of Natural Language Processing (NLP) has made impressive progress in the last two decades and text processing applications are performed in a quality that was beyond imagination only a few years ago.
  • To address this problem substantial efforts have been put into the development of cross-domain (CD, (Daume III, 2007; Ben-David et al, 2010)) and cross-language (CL) transfer methods
  • For both areas, while a variety of methods have been developed for many tasks throughout the years (§ 2), with the prominence of deep neural networks (DNNs) the focus of modern methods is shifting towards learning data representations that can serve as a bridge across domains and languages.
  • For CL, the picture is similar: multilingual representations are prominent in the transfer of NLP algorithms from one language to another (e.g. (Upadhyay et al, 2016))
重点内容
  • The field of Natural Language Processing (NLP) has made impressive progress in the last two decades and text processing applications are performed in a quality that was beyond imagination only a few years ago
  • While a variety of methods have been developed for many tasks throughout the years (§ 2), with the prominence of deep neural networks (DNNs) the focus of modern methods is shifting towards learning data representations that can serve as a bridge across domains and languages
  • PBLM+bilingual word embeddings (BEs)+lazy, the same model when trained in the lazy setup in which no target language unlabeled data is available for training, is the second best model in 9 of 12 product-product setups and is the best performing model in 4 of 6 airlineproduct setup and on average across these setups
  • We addressed the problem of cross-language cross-domain (CLCD) transfer in sentiment analysis and proposed methods based on pivot-based learning, structure-aware DNNs and BEs
  • We considered full and lazy training, and designed a lazy model that, for a given target domain, can be trained with unlabeled data from the source language only and be applied to any target language without re-training
  • Our models outperform previous models across 18 CLCD setups, even when ours are trained in the lazy setup and previous models are trained in the full setup
方法
  • Task and data 3 As in the most related previous work (Prettenhofer and Stein, 2010, 2011; Fernandez et al, 2016) the authors experiment with the Websis-CLS-10 dataset (Prettenhofer and Stein, 2010) consisting of Amazon product reviews written in 4 languages (English, German, French and Japanese), from 3 product domains (Books (B), DVDs (D) and Music (M)).
  • As in the aforementioned related works, the authors consider English as the source language, as it is likely to have labeled documents from the largest number of domains.
  • Following ZR18 the authors consider a more challenging setup where the English source domain consists of user airline (A) reviews (Nguyen, 2015).
  • The authors use the dataset of ZR18, consisting of 1000 positive and 1000 negative reviews in the labeled set, and 39396 reviews as the unlabeled set
结果
  • The authors' results (Table 1) support the integration of structure-aware DNNs, translated pivots and BEs as advocated in this paper.
  • PBLM+BE+lazy, the same model when trained in the lazy setup in which no target language unlabeled data is available for training, is the second best model in 9 of 12 product-product setups and is the best performing model in 4 of 6 airlineproduct setup and on average across these setups
  • To better understand this last surprising result of the airline-product setups, the authors consider the pivot selection process (§ 6): (a) sort the source features by their mutual information with the source domain sentiment label; and (b) iterate over the pivots and exclude the ones whose translation frequency is not high enough in the target domain.
  • In the lazy setup the corresponding numbers are: product to product domain pairs: 148; airline to product domain pairs: 173
结论
  • The authors addressed the problem of CLCD transfer in sentiment analysis and proposed methods based on pivot-based learning, structure-aware DNNs and BEs. The authors considered full and lazy training, and designed a lazy model that, for a given target domain, can be trained with unlabeled data from the source language only and be applied to any target language without re-training.
  • In future work the authors wish to improve the results for large domain gaps and for more dissimilar languages, in the important lazy setup.
  • As the airline-product results indicate, increasing the domain gap harms the results, and the authors expect the same with more diverse language pairs
表格
  • Table1: Sentiment accuracy. Top: CLCD transfer in the product domains. Middle: CLCD transfer from the English airline domain to the French and German product domains. Bottom: within language learning for the target languages. ”All” refers to the average over the setups. We shorten some abbreviations: P+BE stands for PBLM+BE, Lazy for PBLM+BE+Lazy, A-S-SR for AE-SCL-SR, A-SCL for AE-SCL, C-SCL for CL-SCL, CNN for BE+CNN, IL for Linear-IL and ILID for Linear-ILID
Download tables as Excel
基金
  • PBLM is better on average for all four CLCD setups, which emphasizes the importance of structure-awareness. Excluding both BEs and structure-awareness (AE) yields further degradation in most cases and on average. Yet, this degradation is minor (0.5% - 1.7% in the averages of the different setups), suggesting that the way AE-SCL-SR employs BEs, which is useful for CD transfer (ZR17), is less effective for CLCD
研究对象与分析
unlabeled documents: 50000
Task and data 3 As in our most related previous work (Prettenhofer and Stein, 2010, 2011; Fernandez et al, 2016) we experiment with the Websis-CLS-10 dataset (Prettenhofer and Stein, 2010) consisting of Amazon product reviews written in 4 languages (English, German, French and Japanese), from 3 product domains (Books (B), DVDs (D) and Music (M)). Due to our extensive experimental setup we leave Japanese for future.4

For each (language, domain) pair the dataset includes 2000 train and 2000 test documents, labeled as positive or negative, and between 9,358 to 50,000 unlabeled documents
. As in the aforementioned related works, we consider English as the source language, as it is likely to have labeled documents from the largest number of domains.

Following ZR18 we also consider a more challenging setup where the English source domain consists of user airline (A) reviews (Nguyen, 2015)

papers: 49
Multilingual word embeddings Multilingual word embeddings learning is an active field of research. For example, Ruder et al (2017) compare 49 papers that have addressed the problem since 2011. Such embeddings are of importance as they provide means of bridging the lexical gap between languages, which supports CL transfer

unlabeled documents: 50000
Due to our extensive experimental setup we leave Japanese for future.4. For each (language, domain) pair the dataset includes 2000 train and 2000 test documents, labeled as positive or negative, and between 9,358 to 50,000 unlabeled documents. As in the aforementioned related works, we consider English as the source language, as it is likely to have labeled documents from the largest number of domains

引用论文
  • Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer, and Noah Smith. 2016. Many languages, one parser. Transactions of the Association for Computational Linguistics 4.
    Google ScholarFindings
  • Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Machine learning 79(1-2):151–175.
    Google ScholarLocate open access versionFindings
  • John Blitzer, Mark Dredze, Fernando Pereira, et al. 2007.
    Google ScholarLocate open access versionFindings
  • John Blitzer, Ryan McDonald, and Fernando Pereira. 2006. Domain adaptation with structural correspondence learning. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the ACL (TACL) 5:135–146.
    Google ScholarLocate open access versionFindings
  • Danushka Bollegala, Takanori Maehara, and Ken-ichi Kawarabayashi. 2015. Unsupervised cross-domain word representation learning. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Danushka Bollegala, Yutaka Matsuo, and Mitsuru Ishizuka. 2011a. Relation adaptation: learning to extract novel relations with minimum supervision. In Proc. of IJCAI.
    Google ScholarLocate open access versionFindings
  • Danushka Bollegala, David Weir, and John Carroll. 2011b. Using multiple sources to construct a sentiment sensitive thesaurus for cross-domain sentiment classification. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Minmin Chen, Yixin Chen, and Kilian Q Weinberger. 2011. Automatic feature decomposition for single view co-training. In Proc. of ICML.
    Google ScholarLocate open access versionFindings
  • Minmin Chen, Zhixiang Xu, Kilian Weinberger, and Fei Sha. 2012. Marginalized denoising autoencoders for domain adaptation. In Proc. of ICML.
    Google ScholarLocate open access versionFindings
  • Hal Daume III. 2007. Frustratingly easy domain adaptation. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Alejandro Moreo Fernandez, Andrea Esuli, and Fabrizio Sebastiani. 2016. Distributional correspondence indexing for cross-lingual and cross-domain sentiment classification. Journal of artificial intelligence research 55(1):131–163.
    Google ScholarLocate open access versionFindings
  • Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In In proc. of ICML. pages 513–520.
    Google ScholarLocate open access versionFindings
  • Stephan Gouws, GJ Van Rooyen, MIH Medialab, and Yoshua Bengio. 2012. Learning structural correspondences across different linguistic domains with synchronous neural language models. In Proc. of the xLite Workshop on Cross-Lingual Technologies, NIPS.
    Google ScholarLocate open access versionFindings
  • Jiang Guo, Wanxiang Che, David Yarowsky, Haifeng Wang, and Ting Liu. 20Cross-lingual dependency parsing based on distributed representations. In Proceedings ACL-IJCNLP).
    Google ScholarLocate open access versionFindings
  • Jiayuan Huang, Arthur Gretton, Karsten M Borgwardt, Bernhard Scholkopf, and Alex J Smola. 2007. Correcting sample selection bias by unlabeled data. In Proc. of NIPS.
    Google ScholarLocate open access versionFindings
  • Jing Jiang and ChengXiang Zhai. 2007. Instance weighting for domain adaptation in nlp. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Diederik Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proc. of ICLR.
    Google ScholarLocate open access versionFindings
  • Yishay Mansour, Mehryar Mohri, and Afshin Rostamizadeh. 2009. Domain adaptation with multiple sources. In Proc. of NIPS.
    Google ScholarLocate open access versionFindings
  • David McClosky, Eugene Charniak, and Mark Johnson. 2010. Automatic domain adaptation for parsing. In Proc. of NAACL.
    Google ScholarLocate open access versionFindings
  • Quang Nguyen. 2015. The airline review dataset. https://github.com/quankiquanki/skytrax-reviews-dataset. Scraped from www.airlinequality.com.
    Findings
  • Sinno Jialin Pan, Xiaochuan Ni, Jian-Tao Sun, Qiang Yang, and Zheng Chen. 2010. Cross-domain sentiment classification via spectral feature alignment. In Proceedings of the 19th international conference on World wide web. ACM, pages 751–760.
    Google ScholarLocate open access versionFindings
  • Peter Prettenhofer and Benno Stein. 2010. Crosslanguage text classification using structural correspondence learning. In Proceedings of ACL.
    Google ScholarLocate open access versionFindings
  • Peter Prettenhofer and Benno Stein. 2011. Crosslingual adaptation using structural correspondence learning. ACM Transactions on Intelligent Systems and Technology (TIST) 3(1):13.
    Google ScholarLocate open access versionFindings
  • Roi Reichart and Ari Rappoport. 2007. Self-training for enhancement and domain adaptation of statistical parsers trained on small datasets. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Sebastian Ruder, Ivan Vuli, and Anders Sgaard. 2017. A survey of cross-lingual word embedding models. In arXiv preprint arXiv:1706.04902.
    Findings
  • Alexander M Rush, Roi Reichart, Michael Collins, and Amir Globerson. 2012. Improved parsing and pos tagging using inter-sentence consistency constraints. In Proc. of EMNLP-CoNLL.
    Google ScholarLocate open access versionFindings
  • Tobias Schnabel and Hinrich Schutze. 2013. Towards robust cross-domain domain adaptation for part-ofspeech tagging. In Proc. of IJCNLP.
    Google ScholarLocate open access versionFindings
  • Lei Shi, Rada Mihalcea, and Mingjun Tian. 2010. Cross language text classification by model translation and semi-supervised learning. In Proceedings of EMNLP.
    Google ScholarLocate open access versionFindings
  • Samuel L Smith, David HP Turban, Steven Hamblin, and Nils Y Hammerla. 2017. Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In proceedings of ICLR.
    Google ScholarLocate open access versionFindings
  • Oscar Tackstrom, Dipanjan Das, Slav Petrov, Ryan McDonald, and Joakim Nivre. 2013. Token and type constraints for cross-lingual part-of-speech tagging. Transactions of the Association for Computational Linguistics 1:1–12.
    Google ScholarLocate open access versionFindings
  • Shyam Upadhyay, Manaal Faruqui, Chris Dyer, and Dan Roth. 2016. Cross-lingual models of word embeddings: An empirical comparison. In Proceedings of ACL.
    Google ScholarLocate open access versionFindings
  • Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and composing robust features with denoising autoencoders. In Proc. of ICML.
    Google ScholarLocate open access versionFindings
  • Xiaojun Wan. 2009. Co-training for cross-lingual sentiment classification. In Proceedings of ACLIJCNLP.
    Google ScholarLocate open access versionFindings
  • Wei Yang, Wei Lu, and Vincent Zheng. 2017. A simple regularization-based algorithm for learning crossdomain word embeddings. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Jianfei Yu and Jing Jiang. 2016. Learning sentence embeddings with auxiliary tasks for cross-domain sentiment classification. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Xinjie Zhou, Xiaojun Wan, and Jianguo Xiao. 2016. Attention-based lstm network for cross-lingual sentiment classification. In Proceedings of EMNLP.
    Google ScholarLocate open access versionFindings
  • Yftah Ziser and Roi Reichart. 2017. Neural structural correspondence learning for domain adaptation. In Proc. of CoNLL.
    Google ScholarLocate open access versionFindings
  • Yftah Ziser and Roi Reichart. 2018. Pivot based language modeling for improved neural domain adaptation. In Proc. of NAACL-HLT.
    Google ScholarLocate open access versionFindings
  • The dimension of our bilingual embeddings is 300, as decided by (Smith et al., 2017). For all CNN models we use 256 filters of size 3 × |embedding| and perform max pooling for each of the 256 vectors to generate a single 1 × 256 vector that is fed into the classification layer. In the
    Google ScholarLocate open access versionFindings
  • All the algorithms in the paper that involve a CNN or a LSTM are trained with the ADAM algorithm (Kingma and Ba, 2015). For this algorithm we follow ZR18 and use the parameters described in the original ADAM article:
    Google ScholarLocate open access versionFindings
  • The Websis-CLS-10 dataset (Prettenhofer and Stein, 2010) http://www.uni-weimar.de/en/media/chairs/webis/research/corpora/corpus-webis-cls-10/
    Findings
  • Bilingual word embeddings (Smith et al., 2017): https://github.com/
    Findings
  • multilingual. The authors employed their method to monolingual fastText embeddings (Bojanowski et al., 2017) – the embeddings of 78 languages were aligned with the English embeddings.
    Google ScholarFindings
  • The bilingual embeddings are based on the fastText Facebook embeddings (Bojanowski et al., 2017): https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md
    Findings
  • We reimplemented the CL-SCL (Prettenhofer and Stein, 2011) and the DCI (Fernandez et al., 2016) models.
    Google ScholarFindings
作者
Yftah Ziser
Yftah Ziser
0
您的评分 :

暂无评分

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn