AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We propose a novel meta graph learning method for task-aware cross-lingual transfer adaptation of multilingual pretraining language model by leveraging historical CLT experiences

Learn to Cross lingual Transfer with Meta Graph Learning Across Heterogeneous Languages

EMNLP 2020, pp.2290-2301, (2020)

Cited by: 0|Views177
Full Text
Bibtex
Weibo

Abstract

Recent emergence of multilingual pre-training language model (mPLM) has enabled breakthroughs on various downstream cross-lingual transfer (CLT) tasks. However, mPLM-based methods usually involve two problems: (1) simply fine-tuning may not adapt general-purpose multilingual representations to be task-aware on low-resource languages; (2) ...More

Code:

Data:

0
Introduction
  • The diversity of human languages is a critical challenge for natural language processing.
  • To alleviate the cost in annotating data for each task in each language, cross-lingual transfer (CLT) (Yarowsky et al, 2001), aiming to leverage knowledge from source languages that are sufficiently labeled to improve the learning in a target language with little supervision, has become a promising direction.
  • EN FR DE JA language class learning (Conneau et al, 2018a), to powerful mPLM (Devlin et al, 2019; Lample and Conneau, 2019), from which the versatile multilingual representations derived suffice it to become a mainstream approach for various downstream CLT tasks.
  • On the other hand, existing adaptation approaches for mPLM behave as a black box without explicitly identifying intrinsic language relations
Highlights
  • The diversity of human languages is a critical challenge for natural language processing
  • We propose meta graph learning (MGL), a meta learning framework to learn how to cross-lingual transfer for multilingual pretraining language model (mPLM)
  • We propose a meta graph learning (MGL) method to further guide the versatile multilingual representations to be task-aware for downstream cross-lingual transfer (CLT) tasks
  • For the imbalanced industrial dataset with more noises, the MGL method consistently achieves the best results for all pairs, significantly exceeding the best baseline Reptile by 1.23% F1 score on average
  • L2CLT For MGL w/o L2CLT, we treat each language as one task like Reptile and change to sample the support set and query set from the same language for each task
  • We propose a novel MGL method for task-aware CLT adaptation of mPLM by leveraging historical CLT experiences
Methods
  • The authors' framework is a language-agnostic task-aware model for CLT. On the one hand, the authors use the mPLM as the base encoder to calculate language-agnostic representations.
  • Meta approaches There are some efforts on gradient-based meta learning with BERT for lowresource NLU tasks (Dou et al, 2019), including second-order optimization-based MAML (Finn et al, 2017) with its first-order variants FOMAML and Reptile (Nichol et al, 2018)
  • They view each dataset as one task, which may not be able to handle the language heterogeneity.
  • The authors compare with Reptile that is much faster when deployed to the heavy mPLM and has proved to achieve the best results as observed in (Dou et al, 2019)
Results
  • The MGL achieves the best results on most transfer pairs, significantly outperforming the best baseline Reptile by 1.84% accuracy on average.
  • Though MANMoE attempts to fully identify both invariant and specific language features, it can only achieve competitive results with translation-based methods.
  • This proves that language correspondences play a critical role in minimizing language gaps.
  • For the hard {EN, FR, DE}→JA, MGL will learn the transfer skill from the comparatively distant source CLT pair: Germanic languages1 {EN, DE} to Romance languages1 FR for Meta-train, and leverage the skill to rapidly adapt the meta graphs to transfer from {EN, FR, DE} to JA for Meta-test
Conclusion
  • The authors propose a novel MGL method for task-aware CLT adaptation of mPLM by leveraging historical CLT experiences.
  • Extensive evaluations on both the public benchmark and large-scale industrial dataset quantitively and qualitatively demonstrate the effectiveness of the MGL.
  • The proposed MGL method can potentially applied to more cross-lingual natural language understanding (XLU) tasks (Conneau et al, 2018b; Wang et al, 2019; Lewis et al, 2019; Karthikeyan et al, 2020), and be generalized to learn to learn for domain adaptation (Blitzer et al, 2007), representation learning (Shen et al, 2018), multi-task learning (Shen et al, 2019) problems, etc
Summary
  • Introduction:

    The diversity of human languages is a critical challenge for natural language processing.
  • To alleviate the cost in annotating data for each task in each language, cross-lingual transfer (CLT) (Yarowsky et al, 2001), aiming to leverage knowledge from source languages that are sufficiently labeled to improve the learning in a target language with little supervision, has become a promising direction.
  • EN FR DE JA language class learning (Conneau et al, 2018a), to powerful mPLM (Devlin et al, 2019; Lample and Conneau, 2019), from which the versatile multilingual representations derived suffice it to become a mainstream approach for various downstream CLT tasks.
  • On the other hand, existing adaptation approaches for mPLM behave as a black box without explicitly identifying intrinsic language relations
  • Methods:

    The authors' framework is a language-agnostic task-aware model for CLT. On the one hand, the authors use the mPLM as the base encoder to calculate language-agnostic representations.
  • Meta approaches There are some efforts on gradient-based meta learning with BERT for lowresource NLU tasks (Dou et al, 2019), including second-order optimization-based MAML (Finn et al, 2017) with its first-order variants FOMAML and Reptile (Nichol et al, 2018)
  • They view each dataset as one task, which may not be able to handle the language heterogeneity.
  • The authors compare with Reptile that is much faster when deployed to the heavy mPLM and has proved to achieve the best results as observed in (Dou et al, 2019)
  • Results:

    The MGL achieves the best results on most transfer pairs, significantly outperforming the best baseline Reptile by 1.84% accuracy on average.
  • Though MANMoE attempts to fully identify both invariant and specific language features, it can only achieve competitive results with translation-based methods.
  • This proves that language correspondences play a critical role in minimizing language gaps.
  • For the hard {EN, FR, DE}→JA, MGL will learn the transfer skill from the comparatively distant source CLT pair: Germanic languages1 {EN, DE} to Romance languages1 FR for Meta-train, and leverage the skill to rapidly adapt the meta graphs to transfer from {EN, FR, DE} to JA for Meta-test
  • Conclusion:

    The authors propose a novel MGL method for task-aware CLT adaptation of mPLM by leveraging historical CLT experiences.
  • Extensive evaluations on both the public benchmark and large-scale industrial dataset quantitively and qualitatively demonstrate the effectiveness of the MGL.
  • The proposed MGL method can potentially applied to more cross-lingual natural language understanding (XLU) tasks (Conneau et al, 2018b; Wang et al, 2019; Lewis et al, 2019; Karthikeyan et al, 2020), and be generalized to learn to learn for domain adaptation (Blitzer et al, 2007), representation learning (Shen et al, 2018), multi-task learning (Shen et al, 2019) problems, etc
Tables
  • Table1: Experimental results (%) on the multilingual Amazon review dataset. ∆ refers to the improvements. † means that the MGL significantly outperforms the best baseline Reptile with paired sample t-test p-value < 0.01
  • Table2: Experimental results (%) on the multilingual search relevance dataset. ∆ refers to the improvements. † means that the MGL significantly outperforms the best baseline Reptile with paired sample t-test p-value < 0.01
  • Table3: Ablation results (%): averaged accuracy for each target language on the Amazon review dataset
  • Table4: Settings of hyper-parameters
Download tables as Excel
Related work
Funding
  • • The MGL achieves the best results on most transfer pairs, significantly outperforming the best baseline Reptile by 1.84% accuracy on average
  • Our model exceeds the translation-based methods MT-BOW, CL-SCL, and CL-RL by 5.36%, 4.81% and 5.25% accuracy on average, respectively
  • • Our method achieves 2.40%, 2.96%, and 2.10% average accuracy gains over mBERT with common adaption approaches, i.e., Mix, Multi-task, and Fine-tune, respectively
  • MGL exceeds MGL w/o Meta by 2.84% accuracy on average
  • MGL can outperform MGL w/o L2CLT by 1.46% accuracy on average, especially obtaining more gains for distant language JA
Reference
  • Aman Ahuja, Nikhil Rao, Sumeet Katariya, Karthik Subbian, and Chandan K Reddy. 2020. Languageagnostic representation learning for product search on e-commerce platforms. In WSDM, pages 7–15.
    Google ScholarLocate open access versionFindings
  • Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando De Freitas. 2016. Learning to learn by gradient descent by gradient descent. In NIPS, pages 3981–3989.
    Google ScholarLocate open access versionFindings
  • Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2018. A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. In ACL, pages 789–798.
    Google ScholarLocate open access versionFindings
  • Yujia Bao, Menghua Wu, Shiyu Chang, and Regina Barzilay. 2020. Few-shot text classification with distributional signatures. In ICLR.
    Google ScholarFindings
  • John Blitzer, Mark Dredze, and Fernando Pereira. 2007.
    Google ScholarFindings
  • Xilun Chen and Claire Cardie. 2018. Unsupervised multilingual word embeddings. arXiv preprint arXiv:1808.08933.
    Findings
  • Xilun Chen, Ahmed Hassan, Hany Hassan, Wei Wang, and Claire Cardie. 2019. Multi-source cross-lingual model transfer: Learning what to share. In ACL, pages 3098–3112.
    Google ScholarLocate open access versionFindings
  • Xilun Chen, Yu Sun, Ben Athiwaratkun, Claire Cardie, and Kilian Weinberger. 201Adversarial deep averaging networks for cross-lingual sentiment classification. TACL, 6:557–570.
    Google ScholarLocate open access versionFindings
  • Muthu Chidambaram, Yinfei Yang, Daniel Cer, Steve Yuan, Yunhsuan Sung, Brian Strope, and Ray Kurzweil. 201Learning cross-lingual sentence representations via a multi-task dual-encoder model. In RepL4NLP workshop, ACL, pages 250–259.
    Google ScholarLocate open access versionFindings
  • Fan RK Chung and Fan Chung Graham. 1997. Spectral graph theory.
    Google ScholarFindings
  • 92. American Mathematical Soc. Alexis Conneau, Guillaume Lample, Marc’Aurelio Ranzato, Ludovic Denoyer, and Herve Jegou. 2018a. Word translation without parallel data. In ICLR.
    Google ScholarFindings
  • Alexis Conneau, Ruty Rinott, Guillaume Lample, Adina Williams, Samuel R. Bowman, Holger Schwenk, and Veselin Stoyanov. 2018b. Xnli: Evaluating cross-lingual sentence representations. In EMNLP.
    Google ScholarFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, pages 4171–4186.
    Google ScholarLocate open access versionFindings
  • Zi-Yi Dou, Keyi Yu, and Antonios Anastasopoulos. 2019. Investigating meta-learning algorithms for low-resource natural language understanding tasks. arXiv preprint arXiv:1908.10423.
    Findings
  • Julian Eisenschlos, Sebastian Ruder, Piotr Czapla, Marcin Kadras, Sylvain Gugger, and Jeremy Howard. 2019. MultiFiT: Efficient multi-lingual language model fine-tuning. In EMNLP-IJCNLP, pages 5702–5707.
    Google ScholarLocate open access versionFindings
  • Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In ICML, pages 1126–1135.
    Google ScholarLocate open access versionFindings
  • Tianyu Gao, Xu Han, Zhiyuan Liu, and Maosong Sun. 2019. Hybrid attention-based prototypical networks for noisy few-shot relation classification. In AAAI, volume 33, pages 6407–6414.
    Google ScholarLocate open access versionFindings
  • Victor Garcia and Joan Bruna. 20Few-shot learning with graph neural networks. In ICLR.
    Google ScholarFindings
  • Ruiying Geng, Binhua Li, Yongbin Li, Xiaodan Zhu, Ping Jian, and Jian Sun. 20Induction networks for few-shot text classification. In EMNLP-IJCNLP, pages 3895–3904.
    Google ScholarLocate open access versionFindings
  • Jiatao Gu, Yong Wang, Yun Chen, Kyunghyun Cho, and Victor OK Li. 2018. Meta-learning for lowresource neural machine translation. arXiv preprint arXiv:1808.08437.
    Findings
  • Haoyang Huang, Yaobo Liang, Nan Duan, Ming Gong, Linjun Shou, Daxin Jiang, and Ming Zhou. 2019. Unicoder: A universal language encoder by pre-training with multiple cross-lingual tasks. In EMNLP-IJCNLP, pages 2485–2494.
    Google ScholarLocate open access versionFindings
  • K Karthikeyan, Zihan Wang, Stephen Mayhew, and Dan Roth. 2020. Cross-lingual ability of multilingual bert: An empirical study. In ICLR.
    Google ScholarFindings
  • Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
    Findings
  • Thomas N Kipf and Max Welling. 2016. Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
    Findings
  • Guillaume Lample and Alexis Conneau. 2019. Crosslingual language model pretraining. arXiv preprint arXiv:1901.07291.
    Findings
  • Patrick Lewis, Barlas Oguz, Ruty Rinott, Sebastian Riedel, and Holger Schwenk. 2019. Mlqa: Evaluating cross-lingual extractive question answering. arXiv preprint arXiv:1910.07475.
    Findings
  • Qi Liu, Yue Zhang, and Jiangming Liu. 2018. Learning domain representation for multi-domain sentiment classification. In NAACL-HLT, pages 541–550.
    Google ScholarLocate open access versionFindings
  • Xiaodong Liu, Pengcheng He, Weizhu Chen, and Jianfeng Gao. 2019a. Multi-task deep neural networks for natural language understanding. In ACL, pages 4487–4496.
    Google ScholarLocate open access versionFindings
  • Yanbin Liu, Juho Lee, Minseop Park, Saehoon Kim, Eunho Yang, Sungju Hwang, and Yi Yang. 2019b. Learning to propagate labels: Transductive propagation network for few-shot learning. In ICLR.
    Google ScholarFindings
  • Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. Oriol Vinyals, Charles Blundell, Timothy Lillicrap, 2013. Rectifier nonlinearities improve neural net- Daan Wierstra, et al. 2016. Matching networks for work acoustic models. In ICML, page 3.
    Google ScholarLocate open access versionFindings
  • Ryan McDonald, Slav Petrov, and Keith Hall. 2011. Multi-source transfer of delexicalized dependency parsers. In EMNLP, pages 62–72.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Quoc V Le, and Ilya Sutskever. 2013. Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168.
    Findings
  • Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, and Pieter Abbeel. 2017. A simple neural attentive metalearner. arXiv preprint arXiv:1707.03141.
    Findings
  • Alex Nichol, Joshua Achiam, and John Schulman. 2018. On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999.
    Findings
  • Boris Oreshkin, Pau Rodrıguez Lopez, and Alexandre Lacoste. 2018. Tadam: Task dependent adaptive metric for improved few-shot learning. In NIPS, pages 721–731.
    Google ScholarLocate open access versionFindings
  • Peter Prettenhofer and Benno Stein. 2010. Crosslanguage text classification using structural correspondence learning. In ACL, pages 1118–1127.
    Google ScholarLocate open access versionFindings
  • Sachin Ravi and Hugo Larochelle. 2017. Optimization as a model for few-shot learning. In ICLR.
    Google ScholarFindings
  • Tao Shen, Xiubo Geng, Tao Qin, Daya Guo, Duyu Tang, Nan Duan, Guodong Long, and Daxin Jiang. 2019. Multi-task learning for conversational question answering over a large-scale knowledge base. In EMNLP-IJCNLP, pages 2442–2451.
    Google ScholarLocate open access versionFindings
  • Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Shirui Pan, and Chengqi Zhang. 2018. Disan: Directional self-attention network for rnn/cnn-free language understanding. In AAAI, pages 5446–5455.
    Google ScholarLocate open access versionFindings
  • Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning. In NIPS, pages 4077–4087.
    Google ScholarLocate open access versionFindings
  • Shengli Sun, Qingfeng Sun, Kevin Zhou, and Tengchao Lv. 2019. Hierarchical attention prototypical networks for few-shot text classification. In EMNLPIJCNLP, pages 476–485.
    Google ScholarLocate open access versionFindings
  • Xiaojun Wan. 2009. Co-training for cross-lingual sentiment classification. In ACL-IJCNLP, pages 235– 243.
    Google ScholarLocate open access versionFindings
  • Yuxuan Wang, Wanxiang Che, Jiang Guo, Yijia Liu, and Ting Liu. 2019. Cross-lingual bert transformation for zero-shot dependency parsing. arXiv preprint arXiv:1909.06775.
    Findings
  • Qianhui Wu, Zijia Lin, Guoxin Wang, Hui Chen, Borje F Karlsson, Biqing Huang, and Chin-Yew Lin. 2019. Enhanced meta-learning for cross-lingual named entity recognition with minimal resources. arXiv preprint arXiv:1911.06161.
    Findings
  • Min Xiao and Yuhong Guo. 2013. Semi-supervised representation learning for cross-lingual text classification. In EMNLP, pages 1465–1475.
    Google ScholarLocate open access versionFindings
  • Kui Xu and Xiaojun Wan. 2017. Towards a universal sentiment classifier in multiple languages. In EMNLP, pages 511–520.
    Google ScholarLocate open access versionFindings
  • Ruochen Xu and Yiming Yang. 2017. Cross-lingual distillation for text classification. In ACL, pages 1415–1425.
    Google ScholarLocate open access versionFindings
  • Yinfei Yang, Daniel Cer, Amin Ahmad, Mandy Guo, Jax Law, Noah Constant, Gustavo Hernandez Abrego, Steve Yuan, Chris Tar, Yun-Hsuan Sung, et al. 2019. Multilingual universal sentence encoder for semantic retrieval. arXiv preprint arXiv:1907.04307.
    Findings
  • Huaxiu Yao, Ying Wei, Junzhou Huang, and Zhenhui Li. 2019. Hierarchically structured meta-learning. In ICML.
    Google ScholarFindings
  • Huaxiu Yao, Xian Wu, Zhiqiang Tao, Yaliang Li, Bolin Ding, Ruirui Li, and Zhenhui Li. 2020. Automated relational meta-learning. In ICLR.
    Google ScholarFindings
  • David Yarowsky, Grace Ngai, and Richard Wicentowski. 2001. Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the First International Conference on Human Language Technology Research.
    Google ScholarLocate open access versionFindings
  • Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. 2018. Learning to compare: Relation network for few-shot learning. In CVPR, pages 1199–1208.
    Google ScholarLocate open access versionFindings
  • Shyam Upadhyay, Manaal Faruqui, Gokhan Tur, Hakkani-Tur Dilek, and Larry Heck. 2018. (almost) zero-shot cross-lingual spoken language understanding. In ICASSP, pages 6034–6038.
    Google ScholarLocate open access versionFindings
  • Wei Ying, Yu Zhang, Junzhou Huang, and Qiang Yang. 2018. Transfer learning via learning to transfer. In ICML, pages 5085–5094.
    Google ScholarLocate open access versionFindings
  • Xi Sheryl Zhang, Fengyi Tang, Hiroko H Dodge, Jiayu Zhou, and Fei Wang. 2019. Metapred: Metalearning for clinical risk prediction with limited patient electronic health records. In SIGKDD, pages 2487–2495.
    Google ScholarLocate open access versionFindings
  • Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903.
    Findings
  • Dengyong Zhou, Olivier Bousquet, Thomas N Lal, Jason Weston, and Bernhard Scholkopf. 2004. Learning with local and global consistency. In NIPS, pages 321–328.
    Google ScholarLocate open access versionFindings
  • Xinjie Zhou, Xiaojun Wan, and Jianguo Xiao. 2016a. Attention-based lstm network for cross-lingual sentiment classification. In EMNLP, pages 247–256.
    Google ScholarLocate open access versionFindings
  • Xinjie Zhou, Xiaojun Wan, and Jianguo Xiao. 2016b. Cross-lingual sentiment classification with bilingual document representation learning. In ACL, pages 1403–1412.
    Google ScholarLocate open access versionFindings
  • Will Y Zou, Richard Socher, Daniel Cer, and Christopher D Manning. 2013. Bilingual word embeddings for phrase-based machine translation. In EMNLP, pages 1393–1398.
    Google ScholarLocate open access versionFindings
Author
Zheng Li
Zheng Li
Mukul Kumar
Mukul Kumar
William Headden
William Headden
Bing Yin
Bing Yin
Your rating :
0

 

Tags
Comments
小科