AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We have presented a novel semi-supervised framework LNMAP to learn the cross-lingual mapping between two monolingual word embeddings

LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction Through Non Linear Mapping in Latent Space

EMNLP 2020, pp.2712-2723, (2020)

Cited by: 0|Views175
Full Text
Bibtex
Weibo

Abstract

Most of the successful and predominant methods for Bilingual Lexicon Induction (BLI) are mapping-based, where a linear mapping function is learned with the assumption that the word embedding spaces of different languages exhibit similar geometric structures (i.e. approximately isomorphic). However, several recent studies have criticized t...More

Code:

Data:

0
Introduction
  • Plethora of methods have been proposed to learn cross-lingual word embeddings from monolingual word embeddings.
  • Mikolov et al (2013a), in their pioneering work, learn a linear mapping function to transform the source embedding space to the target language by minimizing the squared Euclidean distance between the translation pairs of a seed dictionary
  • They assume that the similarity of geometric arrangements in the embedding spaces is the key reason for their method to succeed as they found linear mapping to be superior to non-linear mappings with multi-layer neural networks.
  • Søgaard et al (2018) empirically show that even closely related languages are far from being isomorphic. Nakashole and Flauger (2018) argue that mapping between embedding spaces of different languages can be approximately linear only at small local regions, but must be non-linear globally. Patra et al (2019) recently show that etymologically distant language pairs cannot be aligned properly using orthogonal transformations
Highlights
  • In recent years, plethora of methods have been proposed to learn cross-lingual word embeddings from monolingual word embeddings
  • Their framework consists of several steps: whitening, orthogonal mapping, re-weighting, de-whitening, and dimensionality reduction. (c) Conneau et al (2018) compare their unsupervised model with a supervised baseline that learns an orthogonal mapping between the embedding spaces by iterative Procrustes refinement. They propose Cross-domain Similarity Local Scaling (CSLS) for nearest neighbour search. (d) Joulin et al (2018) show that minimizing a convex relaxation of the CSLS loss significantly improves the quality of bilingual word vector alignment. Their method achieves state-of-the-art results for many languages (Patra et al, 2019). (e) Jawanpuria et al (2019) propose a geometric approach where they decouple CLWE learning into two steps: (i) learning rotations for languagespecific embeddings to align them to a common space, and (ii) learning a similarity metric in the common space to model similarities between the embeddings of the two languages. (f) Patra et al (2019) propose a semi-supervised technique that relaxes the isomorphic assumption while leveraging both seed dictionary pairs and a larger set of unaligned word embeddings
  • We have presented a novel semi-supervised framework LNMAP to learn the cross-lingual mapping between two monolingual word embeddings
  • Apart from exploiting weak supervision from a small (1K) seed dictionary, our LNMAP leverages the information from monolingual word embeddings
  • In contrast to the existing methods that directly map word embeddings using the isomorphic assumption, our framework is independent of any such strong prior assumptions
  • Extensive experiments with fifteen different language pairs comprising high- and low-resource languages show the efficacy of non-linear transformations especially for low-resource and distant languages
Methods
  • The authors compare the proposed LNMAP with several existing methods comprising supervised, semisupervised, and unsupervised models.
  • (b) Artetxe et al (2018a) propose a multi-step framework that generalizes previous studies
  • Their framework consists of several steps: whitening, orthogonal mapping, re-weighting, de-whitening, and dimensionality reduction.
  • (c) Conneau et al (2018) compare their unsupervised model with a supervised baseline that learns an orthogonal mapping between the embedding spaces by iterative Procrustes refinement.
  • They propose CSLS for nearest neighbour search.
  • Their method achieves state-of-the-art results for many languages (Patra et al, 2019). (e) Jawanpuria et al (2019) propose a geometric approach where they decouple CLWE learning into two steps: (i) learning rotations for languagespecific embeddings to align them to a common space, and (ii) learning a similarity metric in the common space to model similarities between the embeddings of the two languages. (f) Patra et al (2019) propose a semi-supervised technique that relaxes the isomorphic assumption while leveraging both seed dictionary pairs and a larger set of unaligned word embeddings
Results
  • Results and Analysis

    The authors present the results on low-resource and resource-rich languages from MUSE dataset in Tables 1 and 2, respectively, and the results on VecMap dataset in Table 3.
  • The performance of supervised models on low-resource languages was not satisfactory, especially with small seed dictionary.
  • Table 2 shows the results for 5 resource-rich language pairs (10 translation tasks) from the MUSE dataset.
  • The authors show the results on the VecMap dataset in Table 3, where there are 3 resource-rich language pairs, and one low-resource pair (En-Fi) with a total of 8 translation tasks.
  • The comparative results between the model variants in Tables 1 - 3 reveal that LNMAP works better for low-resource languages, whereas LNMAP (LIN.
  • This can be explained by the geometric similarity between the embedding spaces of the two languages
Conclusion
  • The authors have presented a novel semi-supervised framework LNMAP to learn the cross-lingual mapping between two monolingual word embeddings.
  • Apart from exploiting weak supervision from a small (1K) seed dictionary, the LNMAP leverages the information from monolingual word embeddings.
  • LNMAP first learns to transform the embeddings into a latent space and uses a non-linear transformation to learn the mapping.
  • Comparison with existing supervised, semi-supervised, and unsupervised baselines show that LNMAP learns a better mapping.
  • With an indepth ablation study, the authors show that different components of LNMAP works in a collaborative nature
Summary
  • Introduction:

    Plethora of methods have been proposed to learn cross-lingual word embeddings from monolingual word embeddings.
  • Mikolov et al (2013a), in their pioneering work, learn a linear mapping function to transform the source embedding space to the target language by minimizing the squared Euclidean distance between the translation pairs of a seed dictionary
  • They assume that the similarity of geometric arrangements in the embedding spaces is the key reason for their method to succeed as they found linear mapping to be superior to non-linear mappings with multi-layer neural networks.
  • Søgaard et al (2018) empirically show that even closely related languages are far from being isomorphic. Nakashole and Flauger (2018) argue that mapping between embedding spaces of different languages can be approximately linear only at small local regions, but must be non-linear globally. Patra et al (2019) recently show that etymologically distant language pairs cannot be aligned properly using orthogonal transformations
  • Objectives:

    The authors' objective is to learn a transformation function M such that for any vxi ∈ V x, M(xi) corresponds to its translation yj, where vyj ∈ V y.
  • The authors' goal is to assess the following questions.
  • The authors' goal is to assess the contribution of back-translation, reconstruction, nonlinearity in the mapper, and non-linearity in the autoencoder
  • Methods:

    The authors compare the proposed LNMAP with several existing methods comprising supervised, semisupervised, and unsupervised models.
  • (b) Artetxe et al (2018a) propose a multi-step framework that generalizes previous studies
  • Their framework consists of several steps: whitening, orthogonal mapping, re-weighting, de-whitening, and dimensionality reduction.
  • (c) Conneau et al (2018) compare their unsupervised model with a supervised baseline that learns an orthogonal mapping between the embedding spaces by iterative Procrustes refinement.
  • They propose CSLS for nearest neighbour search.
  • Their method achieves state-of-the-art results for many languages (Patra et al, 2019). (e) Jawanpuria et al (2019) propose a geometric approach where they decouple CLWE learning into two steps: (i) learning rotations for languagespecific embeddings to align them to a common space, and (ii) learning a similarity metric in the common space to model similarities between the embeddings of the two languages. (f) Patra et al (2019) propose a semi-supervised technique that relaxes the isomorphic assumption while leveraging both seed dictionary pairs and a larger set of unaligned word embeddings
  • Results:

    Results and Analysis

    The authors present the results on low-resource and resource-rich languages from MUSE dataset in Tables 1 and 2, respectively, and the results on VecMap dataset in Table 3.
  • The performance of supervised models on low-resource languages was not satisfactory, especially with small seed dictionary.
  • Table 2 shows the results for 5 resource-rich language pairs (10 translation tasks) from the MUSE dataset.
  • The authors show the results on the VecMap dataset in Table 3, where there are 3 resource-rich language pairs, and one low-resource pair (En-Fi) with a total of 8 translation tasks.
  • The comparative results between the model variants in Tables 1 - 3 reveal that LNMAP works better for low-resource languages, whereas LNMAP (LIN.
  • This can be explained by the geometric similarity between the embedding spaces of the two languages
  • Conclusion:

    The authors have presented a novel semi-supervised framework LNMAP to learn the cross-lingual mapping between two monolingual word embeddings.
  • Apart from exploiting weak supervision from a small (1K) seed dictionary, the LNMAP leverages the information from monolingual word embeddings.
  • LNMAP first learns to transform the embeddings into a latent space and uses a non-linear transformation to learn the mapping.
  • Comparison with existing supervised, semi-supervised, and unsupervised baselines show that LNMAP learns a better mapping.
  • With an indepth ablation study, the authors show that different components of LNMAP works in a collaborative nature
Tables
  • Table1: Translation accuracy (P@1) on low-resource languages on MUSE dataset using fastText embeddings
  • Table2: Word translation accuracy (P@1) on resourcerich languages on MUSE dataset using fastText
  • Table3: Word translation accuracy (P@1) on VecMap dataset using CBOW embeddings
  • Table4: Ablation study of LNMAP with “1K Unique” dictionary. indicates the component is removed from the full model, and ‘⊕’ indicates the component is added by replacing the corresponding component
Download tables as Excel
Funding
  • To the best of our knowledge, we are the first to showcase such robust and improved performance with non-linear methods.1
  • We notice that our model achieves the highest accuracy in all the tasks for “1K Unique”, 4 tasks for “5K Unique”, 3 for “5K All”
Study subjects and analysis
language pairs: 210
A number of recent studies have questioned the robustness of existing unsupervised CLWE methods (Ruder et al, 2019). Vulicet al. (2019) show that even the most robust unsupervised method (Artetxe et al, 2018b) fails for a large number of language pairs; it gives zero (or near zero) BLI performance for 87 out of 210 language pairs. With a seed dictionary of only 500 - 1000 word pairs, their supervised method outperforms unsupervised methods by a wide margin in most language pairs

word pairs: 1000
Vulicet al. (2019) show that even the most robust unsupervised method (Artetxe et al, 2018b) fails for a large number of language pairs; it gives zero (or near zero) BLI performance for 87 out of 210 language pairs. With a seed dictionary of only 500 - 1000 word pairs, their supervised method outperforms unsupervised methods by a wide margin in most language pairs. Other recent work also suggested to use semi-supervised methods (Patra et al, 2019; Ormazabal et al, 2019)

language pairs: 110
To demonstrate the effectiveness of our method, we evaluate our models against baselines on two popularly used datasets: MUSE (Conneau et al, 2018) and VecMap (Dinu et al, 2015). The MUSE dataset consists of FastText monolingual embeddings of 300 dimensions (Bojanowski et al, 2017) trained on Wikipedia monolingual corpus and gold dictionaries for 110 language pairs.2. To show the generality of different methods, we consider 15 different language pairs with 15 × 2 = 30 different translation tasks encompassing resource-rich and low-resource languages from different language families

source-target pairs: 5000
We present the results in precision@1, which means how many times one of the correct translations of a source word is predicted as the top choice. For each of the cases, we show results on seed dictionary of three different sizes including 1-to-1 and 1-to-many mappings; “1K Unique” and “5K Unique” contain 1-to-1 mappings of 1000 and 5000 source-target pairs respectively, while “5K All” contains 1-tomany mappings of all 5000 source and target words, that is, for each source word there can be multiple target words. Through experiments and analysis, our goal is to assess the following questions

resource-rich language pairs: 5
5.2 Results on Resource-rich Languages. Table 2 shows the results for 5 resource-rich language pairs (10 translation tasks) from the MUSE dataset. We notice that our model achieves the highest accuracy in all the tasks for “1K Unique”, 4 tasks for “5K Unique”, 3 for “5K All”

resource-rich language pairs: 3
We notice that our model achieves the highest accuracy in all the tasks for “1K Unique”, 4 tasks for “5K Unique”, 3 for “5K All”. We show the results on the VecMap dataset in Table 3, where there are 3 resource-rich language pairs, and one low-resource pair (En-Fi) with a total of 8 translation tasks. Overall, we have similar observations as in MUSE – our model outperforms other models on 7 tasks for “1K Unique”, 4 tasks for “5K Unique”, and 4 for for “5K All”

language pairs: 4
Specifically, our goal is to assess the contribution of back-translation, reconstruction, nonlinearity in the mapper, and non-linearity in the autoencoder. We present the ablation results in Table 4 on 8 translation tasks from 4 language pairs consisting of 2 resource-rich and 2 low-resource languages. We use MUSE dataset for this purpose

Reference
  • David Alvarez-Melis and Tommi Jaakkola. 2018. Gromov-wasserstein alignment of word embedding spaces. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1881–1890. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2016. Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2289–2294, Austin, Texas. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2017. Learning bilingual word embeddings with (almost) no bilingual data. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 451–462, Vancouver, Canada. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2018a. Generalizing and improving bilingual word embedding mappings with a multi-step framework of linear transformations. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, pages 5012–5019.
    Google ScholarLocate open access versionFindings
  • Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2018b. A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. In ACL.
    Google ScholarFindings
  • Mikel Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho. 2018c. Unsupervised neural machine translation. In Proceedings of the Sixth International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Antonio Valerio Miceli Barone. 2016. Towards crosslingual distributed representations without parallel text trained with adversarial autoencoders. In Proceedings of the 1st Workshop on Representation
    Google ScholarLocate open access versionFindings
  • Learning for NLP, pages 121–126. Association for Computational Linguistics.
    Google ScholarFindings
  • Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146.
    Google ScholarLocate open access versionFindings
  • Alexis Conneau, Guillaume Lample, Marc’Aurelio Ranzato, Ludovic Denoyer, and Herve Jegou. 2018. Word translation without parallel data. In International Conference on Learning Representations (ICLR).
    Google ScholarLocate open access versionFindings
  • Georgiana Dinu, Angeliki Lazaridou, and Marco Baroni. 2015. Improving zero-shot learning by mitigating the hubness problem. In ICLR, Workshop track.
    Google ScholarLocate open access versionFindings
  • Yerai Doval, Jose Camacho-Collados, Luis Espinosa Anke, and Steven Schockaert. 2019. On the robustness of unsupervised and semi-supervised cross-lingual word embedding learning. ArXiv, abs/1908.07742.
    Findings
  • Geert Heyman, Ivan Vulic, and Marie-Francine Moens. 2017. Bilingual lexicon induction by learning to combine word-level and character-level representations. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 1085–1095, Valencia, Spain. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yedid Hoshen and Lior Wolf. 2018. Non-adversarial unsupervised word translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 469–478. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Pratik Jawanpuria, Arjun Balgovind, Anoop Kunchukuttan, and Bamdev Mishra. 2019. Learning multilingual word embeddings in latent metric space: a geometric approach. Transaction of the Association for Computational Linguistics (TACL), 7:107–120.
    Google ScholarLocate open access versionFindings
  • Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Herve Jegou, and Edouard Grave. 2018. Loss in translation: Learning bilingual word mapping with a retrieval criterion. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2979–2984, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018. Phrase-based & neural unsupervised machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 5039–5049, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Quoc V. Le, and Ilya Sutskever. 2013a. Exploiting similarities among languages for machine translation. CoRR, abs/1309.4168.
    Findings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013b. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26, pages 3111–31Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Tasnim Mohiuddin and Shafiq Joty. 2019. Revisiting adversarial autoencoder for unsupervised word translation with cycle consistency and improved training. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3857–3867, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ndapa Nakashole and Raphael Flauger. 2018. Characterizing departures from linearity in word translation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 221–227, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa, and Eneko Agirre. 2019. Analyzing the limitations of cross-lingual word embedding mappings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4990–4995, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ivan Vulic, Goran Glavas, Roi Reichart, and Anna Korhonen. 2019. Do we really need fully unsupervised cross-lingual embeddings? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4406–4417, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ivan Vulicand Marie-Francine Moens. 2015. Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’15, pages 363–372, New York, NY, USA. ACM.
    Google ScholarLocate open access versionFindings
  • Ruochen Xu, Yiming Yang, Naoki Otani, and Yuexin Wu. 2018. Unsupervised cross-lingual transfer of word embedding spaces. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2465–2474. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Meng Zhang, Yang Liu, Huanbo Luan, and Maosong Sun. 2017. Adversarial training for unsupervised bilingual lexicon induction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1959–1970. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Barun Patra, Joel Ruben Antony Moniz, Sarthak Garg, Matthew R. Gormley, and Graham Neubig. 2019. Bilingual lexicon induction with semi-supervision in non-isometric embedding spaces. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 184–193, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Sebastian Ruder, Anders Søgaard, and Ivan Vulic. 2019. Unsupervised cross-lingual representation learning. In Proceedings of ACL 2019, Tutorial Abstracts, pages 31–38.
    Google ScholarLocate open access versionFindings
  • Samuel L. Smith, David H. P. Turban, Steven Hamblin, and Nils Y. Hammerla. 2017. Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In International Conference on Learning Representations (ICLR).
    Google ScholarLocate open access versionFindings
  • Anders Søgaard, Sebastian Ruder, and Ivan Vulic. 2018. On the limitations of unsupervised bilingual dictionary induction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 778– 788. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Anders Sgaard, Ivan Vuli, Sebastian Ruder, and Manaal Faruqui. 2019. Cross-Lingual Word Embeddings. Synthesis Lectures on Human Language Technologies. Morgan & Claypool.
    Google ScholarFindings
Author
Mohiuddin Tasnim
Mohiuddin Tasnim
Bari M Saiful
Bari M Saiful
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科