AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We propose a density matching based unsupervised method for learning bilingual word embedding mappings

Density Matching for Bilingual Word Embedding.

North American Chapter of the Association for Computational Linguistics, (2019): 1588-1598

Cited by: 20|Views49
EI

Abstract

Recent approaches to cross-lingual word embedding have generally been based on linear transformations between the sets of embedding vectors in the two languages. In this paper, we propose an approach that instead expresses the two monolingual embedding spaces as probability densities defined by a Gaussian mixture model, and matches the tw...More

Code:

Data:

0
Introduction
  • Cross-lingual word embeddings represent words in different languages in a single vector space, capturing the syntactic and semantic similarity of words across languages in a way conducive to use in computational models (Upadhyay et al, 2016; Ruder et al, 2017).
  • “offline” approaches learn a bilingual mapping function or multilingual projections from pre-trained monolingual word embeddings or feature vectors (Haghighi et al, 2008; Mikolov et al, 2013; Faruqui and Dyer, 2014).
  • As mentioned in the introduction and detailed later, the model is based on matching two probability density functions, one representing the source embedding space and one representing the target embedding space
  • To learn in this framework, the authors will use the concept of normalizing flows (Rezende and Mohamed, 2015).
Highlights
  • Cross-lingual word embeddings represent words in different languages in a single vector space, capturing the syntactic and semantic similarity of words across languages in a way conducive to use in computational models (Upadhyay et al, 2016; Ruder et al, 2017)
  • We propose a method for density matching for bilingual word embedding (DeMa-BWE)
  • One standard use case for bilingual embeddings is in bilingual lexicon induction, where the embeddings are used to select the most likely translation in the other language given these embeddings
  • We evaluate our approach extensively on the bilingual lexicon induction (BLI) task, which measures the word translation accuracy in comparison to a gold standard
  • We propose a density matching based unsupervised method for learning bilingual word embedding mappings
Methods
  • The authors present notation used in the method, describe the prior the authors define for the monolingual embedding space, detail the density matching method. 3.1 Notation

    Given two sets of independently trained monolingual embeddings, the problem of bilingual embedding mapping is to learn a mapping function that aligns the two sets in a shared space.
  • Let x ∈ Rd, y ∈ Rd denote vectors in the source and target language embedding space respectively.
  • It is necessary to have a retrieval metric that selects word or words likely to be translations given these embeddings
  • When performing this retrieval, it has been noted that high-dimensional embedding spaces tend to suffer from the “hubness” problem (Radovanovicet al., 2010) where some vectors are nearest neighbors of many other points, which is detrimental to reliably retrieving translations in the bilingual space.
  • Following (Conneau et al, 2017), k is set to be 10
Results
  • Main Results on BLI

    In Tab. 1, the authors compare the performance of DeMaBME extensively with the best performing unsupervised and supervised methods on the commonly benchmarked language pairs.

    The authors' unsupervised baselines are: (1) MUSE (U+R) (Conneau et al, 2017), a GAN-based unsupervised method with refinement. (2) A strong and robust unsupervised self-learning method SLunsup from (Artetxe et al, 2018b).
  • (2) A strong and robust unsupervised self-learning method SLunsup from (Artetxe et al, 2018b).
  • The authors run their published code with identical words as the initial dictionary for fair comparison with the approach, denoted SL-unsup-ID.
  • Procrustes (R) MSF-ISF MSF CSLS-Sp GeoMM en-es es-en en-de de-en en-fr fr-en en-ru ru-en en-zh zh-en en-ja ja-en Supervised Unsupervised MUSE (U+R) SL-unsup.
  • The authors run their published code with identical words as the initial dictionary for fair comparison with the approach, denoted SL-unsup-ID. (3) Sinkhorn (Xu et al, 2018) that minimizes the Sinkhorn distance between the source and target word vectors. (4) An iterative matching method from (Hoshen and
Conclusion
  • The authors propose a density matching based unsupervised method for learning bilingual word embedding mappings.
  • DeMa-BWE performs well in the task of bilingual lexicon induction.
  • The authors will integrate Gaussian embeddings (Vilnis and McCallum, 2015) with the approach
Summary
  • Introduction:

    Cross-lingual word embeddings represent words in different languages in a single vector space, capturing the syntactic and semantic similarity of words across languages in a way conducive to use in computational models (Upadhyay et al, 2016; Ruder et al, 2017).
  • “offline” approaches learn a bilingual mapping function or multilingual projections from pre-trained monolingual word embeddings or feature vectors (Haghighi et al, 2008; Mikolov et al, 2013; Faruqui and Dyer, 2014).
  • As mentioned in the introduction and detailed later, the model is based on matching two probability density functions, one representing the source embedding space and one representing the target embedding space
  • To learn in this framework, the authors will use the concept of normalizing flows (Rezende and Mohamed, 2015).
  • Methods:

    The authors present notation used in the method, describe the prior the authors define for the monolingual embedding space, detail the density matching method. 3.1 Notation

    Given two sets of independently trained monolingual embeddings, the problem of bilingual embedding mapping is to learn a mapping function that aligns the two sets in a shared space.
  • Let x ∈ Rd, y ∈ Rd denote vectors in the source and target language embedding space respectively.
  • It is necessary to have a retrieval metric that selects word or words likely to be translations given these embeddings
  • When performing this retrieval, it has been noted that high-dimensional embedding spaces tend to suffer from the “hubness” problem (Radovanovicet al., 2010) where some vectors are nearest neighbors of many other points, which is detrimental to reliably retrieving translations in the bilingual space.
  • Following (Conneau et al, 2017), k is set to be 10
  • Results:

    Main Results on BLI

    In Tab. 1, the authors compare the performance of DeMaBME extensively with the best performing unsupervised and supervised methods on the commonly benchmarked language pairs.

    The authors' unsupervised baselines are: (1) MUSE (U+R) (Conneau et al, 2017), a GAN-based unsupervised method with refinement. (2) A strong and robust unsupervised self-learning method SLunsup from (Artetxe et al, 2018b).
  • (2) A strong and robust unsupervised self-learning method SLunsup from (Artetxe et al, 2018b).
  • The authors run their published code with identical words as the initial dictionary for fair comparison with the approach, denoted SL-unsup-ID.
  • Procrustes (R) MSF-ISF MSF CSLS-Sp GeoMM en-es es-en en-de de-en en-fr fr-en en-ru ru-en en-zh zh-en en-ja ja-en Supervised Unsupervised MUSE (U+R) SL-unsup.
  • The authors run their published code with identical words as the initial dictionary for fair comparison with the approach, denoted SL-unsup-ID. (3) Sinkhorn (Xu et al, 2018) that minimizes the Sinkhorn distance between the source and target word vectors. (4) An iterative matching method from (Hoshen and
  • Conclusion:

    The authors propose a density matching based unsupervised method for learning bilingual word embedding mappings.
  • DeMa-BWE performs well in the task of bilingual lexicon induction.
  • The authors will integrate Gaussian embeddings (Vilnis and McCallum, 2015) with the approach
Tables
  • Table1: Precision@1 for the MUSE BLI task compared with previous work. All the baseline results employ CSLS as the retrieval metric except for Sinkhorn∗ which uses cosine similarity. R represents refinement. Bold and italic indicate the best unsupervised and overall numbers respectively. (’en’ is English, ’es’ is Spanish, ’de’ is German,
  • Table2: BLI Precision (@1) for morphologically complex languages. id+Procrustes (R)∗ is the result reported in (Søgaard et al, 2018). 5k+Procrustes (R) uses the training dictionary with 5k unique query words. (’et’ is Estonian, ’fi’ is Finnish, ’el’ is Greek, ’hu’ is Hungarian, ’pl’ is Persian, ’tr’ is Turkish.)
  • Table3: Pearson rank correlation (×100) on crosslingual word similarity task. Bold indicates the best unsupervised numbers
  • Table4: Ablation study on different components of DeMa-BME
Download tables as Excel
Funding
  • This work is sponsored by Defense Advanced Research Projects Agency Information Innovation Office (I2O), Program: Low Resource Languages for Emergent Incidents (LORELEI), issued by DARPA/I2O under Contract No HR0011-15C-0114
Reference
  • Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2016. Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2289–2294.
    Google ScholarLocate open access versionFindings
  • Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2017. Learning bilingual word embeddings with (almost) no bilingual data. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 451–462.
    Google ScholarLocate open access versionFindings
  • Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. 2016. Density estimation using real NVP. arXiv preprint arXiv:1605.08803.
    Findings
  • Manaal Faruqui and Chris Dyer. 201Improving vector space word representations using multilingual correlation. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 462–471, Gothenburg, Sweden. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Stephan Gouws, Yoshua Bengio, and Greg Corrado. 201Bilbowa: Fast bilingual distributed representations without word alignments. In International Conference on Machine Learning, pages 748–756.
    Google ScholarLocate open access versionFindings
  • Edouard Grave, Armand Joulin, and Quentin Berthet. 2018. Unsupervised alignment of embeddings with Wasserstein Procrustes. arXiv preprint arXiv:1805.11222.
    Findings
  • Jiatao Gu, Hany Hassan, Jacob Devlin, and Victor OK Li. 2018. Universal neural machine translation for extremely low resource languages. In Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL), New Orleans, USA.
    Google ScholarLocate open access versionFindings
  • Jiang Guo, Wanxiang Che, David Yarowsky, Haifeng Wang, and Ting Liu. 2015. Cross-lingual dependency parsing based on distributed representations. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), volume 1, pages 1234–1244.
    Google ScholarLocate open access versionFindings
  • Aria Haghighi, Percy Liang, Taylor Berg-Kirkpatrick, and Dan Klein. 2008. Learning bilingual lexicons from monolingual corpora. In Proceedings of ACL08: HLT, pages 771–779, Columbus, Ohio. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Junxian He, Graham Neubig, and Taylor BergKirkpatrick. 2018. Unsupervised learning of syntactic structure with invertible neural projections. In Proceedings of EMNLP.
    Google ScholarLocate open access versionFindings
  • Karl Moritz Hermann and Phil Blunsom. 2014. Multilingual models for compositional distributed semantics. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 58–68.
    Google ScholarLocate open access versionFindings
  • Yedid Hoshen and Lior Wolf. 2018. Non-adversarial unsupervised word translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 469–478.
    Google ScholarLocate open access versionFindings
  • Pratik Jawanpuria, Arjun Balgovind, Anoop Kunchukuttan, and Bamdev Mishra. 2018. Learning multilingual word embeddings in latent metric space: a geometric approach. arXiv preprint arXiv:1808.08773.
    Findings
  • Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Herve Jegou, and Edouard Grave. 2018. Loss in translation: Learning bilingual word mapping with a retrieval criterion. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2979–2984.
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
    Findings
  • Diederik P Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. 20Improved variational inference with inverse autoregressive flow. In Advances in Neural Information Processing Systems, pages 4743–4751.
    Google ScholarLocate open access versionFindings
  • Durk P Kingma and Prafulla Dhariwal. 2018. Glow: Generative flow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems, pages 10235–10244.
    Google ScholarLocate open access versionFindings
  • Alexandre Klementiev, Ivan Titov, and Binod Bhattarai. 2012. Inducing crosslingual distributed representations of words. Proceedings of COLING 2012, pages 1459–1474.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Quoc V Le, and Ilya Sutskever. 2013. Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168.
    Findings
  • George Papamakarios, Iain Murray, and Theo Pavlakou. 2017. Masked autoregressive flow for density estimation. In Advances in Neural Information Processing Systems, pages 2338–2347.
    Google ScholarLocate open access versionFindings
  • Milos Radovanovic, Alexandros Nanopoulos, and Mirjana Ivanovic. 2010. Hubs in space: Popular nearest neighbors in high-dimensional data. J. Mach. Learn. Res., 11:2487–2531.
    Google ScholarLocate open access versionFindings
  • Danilo Jimenez Rezende and Shakir Mohamed. 2015. Variational inference with normalizing flows. In Proceedings of the 32nd International Conference on International Conference on Machine LearningVolume 37, pages 1530–1538. JMLR. org.
    Google ScholarLocate open access versionFindings
  • Sebastian Ruder, Ivan Vulic, and Anders Søgaard. 2017. A survey of cross-lingual embedding models. CoRR, abs/1706.04902.
    Findings
  • Yutaro Shigeto, Ikumi Suzuki, Kazuo Hara, Masashi Shimbo, and Yuji Matsumoto. 2015. Ridge regression, hubness, and zero-shot learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 135–151. Springer.
    Google ScholarLocate open access versionFindings
  • Samuel L Smith, David HP Turban, Steven Hamblin, and Nils Y Hammerla. 2017. Offline bilingual word vectors, orthogonal transformations and the inverted softmax. arXiv preprint arXiv:1702.03859.
    Findings
  • Anders Søgaard, Sebastian Ruder, and Ivan Vulic. 2018. On the limitations of unsupervised bilingual dictionary induction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 778– 788. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Shyam Upadhyay, Manaal Faruqui, Chris Dyer, and Dan Roth. 2016. Cross-lingual models of word embeddings: An empirical comparison. arXiv preprint arXiv:1604.00425.
    Findings
  • Luke Vilnis and Andrew McCallum. 2015. Word representations via gaussian embedding. International Conference on Learning Representations.
    Google ScholarFindings
  • Laura Wendlandt, Jonathan K. Kummerfeld, and Rada Mihalcea. 2018. Factors influencing the surprising instability of word embeddings. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2092–2102. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Chao Xing, Dong Wang, Chao Liu, and Yiye Lin. 2015. Normalized word embedding and orthogonal transform for bilingual word translation. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1006–1011.
    Google ScholarLocate open access versionFindings
  • Ruochen Xu, Yiming Yang, Naoki Otani, and Yuexin Wu. 2018. Unsupervised cross-lingual transfer of word embedding spaces. In Conference on Empirical Methods in Natural Language Processing (EMNLP), Brussels, Belgium.
    Google ScholarLocate open access versionFindings
  • Meng Zhang, Yang Liu, Huanbo Luan, and Maosong Sun. 2017. Earth mover’s distance minimization for unsupervised bilingual lexicon induction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1934– 1945.
    Google ScholarLocate open access versionFindings
  • Zhisong Zhang, Wasi Uddin Ahmad, Xuezhe Ma, Eduard Hovy, Kai-Wei Chang, and Nanyun Peng. 2018. Near or far, wide range zero-shot crosslingual dependency parsing. arXiv preprint arXiv:1811.00570.
    Findings
  • Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. 2016. Transfer learning for low-resource neural machine translation. Conference on Empirical Methods in Natural Language Processing (EMNLP).
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科