AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Ablation study shows that the two-way interaction by Prior Optimal Transport and Bidirectional Lexicon Update is the key to significant improvement

Semi Supervised Bilingual Lexicon Induction with Two way Interaction

EMNLP 2020, pp.2973-2984, (2020)

Cited by: 0|Views198
Full Text
Bibtex
Weibo

Abstract

Semi-supervision is a promising paradigm for Bilingual Lexicon Induction (BLI) with limited annotations. However, previous semisupervised methods do not fully utilize the knowledge hidden in annotated and nonannotated data, which hinders further improvement of their performance. In this paper, we propose a new semi-supervised BLI framewor...More

Code:

Data:

0
Introduction
  • Bilingual Lexicon Induction (BLI) is of huge interest to the research frontier. BLI methods learn cross-lingual word embeddings from separately trained monolingual embeddings.
  • Patra et al (2019) combined the unsupervised BLI loss that captured the structural similarity in word embeddings (Lample et al, 2018a) with the supervised loss (Joulin et al, 2018).
  • This loss combination still performed poorly since the bad supervised optimization under limited annotations, see the Experiment part for details.
Highlights
  • Bilingual Lexicon Induction (BLI) is of huge interest to the research frontier
  • We propose two strategies of semi-supervised BLI framework based on Prior Optimal Transport (POT) and Bidirectional Lexicon Update (BLU), named by Cyclic Semi-Supervision (CSS) and Parallel Semi-Supervision (PSS)
  • We introduce the two-way interaction between the supervised signal and unsupervised alignment by proposed POT and BLU message passing mechanisms
  • Ablation study shows that the two-way interaction by POT and BLU is the key to significant improvement
  • The results show that CSS and PSS achieve SOTA results over two popular datasets
  • As CSS and PSS are compatible with any supervised BLI and Optimal Transport (OT)-based unsupervised BLI approaches, they can be applied to the latent space optimization
Methods
Results
  • With “100 unique” annotated lexicon, CSS outperforms all other semi-supervised methods on every task.
  • The accuracy score of Patra et al (2019) is less than 3% on all tasks because the limited annotated lexicon is insufficient for effective learning, while Artetxe et al (2017) avoided this problem by lexicon bootstrap
  • Both CSS and PSS keep strong performance with insufficient annotated lexicon by the proposed message passing mechanisms, and achieve 2.8% and 0.9% improvement over Artetxe et al (2017), respectively.
Conclusion
  • The authors introduce the two-way interaction between the supervised signal and unsupervised alignment by proposed POT and BLU message passing mechanisms.
  • Based on the message passing mechanisms, the authors design two strategies of semi-supervised BLI to integrate supervised and unsupervised approaches, CSS and PSS, which are constructed on cyclic and parallel strategies respectively.
  • As CSS and PSS are compatible with any supervised BLI and OT-based unsupervised BLI approaches, they can be applied to the latent space optimization
Tables
  • Table1: Word translation accuracy(@1) of CSS and PSS on the MUSE dataset with RCSLS as their supervised loss. (’EN’: English, ’ES’: Spanish, ’FR’: French, ’DE’: German, ’RU’: Russian, ’IT’: Italian. Underline: the highest accuracy among the group. In bold: the best among all methods). Detailed Experimental Results of CSS and PSS on MUSE Dataset, VecMap Dataset and distant language pairs. We repeat the experiment on each language pair four times and report best, avg, st of the four results(best: the highest @1 accuracy. avg: the average accuracy which is reported in main body of this paper. st: the standard deviation.)
  • Table2: Word translation accuracy(@1) of CSS and PSS on the VecMap dataset with RCSLS as their supervised loss. (’EN’: English, ’ES’: Spanish, ’DE’: German, ’IT’: Italian. Underline: the highest accuracy among the group. In bold: the best among all methods. In bold and marked by †: the second-highest among all methods). Detailed Experimental Results Ablation Study. We repeat the experiment on each language pair four times and report best, avg, st of the four results(best: the highest @1 accuracy. avg: the average accuracy which is reported in main body of this paper. st: the standard deviation.)
  • Table3: Ablation Study with ”5K all” and ”1K unique” annotated lexicon. ( : remove specific component from the CSS or PSS. &: remove both components.)
  • Table4: Word translation accuracy(@1) of CSS and PSS on the distant language pairs with RCSLS as their supervised loss. (’EN’: English, ’TA’: Tamil, ’JA’: Japanese, ’MS’: Malay, ’FI’: Finnish. In bold: the best among all methods)
Download tables as Excel
Related work
  • This paper is mainly related to the following three lines of work. Supervised methods. Mikolov et al (2013) pointed out that it was a feasible way to BLI by learning a linear transformation based on the Euclidean distance. Artetxe et al (2016) applied normalization to word embeddings and imposed an orthogonal constraint on the linear transformation which led to a closed-form solution. Joulin et al (2018) replaced Euclidean distance with the RCSLS distance to relieve the hubness phenomenon and achieved SOTA results for many languages. Jawanpuria et al (2019) optimized a Mahalanobis metric along with the transformation to refine the similarity between word embeddings. Unsupervised methods. Artetxe et al (2018a) proposed an unsupervised method to generate an initial lexicon by exploiting the similarity in crosslingual space and applied a robust self-learning to improve it iteratively. Lample et al (2018a) did the first work for unsupervised BLI which learned a linear transformation by adversarial training and improved it by a refinement procedure. Mohiuddin and Joty (2019) revisited adversarial autoencoder for unsupervised word translation and proposed two novel extensions to it. Moreover, OT-based
Funding
  • The accuracy score of Patra et al (2019) is less than 3% on all tasks because the limited annotated lexicon is insufficient for effective learning, while Artetxe et al (2017) avoided this problem by lexicon bootstrap
Study subjects and analysis
source-target pairs: 5000
Evaluation Setting Similar to Mohiuddin et al (2020), we compare CSS and PSS against baselines. *https://github.com/BestActionNow/SemiSupBLI †https://github.com/facebookresearch/MUSE ‡https://github.com/artetxem/vecmap on three annotated lexicons with different sizes, including one-to-one and one-to-many mappings: “100 unique” and “5K unique” contain one-to-one mappings of 100 and 5000 source-target pairs respectively, while “5K all” contains one-to-many mappings of all 5000 source and target words, that is, for each source word there may be multiple target words. Moreover, we present the experiment results of five totally unsupervised baselines and three supervised ones

language pairs: 5
Our framework finished in 30 minutes, while the running time for Mohiuddin and Joty (2019) was 3 hours. 5.2 Results on MUSE Dataset In Table 1, we show the word translation results for five language pairs from the MUSE dataset, including 10 BLI tasks considering bidirectional translation. With “100 unique” annotated lexicon, CSS outperforms all other semi-supervised methods on every task

language pairs: 3
Taking all methods into consideration, including supervised, semi-supervised and unsupervised, CSS and PSS achieve the highest accuracy on 8 of 10 tasks and the best results on average. 5.3 Results on VecMap Dataset In Table 2, we show the word translation accuracy for three language pairs, including 6 translation tasks on the harder VecMap dataset (Dinu and Baroni, 2015). Notably, a couple of unsupervised approaches (Lample et al, 2018a; Mohiuddin and Joty, 2019; Grave et al, 2019; Alaux et al, 2019) are evaluated to have a zero accuracy on some of the language pairs

language pairs: 6
Taking all unsupervised, semi-supervised and supervised methods into account, CSS and PSS achieve SOTA accuracy on average. Notably, PSS gets the highest or the second-highest (except the unstable unsupervised baseline (Alaux et al, 2019)) scores for 5 of 6 language pairs. The results for “100 unique” annotated lexicon support our finding on the MUSE dataset that CSS learns better at low supervision level

language pairs: 4
The experimental setting is the same as the main experiments. The ablation results are presented in Table 3 on four language pairs (2 from MUSE dataset and 2 from VecMap dataset). Effectiveness of POT and BLU: Regardless of the annotated lexicon size, removing POT, BLU and both of them from CSS brings 2.4%, 0.9% and 13.0% decline of accuracy respectively on average

distant language pairs with 5000 lexicon: 5
5.5 Results on distant language pairs. In this section, We report the tranlation accuracy of our method on five distant language pairs with 5000 lexicon. We choose three methods as baselines: Patra et al (2019) proposed semi-supervised SOTA method

Reference
  • Jean Alaux, Edouard Grave, Marco Cuturi, and Armand Joulin. 2019. Unsupervised hyperalignment for multilingual word embeddings. ICLR.
    Google ScholarLocate open access versionFindings
  • David Alvarez-Melis and Tommi S. Jaakkola. 2018. Gromov-wasserstein alignment of word embedding spaces. In EMNLP, pages 1881–1890.
    Google ScholarLocate open access versionFindings
  • David Alvarez-Melis, Tommi S. Jaakkola, and Stefanie Jegelka. 2018. Structured optimal transport. In AISTATS, pages 1771–1780.
    Google ScholarLocate open access versionFindings
  • Martın Arjovsky, Soumith Chintala, and Leon Bottou. 2017. Wasserstein generative adversarial networks. In ICML, pages 214–223.
    Google ScholarLocate open access versionFindings
  • Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2016. Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In EMNLP, pages 2289–2294.
    Google ScholarLocate open access versionFindings
  • Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2017. Learning bilingual word embeddings with (almost) no bilingual data. In ACL, pages 451–462.
    Google ScholarLocate open access versionFindings
  • Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2018a. A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. In ACL, pages 789–798.
    Google ScholarLocate open access versionFindings
  • Mikel Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho. 2018b. Unsupervised neural machine translation. In ICLR.
    Google ScholarLocate open access versionFindings
  • Xuefeng Bai, Hailong Cao, Kehai Chen, and Tiejun Zhao. 201A bilingual adversarial autoencoder for unsupervised bilingual lexicon induction. TASLP, 27(10):1639–1648.
    Google ScholarLocate open access versionFindings
  • Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. TACL, 5:135–146.
    Google ScholarLocate open access versionFindings
  • Marco Cuturi. 2013. Sinkhorn distances: Lightspeed computation of optimal transport. In NIPS, pages 2292–2300.
    Google ScholarLocate open access versionFindings
  • Georgiana Dinu and Marco Baroni. 2015. Improving zero-shot learning by mitigating the hubness problem. In ICLR.
    Google ScholarFindings
  • Zi-Yi Dou, Zhi-Hao Zhou, and Shujian Huang. 2018. Unsupervised bilingual lexicon induction via latent variable models. In EMNLP, pages 621–626.
    Google ScholarLocate open access versionFindings
  • David M Gaddy, Yuan Zhang, Regina Barzilay, and Tommi S Jaakkola. 2016. Ten pairs to tagmultilingual pos tagging via coarse mapping between embeddings. In NAACL.
    Google ScholarFindings
  • Edouard Grave, Armand Joulin, and Quentin Berthet. 2019. Unsupervised alignment of embeddings with wasserstein procrustes. In AISTATS, pages 1880– 1890.
    Google ScholarLocate open access versionFindings
  • Jiaji Huang, Qiang Qiu, and Kenneth Church. 2019. Hubless nearest neighbor search for bilingual lexicon induction. In ACL, pages 4072–4080.
    Google ScholarLocate open access versionFindings
  • Pratik Jawanpuria, Arjun Balgovind, Anoop Kunchukuttan, and Bamdev Mishra. 2019. Learning multilingual word embeddings in latent metric space: A geometric approach. TACL, 7:107–120.
    Google ScholarLocate open access versionFindings
  • Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Herve Jegou, and Edouard Grave. 20Loss in translation: Learning bilingual word mapping with a retrieval criterion. In EMNLP, pages 2979–2984.
    Google ScholarLocate open access versionFindings
  • Alexandre Klementiev, Ivan Titov, and Binod Bhattarai. 2012. Inducing crosslingual distributed representations of words. In COLING, pages 1459–1474.
    Google ScholarLocate open access versionFindings
  • Guillaume Lample, Alexis Conneau, Marc’Aurelio Ranzato, Ludovic Denoyer, and Herve Jegou. 2018a. Word translation without parallel data. In ICLR.
    Google ScholarFindings
  • Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018b. Phrase-based & neural unsupervised machine translation. In EMNLP, pages 5039–5049.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Quoc V. Le, and Ilya Sutskever. 2013. Exploiting similarities among languages for machine translation. CoRR, abs/1309.4168.
    Findings
  • Tasnim Mohiuddin, M. Saiful Bari, and Shafiq R. Joty. 2020. Lnmap: Departures from isomorphic assumption in bilingual lexicon induction through non-linear mapping in latent space. CoRR, abs/2004.13889.
    Findings
  • Tasnim Mohiuddin and Shafiq R. Joty. 2019. Revisiting adversarial autoencoder for unsupervised word translation with cycle consistency and improved training. In NAACL, pages 3857–3867.
    Google ScholarLocate open access versionFindings
  • Barun Patra, Joel Ruben Antony Moniz, Sarthak Garg, Matthew R. Gormley, and Graham Neubig. 2019. Bilingual lexicon induction with semi-supervision in non-isometric embedding spaces. In ACL, pages 184–193.
    Google ScholarLocate open access versionFindings
  • Gabriel Peyre, Marco Cuturi, et al. 2019. Computational optimal transport. FTML, 11(5-6):355–607.
    Google ScholarLocate open access versionFindings
  • Sebastian Ruder, Ivan Vulic, and Anders Søgaard. 2019. A survey of cross-lingual word embedding models. JAIR, 65:569–631.
    Google ScholarLocate open access versionFindings
  • Ivan Vulic, Goran Glavas, Roi Reichart, and Anna Korhonen. 2019. Do we really need fully unsupervised cross-lingual embeddings? In EMNLP, pages 4406– 4417.
    Google ScholarLocate open access versionFindings
  • Min Xiao and Yuhong Guo. 2014. Distributed word representation learning for cross-lingual dependency parsing. In CoNLL, pages 119–129.
    Google ScholarLocate open access versionFindings
  • Ruochen Xu, Yiming Yang, Naoki Otani, and Yuexin Wu. 2018. Unsupervised cross-lingual transfer of word embedding spaces. In EMNLP, pages 2465– 2474.
    Google ScholarLocate open access versionFindings
  • Meng Zhang, Yang Liu, Huanbo Luan, and Maosong Sun. 2017. Earth mover’s distance minimization for unsupervised bilingual lexicon induction. In EMNLP, pages 1934–1945.
    Google ScholarLocate open access versionFindings
  • Xu Zhao, Zihao Wang, Yong Zhang, and Hao Wu. 2020. A relaxed matching procedure for unsupervised BLI. In ACL, pages 3036–3041.
    Google ScholarLocate open access versionFindings
  • Chunting Zhou, Xuezhe Ma, Di Wang, and Graham Neubig. 2019. Density matching for bilingual word embedding. In NAACL, pages 1588–1598.
    Google ScholarLocate open access versionFindings
Author
Xu Zhao
Xu Zhao
Zihao Wang
Zihao Wang
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科