## AI帮你理解科学

## AI 精读

AI抽取本论文的概要总结

微博一下：

# The Secret is in the Spectra: Predicting Cross lingual Task Performance with Spectral Similarity Measures

EMNLP 2020, pp.2377-2390, (2020)

关键词

摘要

Performance in cross-lingual NLP tasks is impacted by the (dis)similarity of languages at hand: e.g., previous work has suggested there is a connection between the expected success of bilingual lexicon induction (BLI) and the assumption of (approximate) isomorphism between monolingual embedding spaces. In this work we present a large-scal...更多

代码：

数据：

简介

- The effectiveness of joint multilingual modeling and cross-lingual transfer in cross-lingual NLP is critically impacted by the actual languages in consideration (Bender, 2011; Ponti et al, 2019).
- Selecting suitable source languages is a prerequisite for successful cross-lingual transfer of dependency parsers or POS taggers (Naseem et al, 2012; Ponti et al, 2018; de Lhoneux et al, 2018)
- In another example, with all other factors kept similar, the quality of machine translation depends heavily on the properties and language proximity of the actual language pair (Kudugunta et al, 2019).
- The authors derive measures for the isomorphism between two embedding spaces based on these statistics

重点内容

- The effectiveness of joint multilingual modeling and cross-lingual transfer in cross-lingual NLP is critically impacted by the actual languages in consideration (Bender, 2011; Ponti et al, 2019)
- We further show that our findings generalize beyond bilingual lexicon induction (BLI), to cross-lingual transfer in dependency parsing and POS tagging, and we demonstrate strong correlations with machine translation (MT) performance
- The only exception is the MT task, where our measures fall short of Typological distance (TYP), we mark that they still hold a strong advantage over the baseline Gromov-Hausdorff distance (GH) and IS isomorphism measures that do not seem to capture any useful language similarity properties needed for the MT task
- This work introduces two spectral-based measures, Singular Value Gap (SVG) and ECOND-harmonic mean function (HM), that excel in predicting performance on a variety of cross-lingual tasks
- Both measures leverage information from singular values in different ways: ECOND-HM uses the ratio between two singular values, and is grounded in linear algebra and numerical analysis (Blum, 2014; Roy and Vetterli, 2007), while SVG directly utilizes the full range of singular values
- While the spectral methods are computed solely on word vectors from Wikipedia, the results in the downstream tasks are computed with different sets of embeddings, or the embeddings are learnt during training

方法

**BLI Methods in Comparison**

The scores in each BLI setup were computed by several state-of-theart BLI methods based on cross-lingual word embeddings, briefly described here. 1) SUP is the standard supervised method (Artetxe et al, 2016; Smith et al, 2017) that learns a mapping between two embedding spaces based on a training dictionary by solving the orthogonal Procrustes problem (Schonemann, 1966). 2) SUP+ is another standard supervised method that applies a variety of pre-processing and post-processing steps before and after learning the mapping matrix, see (Artetxe et al, 2018). 3) UNSUP is a fully unsupervised method based on the “similarity of monolingual similarities” heuristic to extract the seed dictionary from monolingual data.- The authors' analyses are conducted in three BLI setups (PanLex, MUSE, GTrans) and examine three types of state-of-the-art mapping-based methods, both supervised and unsupervised (SUP, SUP+, UNSUP).
- These span 556 language pairs, and cover both related and distant languages.8.
- The authors note that identical findings emerge from running the correlation analyses based on Precision@1 scores in lieu of MRR

结果

- The results are summarized in Tables 1 and 2.
- Based isomorphism measures are strongly correlated with performance across all tasks and settings.12.
- They show the strongest individual correlations with task performance among all isomorphism measures and linguistic distances alike.
- A general finding across all tasks is that the spectral measures are the most robust isomorphism measures: they substantially outperform the widely used baselines GH and IS

结论

**Further Discussion and Conclusion**

This work introduces two spectral-based measures, SVG and ECOND-HM, that excel in predicting performance on a variety of cross-lingual tasks.- On the other hand, is to extract the true embedding dimensionality directly from the embedding space
- Another recent study (Yin and Shen, 2018) employed perturbation analysis to study the robustness of embedding spaces to noise in monolingual settings, and established that it is related to effective dimensionality of the embedding space.
- All these inspired them to replace the standard matrix rank with effective rank when computing the condition number, and to introduce the statistic of effective condition number in §2.1

总结

## Introduction:

The effectiveness of joint multilingual modeling and cross-lingual transfer in cross-lingual NLP is critically impacted by the actual languages in consideration (Bender, 2011; Ponti et al, 2019).- Selecting suitable source languages is a prerequisite for successful cross-lingual transfer of dependency parsers or POS taggers (Naseem et al, 2012; Ponti et al, 2018; de Lhoneux et al, 2018)
- In another example, with all other factors kept similar, the quality of machine translation depends heavily on the properties and language proximity of the actual language pair (Kudugunta et al, 2019).
- The authors derive measures for the isomorphism between two embedding spaces based on these statistics
## Objectives:

The authors' aim is to quantify the difference between two embedding spaces by comparing statistics of their singular values.## Methods:

**BLI Methods in Comparison**

The scores in each BLI setup were computed by several state-of-theart BLI methods based on cross-lingual word embeddings, briefly described here. 1) SUP is the standard supervised method (Artetxe et al, 2016; Smith et al, 2017) that learns a mapping between two embedding spaces based on a training dictionary by solving the orthogonal Procrustes problem (Schonemann, 1966). 2) SUP+ is another standard supervised method that applies a variety of pre-processing and post-processing steps before and after learning the mapping matrix, see (Artetxe et al, 2018). 3) UNSUP is a fully unsupervised method based on the “similarity of monolingual similarities” heuristic to extract the seed dictionary from monolingual data.- The authors' analyses are conducted in three BLI setups (PanLex, MUSE, GTrans) and examine three types of state-of-the-art mapping-based methods, both supervised and unsupervised (SUP, SUP+, UNSUP).
- These span 556 language pairs, and cover both related and distant languages.8.
- The authors note that identical findings emerge from running the correlation analyses based on Precision@1 scores in lieu of MRR
## Results:

The results are summarized in Tables 1 and 2.- Based isomorphism measures are strongly correlated with performance across all tasks and settings.12.
- They show the strongest individual correlations with task performance among all isomorphism measures and linguistic distances alike.
- A general finding across all tasks is that the spectral measures are the most robust isomorphism measures: they substantially outperform the widely used baselines GH and IS
## Conclusion:

**Further Discussion and Conclusion**

This work introduces two spectral-based measures, SVG and ECOND-HM, that excel in predicting performance on a variety of cross-lingual tasks.- On the other hand, is to extract the true embedding dimensionality directly from the embedding space
- Another recent study (Yin and Shen, 2018) employed perturbation analysis to study the robustness of embedding spaces to noise in monolingual settings, and established that it is related to effective dimensionality of the embedding space.
- All these inspired them to replace the standard matrix rank with effective rank when computing the condition number, and to introduce the statistic of effective condition number in §2.1

- Table1: Correlations with BLI performance in three BLI setups, see §4.1. The best distance measure for each setup and BLI method is bolded. ris the score from the stepwise regression model, see §4.3. Superscripts indicate the distance measures that are statistically significant and included in the stepwise regression model (e.g., .911,3,6−8 means: SVG, ECOND-HM and all the linguistic distances have a combined contribution equivalent to 0.91 Pearson). *See the scatter plot in Appendix C
- Table2: Correlations with performance in three other cross-lingual tasks: Machine Translation (MT), dependency parsing (DEP), and POS tagging. Results for the best distance measure are highlighted in bold. ris computed using the stepwise regression model (see §4.3)
- Table3: Correlation scores in source-language (Source) and target-language (Target) selection analyses. The best distance measure per column is provided in bold. The percentage of cases a measure topped the others is shown in superscript (see details in Appendix B). rrefers to the unified correlation coefficient from the multiple regression model (see details in Appendix B)
- Table4: Summary of all the languages included in our analyses. The numbers in each cell indicate the number of different language pairs where each language was included, per each task and dataset. IE refers to the IndoEuropean language group

相关工作

**Related Work and Baselines**

We now provide an overview of prior research that focused on two relevant themes: 1) measuring approximate isomorphism between two embedding spaces, and 2) more generally, quantifying the (dis)similarity between languages, going beyond isomorphism measures. The discussed approaches will also be used as the main baselines later in §5.

Measuring Approximate Isomorphism. We focus on two standard isomorphism measures from prior work which are most similar to our work, and use them as our main baselines. The first measure, termed Isospectrality (IS) (Søgaard et al, 2018), is based on spectral analysis as well, but of the Laplacian eigenvalues of the nearest neighborhood graphs that originate from the initial embedding spaces X1 and X2 (for further technical details see Appendix A). Søgaard et al (2018) argue that these eigenvalues are compact representations of the graph Laplacian, and that their comparison reveals the degree of (approximate) isomorphism. Although similar in spirit to our approach, constructing nearest neighborhood graphs (and then analyzing their eigenvalues) removes useful information on the interaction between all vectors from the initial space, which our spectral method retains.

基金

- The work of IV and AK is supported by the ERC Consolidator Grant LEXICAL: Lexical Acquisition Across Languages (no 648909) awarded to AK
- HD is supported by the Blavatnik Postdoctoral Fellowship Programme

研究对象与分析

language pairs: 556

For more technical details on the fully unsupervised model, we refer the reader to prior work (Ruder et al, 2019a; Vulicet al., 2019).7

In sum, our analyses are conducted in three BLI setups (PanLex, MUSE, GTrans) and examine three types of state-of-the-art mapping-based methods, both supervised and unsupervised (SUP, SUP+, UNSUP). Altogether, these span 556 language pairs, and cover both related and distant languages.8. Following prior work (Glavaset al., 2019), our BLI evaluation measure is Mean Reciprocal Rank (MRR)

In sum, our analyses are conducted in three BLI setups (PanLex, MUSE, GTrans) and examine three types of state-of-the-art mapping-based methods, both supervised and unsupervised (SUP, SUP+, UNSUP). Altogether, these span 556 language pairs, and cover both related and distant languages.8. Following prior work (Glavaset al., 2019), our BLI evaluation measure is Mean Reciprocal Rank (MRR)

pairs: 8

While both IS and GH were reported to have strong correlations with BLI performance in prior work, they have not been evaluated in large-scale experiments before. In fact, the correlations were computed on a very small number of language pairs (IS: 8 pairs, GH: 10 pairs). Further, both measures do not scale well computationally

language pairs: 100

The conducted empirical analyses can be divided into two major parts. First, we run large-scale BLI analyses across several hundred language pairs from dozens of languages, comparing the correlation of spectral-based isomorphism measures (§2.2). and all baselines (§3) with performance of a wide spectrum of state-of-the-art BLI methods

language pairs: 210

BLI Setups and Scores. Vulicet al. (2019) ran BLI experiments on 210 language pairs, spanning 15 diverse languages. Their training and test dictionaries (5k and 2k translation pairs) are derived from PanLex (Baldwin et al, 2010; Kamholz et al, 2014)

pairs with additional 210 language pairs of 15 closely: 210

Their training and test dictionaries (5k and 2k translation pairs) are derived from PanLex (Baldwin et al, 2010; Kamholz et al, 2014). We complement the original 210 pairs with additional 210 language pairs of 15 closely related (European) languages using dictionaries extracted from PanLex following the procedure of Vulicet al. (2019). With the additional language set, the aim is

language pairs: 108

to probe if isomorphism measures can also capture more subtle and smaller language differences.6. We also analyze the BLI results of 108 language pairs from MUSE (Conneau et al, 2018). This dataset systematically covers English, with 88 translation pairs that involve English as either the source or target language

translation pairs: 88

We also analyze the BLI results of 108 language pairs from MUSE (Conneau et al, 2018). This dataset systematically covers English, with 88 translation pairs that involve English as either the source or target language. Finally, we analyze the available BLI results of Glavaset al. (2019) (referred to as GTrans) that are based on dictionaries obtained from Google Translate and include 28 language pairs spanning 8 different languages

language pairs: 28

This dataset systematically covers English, with 88 translation pairs that involve English as either the source or target language. Finally, we analyze the available BLI results of Glavaset al. (2019) (referred to as GTrans) that are based on dictionaries obtained from Google Translate and include 28 language pairs spanning 8 different languages. For the full list of language pairs involved in previous BLI studies, we refer the reader to prior work (Conneau et al, 2018; Glavaset al., 2019; Vulicet al., 2019)

language pairs: 556

In sum, our analyses are conducted in three BLI setups (PanLex, MUSE, GTrans) and examine three types of state-of-the-art mapping-based methods, both supervised and unsupervised (SUP, SUP+, UNSUP). Altogether, these span 556 language pairs, and cover both related and distant. 6The initial set of Vulicet al. (2019) comprises Bulgarian, Catalan, Esperanto, Estonian, Basque, Finnish, Hebrew, Hungarian, Indonesian, Georgian, Korean, Lithuanian, Norwegian, Thai, Turkish

language pairs: 210

6The initial set of Vulicet al. (2019) comprises Bulgarian, Catalan, Esperanto, Estonian, Basque, Finnish, Hebrew, Hungarian, Indonesian, Georgian, Korean, Lithuanian, Norwegian, Thai, Turkish. The additional 210 language pairs are only composed of Germanic, Romance and Slavic languages. For a full list of the languages see Table 4 in the appendix

pairs: 930

We base our analysis on the cross-lingual zero-shot parser transfer results of Lin et al (2019): The standard biaffine dependency parser (Dozat and Manning, 2017; Dozat et al, 2017) is trained on the training portions of Universal Dependencies (UD) treebanks from 31 languages (Nivre et al, 2018), and is then used to parse the test treebank of each language, now used as the target language. We report correlations between the language distance measures and the Labeled Attachment Scores (LAS) for all combinations of 31 languages, resulting in 930 pairs. POS Tagging

language pairs: 840

These scores span 26 low-resource target languages and 60 source languages which measure the utility of each source language to each of the 26 target languages in POS tagging. We use a sample of 840 language pairs for the correlation analysis, as 16 lowresource target languages and 49 source languages have readily available pretrained fastText vectors. 8We report all results for each BLI method, dictionary and language pairs in the supplementary material (and also here https://tinyurl.com/skn5cf7)

BLI datasets: 3

The only exception is the MT task, where our measures fall short of TYP (see Table 2), although we mark that they still hold a strong advantage over the baseline GH and IS isomorphism measures that do not seem to capture any useful language similarity properties needed for the MT task. ECOND-HM systematically outperforms CONDHM on 2 of 3 BLI datasets and 2 of 3 downstream tasks, validating our assumption that discarding the smallest singular values reduces noise. Additionally, SVG shows greater stability across tasks and datasets than ECOND-HM

language pairs: 420

The results demonstrate this across all tasks and settings (see bottom rows of the tables). For instance, when combining spectral measures with the linguistic distances, the regression model reaches outstanding correlation scores up to r = .91 on PanLex BLI (Table 1); with 420 language pairs, PanLex is the most comprehensive BLI dataset in our study. In addition, GH and IS are not chosen as significant regressors in the stepwise regression model, which indicates that they capture less information than our spectral methods.13

引用论文

- Zeljko Agic. 2017. Cross-lingual parser selection for low-resource languages. In Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies (UDW 2017), pages 1–10.
- Sanjeev Arora, Nadav Cohen, Wei Hu, and Yuping Luo. 2019. Implicit regularization in deep matrix factorization. In Proceedings of NeurIPS, pages 7411– 7422.
- Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2016. Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In Proceedings of EMNLP, pages 2289–2294.
- Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2018. A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. In Proceedings of ACL, pages 789–798.
- Mikel Artetxe, Sebastian Ruder, and Dani Yogatama. 2020. On the cross-lingual transferability of monolingual representations. In Proceedings of ACL.
- Timothy Baldwin, Jonathan Pool, and Susan Colowick. 2010. PanLex and LEXTRACT: Translating all words of all languages of the world. In Proceedings of COLING (Demo Papers), pages 37–40.
- Antonio Valerio Miceli Barone. 2016. Towards crosslingual distributed representations without parallel text trained with adversarial autoencoders. In Proceedings of the 1st Workshop on Representation Learning for NLP, pages 121–126.
- Emily M. Bender. 2011. On achieving and evaluating language-independence in NLP. Linguistic Issues in Language Technology, 6(3):1–26.
- Malavika Bhaskaranand and Jerry D Gibson. 2010. Spectral entropy-based quantization matrices for H264/AVC video coding. In 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers, pages 421–425.
- Johannes Bjerva, Robert Ostling, Maria Han Veiga, Jorg Tiedemann, and Isabelle Augenstein. 2019. What do language representations really represent? Computational Linguistics, 45(2):381–389.
- Lenore Blum. 2014. Alan Turing and the other theory of computation (expanded), volume 42 of Lecture Notes in Logic, pages 48–69. Cambridge University Press.
- Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146.
- Rachel Carrington, Karthik Bharath, and Simon Preston. 2019. Invariance and identifiability issues for word embeddings. In Proceedings of NeurIPS, pages 15114–15123.
- Frederic Chazal, David Cohen-Steiner, Leonidas J Guibas, Facundo Memoli, and Steve Y Oudot. 2009. Gromov-Hausdorff stable signatures for shapes using persistence. In Computer Graphics Forum, volume 28, pages 1393–1403.
- Alexis Conneau, Guillaume Lample, Marc’Aurelio Ranzato, Ludovic Denoyer, and Herve Jegou. 2018. Word translation without parallel data. In Proceedings of ICLR.
- Ryan Cotterell and Georg Heigold. 2017. Crosslingual character-level neural morphological tagging. In Proceedings of EMNLP, pages 748–759.
- Yerai Doval, Jose Camacho-Collados, Luis EspinosaAnke, and Steven Schockaert. 2019. On the robustness of unsupervised and semi-supervised cross-lingual word embedding learning. CoRR, abs/1908.07742.
- Timothy Dozat and Christopher D. Manning. 2017. Deep biaffine attention for neural dependency parsing. In Proceedings of ICLR.
- Timothy Dozat, Peng Qi, and Christopher D. Manning. 2017. Stanford’s graph-based neural dependency parser at the CoNLL 2017 shared task. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 20–30.
- Norman R. Draper and Harry Smith. 1998. Applied Regression Analysis, 3rd Edition. John Wiley & Sons.
- Matthew S. Dryer and Martin Haspelmath, editors. 2013. WALS Online. Max Planck Institute for Evolutionary Anthropology, Leipzig.
- Julian Eisenschlos, Sebastian Ruder, Piotr Czapla, Marcin Kadras, Sylvain Gugger, and Jeremy Howard. 2019. MultiFiT: Efficient multi-lingual language model fine-tuning. In Proceedings of EMNLP, pages 5702–5707.
- John Rupert Firth. 1957. A synopsis of linguistic theory, 1930-1955. Studies in Linguistic Analysis.
- William Ford. 2015. Chapter 15 - the singular value decomposition. In William Ford, editor, Numerical Linear Algebra with Applications, pages 299 – 320. Academic Press, Boston.
- Daniela Gerz, Ivan Vulic, Edoardo Maria Ponti, Jason Naradowsky, Roi Reichart, and Anna Korhonen. 2018. Language modeling for morphologically rich languages: Character-aware modeling for wordlevel prediction. Transactions of the Association for Computational Linguistics, 6:451–465.
- Goran Glavas, Robert Litschko, Sebastian Ruder, and Ivan Vulic. 2019. How to (properly) evaluate crosslingual word embeddings: On strong baselines, comparative analyses, and some misconceptions. In Proceedings of ACL, pages 710–721.
- Zellig S. Harris. 1954. Distributional structure. Word, 10(23):146–162.
- N.J. Higham, M.R. Dennis, P. Glendinning, P.A. Martin, F. Santosa, and J. Tanner. 2015. The Princeton Companion to Applied Mathematics. Princeton University Press.
- Ronald R. Hocking. 1976. The analysis and selection of variables in linear regression. Biometrics, 32(1):1–49.
- Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Herve Jegou, and Edouard Grave. 2018. Loss in translation: Learning bilingual word mapping with a retrieval criterion. In Proceedings of EMNLP, pages 2979–2984.
- David Kamholz, Jonathan Pool, and Susan M. Colowick. 2014. PanLex: Building a resource for panlingual lexical translation. In Proceedings of LREC, pages 3145–3150.
- Sneha Kudugunta, Ankur Bapna, Isaac Caswell, and Orhan Firat. 2019. Investigating multilingual NMT representations at scale. In Proceedings of EMNLPIJCNLP, pages 1565–1575.
- Olwijn Leeuwenburgh and Rob Arts. 2014. Distance parameterization for efficient seismic history matching with the ensemble kalman filter. Computational Geosciences, 18(3-4):535–548.
- Miryam de Lhoneux, Johannes Bjerva, Isabelle Augenstein, and Anders Søgaard. 2018. Parameter sharing between dependency parsers for related languages. In Proceedings of EMNLP, pages 4992–4997.
- Yu-Hsiang Lin, Chian-Yu Chen, Jean Lee, Zirui Li, Yuyan Zhang, Mengzhou Xia, Shruti Rijhwani, Junxian He, Zhisong Zhang, Xuezhe Ma, Antonios Anastasopoulos, Patrick Littell, and Graham Neubig. 2019. Choosing transfer languages for cross-lingual learning. In Proceedings of ACL, pages 3125–3135.
- Patrick Littell, David R. Mortensen, Ke Lin, Katherine Kairis, Carlisle Turner, and Lori Levin. 2017. URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors. In Proceedings of EACL, pages 8–14.
- Tomas Mikolov, Quoc V Le, and Ilya Sutskever. 2013. Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168.
- Tahira Naseem, Regina Barzilay, and Amir Globerson. 2012. Selective sharing for multilingual dependency parsing. In Proceedings of ACL, pages 629–637.
- Joakim Nivre, Mitchell Abrams, Zeljko Agic, Lars Ahrenberg, Lene Antonsen, Katya Aplonova, Maria Jesus Aranzabe, et al. 2018. Universal Dependencies 2.3.
- Helen O’Horan, Yevgeni Berzak, Ivan Vulic, Roi Reichart, and Anna Korhonen. 2016. Survey on the use of typological information in natural language processing. In Proceedings of COLING, pages 1297– 1308.
- Barun Patra, Joel Ruben Antony Moniz, Sarthak Garg, Matthew R. Gormley, and Graham Neubig. 2019. Bilingual lexicon induction with semi-supervision in non-isometric embedding spaces. In Proceedings of ACL, pages 184–193.
- Telmo Pires, Eva Schlinger, and Dan Garrette. 2019. How multilingual is multilingual BERT? In Proceedings of ACL, pages 4996–5001.
- Edoardo Maria Ponti, Helen O’Horan, Yevgeni Berzak, Ivan Vulic, Roi Reichart, Thierry Poibeau, Ekaterina Shutova, and Anna Korhonen. 2019. Modeling language variation and universals: A survey on typological linguistics for natural language processing. Computational Linguistics, 45(3):559–601.
- Edoardo Maria Ponti, Roi Reichart, Anna Korhonen, and Ivan Vulic. 2018. Isomorphic transfer of syntactic structures in cross-lingual NLP. In Proceedings of ACL, pages 1531–1542.
- Vikas Raunak, Vivek Gupta, and Florian Metze. 2019. Effective dimensionality reduction for word embeddings. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pages 235–243.
- Olivier Roy and Martin Vetterli. 2007. The effective rank: A measure of effective dimensionality. In Proceedings of the 15th European Signal Processing Conference, pages 606–610.
- Sebastian Ruder, Anders Søgaard, and Ivan Vulic. 2019a. Unsupervised cross-lingual representation learning. In Proceedings of ACL: Tutorial Abstracts, pages 31–38.
- Sebastian Ruder, Ivan Vulic, and Anders Søgaard. 2019b. A survey of cross-lingual word embedding models. Journal of Artificial Intelligence Research, 65:569–631.
- Peter H. Schonemann. 1966. A generalized solution of the orthogonal Procrustes problem. Psychometrika, 31(1):1–10.
- Samuel L. Smith, David H.P. Turban, Steven Hamblin, and Nils Y. Hammerla. 2017. Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In Proceedings of ICLR.
- Anders Søgaard, Sebastian Ruder, and Ivan Vulic. 2018. On the limitations of unsupervised bilingual dictionary induction. In Proceedings of ACL, pages 778–788.
- Vladimir Tourbabin and Boaz Rafaely. 2015. Direction of arrival estimation using microphone array processing for moving humanoid robots. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(11):2046–2058.
- Ivan Vulic, Goran Glavas, Roi Reichart, and Anna Korhonen. 2019. Do we really need fully unsupervised cross-lingual embeddings? In Proceedings of EMNLP, pages 4398–4409.
- Ivan Vulic, Sebastian Ruder, and Anders Søgaard. 2020. Are all good word vector spaces isomorphic? In Proceedings of EMNLP.
- Yu Wang. 2019. Single training dimension selection for word embedding with PCA. In Proceedings of EMNLP-IJCNLP, pages 3588–3593.
- Søren Wichmann, Andre Muller, Viveka Velupillai, Cecil H Brown, Eric W Holman, Pamela Brown, Sebastian Sauppe, Oleg Belyaev, Matthias Urban, Zarina Molochieva, et al. 2018. The ASJP database (version 18).
- Zi Yin and Yuanyuan Shen. 2018. On the dimensionality of word embedding. In Proceedings of NeurIPS, pages 887–898.
- Meng Zhang, Yang Liu, Huanbo Luan, and Maosong Sun. 2017. Earth mover’s distance minimization for unsupervised bilingual lexicon induction. In Proceedings of EMNLP, pages 1934–1945.
- Isospectrality (IS) After length-normalizing the vectors, Søgaard et al. (2018) compute the nearest neighbor graphs using a subset of the top N most frequent words in each space, and then calculate the Laplacian matrices LP1 and LP2 of each graph.
- Computing GH directly is computationally intractable in practice, but it can be tractably approximated by computing the Bottleneck distance between the metric spaces (Chazal et al., 2009).
- We also observe interesting patterns in the selection analyses for the POS tagging task in Table 3: While the results in the target-language selection analysis largely follow the main-text results, the same does not hold for source-language selection (Table 3, POS Target and Source columns). We speculate that this is in fact an artefact of the experimental design of Lin et al. (2019). Their set of target languages deliberately comprises only truly low-resource languages, and such languages are expected to have lower-quality embedding spaces. Transferring to such languages is bound to fail with most source languages regardless of the actual source-target language similarity. The difficulty of this setting is reflected in the actual scores: average accuracy scores for the best source-target combination is 0.55 in the source-language selection analysis, and 0.92 for target-language selection.

标签

评论