AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
EMD Minimization Under Orthogonal Transformation considerably improves the performance, which indicates that EMD Minimization Under Orthogonal Transformation refines the transformation found by Wasserstein Generative adversarial nets

Earth Mover's Distance Minimization for Unsupervised Bilingual Lexicon Induction.

EMNLP, pp.1924-1935, (2017)

Cited: 119|Views224
EI
Full Text
Bibtex
Weibo

Abstract

Cross-lingual natural language processing hinges on the premise that there exists invariance across languages. At the word level, researchers have identified such invariance in the word embedding semantic spaces of different languages. However, in order to connect the separate spaces, cross-lingual supervision encoded in parallel data is ...More

Code:

Data:

Introduction
  • Despite tremendous variation and diversity, languages are believed to share something in common.
  • As computational models of word semantics, monolingual word embeddings exhibit isomorphism across languages (Mikolov et al, 2013a)
  • This finding opens up the possibility to use a simple transformation, e.g. a linear map, to connect separately trained word embeddings cross-lingually.
  • As the authors aim to eliminate the need for crosslingual supervision from word translation pairs, the measure cannot be defined at the word level as in previous work (Mikolov et al, 2013a)
  • Rather, it should quantify the difference between the entire distributions of embeddings.
  • The authors can try to find the transformation that minimizes the earth mover’s distance
Highlights
  • Despite tremendous variation and diversity, languages are believed to share something in common
  • We develop two approaches to our earth mover’s distance minimization idea, called Wasserstein Generative adversarial nets
  • EMD Minimization Under Orthogonal Transformation considerably improves the performance, which indicates that EMD Minimization Under Orthogonal Transformation refines the transformation found by Wasserstein Generative adversarial nets
  • As our system minimizes the earth mover’s distance between embeddings of two languages, we show here the final earth mover’s distance can indicate the degree of difference between languages, serving as a proxy for language distance
  • We introduce earth mover’s distance minimization to tackle this task by exploiting its distribution-level matching to sidestep the requirement for word-level cross-lingual supervision
  • The earth mover’s distance provides a natural measure that may prove helpful for quantifying language difference
Methods
  • The authors first investigate the learning behavior of the WGAN approach, and present experiments on the bilingual lexicon induction task, followed by a showcase of the earth mover’s distance as a language distance measure.
  • The critic objective (6) provides an estimate of the Wasserstein distance up to a multiplicative constant, and a smaller Wasserstein distance should mean the transformed source embedding space and the target embedding space align better, which should in turn result in a better bilingual lexicon.
  • This is validated in Figure 3 by the correlation between Wasserstein estimate and accuracy.
  • The Wasserstein estimate can serve as an indicator for the bilingual lexicon induction performance, and the authors can save the model with the lowest value during training as the final model
Results
  • Table 1 shows the F1 scores on the five language pairs.
  • WGAN successfully finds a transformation that produces reasonable word translations.
  • EMDOT considerably improves the performance, which indicates that EMDOT refines the transformation found by WGAN.
  • The quality of the embeddings, will have an important effect on the performance, which may explain the lower scores on Turkish-English, as this lowresource setting may lack sufficient data to produce reliable embeddings.
  • Higher noise levels in the preprocessing and ground truth for this lan-
Conclusion
  • Starting from the idea of earth mover’s distance minimization, the authors have developed two approaches towards the goal.
  • The EMDOT approach is attractive for several reasons: It is consistent for training and testing, compatible with the orthogonal constraint, mathematically sound, guaranteed to converge, almost hyperparameter free, and fast in speed.
  • It suffers from a serious limitation: The alternating minimization procedure only converges to local minima, and they often turn out to be rather poor in practice.Conclusion and Future.
  • Future work should evaluate the earth mover’s distance between more languages to assess its quality as language distance
Tables
  • Table1: F1 scores for bilingual lexicon induction on Chinese-English, Spanish-English, Italian-English, Japanese-Chinese, and Turkish-English. The supervised methods TM and IA require seeds to train, and are listed for reference. Our EMDOT approach is initialized with the transformation found by WGAN, and consistently improves on it, reaching competitive performance with supervised methods
  • Table2: The earth mover’s distance (EMD), typology dissimilarity, and geographical distance for Chinese-English, Spanish-English, Italian-English, Japanese-Chinese, and Turkish-English. The EMD shows correlation with both factors of linguistic difference
  • Table3: Statistics of the non-parallel corpora for training monolingual word embeddings. Language codes: zh = Chinese, en = English, es = Spanish, it = Italian, ja = Japanese, tr = Turkish
Download tables as Excel
Related work
Funding
  • This work is supported by the National Natural Science Foundation of China (No 61522204), the 973 Program (2014CB340501), and the National Natural Science Foundation of China (No 61331013)
  • This research is also supported by the Singapore National Research Foundation under its International Research Centre@Singapore Funding Initiative and administered by the IDM Programme. 10https://code.google.com/archive/p/word2vec
Reference
  • Mihai Albu. 2006. Quantitative analyses of typological data. Ph.D. thesis, Univ. Leipzig.
    Google ScholarFindings
  • Waleed Ammar, George Mulcaire, Yulia Tsvetkov, Guillaume Lample, Chris Dyer, and Noah A. Smith. 2016. Massively Multilingual Word Embeddings. arXiv:1602.01925 [cs].
    Findings
  • Martin Arjovsky and Leon Bottou. 2017. Towards Principled Methods For Training Generative Adversarial Networks. In ICLR.
    Google ScholarFindings
  • Martin Arjovsky, Soumith Chintala, and Leon Bottou. 2017. Wasserstein GAN. arXiv:1701.07875 [cs, stat].
    Findings
  • Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2016. Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In EMNLP.
    Google ScholarFindings
  • Ehsaneddin Asgari and Mohammad R. K. Mofrad. 201Comparing Fifty Natural Languages and Twelve Genetic Languages Using Word Embedding Language Divergence (WELD) as a Quantitative Measure of Language Distance. In Proceedings of the Workshop on Multilingual and Cross-lingual Methods in NLP.
    Google ScholarLocate open access versionFindings
  • Antonio Valerio Miceli Barone. 2016. Towards crosslingual distributed representations without parallel text trained with adversarial autoencoders. In Proceedings of the 1st Workshop on Representation Learning for NLP.
    Google ScholarLocate open access versionFindings
  • Hailong Cao, Tiejun Zhao, Shu Zhang, and Yao Meng. 2016. A Distribution-based Model to Learn Bilingual Word Embeddings. In COLING.
    Google ScholarFindings
  • Sarath Chandar A P, Stanislas Lauly, Hugo Larochelle, Mitesh Khapra, Balaraman Ravindran, Vikas C Raykar, and Amrita Saha. 2014. An Autoencoder Approach to Learning Bilingual Word Representations. In NIPS.
    Google ScholarFindings
  • Scott Cohen and Leonidas Guibas. 1999. The Earth Mover’s Distance Under Transformation Sets. In ICCV.
    Google ScholarFindings
  • Jocelyn Coulmance, Jean-Marc Marty, Guillaume Wenzek, and Amine Benhalloum. 2015.
    Google ScholarFindings
  • Marco Cuturi. 2013. Sinkhorn Distances: Lightspeed Computation of Optimal Transport. In NIPS.
    Google ScholarFindings
  • Marco Cuturi and Arnaud Doucet. 2014. Fast Computation of Wasserstein Barycenters. In ICML.
    Google ScholarFindings
  • Michael Cysouw. 2013a. Disentangling geography from genealogy. Space in language and linguistics: Geographical, interactional, and cognitive perspectives.
    Google ScholarFindings
  • Michael Cysouw. 2013b. Predicting language learning difficulty. Approaches to measuring linguistic differences.
    Google ScholarFindings
  • Hu Ding and Jinhui Xu. 20FPTAS for Minimizing the Earth Mover’s Distance Under Rigid Transformations and Related Problems. Algorithmica.
    Google ScholarFindings
  • Georgiana Dinu, Angeliki Lazaridou, and Marco Baroni. 2015. Improving Zero-Shot Learning by Mitigating the Hubness Problem. In ICLR Workshop.
    Google ScholarLocate open access versionFindings
  • Qing Dou and Kevin Knight. 2012. Large scale decipherment for out-of-domain machine translation. In EMNLP-CoNLL.
    Google ScholarFindings
  • Qing Dou and Kevin Knight. 2013. DependencyBased Decipherment for Resource-Limited Machine Translation. In EMNLP.
    Google ScholarFindings
  • Qing Dou, Ashish Vaswani, Kevin Knight, and Chris Dyer. 2015. Unifying Bayesian Inference and Vector Space Models for Improved Decipherment. In ACL-IJCNLP.
    Google ScholarFindings
  • Matthew S. Dryer and Martin Haspelmath, editors. 2013. WALS Online. Max Planck Institute for Evolutionary Anthropology, Leipzig.
    Google ScholarFindings
  • Long Duong, Hiroshi Kanayama, Tengfei Ma, Steven Bird, and Trevor Cohn. 2016. Learning Crosslingual Word Embeddings without Bilingual Corpora. In EMNLP.
    Google ScholarFindings
  • Steffen Eger, Armin Hoenen, and Alexander Mehler. 2016. Language classification from bilingual word embedding graphs. In COLING.
    Google ScholarFindings
  • Manaal Faruqui and Chris Dyer. 2014. Improving Vector Space Word Representations Using Multilingual Correlation. In EACL.
    Google ScholarFindings
  • Charlie Frogner, Chiyuan Zhang, Hossein Mobahi, Mauricio Araya, and Tomaso A Poggio. 2015. Learning with a Wasserstein Loss. In NIPS.
    Google ScholarFindings
  • Eric Gaussier, J.M. Renders, I. Matveeva, C. Goutte, and H. Dejean. 2004. A Geometric View on Bilingual Lexicon Extraction from Comparable Corpora. In ACL.
    Google ScholarFindings
  • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In NIPS.
    Google ScholarLocate open access versionFindings
  • Stephan Gouws, Yoshua Bengio, and Greg Corrado. 2015. BilBOWA: Fast Bilingual Distributed Representations without Word Alignments. In ICML.
    Google ScholarFindings
  • Stephan Gouws and Anders Søgaard. 2015. Simple task-specific bilingual word embeddings. In NAACL-HLT.
    Google ScholarFindings
  • Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. 2017. Improved Training of Wasserstein GANs. arXiv:1704.00028 [cs, stat].
    Findings
  • Harald Hammarstrom and Loretta O’Connor. 2013. Dependency-sensitive typological distance. Approaches to measuring linguistic differences.
    Google ScholarFindings
  • Karl Moritz Hermann and Phil Blunsom. 2014. Multilingual Distributed Representations without Word Alignment. In ICLR.
    Google ScholarFindings
  • Kurt Hornik. 1991. Approximation capabilities of multilayer feedforward networks. Neural Networks.
    Google ScholarFindings
  • Gao Huang, Chuan Guo, Matt J Kusner, Yu Sun, Fei Sha, and Kilian Q Weinberger. 2016. Supervised Word Mover’s Distance. In NIPS.
    Google ScholarLocate open access versionFindings
  • Patrick Juola. 1998. Cross-Entropy and Linguistic Typology. In CoNLL.
    Google ScholarFindings
  • Tomas Kocisky, Karl Moritz Hermann, and Phil Blunsom. 2014. Learning Bilingual Word Representations by Marginalizing Alignments. In ACL.
    Google ScholarFindings
  • Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From Word Embeddings To Document Distances. In ICML.
    Google ScholarFindings
  • Angeliki Lazaridou, Georgiana Dinu, and Marco Baroni. 2015. Hubness and Pollution: Delving into Cross-Space Mapping for Zero-Shot Learning. In ACL-IJCNLP.
    Google ScholarFindings
  • Ang Lu, Weiran Wang, Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2015. Deep Multilingual Correlation for Improved Word Embeddings. In NAACLHLT.
    Google ScholarFindings
  • Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Bilingual Word Representations with Monolingual Quality in Mind. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, and Brendan Frey. 2015. Adversarial Autoencoders. arXiv:1511.05644 [cs].
    Findings
  • Thomas Mayer and Michael Cysouw. 2012. Language comparison through sparse multilingual word alignment. In Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH.
    Google ScholarLocate open access versionFindings
  • Luke Metz, Ben Poole, David Pfau, and Jascha SohlDickstein. 2016. Unrolled Generative Adversarial Networks. arXiv:1611.02163 [cs, stat].
    Findings
  • Tomas Mikolov, Quoc V Le, and Ilya Sutskever. 2013a. Exploiting Similarities among Languages for Machine Translation. arXiv:1309.4168 [cs].
    Findings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013b. Distributed Representations of Words and Phrases and their Compositionality. In NIPS.
    Google ScholarLocate open access versionFindings
  • Gregoire Montavon, Klaus-Robert Muller, and Marco Cuturi. 2016. Wasserstein Training of Restricted Boltzmann Machines. In NIPS.
    Google ScholarFindings
  • Sebastian Nowozin, Botond Cseke, and Ryota Tomioka. 2016. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization. arXiv:1606.00709 [cs, stat].
    Findings
  • Franz Josef Och and Hermann Ney. 2003. A Systematic Comparison of Various Statistical Alignment Models. CL.
    Google ScholarLocate open access versionFindings
  • Takamasa Oshikiri, Kazuki Fukui, and Hidetoshi Shimodaira. 2016. Cross-Lingual Word Representations via Spectral Graph Embeddings. In ACL.
    Google ScholarFindings
  • Ben Poole, Alexander A. Alemi, Jascha SohlDickstein, and Anelia Angelova. 2016. Improved generator objectives for GANs. arXiv:1612.02780 [cs, stat]. ArXiv: 1612.02780.
    Findings
  • Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv:1511.06434 [cs].
    Findings
  • Reinhard Rapp. 1999. Automatic Identification of Word Translations from Unrelated English and German Corpora. In ACL.
    Google ScholarLocate open access versionFindings
  • Y. Rubner, C. Tomasi, and L.J. Guibas. 1998. A Metric for Distributions with Applications to Image Databases. In ICCV.
    Google ScholarFindings
  • Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved Techniques for Training GANs. In NIPS.
    Google ScholarFindings
  • Peter H. Schonemann. 1966. A generalized solution of the orthogonal procrustes problem. Psychometrika.
    Google ScholarLocate open access versionFindings
  • Tianze Shi, Zhiyuan Liu, Yang Liu, and Maosong Sun. 2015. Learning Cross-lingual Word Embeddings via Matrix Co-factorization. In ACL-IJCNLP.
    Google ScholarFindings
  • Samuel Smith, David Turban, Steven Hamblin, and Nils Hammerla. 2017. Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In ICLR.
    Google ScholarFindings
  • Stephanie Strassel and Jennifer Tracey. 2016. LORELEI Language Packs: Data, Tools, and Resources for Technology Development in Low Resource Languages. In LREC.
    Google ScholarFindings
  • Oscar Tackstrom, Dipanjan Das, Slav Petrov, Ryan McDonald, and Joakim Nivre. 2013. Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging. TACL.
    Google ScholarLocate open access versionFindings
  • Cedric Villani. 2009. Optimal Transport: Old and New.
    Google ScholarFindings
  • Ivan Vulicand Anna Korhonen. 2016. On the Role of Seed Lexicons in Learning Bilingual Word Embeddings. In ACL.
    Google ScholarFindings
  • Ivan Vulicand Marie-Francine Moens. 2013. CrossLingual Semantic Similarity of Words as the Similarity of Their Semantic Word Responses. In NAACL-HLT.
    Google ScholarFindings
  • Ivan Vulicand Marie-Francine Moens. 2015. Bilingual Word Embeddings from Non-Parallel DocumentAligned Data Applied to Bilingual Lexicon Induction. In ACL-IJCNLP.
    Google ScholarFindings
  • Michael Wick, Pallika Kanani, and Adam Pocock. 2016. Minimally-Constrained Multilingual Embeddings via Artificial Code-Switching. In AAAI.
    Google ScholarFindings
  • Chao Xing, Dong Wang, Chao Liu, and Yiye Lin. 2015. Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation. In NAACL-HLT.
    Google ScholarFindings
  • Hyejin Youn, Logan Sutton, Eric Smith, Cristopher Moore, Jon F. Wilkins, Ian Maddieson, William Croft, and Tanmoy Bhattacharya. 2016. On the universal structure of human lexical semantics. Proceedings of the National Academy of Sciences.
    Google ScholarLocate open access versionFindings
  • Meng Zhang, Yang Liu, Huanbo Luan, Yiqun Liu, and Maosong Sun. 2016a. Inducing Bilingual Lexica From Non-Parallel Data With Earth Mover’s Distance Regularization. In COLING.
    Google ScholarFindings
  • Meng Zhang, Yang Liu, Huanbo Luan, Maosong Sun, Tatsuya Izuha, and Jie Hao. 2016b. Building Earth Mover’s Distance on Bilingual Word Embeddings for Machine Translation. In AAAI.
    Google ScholarFindings
  • Meng Zhang, Haoruo Peng, Yang Liu, Huanbo Luan, and Maosong Sun. 2017. Bilingual Lexicon Induction From Non-Parallel Data With Minimal Supervision. In AAAI.
    Google ScholarFindings
  • Yuan Zhang, David Gaddy, Regina Barzilay, and Tommi Jaakkola. 2016c. Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings. In NAACL-HLT.
    Google ScholarFindings
  • Will Y. Zou, Richard Socher, Daniel Cer, and Christopher D. Manning. 2013. Bilingual Word Embeddings for Phrase-Based Machine Translation. In EMNLP.
    Google ScholarFindings
0
Your rating :

No Ratings

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn