AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
Despite structural similarities identified by linear centered kernel alignment, the scores from Table 2 demonstrate that structurally similar layers might encode different amounts of lexical information: e.g., compare performance drops between L5 and L8 in all evaluation tasks

Probing Pretrained Language Models for Lexical Semantics

EMNLP 2020, pp.7222-7240, (2020)

被引用0|浏览254
下载 PDF 全文
引用
微博一下

摘要

The success of large pretrained language models (LMs) such as BERT and RoBERTa has sparked interest in probing their representations, in order to unveil what types of knowledge they implicitly capture. While prior research focused on morphosyntactic, semantic, and world knowledge, it remains unclear to which extent LMs also derive lexical...更多

代码

数据

0
简介
  • Language models (LMs) based on deep Transformer networks (Vaswani et al, 2017), pretrained on unprecedentedly large amounts of text, offer unmatched performance in virtually every NLP task (Qiu et al, 2020).
  • While preliminary findings from Ethayarajh (2019) and Vulicet al. (2020) suggest that there is a wealth of lexical knowledge available within the parameters of BERT and other LMs, a systematic empirical study across different languages is currently lacking
重点内容
  • Introduction and Motivation

    Language models (LMs) based on deep Transformer networks (Vaswani et al, 2017), pretrained on unprecedentedly large amounts of text, offer unmatched performance in virtually every NLP task (Qiu et al, 2020)
  • Our study aims at providing answers to the following key questions: Q1) Do lexical extraction strategies generalise across different languages and tasks, or do they rather require language- and taskspecific adjustments?; Q2) Is lexical information concentrated in a small number of parameters and layers, or scattered throughout the encoder?; Q3) Are “BERT-based” static word embeddings competitive with traditional word embeddings such as fastText?; Q4) Do monolingual LMs independently trained in multiple languages learn structurally similar representations for words denoting similar concepts?
  • Per-layer centered kernel alignment (CKA) similarities are provided in Figure 7 and Figure 5, and we show results of representations extracted from individual layers for selected evaluation setups and languages in Table 2
  • Despite structural similarities identified by linear CKA, the scores from Table 2 demonstrate that structurally similar layers might encode different amounts of lexical information: e.g., compare performance drops between L5 and L8 in all evaluation tasks
  • The results in Table 2 further suggest that more type-level lexical information is available in lower layers, as all peak scores in the table are achieved with representations extracted from layers L1 − L5
  • We found that type-level word embeddings (WEs) extracted from pretrained LMs can surpass static WEs like fastText (Bojanowski et al, 2017)
方法
  • Pretrained LMs and Languages.
  • The authors' selection of test languages is guided by the following constraints: a) availability of comparable pretrained monolingual LMs; b) availability of evaluation data; and c) typological diversity of the sample, along the lines of recent initiatives in multilingual NLP (Gerz et al, 2018; Hu et al, 2020; Ponti et al, 2020, inter alia).
  • The authors use monolingual uncased BERT Base models for all languages, retrieved from the HuggingFace repository (Wolf et al, 2019).3.
  • The authors experiment with multilingual BERT (Devlin et al, 2019) as the underlying LM, aiming to measure the performance difference between language-specific and massively multilingual LMs in the lexical probing tasks
结果
  • A summary of the results is shown in Figure 2 for LSIM, in Figure 3a for BLI, in Figure 3b for CLIR, in Figure 4a and Figure 4b for RELP, and in Figure 4c for WA.
  • These results offer multiple axes of comparison, and the ensuing discussion focuses on the central questions Q1-Q3 posed in §1.6.
  • How Important is Context? Another observation that holds across all configurations concerns the usefulness of providing contexts drawn from external corpora, and corroborates findings from prior work (Liu et al, 2019b): ISO configurations cannot match configurations that average subword embeddings from multiple contexts (AOC-10 and AOC-
结论
  • Per-layer CKA similarities are provided in Figure 7 and Figure 5, and the authors show results of representations extracted from individual layers for selected evaluation setups and languages in Table 2.
  • Much lower scores in type-level semantic tasks for higher layers empirically validate a recent hypothesis of Ethayarajh (2019) “that contextualised word representations are more contextspecific in higher layers.” We note that none of the results with L=n configurations from Table 1 can match best performing AVG(L≤n) configurations with layer-wise averaging.
  • The authors found that type-level WEs extracted from pretrained LMs can surpass static WEs like fastText (Bojanowski et al, 2017)
表格
  • Table1: Configuration components of word-level embedding extraction, resulting in 24 possible configurations
  • Table2: Task performance of word representations extracted from different Transformer layers for a selection of tasks, languages, and language pairs. Configuration: MONO.AOC-100.NOSPEC. Highest scores per row are in bold
  • Table3: URLs of the models used in our study. The first part of the table refers to the models used in the main experiments throughout the paper, while the second part refers to the models used in side experiments
  • Table4: Links to the external corpora used in the study. We randomly sample 1M sentences of maximum sequence length 512 from the corresponding corpora
  • Table5: Links to evaluation data and models
  • Table6: Results in the BLI task across different language pairs and word vector extraction configurations. MRR scores reported. For clarity of presentation, a subset of results is presented in this table, while the rest (and the averages) are presented in Table 7. AVG(L≤n) means that we average representations over all Transformer layers up to the nth layer (included), where L = 0 refers to the embedding layer, L = 1 to the bottom layer, and L = 12 to the final (top) layer. Different configurations are described in §2 and Table 1. Additional diagnostic experiments with top-to-bottom layerwise averaging configs (REVERSE) are run for a subset of languages: {EN, DE, FI }
  • Table7: Results in the bilingual lexicon induction (BLI) task across different language pairs and word vector extraction configurations: Part II. MAP scores reported. For clarity of presentation, a subset of results is presented in this table, while the rest (also used to calculate the averages) is provided in Table 6 in the previous page. AVG(L≤n) means that we average representations over all Transformer layers up to the nth layer (included), where L = 0 refers to the embedding layer, L = 1 to the bottom layer, and L = 12 to the final (top) layer. Different configurations are described in §2 and Table 1
  • Table8: Results in the CLIR task across different language pairs and word vector extraction configurations. MAP scores reported; AVG(L≤n) means that we average representations over all Transformer layers up to the nth layer (included), where L = 0 refers to the embedding layer, L = 1 to the bottom layer, and L = 12 to the final
  • Table9: Results in the relation prediction task (RELP) across different word vector extraction configurations. Micro-averaged F1 scores reported , obtained as averages over 5 experimental runs for each configuration; standard deviation is also reported. AVG(L≤n) means that we average representations over all Transformer layers up to the nth layer (included), where L = 0 refers to the embedding layer, L = 1 to the bottom layer, and L = 12 to the final (top) layer. Different configurations are described in §2 and Table 1. RANDOM.XAVIER are 768-dim vectors for the same vocabularies, randomly initialised via Xavier initialisation (<a class="ref-link" id="cGlorot_2010_a" href="#rGlorot_2010_a">Glorot and Bengio, 2010</a>)
Download tables as Excel
基金
  • This work is supported by the ERC Consolidator Grant LEXICAL: Lexical Acquisition Across Languages (no 648909) awarded to Anna Korhonen
  • The work of Goran Glavasand Robert Litschko is supported by the Baden-Wurttemberg Stiftung (AGREE grant of the Eliteprogramm)
研究对象与分析
pairs: 1888
The evaluation metric is the Spearman’s rank correlation between the average of human-elicited semantic similarity scores for word pairs and the cosine similarity between the respective type-level word vectors. We rely on the recent comprehensive multilingual LSIM benchmark Multi-SimLex (Vulicet al., 2020), which covers 1,888 pairs in. 13 languages

language pairs: 10
We adopt the standard BLI evaluation setup from Glavaset al. (2019): 5K training word pairs are used to learn the mapping, and another 2K pairs as test data. We report standard Mean Reciprocal Rank (MRR) scores for 10 language pairs spanning EN, DE, RU, FI, TR. Task 4: Cross-Lingual Information Retrieval (CLIR)

language pairs: 6
It embeds queries and documents as IDF-weighted sums of their corresponding WEs from the CLWE space, and uses cosine similarity as the ranking function. We report Mean Average Precision (MAP) scores for 6 language pairs covering EN, DE, RU, FI. Task 5: Lexical Relation Prediction (RELP)

引用论文
  • Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2018. A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. In Proceedings of ACL, pages 789–798.
    Google ScholarLocate open access versionFindings
  • Mikel Artetxe, Sebastian Ruder, and Dani Yogatama. 2020. On the cross-lingual transferability of monolingual representations. In Proceedings of ACL, pages 4623–4637.
    Google ScholarLocate open access versionFindings
  • Yonatan Belinkov and James R. Glass. 2019. Analysis methods in neural language processing: A survey. Transactions of the Association of Computational Linguistics, 7:49–72.
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of Machine Learning Research, 3:1137–1155.
    Google ScholarLocate open access versionFindings
  • Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the ACL, 5:135–146.
    Google ScholarLocate open access versionFindings
  • Ondrej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Shujian Huang, Matthias Huck, Philipp Koehn, Qun Liu, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Raphael Rubino, Lucia Specia, and Marco Turchi. 2017. Findings of the 2017 Conference on Machine Translation (WMT17). In Proceedings of WMT, pages 169–214.
    Google ScholarLocate open access versionFindings
  • Danushka Bollegala and Cong Bao. 2018. Learning word meta-embeddings by autoencoding. In Proceedings of COLING, pages 1650–1661.
    Google ScholarLocate open access versionFindings
  • Inigo Casanueva, Tadas Temcinas, Daniela Gerz, Matthew Henderson, and Ivan Vulic. 2020. Efficient intent detection with dual sentence encoders. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, pages 38–45.
    Google ScholarLocate open access versionFindings
  • Ting-Yun Chang and Yun-Nung Chen. 201What does this word mean? Explaining contextualized embeddings with natural language definition. In Proceedings of EMNLP-IJCNLP, pages 6064–6070.
    Google ScholarLocate open access versionFindings
  • Ethan A. Chi, John Hewitt, and Christopher D. Manning. 2020. Finding universal grammatical relations in multilingual BERT. In Proceedings of ACL, pages 5564–5577.
    Google ScholarLocate open access versionFindings
  • Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzman, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Unsupervised cross-lingual representation learning at scale. In Proceedings of ACL, pages 8440–8451.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171–4186.
    Google ScholarLocate open access versionFindings
  • Aleksandr Drozd, Anna Gladkova, and Satoshi Matsuoka. 2016. Word embeddings, analogies, and machine learning: Beyond king - man + woman = queen. In Proceedings of COLING, pages 3519– 3530.
    Google ScholarLocate open access versionFindings
  • Daniel Edmiston. 2020. A systematic analysis of morphological content in BERT models for multiple languages. CoRR, abs/2004.03032.
    Findings
  • Julian Eisenschlos, Sebastian Ruder, Piotr Czapla, Marcin Kardas, Sylvain Gugger, and Jeremy Howard. 2019. MultiFiT: Efficient multi-lingual language model fine-tuning. In Proceedings of EMNLP-IJCNLP, pages 5701–5706.
    Google ScholarLocate open access versionFindings
  • Kawin Ethayarajh. 2019. How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings. In Proceedings of EMNLP-IJCNLP, pages 55–65.
    Google ScholarLocate open access versionFindings
  • Christiane Fellbaum. 1998. WordNet. MIT Press.
    Google ScholarFindings
  • Daniela Gerz, Ivan Vulic, Edoardo Maria Ponti, Roi Reichart, and Anna Korhonen. 20On the relation between linguistic typology and (limitations of) multilingual language modeling. In Proceedings of EMNLP, pages 316–327.
    Google ScholarLocate open access versionFindings
  • Goran Glavasand Ivan Vulic. 2018. Discriminating between lexico-semantic relations with the specialization tensor model. In Proceedings of NAACL-HLT, pages 181–187.
    Google ScholarLocate open access versionFindings
  • Goran Glavas, Robert Litschko, Sebastian Ruder, and Ivan Vulic. 2019. How to (properly) evaluate crosslingual word embeddings: On strong baselines, comparative analyses, and some misconceptions. In Proceedings of ACL, pages 710–721.
    Google ScholarLocate open access versionFindings
  • Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of AISTATS, pages 249– 256.
    Google ScholarLocate open access versionFindings
  • Stephan Gouws, Yoshua Bengio, and Greg Corrado. 2015. BilBOWA: Fast bilingual distributed representations without word alignments. In Proceedings of ICML, pages 748–756.
    Google ScholarLocate open access versionFindings
  • John Hewitt and Christopher D. Manning. 2019. A structural probe for finding syntax in word representations. In Proceedings of NAACL-HLT, pages 4129–4138.
    Google ScholarLocate open access versionFindings
  • Felix Hill, Roi Reichart, and Anna Korhonen. 2015. SimLex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics, 41(4):665–695.
    Google ScholarLocate open access versionFindings
  • Valentin Hofmann, Janet B. Pierrehumbert, and Hinrich Schutze. 2020. Generating derivational morphology with BERT. CoRR, abs/2005.00672.
    Findings
  • Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, and Melvin Johnson. 2020. XTREME: A massively multilingual multitask benchmark for evaluating cross-lingual generalization. In Proceedings of ICML.
    Google ScholarLocate open access versionFindings
  • Ganesh Jawahar, Benoıt Sagot, and Djame Seddah. 2019. What does BERT learn about the structure of language? In Proceedings of ACL, pages 3651– 3657.
    Google ScholarLocate open access versionFindings
  • Douwe Kiela, Changhan Wang, and Kyunghyun Cho. 2018. Dynamic meta-embeddings for improved sentence representations. In Proceedings of EMNLP, pages 1466–1477.
    Google ScholarLocate open access versionFindings
  • Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In Proceedings of the 10th Machine Translation Summit (MT SUMMIT), pages 79–86.
    Google ScholarLocate open access versionFindings
  • Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey E. Hinton. 2019. Similarity of neural network representations revisited. In Proceedings of ICML, pages 3519–3529.
    Google ScholarLocate open access versionFindings
  • Artur Kulmizev, Vinit Ravishankar, Mostafa Abdou, and Joakim Nivre. 2020. Do neural language models show preferences for syntactic formalisms? In Proceedings of ACL, pages 4077–4091.
    Google ScholarLocate open access versionFindings
  • Robert Litschko, Goran Glavas, Simone Paolo Ponzetto, and Ivan Vulic. 2018. Unsupervised crosslingual information retrieval using monolingual data only. In Proceedings of SIGIR, pages 1253–1256.
    Google ScholarLocate open access versionFindings
  • Robert Litschko, Goran Glavas, Ivan Vulic, and Laura Dietz. 2019. Evaluating resource-lean cross-lingual embedding models in unsupervised retrieval. In Proceedings of SIGIR, pages 1109–1112.
    Google ScholarLocate open access versionFindings
  • Nelson F. Liu, Matt Gardner, Yonatan Belinkov, Matthew E. Peters, and Noah A. Smith. 2019a. Linguistic knowledge and transferability of contextual representations. In Proceedings of NAACL-HLT, pages 1073–1094.
    Google ScholarLocate open access versionFindings
  • Qianchu Liu, Diana McCarthy, Ivan Vulic, and Anna Korhonen. 2019b. Investigating cross-lingual alignment methods for contextualized embeddings with token-level evaluation. In Proceedings of CoNLL, pages 33–43.
    Google ScholarLocate open access versionFindings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019c. RoBERTa: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
    Findings
  • Timothee Mickus, Denis Paperno, Mathieu Constant, and Kees van Deemter. 2020. What do you mean, BERT? Assessing BERT as a distributional semantics model. Proceedings of the Society for Computation in Linguistics, 3(34).
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Quoc V. Le, and Ilya Sutskever. 2013a. Exploiting similarities among languages for machine translation. arXiv preprint, CoRR, abs/1309.4168.
    Findings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013b. Distributed representations of words and phrases and their compositionality. In Proceedings of NeurIPS, pages 3111– 3119.
    Google ScholarLocate open access versionFindings
  • Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of EMNLP, pages 1532– 1543.
    Google ScholarLocate open access versionFindings
  • Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of NAACL-HLT, pages 2227–2237.
    Google ScholarLocate open access versionFindings
  • Jonas Pfeiffer, Ivan Vulic, Iryna Gurevych, and Sebastian Ruder. 2020. MAD-X: An adapter-based framework for multi-task cross-lingual transfer. In Proceedings of EMNLP.
    Google ScholarLocate open access versionFindings
  • Tiago Pimentel, Josef Valvoda, Rowan Hall Maudslay, Ran Zmigrod, Adina Williams, and Ryan Cotterell. 2020. Information-theoretic probing for linguistic structure. In Proceedings of ACL, pages 4609–4622.
    Google ScholarLocate open access versionFindings
  • Edoardo Maria Ponti, Goran Glavas, Olga Majewska, Qianchu Liu, Ivan Vulic, and Anna Korhonen. 2020. XCOPA: A multilingual dataset for causal commonsense reasoning. In Proceedings of EMNLP.
    Google ScholarLocate open access versionFindings
  • Yifan Qiao, Chenyan Xiong, Zheng-Hao Liu, and Zhiyuan Liu. 2019. Understanding the behaviors of BERT in ranking. CoRR, abs/1904.07531.
    Findings
  • Xipeng Qiu, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai, and Xuanjing Huang. 2020. Pre-trained models for natural language processing: A survey. CoRR, abs/2003.08271.
    Findings
  • Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. CoRR, abs/1910.10683.
    Findings
  • Emily Reif, Ann Yuan, Martin Wattenberg, Fernanda B. Viegas, Andy Coenen, Adam Pearce, and Been Kim. 2019. Visualizing and measuring the geometry of BERT. In Proceedings of NeurIPS, pages 8594– 8603.
    Google ScholarLocate open access versionFindings
  • Anna Rogers, Olga Kovaleva, and Anna Rumshisky. 2020. A primer in BERTology: what we know about how BERT works. Transactions of the ACL.
    Google ScholarFindings
  • Sebastian Ruder, Ivan Vulic, and Anders Søgaard. 2019. A survey of cross-lingual embedding models. Journal of Artificial Intelligence Research, 65:569– 631.
    Google ScholarLocate open access versionFindings
  • Jasdeep Singh, Bryan McCann, Richard Socher, and Caiming Xiong. 2019. BERT is not an interlingua and the bias of tokenization. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019), pages 47–55.
    Google ScholarLocate open access versionFindings
  • Samuel L. Smith, David H.P. Turban, Steven Hamblin, and Nils Y. Hammerla. 2017. Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In Proceedings of ICLR (Conference Track).
    Google ScholarLocate open access versionFindings
  • Anders Søgaard, Sebastian Ruder, and Ivan Vulic. 2018. On the limitations of unsupervised bilingual dictionary induction. In Proceedings of ACL, pages 778–788.
    Google ScholarLocate open access versionFindings
  • Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019. BERT rediscovers the classical NLP pipeline. In Proceedings of ACL, pages 4593–4601.
    Google ScholarLocate open access versionFindings
  • Jorg Tiedemann. 2009. News from OPUS - A collection of multilingual parallel corpora with tools and interfaces. In Proceedings of RANLP, pages 237– 248.
    Google ScholarLocate open access versionFindings
  • Henry Tsai, Jason Riesa, Melvin Johnson, Naveen Arivazhagan, Xin Li, and Amelia Archer. 2019. Small and practical BERT models for sequence labeling. In Proceedings EMNLP-IJCNLP, pages 3632–3636.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of NeurIPS, pages 6000– 6010.
    Google ScholarLocate open access versionFindings
  • Elena Voita and Ivan Titov. 2020. Informationtheoretic probing with minimum description length. In Proceedings of EMNLP.
    Google ScholarLocate open access versionFindings
  • Ivan Vulic, Simon Baker, Edoardo Maria Ponti, Ulla Petti, Ira Leviant, Kelly Wing, Olga Majewska, Eden Bar, Matt Malone, Thierry Poibeau, Roi Reichart, and Anna Korhonen. 2020. Multi-Simlex: A largescale evaluation of multilingual and cross-lingual lexical semantic similarity. Computational Linguistics.
    Google ScholarFindings
  • Ruize Wang, Duyu Tang, Nan Duan, Zhongyu Wei, Xuanjing Huang, Jianshu Ji, Guihong Cao, Daxin Jiang, and Ming Zhou. 2020. K-Adapter: Infusing knowledge into pre-trained models with adapters. CoRR, abs/2002.01808.
    Findings
  • Yuxuan Wang, Wanxiang Che, Jiang Guo, Yijia Liu, and Ting Liu. 2019. Cross-lingual BERT transformation for zero-shot dependency parsing. In Proceedings of EMNLP-IJCNLP, pages 5721–5727.
    Google ScholarLocate open access versionFindings
  • Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R’emi Louf, Morgan Funtowicz, and Jamie Brew. 2019. HuggingFace’s Transformers: State-of-the-art natural language processing. ArXiv, abs/1910.03771.
    Findings
  • Shijie Wu, Alexis Conneau, Haoran Li, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Emerging crosslingual structure in pretrained language models. In Proceedings of ACL, pages 6022–6034.
    Google ScholarLocate open access versionFindings
  • Shijie Wu and Mark Dredze. 2019.
    Google ScholarFindings
  • Beto, bentz, becas: The surprising cross-lingual effectiveness of BERT. In Proceedings of EMNLP, pages 833–844.
    Google ScholarLocate open access versionFindings
  • Wenpeng Yin and Hinrich Schutze. 2016. Learning word meta-embeddings. In Proceedings of ACL, pages 1351–1360.
    Google ScholarLocate open access versionFindings
  • Michal Ziemski, Marcin Junczys-Dowmunt, and Bruno Pouliquen. 2016. The United Nations Parallel Corpus v1.0. In Proceedings of LREC.
    Google ScholarLocate open access versionFindings
  • In Table 9, we provide full relation prediction (RELP) results for EN and DE. All scores are micro-averaged F1 scores over 5 runs of the relation predictor (Glavasand Vulic, 2018). We also report standard deviation for each configuration.
    Google ScholarLocate open access versionFindings
  • Finally, in Figures 8-10, we also provide heatmaps denoting bilingual layer correspondence, computed via linear CKA similarity (Kornblith et al., 2019), for several EN–Lt language pairs (see §4.1), which are not provided in the main paper
    Google ScholarFindings
  • https://huggingface.co/bert-base-uncased https://huggingface.co/bert-base-german-dbmdz-uncased https://huggingface.co/DeepPavlov/rubert-base-cased https://huggingface.co/TurkuNLP/bert-base-finnish-uncased-v1 https://huggingface.co/bert-base-chinese https://huggingface.co/dbmdz/bert-base-turkish-uncased https://huggingface.co/bert-base-multilingual-uncased https://huggingface.co/dbmdz/bert-base-italian-uncased https://huggingface.co/dbmdz/bert-base-italian-xxl-uncased
    Findings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科