AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We showed that models trained on MultiNLI-TR perform well on the expert-translated test set from XNLI

Data and Representation for Turkish Natural Language Inference

EMNLP 2020, pp.8253-8267, (2020)

Cited by: 0|Views134
Full Text
Bibtex
Weibo

Abstract

Large annotated datasets in NLP are overwhelmingly in English. This is an obstacle to progress in other languages. Unfortunately, obtaining new annotated resources for each task in each language would be prohibitively expensive. At the same time, commercial machine translation systems are now robust. Can we leverage these systems to trans...More

Code:

Data:

0
Introduction
  • Many tasks in natural language processing have been transformed by the introduction of very large annotated datasets.
  • Outside of parsing and MT, these datasets tend to be in English.
  • A natural response to these gaps in the dataset coverage might be to launch new annotation efforts for multiple languages.
  • This would likely be prohibitively expensive.
  • Based on the costs of SNLI (Bowman et al, 2015) and MultiNLI (Williams et al, 2018a), the authors estimate that each large dataset for NLI would cost upwards of US $50,000 if created completely from scratch
Highlights
  • Many tasks in natural language processing have been transformed by the introduction of very large annotated datasets
  • We observe that every model performed better on the dev and test folds of Stanford Natural Language Inference Corpus (SNLI)-TR than the dev folds of MultiNLI-TR, which is an expected outcome given the greater complexity of MultiNLI compared to SNLI
  • Though English and Turkish have very different grammars and stress-test automatic approaches, our team of experts judged the translations to be of very high quality and to preserve the original natural language inference (NLI) labels consistently
  • These results suggest that machine translation (MT) can help address the paucity of datasets for Turkish NLI
  • We used NLI-TR to analyze the effects of in-language pretraining
  • We showed that models trained on MultiNLI-TR perform well on the expert-translated test set from XNLI
Methods
  • 4.1 Case Study I

    Comparing BERT models on Turkish NLI Datasets

    The arrival of pre-trained model-sharing hubs (e.g., Tensorflow Hub,3 PyTorch Hub,4 and Hugging Face Hub5) has democratized access to Transformer-based models (Vaswani et al, 2017), which are mostly in English.
  • Comparing BERT models on Turkish NLI Datasets.
  • The arrival of pre-trained model-sharing hubs (e.g., Tensorflow Hub,3 PyTorch Hub,4 and Hugging Face Hub5) has democratized access to Transformer-based models (Vaswani et al, 2017), which are mostly in English.
  • Combined with the abundance of labeled English datasets for finetuning, this has increased the performance gap between English and resource-constrained languages.
  • The authors compare three BERT models trained on different corpora by fine-tuning them on NLI-TR.
  • BERT-En is the original BERT-base model released by Devlin et al (2019), which used an English-only
Results
  • The authors train a BERT model from scratch utilizing each approach for pretraining and use NLI-TR for finetuning
  • This leads to the striking result that morphology adds additional information where training data is sparse, but its importance shrinks as the dataset grows larger.
  • Figure 1 suggests that morphological parsing is beneficial where the training set is small, but its importance largely disappears for large training sets
  • This is reflected in the final results in Table 5.
  • Machine-translated MultiNLI-TR and human-translated XNLI display similar characteristics across evaluations, which lends further credence to the claim that MT can help provide a viable path to robust Turkish NLI
Conclusion
  • The authors created and released the first large Turkish NLI dataset, NLI-TR, by machine translating SNLI and MultiNLI.
  • Though English and Turkish have very different grammars and stress-test automatic approaches, the team of experts judged the translations to be of very high quality and to preserve the original NLI labels consistently.
  • These results suggest that MT can help address the paucity of datasets for Turkish NLI.
  • The authors showed that models trained on MultiNLI-TR perform well on the expert-translated test set from XNLI
Summary
  • Introduction:

    Many tasks in natural language processing have been transformed by the introduction of very large annotated datasets.
  • Outside of parsing and MT, these datasets tend to be in English.
  • A natural response to these gaps in the dataset coverage might be to launch new annotation efforts for multiple languages.
  • This would likely be prohibitively expensive.
  • Based on the costs of SNLI (Bowman et al, 2015) and MultiNLI (Williams et al, 2018a), the authors estimate that each large dataset for NLI would cost upwards of US $50,000 if created completely from scratch
  • Methods:

    4.1 Case Study I

    Comparing BERT models on Turkish NLI Datasets

    The arrival of pre-trained model-sharing hubs (e.g., Tensorflow Hub,3 PyTorch Hub,4 and Hugging Face Hub5) has democratized access to Transformer-based models (Vaswani et al, 2017), which are mostly in English.
  • Comparing BERT models on Turkish NLI Datasets.
  • The arrival of pre-trained model-sharing hubs (e.g., Tensorflow Hub,3 PyTorch Hub,4 and Hugging Face Hub5) has democratized access to Transformer-based models (Vaswani et al, 2017), which are mostly in English.
  • Combined with the abundance of labeled English datasets for finetuning, this has increased the performance gap between English and resource-constrained languages.
  • The authors compare three BERT models trained on different corpora by fine-tuning them on NLI-TR.
  • BERT-En is the original BERT-base model released by Devlin et al (2019), which used an English-only
  • Results:

    The authors train a BERT model from scratch utilizing each approach for pretraining and use NLI-TR for finetuning
  • This leads to the striking result that morphology adds additional information where training data is sparse, but its importance shrinks as the dataset grows larger.
  • Figure 1 suggests that morphological parsing is beneficial where the training set is small, but its importance largely disappears for large training sets
  • This is reflected in the final results in Table 5.
  • Machine-translated MultiNLI-TR and human-translated XNLI display similar characteristics across evaluations, which lends further credence to the claim that MT can help provide a viable path to robust Turkish NLI
  • Conclusion:

    The authors created and released the first large Turkish NLI dataset, NLI-TR, by machine translating SNLI and MultiNLI.
  • Though English and Turkish have very different grammars and stress-test automatic approaches, the team of experts judged the translations to be of very high quality and to preserve the original NLI labels consistently.
  • These results suggest that MT can help address the paucity of datasets for Turkish NLI.
  • The authors showed that models trained on MultiNLI-TR perform well on the expert-translated test set from XNLI
Tables
  • Table1: We publicly share NLI-TR.1 SNLI-TR and MultiNLI-TR are different from. Sample translations from SNLI into NLI-TR. Each premise is associated with a hypothesis from each of the three NLI categories
  • Table2: Comparative statistics for the English and Turkish NLI datasets. The Turkish translations have larger vocabularies and lower token counts due to the highly agglutinating morphology of Turkish as compared to English
  • Table3: Translation quality and label consistency of the translations in SNLI-TR and MultiNLI-TR based on expert judgements. For the quality ratings (1–5), we report mean and standard deviation (in parentheses). For label consistency, we report the percentage of labels in SNLI-TR and MultiNLI-TR judged consistent with the original label, both in annotation- and sentence-level
  • Table4: Accuracy results for the publicly available cased BERT models on NLI-TR. BERTurk performed the best in all three evaluations, highlighting the value of language-specific resources for NLI
  • Table5: Accuracy results for different morphology approaches on NLI-TR. To facilitate running many experiments, these results are for pretraining on just one-tenth of the Turkish corpus used by BERTurk and fine-tuning on NLI-TR for just one epoch
  • Table6: Accuracy results comparing NLI-TR with another machine translated dataset. NLI-TR performed better, but the gap is modest, suggesting that both datasets have value for Turkish NLI
  • Table7: Sample translations from SNLI and MultiNLI into NLI-TR. Each premise is associated with a hypothesis from each of the three NLI categories
  • Table8: Accuracy of the cased models in Table 4 trained on SNLI and MultiNLI. We used the same fine-tuning and evaluation procedures. BERT-En ranked the first and BERT-Multi ranked the second, emphasizing the importance of in-language training one-more time as in Section 4.1
  • Table9: Accuracy results of the models in Table 6 for machine translated XNLI. The outcomes agree with the ones in Section 4.3, suggesting that machine translated sentences can be used to evaluate Turkish NLI models. Here we note that, XNLI-Dev-TR, XNLI-Test-TR and MultiNLI-TR are translated with the same MT service, whereas MultiNLI-TRXNLI used a different one. Though this might result in a positive bias for MultiNLI-TR models, we report the accuracy of MultiNLI-TRXNLI models as well for the sake of completeness
Download tables as Excel
Related work
  • Early in the development of textual entailment tasks, Mehdad et al (2010) argued for multilingual versions of them. This led to subsequent explorations of a variety of techniques, including crowdsourcing translations (Negri and Mehdad, 2010; Negri et al, 2011), relying on parallel corpora to support reasoning across languages (Mehdad et al, 2011), and automatically translating datasets using MT systems (Mehdad et al, 2010; Real et al, 2018; Rodrigues et al, 2020). This research informed SemEval tasks in 2012 (Negri et al, 2012) and 2013 (Negri et al, 2013) followed by ASSIN 1 (Fonseca et al, 2016) and 2 (Real et al, 2020) shared tasks exploring the viability of multilingual NLI.

    From the perspective of present-day NLI models, these datasets are very small, but they could be used productively as challenge problems.

    More recently, Conneau et al (2018) reinvigorated work on multilingual NLI with their XNLI dataset. XNLI provides expert-translated evaluation sets from English into 14 other languages, including Turkish. Though they are valuable resources to push NLI research beyond English, test sets alone are insufficient for in-language training on target languages, which is likely to lower the performance of the resulting systems.
Funding
  • This research was supported by the AWS Cloud Credits for Research Program (formerly AWS Research Grants)
Study subjects and analysis
men: 3
Premise Entailment SNLI Contradiction Neutral. English Three men are sitting near an orange building with blue trim. Three males are seated near an orange building with blue trim

males: 3
English Three men are sitting near an orange building with blue trim. Three males are seated near an orange building with blue trim. Three women are standing near a yellow building with red trim

women: 3
Three males are seated near an orange building with blue trim. Three women are standing near a yellow building with red trim. Three males are seated near an orange house with blue trim and a blue roof

males: 3
Three women are standing near a yellow building with red trim. Three males are seated near an orange house with blue trim and a blue roof. Turkish Ucadam mavi suslemeli turuncu bir binanın yanında oturuyor

label-inconsistent pairs: 49
Still, we would like to better understand why inconsistencies do arise. To this end, we inspected all 49 label-inconsistent pairs in our annotations. We find that low translation quality is the leading source of such errors, which further emphasizes how essential it is to work with high-quality translations

Reference
  • Ahmet Afsın Akın and Mehmet Dundar Akın. 2007.
    Google ScholarFindings
  • Zemberek, an open source NLP framework for Turkic languages. Structure, 10:1–5. https://github.com/ahmetaa/zemberek-nlp.
    Locate open access versionFindings
  • Cyril Allauzen, Michael Riley, Johan Schalkwyk, Wojciech Skut, and Mehryar Mohri. 2007. OpenFst: A general and efficient weighted finite-state transducer library. In Implementation and Application of Automata, pages 11–23, Berlin, Heidelberg. Springer Berlin Heidelberg.
    Google ScholarLocate open access versionFindings
  • Emily Alsentzer, John Murphy, William Boag, WeiHung Weng, Di Jindi, Tristan Naumann, and Matthew McDermott. 2019. Publicly available clinical BERT embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, pages 72–78, Minneapolis, Minnesota, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Richard Bellman. 1952. On the theory of dynamic programming. Proceedings of the National Academy of Sciences, 38(8):716–719.
    Google ScholarLocate open access versionFindings
  • Ondrej Bojar, Christian Buck, Christian Federmann, Barry Haddow, Philipp Koehn, Johannes Leveling, Christof Monz, Pavel Pecina, Matt Post, Herve Saint-Amand, Radu Soricut, Lucia Specia, and Ales Tamchyna. 2014. Findings of the 2014 workshop on statistical machine translation. In Proceedings of the Ninth Workshop on Statistical Machine Translation, pages 12–58, Baltimore, Maryland, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 632–642, Lisbon, Portugal. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Osman Buyuk. 2020. Context-dependent sequence-tosequence Turkish spelling correction. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 19(4):1–16.
    Google ScholarLocate open access versionFindings
  • Burcu Can. 2017. Unsupervised learning of allomorphs in Turkish. Turkish Journal of Electrical Engineering & Computer Sciences, 25(4):3253–3260.
    Google ScholarLocate open access versionFindings
  • Domenic V Cicchetti. 1994.
    Google ScholarFindings
  • Alexis Conneau, Ruty Rinott, Guillaume Lample, Adina Williams, Samuel Bowman, Holger Schwenk, and Veselin Stoyanov. 2018. XNLI: Evaluating cross-lingual sentence representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2475–2485, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Rahim Dehkharghani, Yucel Saygin, Berrin Yanikoglu, and Kemal Oflazer. 2016. SentiTurkNet: a Turkish polarity lexicon for sentiment analysis. Language Resources and Evaluation, 50(3):667–685.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Erick Rocha Fonseca, Leandro Borges dos Santos, Marcelo Criscuolo, and Sandra Maria Aluısio. 2016. Visao geral da avaliacao de similaridade semantica e inferencia textual. Linguamatica, 8(2):3–13.
    Google ScholarLocate open access versionFindings
  • Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch. 2013. PPDB: The paraphrase database. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 758–764, Atlanta, Georgia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 20Deep Learning. MIT Press. http://www.deeplearningbook.org.
    Findings
  • Caglar Gulcehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loic Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2015. On using monolingual corpora in neural machine translation.
    Google ScholarFindings
  • Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel Bowman, and Noah A. Smith. 20Annotation artifacts in natural language inference data. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 107–112, New Orleans, Louisiana. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Kevin A Hallgren. 2012. Computing inter-rater reliability for observational data: an overview and tutorial. Tutorials in quantitative methods for psychology, 8(1):23.
    Google ScholarLocate open access versionFindings
  • Felix Hieber, Tobias Domhan, Michael Denkowski, and David Vilar. 20Sockeye 2: A toolkit for neural machine translation. In European Association for Machine Translation.
    Google ScholarLocate open access versionFindings
  • Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton, and Matt Post. 2017. Sockeye: A toolkit for neural machine translation. CoRR, abs/1712.05690.
    Findings
  • Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton, and Matt Post. 2018. The sockeye neural machine translation toolkit at AMTA 2018. In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Papers), pages 200–207, Boston, MA. Association for Machine Translation in the Americas.
    Google ScholarLocate open access versionFindings
  • Melvin Johnson, Mike Schuster, Quoc V Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viegas, Martin Wattenberg, Greg Corrado, et al. 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics, 5:339–351.
    Google ScholarLocate open access versionFindings
  • Klaus Krippendorff. 1970. Estimating the reliability, systematic error and random error of interval data. Educational and Psychological Measurement, 30(1):61–70.
    Google ScholarLocate open access versionFindings
  • Taku Kudo. 2018. Subword regularization: Improving neural network translation models with multiple subword candidates. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 66–75, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Taku Kudo and John Richardson. 2018. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226.
    Findings
  • Birol Kuyumcu, Cuneyt Aksakallı, and Selman Delil. 2019. An automated new approach in fast text classification (FastText): A case study for Turkish text classification without pre-processing. In Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval, NLPIR 2019, page 1–4, New York, NY, USA. Association for Computing Machinery.
    Google ScholarLocate open access versionFindings
  • Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2020. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240.
    Google ScholarLocate open access versionFindings
  • Kenneth O McGraw and Seok P Wong. 1996. Forming inferences about some intraclass correlation coefficients. Psychological methods, 1(1):30.
    Google ScholarLocate open access versionFindings
  • Yashar Mehdad, Matteo Negri, and Marcello Federico. 2010. Towards cross-lingual textual entailment. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 321– 324, Los Angeles, California. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yashar Mehdad, Matteo Negri, and Marcello Federico. 2011. Using bilingual parallel corpora for crosslingual textual entailment. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 1336–1345, Portland, Oregon, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Rob Munro. 2012. Processing Short Message Communications in Low-Resource Languages. Ph.D. thesis, Stanford University, Stanford, CA.
    Google ScholarFindings
  • Matteo Negri, Luisa Bentivogli, Yashar Mehdad, Danilo Giampiccolo, and Alessandro Marchetti. 2011. Divide and conquer: Crowdsourcing the creation of cross-lingual textual entailment corpora. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 670–679, Edinburgh, Scotland, UK. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Matteo Negri, Alessandro Marchetti, Yashar Mehdad, Luisa Bentivogli, and Danilo Giampiccolo. 2012. Semeval-2012 task 8: Cross-lingual textual entailment for content synchronization. In *SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pages 399– 407, Montreal, Canada. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Matteo Negri, Alessandro Marchetti, Yashar Mehdad, Luisa Bentivogli, and Danilo Giampiccolo. 2013. Semeval-2013 task 8: Cross-lingual textual entailment for content synchronization. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh
    Google ScholarLocate open access versionFindings
  • International Workshop on Semantic Evaluation (SemEval 2013), pages 25–33, Atlanta, Georgia, USA. Association for Computational Linguistics.
    Google ScholarFindings
  • Matteo Negri and Yashar Mehdad. 2010. Creating a bilingual entailment corpus through translations with mechanical turk: $100 for a 10-day rush. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pages 212–216, Los Angeles. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajic, Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman. 2016. Universal dependencies v1: A multilingual treebank collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pages 1659–1666, Portoroz, Slovenia. European Language Resources Association (ELRA).
    Google ScholarLocate open access versionFindings
  • Kemal Oflazer. 1994. Two-level description of Turkish morphology. Literary and linguistic computing, 9(2):137–148.
    Google ScholarLocate open access versionFindings
  • Zeynep Ozer, ̇Ilyas Ozer, and Oguz Fındık. 2018. Diacritic restoration of Turkish tweets with word2vec. Engineering Science and Technology, an International Journal, 21(6):1120–1127.
    Google ScholarLocate open access versionFindings
  • Adnan Ozturel, Tolga Kayadelen, and Isın Demirsahin. 2019. A syntactically expressive morphological analyzer for Turkish. In Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing, pages 65–75.
    Google ScholarLocate open access versionFindings
  • Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Livy Real, Erick Fonseca, and Hugo Goncalo Oliveira. 2020. The ASSIN 2 shared task: a quick overview. In International Conference on Computational Processing of the Portuguese Language, pages 406–412. Springer.
    Google ScholarLocate open access versionFindings
  • Livy Real, Ana Rodrigues, Andressa Vieira e Silva, Beatriz Albiero, Bruna Thalenberg, Bruno Guide, Cindy Silva, Guilherme de Oliveira Lima, Igor CS Camara, Milos Stanojevic, et al. 2018. SICK-BR: a Portuguese corpus for inference. In International Conference on Computational Processing of the Portuguese Language, pages 303–312. Springer.
    Google ScholarLocate open access versionFindings
  • Ruan Chaves Rodrigues, Jessica Rodrigues da Silva, Pedro Vitor Quinta de Castro, Nadia Felix Felipe da Silva, and Anderson da Silva Soares. 2020. Multilingual transformer ensembles for Portuguese natural language tasks. In Proceedings of the ASSIN 2 Shared Task: Evaluating Semantic Textual Similarity and Textual Entailment in Portuguese, CEUR Workshop Proceedings, pages 27–38. CEURWS.org.
    Google ScholarLocate open access versionFindings
  • Hasim Sak, Tunga Gungor, and Murat Saraclar. 2009. A stochastic finite-state morphological parser for Turkish. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 273–276, Suntec, Singapore. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Hasim Sak, Tunga Gungor, and Murat Saraclar. 2011. Resources for Turkish morphological processing. Language resources and evaluation, 45(2):249–261.
    Google ScholarLocate open access versionFindings
  • Stefan Schweter. 2020. BERTurk - BERT models for Turkish.
    Google ScholarFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1715– 1725, Berlin, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Laura Tomasello. 2019. Neural Machine Translation and Artificial Intelligence: What Is Left for the Human Translator? Ph.D. thesis, University of Padua.
    Google ScholarFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 5998–6008. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • Adina Williams, Nikita Nangia, and Samuel Bowman. 2018a. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122, New Orleans, Louisiana. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Adina Williams, Nikita Nangia, and Samuel Bowman. 2018b. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2019. HuggingFace’s Transformers: State-of-the-art natural language processing. ArXiv, abs/1910.03771.
    Findings
  • Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
    Findings
  • Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, 2:67–78.
    Google ScholarLocate open access versionFindings
Author
Emrah Budur
Emrah Budur
Rıza Özçelik
Rıza Özçelik
Tunga Gungor
Tunga Gungor
Your rating :
0

 

Tags
Comments
小科