AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We evaluated transfer learning and distant supervision on multilingual transformer models, studying realistic low-resource settings for African languages

Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages

EMNLP 2020, pp.2580-2591, (2020)

Cited by: 0|Views130
Full Text
Bibtex
Weibo

Abstract

Multilingual transformer models like mBERT and XLM-RoBERTa have obtained great improvements for many NLP tasks on a variety of languages. However, recent works also showed that results from high-resource languages could not be easily transferred to realistic, low-resource scenarios. In this work, we study trends in performance for differe...More

Code:

Data:

0
Introduction
  • Deep learning techniques, including contextualized word embeddings based on transformers and pretrained on language modelling, have resulted in considerable improvements for many NLP tasks
  • They often require large amounts of labeled training data, and there is growing evidence that transferring approaches from high to low-resource settings is not straightforward.
  • Kann et al (2020) recently inspected POS classifiers trained on weak supervision
  • They found that in contrast to scenarios with simulated lowresource settings of high-resource languages, in truly low-resource settings this is still a difficult problem.
  • These findings highlight the importance of aiming for realistic experiments when studying low-resource scenarios
Highlights
  • Deep learning techniques, including contextualized word embeddings based on transformers and pretrained on language modelling, have resulted in considerable improvements for many NLP tasks
  • We analyse multilingual transformer models, namely mBERT (Devlin et al, 2019; Devlin, 2019) and XLM-RoBERTa (Conneau et al, 2019). We evaluate both sequence and token classification tasks in the form of news title topic classification and named entity recognition (NER)
  • We collected three new Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 2580–2591, November 16–20, 2020. c 2020 Association for Computational Linguistics datasets that are made publicly available alongside the code and additional material.1. We show both challenges and opportunities when working with multilingual transformer models evaluating trends for different levels of resource scarcity
  • We evaluate on three African languages, namely Hausa, isiXhosa and Yoruba
  • We obtain an F1-score of 49% and 55% on the Hausa and Yorubatest set respectively when applying the distant supervision directly to the topic classification test sets
  • We evaluated transfer learning and distant supervision on multilingual transformer models, studying realistic low-resource settings for African languages
Results
  • The authors obtain an F1-score of 49% and 55% on the Hausa and Yorubatest set respectively when applying the distant supervision directly to the topic classification test sets.
Conclusion
  • The authors evaluated transfer learning and distant supervision on multilingual transformer models, studying realistic low-resource settings for African languages.
  • The authors hope that the new datasets and the reflections on assumptions in low-resource settings help to foster future research in this area
Summary
  • Introduction:

    Deep learning techniques, including contextualized word embeddings based on transformers and pretrained on language modelling, have resulted in considerable improvements for many NLP tasks
  • They often require large amounts of labeled training data, and there is growing evidence that transferring approaches from high to low-resource settings is not straightforward.
  • Kann et al (2020) recently inspected POS classifiers trained on weak supervision
  • They found that in contrast to scenarios with simulated lowresource settings of high-resource languages, in truly low-resource settings this is still a difficult problem.
  • These findings highlight the importance of aiming for realistic experiments when studying low-resource scenarios
  • Results:

    The authors obtain an F1-score of 49% and 55% on the Hausa and Yorubatest set respectively when applying the distant supervision directly to the topic classification test sets.
  • Conclusion:

    The authors evaluated transfer learning and distant supervision on multilingual transformer models, studying realistic low-resource settings for African languages.
  • The authors hope that the new datasets and the reflections on assumptions in low-resource settings help to foster future research in this area
Tables
  • Table1: Datasets Summary. *Created for this work
Download tables as Excel
Funding
  • Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 232722074 – SFB 1102, the EU-funded Horizon 2020 projects ROXANNE under grant number 833635 and COMPRISE under grant agreement No 3081705
Reference
  • Idris Abdulmumin and Bashir Shehu Galadanci. 2019. hauwe: Hausa words embedding for natural language processing. 2019 2nd International Conference of the IEEE Nigeria Computer Chapter (NigeriaComputConf).
    Google ScholarLocate open access versionFindings
  • David Ifeoluwa Adelani, Michael A. Hedderich, Dawei Zhu, Esther van den Berg, and Dietrich Klakow. 2020. Distant supervision and noisy label learning for low resource named entity recognition: A study on hausa and yoruba.
    Google ScholarFindings
  • Jesujoba Alabi, Kwabena Amponsah-Kaakyire, David Adelani, and Cristina Espana-Bonet. 2020. Massive vs. Curated Word Embeddings for Low-Resourced Languages. The Case of Yorubaand Twi. In Proceedings of The 12th Language Resources and Evaluation Conference, pages 2747–2755, Marseille, France. European Language Resources Association.
    Google ScholarLocate open access versionFindings
  • Junfan Chen, Richong Zhang, Yongyi Mao, Hongyu Guo, and Jie Xu. 2019. Uncover the ground-truth relations in distant supervision: A neural expectationmaximization framework. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 326–336, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Artem Chernodub, Oleksiy Oliynyk, Philipp Heidenreich, Alexander Bondarenko, Matthias Hagen, Chris Biemann, and Alexander Panchenko. 2019. Targer: Neural argument mining at your fingertips. In Proceedings of the 57th Annual Meeting of the Association of Computational Linguistics (ACL’2019), Florence, Italy.
    Google ScholarLocate open access versionFindings
  • Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 1724–1734. ACL.
    Google ScholarLocate open access versionFindings
  • Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzman, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Unsupervised cross-lingual representation learning at scale.
    Google ScholarFindings
  • Jacob Devlin. 2019. mBERT README file.
    Google ScholarFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 201BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • David M. Eberhard, Gary F. Simons, and Charles D. Fennig (eds.). 2019. Ethnologue: Languages of the world. twenty-second edition.
    Google ScholarFindings
  • Roald Eiselen. 2016. Government domain named entity recognition for south African languages. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 3344–3348, Portoroz, Slovenia. European Language Resources Association (ELRA).
    Google ScholarLocate open access versionFindings
  • Roald Eiselen and Martin J. Puttkammer. 2014. Developing text resources for ten south african languages. In Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland, May 26-31, 2014, pages 3698–3703. European Language Resources Association (ELRA).
    Google ScholarLocate open access versionFindings
  • Meng Fang and Trevor Cohn. 2016. Learning when to trust distant supervision: An application to lowresource POS tagging using cross-lingual projection. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pages 178–186, Berlin, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Michael A. Hedderich and Dietrich Klakow. 2018. Training a neural network in a low-resource setting on automatically annotated noisy data. In Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP, DeepLo@ACL 2018, Melbourne, Australia, July 19, 2018, pages 12–18. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, and Melvin Johnson. 2020. Xtreme: A massively multilingual multi-task benchmark for evaluating cross-lingual generalization.
    Google ScholarFindings
  • Katharina Kann, Kyunghyun Cho, and Samuel R. Bowman. 2019. Towards realistic practices in lowresource natural language processing: The development set. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3342–3349, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Katharina Kann, Ophelie Lacroix, and Anders Søgaard. 2020. Weakly supervised pos taggers perform poorly on truly low-resource languages.
    Google ScholarFindings
  • Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Twenty-ninth AAAI conference on artificial intelligence.
    Google ScholarLocate open access versionFindings
  • Lukas Lange, Michael A. Hedderich, and Dietrich Klakow. 20Feature-dependent confusion matrices for low-resource NER labeling with noisy labels. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3554– 3559, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Anne Lauscher, Vinit Ravishankar, Ivan Vulic, and Goran Glavas. 20From zero to hero: On the limitations of zero-shot cross-lingual transfer with multilingual transformers. ArXiv, abs/2005.00633.
    Findings
  • Melinda Loubser and Martin J. Puttkammer. 2020a. Viability of neural networks for core technologies for resource-scarce languages. Information, 11:41.
    Google ScholarLocate open access versionFindings
  • Melinda Loubser and Martin J. Puttkammer. 2020b. Viability of neural networks for core technologies for resource-scarce languages. Information, 11:41.
    Google ScholarLocate open access versionFindings
  • Bingfeng Luo, Yansong Feng, Zheng Wang, Zhanxing Zhu, Songfang Huang, Rui Yan, and Dongyan Zhao. 2017. Learning with noise: Enhance distantly supervised relation extraction with dynamic transition matrix. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 430–439, Vancouver, Canada. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Xianbin Lv, Dongxian Wu, and Shu-Tao Xia. 2020. Matrix smoothing: A regularization for DNN with transition matrix under noisy labels. CoRR, abs/2003.11904.
    Findings
  • Xuezhe Ma and Eduard Hovy. 2016. End-to-end sequence labeling via bi-directional LSTM-CNNsCRF. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1064–1074, Berlin, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Stephen Mayhew, Snigdha Chaturvedi, Chen-Tse Tsai, and Dan Roth. 2019. Named entity recognition with partially annotated training data. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 645–655, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Jan Hajic, Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis Tyers, and Daniel Zeman. 2020. Universal dependencies v2: An evergrowing multilingual treebank collection.
    Google ScholarFindings
  • Xiaoman Pan, Boliang Zhang, Jonathan May, Joel Nothman, Kevin Knight, and Heng Ji. 2017. Crosslingual name tagging and linking for 282 languages. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1946–1958, Vancouver, Canada. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Debjit Paul, Mittul Singh, Michael A. Hedderich, and Dietrich Klakow. 2019. Handling noisy labels for robustly learning from self-training data for lowresource sequence labeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 29–34, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Telmo Pires, Eva Schlinger, and Dan Garrette. 2019. How multilingual is multilingual BERT? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4996– 5001, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Alexander Ratner, Stephen H. Bach, Henry R. Ehrenberg, Jason A. Fries, Sen Wu, and Christopher Re. 2020. Snorkel: rapid training data creation with weak supervision. VLDB J., 29(2):709–730.
    Google ScholarLocate open access versionFindings
  • Shruti Rijhwani, Shuyan Zhou, Graham Neubig, and Jaime Carbonell. 2020. Soft gazetteers for lowresource named entity recognition.
    Google ScholarFindings
  • Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019.
    Google ScholarFindings
  • Stephanie Strassel and Jennifer Tracey. 2016. LORELEI language packs: Data, tools, and resources for technology development in low resource languages. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 3273–3280, Portoroz, Slovenia. European Language Resources Association (ELRA).
    Google ScholarLocate open access versionFindings
  • Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pages 142–147.
    Google ScholarLocate open access versionFindings
  • Jennifer Tracey, Stephanie Strassel, Ann Bies, Zhiyi Song, Michael Arrigo, Kira Griffitt, Dana Delgado, Dave Graff, Seth Kulick, Justin Mott, and Neil Kuster. 2019. Corpus building for low resource languages in the DARPA LORELEI program. In Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages, pages 48–55, Dublin, Ireland. European Association for Machine Translation.
    Google ScholarLocate open access versionFindings
  • Hao Wang, Bing Liu, Chaozhuo Li, Yan Yang, and Tianrui Li. 2019. Learning with noisy labels for sentence-level sentiment classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6286–6292, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, and Jamie Brew. 2019. Huggingface’s transformers: State-of-the-art natural language processing.
    Google ScholarFindings
  • Shijie Wu and Mark Dredze. 2019.
    Google ScholarFindings
  • Beto, bentz, becas: The surprising cross-lingual effectiveness of BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 833–844, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 649–657. Curran Associates, Inc.
    Google ScholarLocate open access versionFindings
  • In this work, we consider three languages: Hausa, isiXhosa and Yoruba. These languages are from two language families: Niger-Congo and AfroAsiatic, according to Ethnologue (Eberhard et al., 2019), where the Niger-Congo family has over 20% of the world languages.
    Google ScholarFindings
  • The Hausa language is native to the northern part of Nigeria and the southern part of the Republic of Niger with more than 45 million native speakers (Eberhard et al., 2019). It is the second most spoken language in Africa after Swahili. Hausa is a tonal language, but this is not marked in written text. The language is written in a modified Latin alphabet.
    Google ScholarLocate open access versionFindings
  • Yoruba, on the other hand, is native to southwestern Nigeria and the Republic of Benin. It has over 35 million native speakers (Eberhard et al., 2019) and is the third most spoken language in Africa. Yorubais a tonal language with three tones: low, middle and high. These tones are represented by the grave (“\”), optional macron (“−”) and acute (“/”) accents respectively. The tones are represented in written texts along with a modified Latin alphabet.
    Google ScholarFindings
  • Lastly, we consider isiXhosa, a Bantu language that is native to South Africa and also recognized as one of the official languages in South Africa and Zimbabwe. It is spoken by over 8 million native speakers (Eberhard et al., 2019). isiXhosa is a tonal language, but the tones are not marked in written text. The text is written with the Latin alphabet.
    Google ScholarLocate open access versionFindings
  • Kann et al. (2020) used as an indicator for a lowresource language the availability of data in the Universal Dependency project (Nivre et al., 2020). The languages we study suit their indicator having less than 10k (Yoruba) or no data (Hausa, isiXhosa) at the time of writing.
    Google ScholarFindings
  • The WikiAnn corpus (Pan et al., 2017) provides NER datasets for 282 languages available on Wikipedia. These are, however, only silverstandard annotations and for Hausa and isiXhosa less than 4k and 1k tokens respectively are provided. The LORELEI project announced the release of NER datasets for several African languages via LDC (Strassel and Tracey, 2016; Tracey et al., 2019) but have not yet done so for Hausa and Yorubaat the time of writing.
    Google ScholarLocate open access versionFindings
  • Eiselen and Puttkammer (2014) and Eiselen (2016) created NLP datasets for South African languages. We use the latter’s NER dataset for isiXhosa. For the Yoruba NER dataset (Alabi et al., 2020), we use the authors’ split into training, dev and test set of the cased version of their data.2 For the isiXhosa dataset3, we use an 80%/10%/10% split following the instructions in (Loubser and Puttkammer, 2020b). The split is based on tokencount, splitting only after the end of the sentence (information obtained through personal conversation with the authors). For the fine-tuning of the zero- and few-shot models, the standard CoNLL03 NER (Tjong Kim Sang and De Meulder, 2003) and AG News (Zhang et al., 2015) datasets are used with their existing splits.
    Google ScholarLocate open access versionFindings
  • For the Hausa NER annotation, we collected 250 articles from VOA Hausa4, 50 articles each from the five pre-defined categories of the news website. The categories are Najeriya (Nigeria), Afirka (Africa), Amurka (USA), Sauran Duniya (the rest of the world) and Kiwon Lafiya (Health). We removed articles with less than 50 tokens which results in 188 news articles (over 37K tokens). We asked two volunteers who are native Hausa speakers to annotate the corpus separately. Each volunteer was supervised by someone with experience in NER annotation. Following the named entity annotation in Yorubaby Alabi et al. (2020), we annotated PER, ORG, LOC and DATE (dates and times) for Hausa. The annotation was based on the MUC-6 Named Entity Task Definition guide.5 Comparing the annotations of the volunteers, we observe a conflict for 1302 tokens (out of 4838 tokens) excluding the non-entity words (i.e. words with ’O’ labels). One of the annotators was better
    Google ScholarLocate open access versionFindings
  • 4https://www.voahausa.com 5https://cs.nyu.edu/faculty/grishman/ NEtask20.book_1.html in annotating DATE, while the other was better in annotating ORG especially for multi-word expressions of entities. We resolved all the conflicts after discussion with one of the volunteers. The split of annotated data of the Yoruba and Hausa NER data is 70%/10%/20% for training, validation and test sentences.
    Findings
  • For the RNN models, we make use of word features obtained from Word2Vec embeddings for the Hausa language and FastText embeddings for Yorubaand isiXhosa languages. We utilize the better quality embeddings recently released by Abdulmumin and Galadanci (2019) and Alabi et al. (2020) for Hausa and Yorubarespectively instead of the pre-trained embeddings by Facebook that were trained on a smaller and lower quality dataset from Wikipedia. For isiXhosa, we did not find any existing word embeddings, therefore, we trained FastText embeddings from data collected from the I’solezwe7 news website and the OPUS8 parallel translation website. The corpus size for isiXhosa is 1.4M sentences (around 15M tokens). We trained FastText embeddings for isiXhosa using Gensim9 with the following hyper-parameters: embedding size of 300, context window size of 5, minimum word count of 3, number of negative samples ten and number of iterations 10.
    Google ScholarLocate open access versionFindings
  • 6https://www.bbc.com/yoruba 7https://www.isolezwelesixhosa.co.za/ 8http://opus.nlpl.eu/ 9https://radimrehurek.com/gensim/
    Findings
  • Rules allow us to apply the knowledge of domain experts without the manual effort of labeling each instance. We asked native speakers with knowledge of NLP to write DATE rules for Hausa and Yoruba. In both languages, date expressions are preceded by date keywords, like “ranar” / “o. jo. ́” (day), “watan” / “os. u” (month), and “shekarar” / “o. du. ́n” (year) in Hausa/Yoruba. For example, “18th of December, 2019” in Hausa / Yorubatranslates to “ ranar 18 ga watan Disamba, shekarar 2019” / “o. jo. ́ 18 os. u O. pe.`, o. dun 2019”. The annotation rules are based on these three criteria: (1) A token is a date keyword or follows a date keyword in a sequence.
    Google ScholarLocate open access versionFindings
  • (2) A token is a digit, and (3) other heuristics to capture relative dates and date periods connected by conjunctions e.g between July 2019 and March 2020. Applying these rules result in a precision of 49.30%/51.35%, a recall of 60.61%/79.17% and an F1-score of 54.42%/62.30% on Hausa /Yorubatest set respectively.
    Google ScholarFindings
  • All experiments were repeated ten times with varying random seeds but with the same data (subsets). We√report mean F1 test score and standard error (σ/ 10). For NER, the score was computed following the standard CoNLL approach (Tjong Kim Sang and De Meulder, 2003) using the seqeval implementation.10 Labels are in the BIO2-scheme.
    Google ScholarFindings
  • As multilingual transformer models, mBert and XLM-RoBERTa are used, both in the implementation by Wolf et al. (2019). The specific model IDs are bert-base-multilingual-cased and xlm-robertabase.12 For the DistilBERT experiment it is distilbert-base-multilingual-cased. As is standard, the last layer (language model head) is replaced with a classification layer (either for sequence or token classification). Models were trained with the Adam optimizer and a learning rate of 5e−5. Gradient clipping of value 1 is applied. The batch size is 32 for NER and 128 for topic classification. For distant supervision and XLM-RoBERTa on the Hausa topic classification data with 100 or more labeled sentences, we observed convergence issues where the trained model would just predict the majority classes. We, therefore, excluded for this task runs where on the development set the class-specific F1 score was 0.0 for two or more classes. The experiments were then repeated with a different seed.
    Google ScholarLocate open access versionFindings
  • For the GRU and LSTM-CNN-CRF model, we use the implementation by Chernodub et al. (2019) with modifications to support FastText embeddings and the seqeval evaluation library. Both model architectures are bidirectional. Dropout of 0.5 is applied. The batch-size is 10 and SGD with a learning rate of 0.01, and a decay of 0.05 and momentum of 0.9 is used. Gradients are clipped with a value of 5. The RNN dimension is 300. For the CNN, the character embedding dimension is 25 with 30 filters and a window-size of 3.
    Google ScholarLocate open access versionFindings
  • For the topic classification task, we experiment with the RCNN model proposed by (Lai et al., 2015). The hidden size in the Bi-LSTM is 100 for each direction. The linear layer after the BiLSTM reduces the dimension to 64. The model is trained for 50 epochs.
    Google ScholarLocate open access versionFindings
Author
Michael A. Hedderich
Michael A. Hedderich
David Adelani
David Adelani
Jesujoba Alabi
Jesujoba Alabi
Udia Markus
Udia Markus
Your rating :
0

 

Tags
Comments
小科