Ìtàkúròso: Exploiting Cross-Lingual Transferability for Natural Language Generation of Dialogues in Low-Resource, African Languages

Tosin Adewumi,Mofetoluwa Adeyemi,Aremu Anuoluwapo,Bukola Peters,Happy Buzaaba,Oyerinde Samuel,Amina Mardiyyah Rufai,Benjamin Ajibade,Tajudeen Gwadabe,Mory Moussou Koulibaly Traore, Tunde Ajayi,Shamsuddeen Muhammad,Ahmed Baruwa,Paul Owoicho,Tolulope Ogunremi,Phylis Ngigi,Orevaoghene Ahia,Ruqayya Nasir,Foteini Liwicki,Marcus Liwicki

arxiv（2022）

引用 0|浏览32

暂无评分

摘要

We investigate the possibility of crosslingual transfer from a state-of-the-art (SoTA) deep monolingual model (DialoGPT) to 6 African languages and compare with 2 baselines (BlenderBot 90M, another SoTA, and a simple Seq2Seq). The languages are Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda & Yorùbá. Generation of dialogues is known to be a challenging task for many reasons. It becomes more challenging for African languages which are low-resource in terms of data. Therefore, we translate a small portion of the English multi-domain MultiWOZ dataset for each target language. Besides intrinsic evaluation (i.e. perplexity), we conduct human evaluation of single-turn conversations by using majority votes and measure inter-annotator agreement (IAA). The results show that the hypothesis that deep monolingual models learn some abstractions that generalise across languages holds. We observe human-like conversations in 5 out of the 6 languages. It, however, applies to different degrees in different languages, which is expected. The language with the most transferable properties is the Nigerian Pidgin English, with a human-likeness score of 78.1%, of which 34.4% are unanimous. The main contributions of this paper include the representation (through the provision of high-quality dialogue data) of underrepresented African languages and demonstrating the cross-lingual transferability hypothesis for dialogue systems. We also provide the datasets and host the model checkpoints/demos on the HuggingFace hub for public access.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要