Resources And End-To-End Neural Network Models For Arabic Image Captioning

Obeida ElJundi, Mohamad Dhaybi, Kotaiba Mokadam,Hazem Hajj,Daniel Asmar

PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 5: VISAPP（2020）

引用 8|浏览37

暂无评分

摘要

Image Captioning (IC) is the process of automatically augmenting an image with semantically-laden descriptive text. While English IC has made remarkable strides forward in the past decade, very little work exists on IC for other languages. One possible solution to this problem is to boostrap off of existing English IC systems for image understanding, and then translate the outcome to the required language. Unfortunately, as this paper will show, translated IC is lacking due to the error accumulation of the two tasks; IC and translation. In this paper, we address the problem of image captioning in Arabic. We propose an end-to-end model that directly transcribes images into Arabic text. Due to the lack of Arabic resources, we develop an annotated dataset for Arabic image captioning (AIC). We also develop a base model for AIC that relies on text translation from English image captions. The two models are evaluated with the new dataset, and the results show the superiority of our end-to-end model.

查看译文

关键词

Deep Learning, Computer Vision, Natural Language Processing, Image Captioning, Arabic

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要