谷歌浏览器插件
订阅小程序
在清言上使用

Automatic Linking of Short Arabic Texts to Wikipedia Articles

Fatoom Fayad,Iyad AlAgha

Journal of software(2016)

引用 0|浏览0
暂无评分
摘要
Given the enormous amount of unstructured texts available on the Web, there has been an emerging need to increase discoverability of and accessibility to these texts. One of the proposed solutions is to annotate texts with information extracted from background knowledge. Wikipedia, the free encyclopedia, has been recently exploited as a background knowledge to annotate text with complementary information. Given any piece of text, the main challenge is how to determine the most relevant information from Wikipedia with the least effort and time. While Wikipedia-based annotation has mainly targeted the English and Latin versions of Wikipedia, little effort has been devoted to annotate Arabic text using the Arabic version of Wikipedia. In addition, the annotation of short text presents further challenges due to the inability to apply statistical or machine learning techniques that are commonly used with long text. This work proposes an approach for automatic linking of Arabic short texts to articles drawn from Wikipedia. It reports on the several challenges associated with the design and implementation of the linking approach including the processing of the Wikipedia's enormous content, the mapping of texts to Wikipedia articles, the problem of article disambiguation, and the time efficiency. The proposed approach was tested on a dataset of 100 short texts gathered from online Arabic articles. The annotations generated by the approach were compared with the annotations generated by two human subjects. The approach achieved 71.79% accuracy, 74.70% average precision, and 82.63 % average recall. A thorough analysis and discussion of the evaluation results are also presented to address the limitations, strengths as well as recommendations for future improvements.
更多
查看译文
关键词
Semantic Similarity,Part-of-Speech Tagging,Machine Translation,Text Classification,Wikipedia
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要