Enhancing the Capabilities of Solr Information Retrieval System: Arabic Language
2020 3rd International Conference on Computer Applications & Information Security (ICCAIS)(2020)
Abstract
Arabic language is one of the most complex languages in Natural Language Processing (NLP). Solr is an Information Retrieval System (IRS) that is widely known for its accurate results and high performance in English. However, Arabic stemmer that is currently used by Solr is called Light-10 which has some deficiencies. In this approach, we evaluated two light stemmers (Assem, Tashaphyne) and two root stemmers (Khoja, ISRI) and chose the two stemmers that the experiments show the best; in addition to Light-10 stemmer. The highest two stemmers are Assem and Khoja. So, we used these two stemmers and Light-10 to evaluate the search retrieval accuracy of Solr in Arabic, then evaluated them again with synonyms. The evaluation is based on using two metrics Precision and Normalized Discounted Cumulative Gain (NDCG). Assem stemmer has the highest accuracy which is 86%, Light-10 is 83% and Khoja is 81%. Finally, Assem stemmer has been used as the stemmer for Almufed search engine that we developed in this approach based on Solr for more than 6000 Arabic books from Alshamela Library.
MoreTranslated text
Key words
Solr,Information Retrieval System,Arabic Stemmers,Arabic Morphological Analyzer,Arabic Synonyms
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined