Effects of term weighting approach with and without stop words removing on Arabic text classification
2023 9th International Conference on Optimization and Applications (ICOA)(2024)
摘要
Classifying text is a method for categorizing documents into pre-established
groups. Text documents must be prepared and represented in a way that is
appropriate for the algorithms used for data mining prior to classification. As
a result, a number of term weighting strategies have been created in the
literature to enhance text categorization algorithms' functionality. This study
compares the effects of Binary and Term frequency weighting feature
methodologies on the text's classification method when stop words are
eliminated once and when they are not. In recognition of assessing the effects
of prior weighting of features approaches on classification results in terms of
accuracy, recall, precision, and F-measure values, we used an Arabic data set
made up of 322 documents divided into six main topics (agriculture, economy,
health, politics, science, and sport), each of which contains 50 documents,
with the exception of the health category, which contains 61 documents. The
results demonstrate that for all metrics, the term frequency feature weighting
approach with stop word removal outperforms the binary approach, while for
accuracy, recall, and F-Measure, the binary approach outperforms the TF
approach without stop word removal. However, for precision, the two approaches
produce results that are very similar. Additionally, it is clear from the data
that, using the same phrase weighting approach, stop word removing increases
classification accuracy.
更多查看译文
关键词
component,formatting,style,styling,insert
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要