谷歌浏览器插件
订阅小程序
在清言上使用

Voting-Based Multiple Classification Approach for Turkish News Texts

2019 INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS CONFERENCE (ASYU)(2019)

引用 0|浏览1
暂无评分
摘要
Nowadays, there are numerous sources on the internet that produce news on a daily basis. Through this growing knowledge base, it makes it difficult for users to access the information and news they are looking for. It is important to classify the information for fast and efficient search and access. In this study, a dataset consisting of Turkish news content Kemik prepared by Yıldız Technical University, Natural Language Processing Group, used. A hierarchical approach based on a voting structure is adopted by using machine learning based approaches. In order to solve the problem, firstly Tf-Idf method is applied for word 1-3- ngrams and character 2-6-ngrams. Thus, the 2000 dimensional feature vector is pre-trained. By using FastText, 300-dimensional feature vectors and 2 feature vectors are combined to produce 2300-dimensional feature vectors.. In order to determine the one that will increase the classification accuracy among these vectors, Support Vector Machines method is applied and Tf-Idf method which has the robust accuracy is determined as the main feature extraction method. Next, Support Vector Machines, K-Nearest Neighborhood Method, Random Forest, Logistic Regression, XGBoost methods are used for the classification of news texts. Estimated label values from all classifiers are voted for each sample and the label with the highest voting rate is considered as the final estimate. In this study, it is aimed to open the way to reach the right information quickly by classifying news topics. Finally, the feature vector size has been reduced using Principal Component Analysis and it is possible to gain processing speed without reducing performance. In both approaches, it is seen that the performance achieved by voting is higher than the individual performance rates of the classifiers.
更多
查看译文
关键词
natural language processing,text classification,support vektör machines,majority voting,dimension reduction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要