谷歌浏览器插件
订阅小程序
在清言上使用

Improved Naive Bayes with Optimal Correlation Factor for Text Classification

SN applied sciences/SN Applied Sciences(2019)

引用 8|浏览6
暂无评分
摘要
Naive Bayes (NB) estimator is widely-used in text classification problems. However, it does not perform well with small-size training datasets. Most previous literature focuses on either creating and modifying features or combing clustering to improve the performance of NB. We directly tackle the problem by constructing a new estimator, called Naive Bayes with correlation factor. We introduce a correlation factor to NB estimator that incorporates overall correlation among the different classes. This effectively exploits the idea of bootstrapping, which reuses data for all classes even if they only belong to one class. Moreover, we obtain a formula for the optimal correlation factor by balancing bias and variance of the estimator. Experimental results on real-world data show that our estimator achieves better accuracy compared with traditional Naive Bayes, yet at the same time maintaining the simplicity of NB.
更多
查看译文
关键词
Naive Bayes,Correlation factor,Text classification,Insufficient training set
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要