Baselines and bigrams: simple, good sentiment and topic classification

ACL, 2012.

Cited by: 932|Bibtex|Views137
EI
Other Links: dl.acm.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
We show that the usefulness of bigram features in bag of features sentiment classification has been underappreciated, perhaps because their usefulness is more of a mixed bag for topical text classification tasks

Abstract:

Variants of Naive Bayes (NB) and Support Vector Machines (SVM) are often used as baseline methods for text classification, but their performance varies greatly depending on the model variant, features used and task/dataset. We show that: (i) the inclusion of word bigram features gives consistent gains on sentiment analysis tasks; (ii) for...More

Code:

Data:

Introduction
  • Naive Bayes (NB) and Support Vector Machine (SVM) models are often used as baselines for other methods in text categorization and sentiment analysis research.
  • The authors show that the better variants often outperform recently published state-of-the-art methods on many datasets.
  • By combining generative and discriminative classifiers, the authors present a simple model variant where an SVM is built over NB log-count ratios as feature values, and show that it is a strong and robust performer over all the presented tasks.
Highlights
  • Naive Bayes (NB) and Support Vector Machine (SVM) models are often used as baselines for other methods in text categorization and sentiment analysis research
  • We show that the usefulness of bigram features in bag of features sentiment classification has been underappreciated, perhaps because their usefulness is more of a mixed bag for topical text classification tasks
  • Contrary to claims in the literature, we show that bag of features models are still strong performers on snippet sentiment classification tasks, with Naive Bayes models generally outperforming the sophisticated, structure-sensitive models explored in recent work
  • By combining generative and discriminative classifiers, we present a simple model variant where an Support Vector Machine is built over Naive Bayes log-count ratios as feature values, and show that it is a strong and robust performer over all the presented tasks
  • While (Ng and Jordan, 2002) showed that Naive Bayes is better than Support Vector Machine/logistic regression (LR) with few training cases, we show that Multinomial Naive Bayes is better with short documents
  • In contrast to their result that an Support Vector Machine usually beats Naive Bayes when it has more than 30–50 training cases, we show that Multinomial Naive Bayes is still better on snippets even with relatively large training sets (9k cases)
Results
  • While this does very well for long documents, the authors find that an interpolation between MNB and SVM performs excellently for all documents and the authors report results using this model: w = (1 − β)w + βw where w = ||w||1/|V | is the mean magnitude of w, and β ∈ [0, 1] is the interpolation parameter.
  • All results reported use α = 1, C = 1, β = 0.25 for NBSVM, and C = 0.1 for SVM.
  • For comparison with other published results, the authors use either 10-fold cross-validation or train/test split depending on what is standard for the dataset.
  • The authors find that several NB/SVM variants do better than these state-of-the-art methods, even compared to methods that use lexicons, reversal rules, or unsupervised pretraining.
  • The authors' SVM-uni results are consistent with BoFnoDic and BoF-w/Rev used in (Nakagawa et al, 2010) and BoWSVM in (Pang and Lee, 2004).
  • With the only exception being MPQA, MNB performed better than SVM in all cases.7
  • MNB are much better on sentiment snippet tasks, and usually better than other published results.
  • While (Ng and Jordan, 2002) showed that NB is better than SVM/logistic regression (LR) with few training cases, the authors show that MNB is better with short documents.
  • In contrast to their result that an SVM usually beats NB when it has more than 30–50 training cases, the authors show that MNB is still better on snippets even with relatively large training sets (9k cases).
  • Compared to the excellent performance of MNB on snippet datasets, the many poor assumptions of MNB pointed out in (Rennie et al, 2003) become more crippling for these longer documents.
  • SVM is much stronger than MNB for the 2 full-length sentiment analysis tasks, but still worse than some other published results.
Conclusion
  • In both tables 2 and 3, adding bigrams always improved the performance, and often gives better results than previously published.8 This presumably reflects that in sentiment classification there are
  • NBSVM performs well on snippets and longer documents, for sentiment, topic and subjectivity classification, and is often better than previously published results.
  • For MNB and NBSVM, using the binarized MNBf is slightly better than using the raw count feature f .
Summary
  • Naive Bayes (NB) and Support Vector Machine (SVM) models are often used as baselines for other methods in text categorization and sentiment analysis research.
  • The authors show that the better variants often outperform recently published state-of-the-art methods on many datasets.
  • By combining generative and discriminative classifiers, the authors present a simple model variant where an SVM is built over NB log-count ratios as feature values, and show that it is a strong and robust performer over all the presented tasks.
  • While this does very well for long documents, the authors find that an interpolation between MNB and SVM performs excellently for all documents and the authors report results using this model: w = (1 − β)w + βw where w = ||w||1/|V | is the mean magnitude of w, and β ∈ [0, 1] is the interpolation parameter.
  • All results reported use α = 1, C = 1, β = 0.25 for NBSVM, and C = 0.1 for SVM.
  • For comparison with other published results, the authors use either 10-fold cross-validation or train/test split depending on what is standard for the dataset.
  • The authors find that several NB/SVM variants do better than these state-of-the-art methods, even compared to methods that use lexicons, reversal rules, or unsupervised pretraining.
  • The authors' SVM-uni results are consistent with BoFnoDic and BoF-w/Rev used in (Nakagawa et al, 2010) and BoWSVM in (Pang and Lee, 2004).
  • With the only exception being MPQA, MNB performed better than SVM in all cases.7
  • MNB are much better on sentiment snippet tasks, and usually better than other published results.
  • While (Ng and Jordan, 2002) showed that NB is better than SVM/logistic regression (LR) with few training cases, the authors show that MNB is better with short documents.
  • In contrast to their result that an SVM usually beats NB when it has more than 30–50 training cases, the authors show that MNB is still better on snippets even with relatively large training sets (9k cases).
  • Compared to the excellent performance of MNB on snippet datasets, the many poor assumptions of MNB pointed out in (Rennie et al, 2003) become more crippling for these longer documents.
  • SVM is much stronger than MNB for the 2 full-length sentiment analysis tasks, but still worse than some other published results.
  • In both tables 2 and 3, adding bigrams always improved the performance, and often gives better results than previously published.8 This presumably reflects that in sentiment classification there are
  • NBSVM performs well on snippets and longer documents, for sentiment, topic and subjectivity classification, and is often better than previously published results.
  • For MNB and NBSVM, using the binarized MNBf is slightly better than using the raw count feature f .
Tables
  • Table1: Dataset statistics. (N+, N−): number of positive and negative examples. l: average number of words per example. CV: number of crossvalidation splits, or N for train/test split. |V |: the vocabulary size. ∆: upper-bounds of the differences required to be statistically significant at the p < 0.05 level
  • Table2: Results for snippets datasets. Tree-CRF: (<a class="ref-link" id="cNakagawa_et+al_2010_a" href="#rNakagawa_et+al_2010_a">Nakagawa et al, 2010</a>) RAE: Recursive Autoencoders (<a class="ref-link" id="cSocher_et+al_2011_a" href="#rSocher_et+al_2011_a">Socher et al, 2011</a>). RAE-pretrain: train on Wikipedia (<a class="ref-link" id="cCollobert_2008_a" href="#rCollobert_2008_a">Collobert and Weston, 2008</a>). “Voting” and “Rule”: use a sentiment lexicon and hard-coded reversal rules. “w/Rev”: “the polarities of phrases which have odd numbers of reversal phrases in their ancestors”. The top 3 methods are in bold and the best is also underlined
  • Table3: Results for long reviews (RT-2k and IMDB). The snippet dataset Subj. is also included for comparison. Results in rows 7-11 are from (<a class="ref-link" id="cMaas_et+al_2011_a" href="#rMaas_et+al_2011_a">Maas et al, 2011</a>). BoW: linear SVM on bag of words features. bnc: binary, no idf, cosine normalization. ∆t : smoothed delta idf. Full: the full model. Unlab’d: additional unlabeled data. BoWSVM: bag of words SVM used in (<a class="ref-link" id="cPang_2004_a" href="#rPang_2004_a">Pang and Lee, 2004</a>). Valence Shifter: (<a class="ref-link" id="cKennedy_2006_a" href="#rKennedy_2006_a">Kennedy and Inkpen, 2006</a>). tf.∆idf: (<a class="ref-link" id="cMartineau_2009_a" href="#rMartineau_2009_a">Martineau and Finin, 2009</a>). Appraisal Taxonomy: (<a class="ref-link" id="cWhitelaw_et+al_2005_a" href="#rWhitelaw_et+al_2005_a">Whitelaw et al, 2005</a>). WRRBM: Word Representation Restricted Boltzmann Machine (<a class="ref-link" id="cDahl_et+al_2012_a" href="#rDahl_et+al_2012_a">Dahl et al, 2012</a>)
  • Table4: On 3 20-newsgroup subtasks, we compare to DiscLDA (<a class="ref-link" id="cLacoste-Julien_et+al_2008_a" href="#rLacoste-Julien_et+al_2008_a">Lacoste-Julien et al, 2008</a>) and ActiveSVM (<a class="ref-link" id="cSchohn_2000_a" href="#rSchohn_2000_a">Schohn and Cohn, 2000</a>)
Download tables as Excel
Reference
  • R. Collobert and J. Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of ICML.
    Google ScholarLocate open access versionFindings
  • George E. Dahl, Ryan P. Adams, and Hugo Larochelle. 201Training restricted boltzmann machines on word observations. arXiv:1202.5695v1 [cs.LG].
    Findings
  • Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, June.
    Google ScholarLocate open access versionFindings
  • Minqing Hu and Bing Liu. 200Mining and summarizing customer reviews. In Proceedings ACM SIGKDD, pages 168–177.
    Google ScholarLocate open access versionFindings
  • Alistair Kennedy and Diana Inkpen. 2006. Sentiment classification of movie reviews using contextual valence shifters. Computational Intelligence, 22.
    Google ScholarLocate open access versionFindings
  • Simon Lacoste-Julien, Fei Sha, and Michael I. Jordan. 2008. DiscLDA: Discriminative learning for dimensionality reduction and classification. In Proceedings of NIPS, pages 897–904.
    Google ScholarLocate open access versionFindings
  • Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of ACL.
    Google ScholarLocate open access versionFindings
  • Justin Martineau and Tim Finin. 2009. Delta tfidf: An improved feature space for sentiment analysis. In Proceedings of ICWSM.
    Google ScholarLocate open access versionFindings
  • Andrew McCallum and Kamal Nigam. 1998. A comparison of event models for naive bayes text classification. In AAAI-98 Workshop, pages 41–48.
    Google ScholarLocate open access versionFindings
  • Vangelis Metsis, Ion Androutsopoulos, and Georgios Paliouras. 2006. Spam filtering with naive bayes which naive bayes? In Proceedings of CEAS.
    Google ScholarLocate open access versionFindings
  • Karo Moilanen and Stephen Pulman. 2007. Sentiment composition. In Proceedings of RANLP, pages 378– 382, September 27-29.
    Google ScholarLocate open access versionFindings
  • Tetsuji Nakagawa, Kentaro Inui, and Sadao Kurohashi. 2010. Dependency tree-based sentiment classification using CRFs with hidden variables. In Proceedings of ACL:HLT.
    Google ScholarLocate open access versionFindings
  • Andrew Y Ng and Michael I Jordan. 2002. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In Proceedings of NIPS, volume 2, pages 841–848.
    Google ScholarLocate open access versionFindings
  • Bo Pang and Lillian Lee. 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of ACL.
    Google ScholarLocate open access versionFindings
  • Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of ACL.
    Google ScholarLocate open access versionFindings
  • Jason D. Rennie, Lawrence Shih, Jaime Teevan, and David R. Karger. 2003. Tackling the poor assumptions of naive bayes text classifiers. In Proceedings of ICML, pages 616–623.
    Google ScholarLocate open access versionFindings
  • Greg Schohn and David Cohn. 2000. Less is more: Active learning with support vector machines. In Proceedings of ICML, pages 839–846.
    Google ScholarLocate open access versionFindings
  • Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning. 2011. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. In Proceedings of EMNLP.
    Google ScholarLocate open access versionFindings
  • Casey Whitelaw, Navendu Garg, and Shlomo Argamon. 2005. Using appraisal taxonomies for sentiment analysis. In Proceedings of CIKM-05.
    Google ScholarLocate open access versionFindings
  • Janyce Wiebe, Theresa Wilson, and Claire Cardie. 2005. Annotating expressions of opinions and emotions in language. Language Resources and Evaluation, 39(23):165–210.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments