The results produced via machine learning techniques are quite good in comparison to the humangenerated baselines discussed in Section 4
Thumbs up?: sentiment classification using machine learning techniques
empirical methods in natural language processing, (2002): 79-86
We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive...更多
下载 PDF 全文
- Very large amounts of information are available in on-line documents. As part of the effort to better organize this information for users, researchers have been actively investigating the problem of automatic text categorization.
The bulk of such work has focused on topical categorization, attempting to sort documents according to their subject matter (e.g., sports vs. politics).
- Recent years have seen rapid growth in on-line discussion groups and review sites where a crucial characteristic of the posted articles is their sentiment, or overall opinion towards the subject matter — for example, whether a product review is positive or negative
- Labeling these articles with their sentiment would provide succinct summaries to readers; these labels are part of the appeal and value-add of such sites as www.rottentomatoes.com, which both labels movie reviews that do not contain explicit rating indicators and normalizes the different rating schemes that individual reviewers use.
- There are potential applications to message filtering; for example, one might be able to use sentiment information to recognize and discard “flames”(Spertus, 1997)
- Today, very large amounts of information are available in on-line documents
- The bulk of such work has focused on topical categorization, attempting to sort documents according to their subject matter
- Sentiment classification would be helpful in business intelligence applications (e.g. MindfulEye’s Lexant system1) and recommender systems (e.g., Terveen et al (1997), Tatemura (2000)), where user input and feedback could be quickly summarized; in general, free-form survey responses given in natural language format could be processed using sentiment categorization
- We examine the effectiveness of applying machine learning techniques to the sentiment classification problem
- The results produced via machine learning techniques are quite good in comparison to the humangenerated baselines discussed in Section 4
- Though, the superiority of presence information in comparison to frequency information in our setting contradicts previous observations made in topic-classification work (McCallum and Nigam, 1998)
- Initial unigram results The classification accuracies resulting from using only unigrams as features are shown in line (1) of Figure 3.
- In topic-based classification, all three classifiers have been reported to use bagof-unigram features to achieve accuracies of 90% and above for particular categories (Joachims, 1998; Nigam et al, 1999)9 — and such results are for settings with more than two classes.
- This provides suggestive evidence that sentiment categorization is more difficult than topic classification, which corresponds to the intuitions of the text categorization expert mentioned above.10 the authors still wanted to investigate ways to improve the sentiment categorization results; these experiments are reported below
- The results produced via machine learning techniques are quite good in comparison to the humangenerated baselines discussed in Section 4.
- Though, the superiority of presence information in comparison to frequency information in the setting contradicts previous observations made in topic-classification work (McCallum and Nigam, 1998).
- What accounts for these two differences — difficulty and types of information proving useful — between topic and sentiment classification, and how might the authors improve the latter?
- What accounts for these two differences — difficulty and types of information proving useful — between topic and sentiment classification, and how might the authors improve the latter? To answer these questions, the authors examined the data further. (All examples below are drawn from the full 2053-document corpus.)
- This paper is based upon work supported in part by the National Science Foundation under ITR/IM grant IIS0081334
- Shlomo Argamon-Engelson, Moshe Koppel, and Galit Avneri. 1998. Style-based text categorization: What newspaper am I reading? In Proc. of the AAAI Workshop on Text Categorization, pages 1–4.
- Adam L. Berger, Stephen A. Della Pietra, and Vincent J. Della Pietra. 1996. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39–71.
- Douglas Biber. 1988. Variation across Speech and Writing. Cambridge University Press.
- Stanley Chen and Ronald Rosenfeld. 2000. A survey of smoothing techniques for ME models. IEEE Trans. Speech and Audio Processing, 8(1):37–50.
- Sanjiv Das and Mike Chen. 2001. Yahoo! for Amazon: Extracting market sentiment from stock message boards. In Proc. of the 8th Asia Pacific Finance Association Annual Conference (APFA 2001).
- Stephen Della Pietra, Vincent Della Pietra, and John Lafferty. 1997. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380–393.
- Pedro Domingos and Michael J. Pazzani. 199On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29(2-3):103– 130.
- Aidan Finn, Nicholas Kushmerick, and Barry Smyth. 2002. Genre classification and domain transfer for information filtering. In Proc. of the European Colloquium on Information Retrieval Research, pages 353–362, Glasgow.
- Vasileios Hatzivassiloglou and Kathleen McKeown. 1997. Predicting the semantic orientation of adjectives. In Proc. of the 35th ACL/8th EACL, pages 174–181.
- Vasileios Hatzivassiloglou and Janyce Wiebe. 2000. Effects of adjective orientation and gradability on sentence subjectivity. In Proc. of COLING.
- Marti Hearst. 1992. Direction-based text interpretation as an information access refinement. In Paul Jacobs, editor, Text-Based Intelligent Systems. Lawrence Erlbaum Associates.
- Alison Huettner and Pero Subasic. 2000. Fuzzy typing for document management. In ACL 2000 Companion Volume: Tutorial Abstracts and Demonstration Notes, pages 26–27.
- Thorsten Joachims. 1998. Text categorization with support vector machines: Learning with many relevant features. In Proc. of the European Conference on Machine Learning (ECML), pages 137– 142.
- Thorsten Joachims. 1999. Making large-scale SVM learning practical. In Bernhard Scholkopf and Alexander Smola, editors, Advances in Kernel Methods - Support Vector Learning, pages 44–56. MIT Press.
- Jussi Karlgren and Douglass Cutting. 1994. Recognizing text genres with simple metrics using discriminant analysis. In Proc. of COLING.
- Brett Kessler, Geoffrey Nunberg, and Hinrich Schutze. 1997. Automatic detection of text genre. In Proc. of the 35th ACL/8th EACL, pages 32–38.
- David D. Lewis. 1998. Naive (Bayes) at forty: The independence assumption in information retrieval. In Proc. of the European Conference on Machine Learning (ECML), pages 4–15. Invited talk.
- Andrew McCallum and Kamal Nigam. 1998. A comparison of event models for Naive Bayes text classification. In Proc. of the AAAI-98 Workshop on Learning for Text Categorization, pages 41–48.
- Frederick Mosteller and David L. Wallace. 1984. Applied Bayesian and Classical Inference: The Case of the Federalist Papers. Springer-Verlag.
- Kamal Nigam, John Lafferty, and Andrew McCallum. 1999. Using maximum entropy for text classification. In Proc. of the IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 61–67.
- Ted Pedersen. 2001. A decision tree of bigrams is an accurate predictor of word sense. In Proc. of the Second NAACL, pages 79–86.
- Warren Sack. 1994. On the computation of point of view. In Proc. of the Twelfth AAAI, page 1488. Student abstract.
- Ellen Spertus. 1997. Smokey: Automatic recognition of hostile messages. In Proc. of Innovative Applications of Artificial Intelligence (IAAI), pages 1058–1065.
- Junichi Tatemura. 2000. Virtual reviewers for collaborative exploration of movie reviews. In Proc. of the 5th International Conference on Intelligent User Interfaces, pages 272–275.
- Loren Terveen, Will Hill, Brian Amento, David McDonald, and Josh Creter. 1997. PHOAKS: A system for sharing recommendations. Communications of the ACM, 40(3):59–62.
- Laura Mayfield Tomokiyo and Rosie Jones. 2001. You’re not from round here, are you? Naive Bayes detection of non-native utterance text. In Proc. of the Second NAACL, pages 239–246.
- Richard M. Tong. 2001. An operational system for detecting and tracking opinions in on-line discussion. Workshop note, SIGIR 2001 Workshop on Operational Text Classification.
- Peter D. Turney and Michael L. Littman. 2002. Unsupervised learning of semantic orientation from a hundred-billion-word corpus. Technical Report EGB-1094, National Research Council Canada.
- Peter Turney. 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proc. of the ACL.
- Janyce M. Wiebe, Theresa Wilson, and Matthew Bell. 2001. Identifying collocations for recognizing opinions. In Proc. of the ACL/EACL Workshop on Collocation.
- Yorick Wilks and Mark Stevenson. 1998. The grammar of sense: Using part-of-speech tags as a first step in semantic disambiguation. Journal of Natural Language Engineering, 4(2):135–144.