AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
A novel lexicalized HMMbased approach is proposed and an opinion mining and extraction system, OpinionMiner, has been developed. Our objective in this system is to answer the following questions: given a particular product, 1) how to automatically extract potential product entiti...

OpinionMiner: a novel machine learning system for web opinion mining and extraction

KDD, pp.1195-1204, (2009)

引用286|浏览36
EI
下载 PDF 全文
引用
微博一下

摘要

Merchants selling products on the Web often ask their customers to share their opinions and hands-on experiences on products they have purchased. Unfortunately, reading through all customer reviews is difficult, especially for popular items, the number of reviews can be up to hundreds or even thousands. This makes it difficult for a poten...更多

代码

数据

0
简介
  • As e-commerce is becoming more and more popular, it has become a common practice for online merchants to ask their customers to share their opinions and hands-on experiences on products they have purchased
  • Such information is highly valuable to manufacturers, online advertisers and potential customers.
  • A novel lexicalized HMMbased approach is proposed and an opinion mining and extraction system, OpinionMiner, has been developed
  • The authors' objective in this system is to answer the following questions: given a particular product, 1) how to automatically extract potential product entities and opinion entities from the reviews?
  • The experimental results demonstrate the effectiveness of the proposed approach in web opinion mining and extraction from online product reviews
重点内容
  • As e-commerce is becoming more and more popular, it has become a common practice for online merchants to ask their customers to share their opinions and hands-on experiences on products they have purchased
  • A novel lexicalized HMMbased approach is proposed and an opinion mining and extraction system, OpinionMiner, has been developed. Our objective in this system is to answer the following questions: given a particular product, 1) how to automatically extract potential product entities and opinion entities from the reviews? 2) how to identify opinion sentences which describe each extracted product entity? and 3) how to determine opinion orientation given each recognized product entity? Different from previous approaches that have mostly relied on rule-based techniques [3, 4] or statistic information [10, 13], we propose a new framework that naturally integrates multiple linguistic features into automatic learning
  • The proposed machine learning framework performs significantly better than the rule-based baseline system in terms of entity extraction, opinion sentence recognition and opinion polarity classification
  • A novel and robust machine learning system is designed for opinion mining and extraction
  • The model provides solutions for several problems that have not been addressed by previous approaches
  • The system can predict new potential product and opinion entities based on the patterns it has learned, which is extremely useful in text and web mining due to the complexity and flexibility of natural language
结果
  • Evaluation Results and Discussions

    The detailed evaluation results are presented in Table 6 and Table 7.
  • The authors observed the approach effectively identified highly specific product entities and opinion expressions and self-learned new vocabularies based on the patterns it has seen from the training data
  • Another observation is in addition to effectively extracting frequent entities, the system excels in identifying important but infrequently mentioned entities, which was under-analyzed or ignored by previously proposed methods.
  • “automatic white balance”, “custom white balance” and “preset white balance” represent different user preferences and a recommender system should be able to distinguish among these to answer the user’s specific queries
结论
  • A novel and robust machine learning system is designed for opinion mining and extraction.
  • (1) People like to describe a long story about their experiences.
  • Some people like to describe how bad/good their former cameras were.
  • This influences the system performance on some camera reviews in the experiments.
  • The authors are looking into this issue further
表格
  • Table1: Definitions of entity types and examples
  • Table2: Basic tag set and its corresponding entities
  • Table3: Pattern tag set and its corresponding pattern
  • Table4: The transformation table
  • Table5: Baseline rules for extracting product entities and opinion-bearing words
  • Table6: Experimental results on entity extraction (R: Recall; P: Precision; F: F-score; VE: Vocabulary Expansion; BS: Bootstrapping)
  • Table7: Experimental results on opinion sentence identification and opinion orientation classification
  • Table8: Examples of self-learned vocabularies auto red eye correction = 1 auto stabilizer* = 2 auto white balance* = 3 automatic = 2 automatic setting = 5 automatic fill-flash* = 1 automatic focus* = 1 automatic white balance = 1 automatic zoom* = 1 automatic functions* = 1 automatic point-and-shoot mode* = 1
Download tables as Excel
相关工作
  • Opinion analysis has been studied by many researchers in recent years. Two main research directions are explored, i.e., document level opinion mining and feature level opinion mining. In document level, Turney [3] presented an approach of determining document’s polarity by calculating the average semantic orientation (SO) of extracted phrases. SO was computed by using pointwise mutual information (PMI) to measure the dependence between extracted phrases and the reference words “excellent” and “poor” by using web search hit counts. One year later Turney and Littman [4] further expanded their work by using cosine distance in latent semantic analysis (LSA) as the distance measure. Dave, Lawrence and Pennock [5] classified reviews on Amazon by calculating scores using normalized term frequency on uni-gram, bi-gram and tri-gram with different smoothing techniques. Das and Chen [8] studied document level sentiment polarity classification on financial documents. Pang, Lee and Vaithyanathan [6] used several machine learning approaches to classify movie reviews and in [7], they further studied another machine learning approach based on subjectivity detection and minimum cuts in graphs for sentiment classification of movie reviews. Our work is different from these as their goal is to determine the sentiment of documents while ours is to perform extraction and classification on entities. Another difference is they were not focused on features being commented on.
引用论文
  • Lee, S. Z., Tsujii, J., and Rim, H. C. 2000. Lexicalized Hidden Markov Models for Part-of-Speech Tagging. In Proceedings of the 18th International Conference on Computational Linguistics (COLING'00), 481-487.
    Google ScholarLocate open access versionFindings
  • Fu, G. and Luke, K. K. 2005. Chinese Named Entity Recognition using Lexicalized HMMs. ACM SIGKDD Explorations Newsletter 7,1 (2005), 19-25.
    Google ScholarLocate open access versionFindings
  • Turney, P. D. 2002. Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL’02), 417-424.
    Google ScholarLocate open access versionFindings
  • Turney, P. D. and Littman, M. L. 2003. Measuring praise and criticism: Inference of semantic orientation from association. ACM Trans. On Information Systems, 21, 4 (2003), 315-346.
    Google ScholarLocate open access versionFindings
  • Dave, K., Lawrence, S., and Pennock, D. M. 2003. Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews. In Proceedings of the 12th international conference on World Wide Web (WWW’03), 519-528.
    Google ScholarLocate open access versionFindings
  • Pang, B., Lee, L., and Vaithyanathan, S. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP’02), 79-86.
    Google ScholarLocate open access versionFindings
  • Pang, B. and Lee, L. 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42th Annual Meeting of the Association for Computational Linguistics (ACL’04), 271-278.
    Google ScholarLocate open access versionFindings
  • Das, S. and Chen, M. 2001. Yahoo! for Amazon: Extracting market sentiment from stock message boards. In Proceedings of the 8th Asia Pacific Finance Association Annual Conference (APFA’01).
    Google ScholarLocate open access versionFindings
  • Hu, M. and Liu, B. 2004. Mining and Summarizing Customer Reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’04), 168-177
    Google ScholarLocate open access versionFindings
  • Zhuang, L., Jing, F., and Zhu, X. 2006. Movie Review Mining and Summarization. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’06), 43-50.
    Google ScholarLocate open access versionFindings
  • Popescu, A. and Etzioni, O. 2005. Extracting Product Features and Opinions from Reviews. In Proceeding of 2005 Conference on Empirical Methods in Natural Language Processing (EMNLP’05), 339-346.
    Google ScholarLocate open access versionFindings
  • Ding, X., Liu, B., and Yu, P. S. 2008. A Holistic Lexiconbased Approach to Opinion Mining. In Proceeding of the international conference on Web Search and Web Data Mining (WSDM’08), 231-239.
    Google ScholarLocate open access versionFindings
0
您的评分 :

暂无评分

标签
评论
avatar
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn