Active learning for cross-domain sentiment classification

IJCAI, pp. 2127-2133, 2013.

Cited by: 65|Views32
EI WOS SCOPUS
Weibo:
We propose a novel active learning approach for cross-domain sentiment classification by leveraging Query By Committee-based sample selection and combination-based classifier classification

Abstract:

In the literature, various approaches have been proposed to address the domain adaptation problem in sentiment classification (also called cross-domain sentiment classification). However, the adaptation performance normally much suffers when the data distributions in the source and target domains differ significantly. In this paper, we su...More

Code:

Data:

Full Text
Bibtex
Weibo
Introduction
  • Sentiment classification is a task of determining the sentimental orientation of a given textual document towards a given topic [Pang et al, 2002; Turney, 2002].
  • *1 Corresponding author domain
  • To overcome this problem, several studies have been proposed to address the domain adaptation problem in sentiment classification by using some labeled data from the source domain and a large amount of unlabeled data from the target domain [Blitzer et al, 2007; He et al, 2011; Bollegala et al, 2011].
  • Active learning in cross-domain sentiment classification faces some unique challenges than active learning in traditional in-domain sentiment classification
Highlights
  • Sentiment classification is a task of determining the sentimental orientation of a given textual document towards a given topic [Pang et al, 2002; Turney, 2002]
  • This study has been extensively explored in multiple research communities, such as natural language processing (NLP), data mining and machine learning [Pang and Lee, 2008]
  • To well incorporate the knowledge in the unlabeled data for both the source and target classifiers, we adopt a graph-based ranking approach named label propagation to propagate the labels from the labeled data to the unlabeled data
  • The nodes consists of two parts: documents and all words extracted from the documents
  • We address domain adaptation in sentiment classification when the source and target domains differ significantly
  • We propose a novel active learning approach for cross-domain sentiment classification by leveraging Query By Committee-based sample selection and combination-based classifier classification
Results
  • Input: Labeled source-domain data LS Unlabeled target-domain data U T

    Output: Automatically labeled target-domain data LT Procedure:

    (a) Initialize LT ‡

    (b) Train the source classifier fS with LS (c) Use fS to select top-N uncertainty samples as 'LT (d) LT LT 'LT , and UT UT 'LT (e) Repeat k times e1) Train the target classifier fT with LT e2) Use both fS and fT to select label-disagreed samples from UT e3) Use fS to select top-N uncertainty samples from the label-disagreed samples as 'LT e4) LT LT 'LT , and UT UT 'LT

    4.3 LP-based Classification Algorithm

    To well incorporate the knowledge in the unlabeled data for both the source and target classifiers, the authors adopt a graph-based ranking approach named LP to propagate the labels from the labeled data to the unlabeled data.
  • Input: Labeled source-domain data LS Unlabeled target-domain data U T.
  • To well incorporate the knowledge in the unlabeled data for both the source and target classifiers, the authors adopt a graph-based ranking approach named LP to propagate the labels from the labeled data to the unlabeled data.
  • The input of the LP algorithm is a graph describing the relationship among each sample pair in the labeled and unlabeled data.
  • The document-word bipartite graph is adopted due to its excellent performance in sentiment classification [Sindhwani and Melville, 2008].
Conclusion
  • The authors address domain adaptation in sentiment classification when the source and target domains differ significantly.
  • The authors propose a novel active learning approach for cross-domain sentiment classification by leveraging QBC-based sample selection and combination-based classifier classification.
  • The authors will exploit more effective algorithms to improve the performances of the source and target classifiers.
  • The authors would like to adapt the active learning approach to other cross-domain tasks in natural language processing
Summary
  • Introduction:

    Sentiment classification is a task of determining the sentimental orientation of a given textual document towards a given topic [Pang et al, 2002; Turney, 2002].
  • *1 Corresponding author domain
  • To overcome this problem, several studies have been proposed to address the domain adaptation problem in sentiment classification by using some labeled data from the source domain and a large amount of unlabeled data from the target domain [Blitzer et al, 2007; He et al, 2011; Bollegala et al, 2011].
  • Active learning in cross-domain sentiment classification faces some unique challenges than active learning in traditional in-domain sentiment classification
  • Results:

    Input: Labeled source-domain data LS Unlabeled target-domain data U T

    Output: Automatically labeled target-domain data LT Procedure:

    (a) Initialize LT ‡

    (b) Train the source classifier fS with LS (c) Use fS to select top-N uncertainty samples as 'LT (d) LT LT 'LT , and UT UT 'LT (e) Repeat k times e1) Train the target classifier fT with LT e2) Use both fS and fT to select label-disagreed samples from UT e3) Use fS to select top-N uncertainty samples from the label-disagreed samples as 'LT e4) LT LT 'LT , and UT UT 'LT

    4.3 LP-based Classification Algorithm

    To well incorporate the knowledge in the unlabeled data for both the source and target classifiers, the authors adopt a graph-based ranking approach named LP to propagate the labels from the labeled data to the unlabeled data.
  • Input: Labeled source-domain data LS Unlabeled target-domain data U T.
  • To well incorporate the knowledge in the unlabeled data for both the source and target classifiers, the authors adopt a graph-based ranking approach named LP to propagate the labels from the labeled data to the unlabeled data.
  • The input of the LP algorithm is a graph describing the relationship among each sample pair in the labeled and unlabeled data.
  • The document-word bipartite graph is adopted due to its excellent performance in sentiment classification [Sindhwani and Melville, 2008].
  • Conclusion:

    The authors address domain adaptation in sentiment classification when the source and target domains differ significantly.
  • The authors propose a novel active learning approach for cross-domain sentiment classification by leveraging QBC-based sample selection and combination-based classifier classification.
  • The authors will exploit more effective algorithms to improve the performances of the source and target classifiers.
  • The authors would like to adapt the active learning approach to other cross-domain tasks in natural language processing
Tables
  • Table1: Symbol definition
  • Table2: Performance comparison between SCL and our LP-based domain adaptation
  • Table3: Performance comparison between the Personal/Impersonal approach and our LP-based semi-supervised classification in the target domain
Download tables as Excel
Related work
  • This section gives an overview of the related domain adaptation work from both sentiment classification and active learning perspectives.

    2.1 Domain Adaptation in Sentiment Classification

    Early studies on sentiment classification mainly focus on the single-domain setting [Pang et al, 2002; Turney, 2002]. For detailed discussion on this setting, please refer to [Pang and Lee, 2008].

    As for cross-domain sentiment classification, [Aue and Gammon, 2005] pioneer the studies. Although they fail to propose an effective solution, they highlight the importance and difficulty of cross-domain sentiment classification.

    Subsequently, [Blitzer et al, 2007] successfully develop a domain adaptation approach, named SCL, for sentiment classification, with the main idea to bridge the knowledge between the source and target domains using some pivotal features.

    More recently, [He et al, 2011] employ a topic model, called joint sentiment-topic model (JST), and [Bollegala et al, 2011] create a sentiment sensitive thesaurus, to perform cross-domain sentiment classification. Results from these studies demonstrate comparable performance to SCL.
Funding
  • The research work described in this paper has been partially supported by three NSFC grants, No.61003155, and No.60873150, one National High-tech Research and Development Program of China No.2012AA011102, Open Projects Program of National Laboratory of Pattern Recognition, and one project supported by Zhejiang Provincial Natural Science Foundation of China, No.Y13F020030
Reference
  • [Aue and Gamon, 2005] Aue A. and M. Gamon. 2005. Customizing Sentiment Classifiers to New Domains: A Case Study. Technical report, Microsoft Research.
    Google ScholarFindings
  • [Blizer et al., 2007] Blitzer J., M. Dredze and F. Pereira. 2007.
    Google ScholarFindings
  • Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In Proceedings of ACL-07, pp.440-447.
    Google ScholarLocate open access versionFindings
  • [Bollegala et al., 2011] Bollegala D., D. Weir, and J. Carroll. 2011. Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification. In Proceedings of ACL-11, pp.132-141.
    Google ScholarLocate open access versionFindings
  • [Chan and Ng, 2007] Chan Y. and H. Ng. 2007. Domain Adaptation with Active Learning for Word Sense Disambiguation. In Proceedings of ACL-07, pp.49-56.
    Google ScholarLocate open access versionFindings
  • [Daumé III, 2007] Daumé III H. 2007. Frustratingly Easy Domain Adaptation. In Proceedings of ACL-07, 256–263.
    Google ScholarLocate open access versionFindings
  • [He et al., 2011] He Y., C. Lin and H. Alani. 2011. Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification. In Proceeding of ACL-11, pp.123-131.
    Google ScholarLocate open access versionFindings
  • [Jiang and Zhai, 2007] Jiang J. and C. Zhai. 2007. Instance Weighting for Domain Adaptation in NLP. In Proceedings of ACL-07, pp.264–271.
    Google ScholarLocate open access versionFindings
  • [Li et al., 2010] Li S., C. Huang, G. Zhou and S. Lee. 2010. Employing Personal/Impersonal Views in Supervised and Semi-supervised Sentiment Classification. In Proceedings of ACL-10, pp.414-423.
    Google ScholarLocate open access versionFindings
  • [Pan and Yang, 2010] Pan S. and Q. Yang. 20A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering. vol.22(10), pp.1345-1359.
    Google ScholarLocate open access versionFindings
  • [Pang and Lee, 2008] Pang B. and L. Lee. 2008. Opinion Mining and Sentiment Analysis: Foundations and Trends. Information Retrieval, vol.2(12), pp.1-135.
    Google ScholarLocate open access versionFindings
  • [Pang et al., 2002] Pang B., L. Lee, and S. Vaithyanathan. 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques. In Proceedings of EMNLP-02, pp.79-86.
    Google ScholarLocate open access versionFindings
  • [Rai et al., 2010] Rai P., A. Saha, H. Daume III and S. Venkatasubramanian. 2010. Domain Adaptation Meets
    Google ScholarFindings
  • Active Learning. In Proceedings of NAACL-10 Workshop on Active Learning for Natural Language Processing, pp.27-32.
    Google ScholarLocate open access versionFindings
  • [Shen et al., 2004] Shen D., J. Zhang, J. Su, G. Zhou and C. Tan. 2004. Multi-Criteria-based Active Learning for Named Entity Recognition. In Proceedings of ACL-04, pp.589-596.
    Google ScholarLocate open access versionFindings
  • [Shi et al., 2008] Shi X., W. Fan and J. Ren. 2008. Actively Transfer Domain Knowledge. In Proceedings of ECML/PKDD-08, pp.342-357.
    Google ScholarLocate open access versionFindings
  • [Sindhwani and Melville, 2008] Sindhwani V. and P. Melville. 2008. Document-Word Co-Regularization for Semi-supervised Sentiment Analysis. In Proceedings of ICDM-08, pp.1025- 1030.
    Google ScholarLocate open access versionFindings
  • [Turney, 2002] Turney P. 2002. Thumbs up or Thumbs down? Semantic Orientation Applied to Unsupervised Classification of reviews. In Proceedings of ACL-02, pp.417-424.
    Google ScholarLocate open access versionFindings
  • [Wang and Yao, 2011]Wang S. and X. Yao. Relationships Between Diversity of Classification Ensembles and Single-Class Performance Measures. IEEE Transactions on Knowledge and Data Engineering, DOI: 10.1109/TKDE.2011.207.
    Locate open access versionFindings
  • [Zhu et al., 2008] Zhu J., H. Wang, T. Yao and B. Tsou. 2008. Active Learning with Sampling by Uncertainty and Density for Word Sense Disambiguation and Text Classification. In Proceedings of COLING-08, pp.1137-1144.
    Google ScholarLocate open access versionFindings
  • [Zhu and Ghahramani, 2002] Zhu X. and Z. Ghahramani. 2002. Learning from Labeled and Unlabeled Data with Label Propagation. CMU CALD Technical Report. CMU-CALD-02-107.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments