Learning Structured Representation for Text Classification via Reinforcement Learning

AAAI, 2018.

Cited by: 85|Bibtex|Views155
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com
Weibo:
Reward Once all the actions are sampled by the policy network, the structured representation of a sentence is determined by our representation models, and the representation will be passed to Classification Network to obtain P where y is the class label

Abstract:

Representation learning is a fundamental problem in natural language processing. This paper studies how to learn a structured representation for text classification. Unlike most existing representation models that either use no structure or rely on pre-specified structures, we propose a reinforcement learning (RL) method to learn sentence...More

Code:

Data:

0
Introduction
Highlights
  • Representation learning is a fundamental problem in AI, and important for natural language processing (NLP) (Bengio, Courville, and Vincent 2013; Le and Mikolov 2014)
  • We propose a reinforcement learning (RL) method to build structured sentence representations by identifying task-relevant structures without explicit structure annotations
  • We propose two structured representation models: information distilled LSTM (ID-LSTM) and hierarchical structured LSTM (HS-LSTM)
  • The model consists of three components: Policy Network (PNet), structured representation models, and Classification Network (CNet)
  • This paper has presented a reinforcement learning method which learns sentence representation by discovering taskrelevant structures
Methods
  • Overview

    The goal of this paper is to learn structured representation for text classification by discovering important, task-relevant structures.
  • PNet adopts a stochastic policy and samples an action at each state.
  • The structured representation models translate the actions into a structured representation.
  • CNet makes classification based on the structured representation and offers reward computation to PNet. Since the reward can be computed once the final representation is available, the process can be naturally addressed by policy gradient method (Sutton et al 2000)
Results
  • Classification results as listed in Table 2 show that the models perform competitively across different datasets and different tasks.
  • Comparing to pre-specified parsing structures, automatically discovered structures seem to be more friendly for classification.
  • These results demonstrate the effectiveness of learning structured representations by discovering task-relevant structures.
  • Origin text ID-LSTM HS-LSTM Origin text ID-LSTM HS-LSTM Origin text ID-LSTM HS-LSTM.
  • Cho continues her exploration of the outer limits of raunch with considerable brio.
  • Offers an interesting look at the rapidly changing face of Beijing
Conclusion
  • This paper has presented a reinforcement learning method which learns sentence representation by discovering taskrelevant structures.
  • In the framework of RL, the authors adopted two representation models: ID-LSTM that distills task-relevant words to form purified sentence representation, and HSLSTM that discovers phrase structures to form hierarchical sentence representation.
  • Extensive experiments show that the method has state-of-the-art performance and is able to discover interesting task-relevant structures without explicit structure annotations.
  • The authors will apply the method to other types of sequences since the idea of structure discovery can be generalized to other tasks and domains
Summary
  • Representation learning is a fundamental problem in AI, and important for natural language processing (NLP) (Bengio, Courville, and Vincent 2013; Le and Mikolov 2014).
  • In our RL method, we design two structured representation models: Information Distilled LSTM (ID-LSTM) which selects important, task-relevant words to build sentence representation, and Hierarchical Structured LSTM (HS-LSTM) which discovers phrase structures and builds sentence representation with a two-level LSTM.
  • We propose a reinforcement learning method which discovers task-relevant structures to build structured sentence representations for text classification problems.
  • The goal of this paper is to learn structured representation for text classification by discovering important, task-relevant structures.
  • The model consists of three components: Policy Network (PNet), structured representation models, and Classification Network (CNet).
  • Once all the actions are decided, the representation models will obtain a structured representation of the sentence, and it will be used by CNet to compute P (y|X).
  • Reward Once all the actions are sampled by the policy network, the structured representation of a sentence is determined by our representation models, and the representation will be passed to CNet to obtain P (y|X) where y is the class label.
  • ID-LSTM translates the actions obtained from PNet to a structured representation of a sentence.
  • HS-LSTM translates the actions to a hierarchical structured representation of the sentence.
  • If action at−1 is End, the word at position t is the start of a phrase and the wordlevel LSTM starts with a zero-initialized state.
  • The classification network produces a probability distribution over class labels based on the structured representation obtained from ID-LSTM or HS-LSTM.
  • Taking sentiment classification as an example, we observed that the retained words by ID-LSTM are mostly sentiment words and negation words, indicating that the model can distill important, task-relevant words.
  • The most and least deleted words by ID-LSTM in the SST dataset are listed in Table 5, ordered by deletion percentage (Deleted/Count).
  • The qualitative and quantitative results demonstrate that ID-LSTM is able to remove irrelevant words and distill task-relevant ones in a sentence.
  • Quantitative analysis: First of all, we compared HS-LSTM with other structured models to investigate whether classification tasks can benefit from the discovered structure.
  • The results in Table 8 show that HS-LSTM outperforms other structured models, indicating that the discovered structure may be more task-relevant and advantageous than that given by parser.
  • Our HS-LSTM has the ability of discovering task-relevant structures and building better structured sentence representations.
  • In the framework of RL, we adopted two representation models: ID-LSTM that distills task-relevant words to form purified sentence representation, and HSLSTM that discovers phrase structures to form hierarchical sentence representation.
  • We will apply the method to other types of sequences since the idea of structure discovery can be generalized to other tasks and domains
Tables
  • Table1: The behavior of HS-LSTM according to action at−1 and at
  • Table2: Classification accuracy on different datasets. Results marked with * are re-printed from (<a class="ref-link" id="cTai_et+al_2015_a" href="#rTai_et+al_2015_a">Tai, Socher, and Manning 2015</a>), (<a class="ref-link" id="cKim_2014_a" href="#rKim_2014_a">Kim 2014</a>), and (<a class="ref-link" id="cHuang_et+al_2017_a" href="#rHuang_et+al_2017_a">Huang, Qian, and Zhu 2017</a>). The rest are obtained by our own implementation
  • Table3: Examples of the structures distilled and discovered by ID-LSTM and HS-LSTM
  • Table4: The original average length and distilled average length by ID-LSTM in the test set of each dataset
  • Table5: The most/least deleted words in the test set of SST
  • Table6: The comparison of the predefined structures and those discovered by HS-LSTM
  • Table7: Phrase examples discovered by HS-LSTM
  • Table8: Classification accuracy from structured models. The result marked with * is re-printed from (<a class="ref-link" id="cYogatama_et+al_2017_a" href="#rYogatama_et+al_2017_a">Yogatama et al 2017</a>)
  • Table9: Statistics of structures discovered by HS-LSTM in the test set of each dataset
Download tables as Excel
Funding
  • This work was partly supported by the National Science Foundation of China under grant No.61272227/61332007
Reference
  • Bengio, Y.; Courville, A.; and Vincent, P. 2013. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8):1798–1828.
    Google ScholarLocate open access versionFindings
  • Bowman, S. R.; Angeli, G.; Potts, C.; and Manning, C. D. 2015. A large annotated corpus for learning natural language inference. In EMNLP, 632–642.
    Google ScholarLocate open access versionFindings
  • Chung, J.; Ahn, S.; and Bengio, Y. 2017. Hierarchical multiscale recurrent neural networks. In ICLR.
    Google ScholarFindings
  • Chung, J.; Gulcehre, C.; Cho, K.; and Bengio, Y. 201Empirical evaluation of gated recurrent neural networks on sequence modeling. In NIPS 2014 Workshop.
    Google ScholarFindings
  • Ghosh, S.; Vinyals, O.; Strope, B.; Roy, S.; Dean, T.; and Heck, L. 2016. Contextual lstm (clstm) models for large scale nlp tasks. In SIGKDD workshop Oral presentation.
    Google ScholarLocate open access versionFindings
  • Hochreiter, S., and Schmidhuber, J. 1997. Long short-term memory. Neural computation 9(8):1735–1780.
    Google ScholarLocate open access versionFindings
  • Huang, M.; Qian, Q.; and Zhu, X. 201Encoding syntactic knowledge in neural networks for sentiment classification. ACM Transactions on Information Systems (TOIS) 35(3):26.
    Google ScholarLocate open access versionFindings
  • Iyyer, M.; Manjunatha, V.; Boyd-Graber, J.; and Daume III, H. 2015. Deep unordered composition rivals syntactic methods for text classification. In ACL, 1681–1691.
    Google ScholarFindings
  • Joulin, A.; Grave, E.; Bojanowski, P.; and Mikolov, T. 2017. Bag of tricks for efficient text classification. In EACL, 427– 431.
    Google ScholarFindings
  • Kalchbrenner, N.; Grefenstette, E.; and Blunsom, P. 2014. A convolutional neural network for modelling sentences. In ACL, 655–665.
    Google ScholarLocate open access versionFindings
  • Kim, Y. 2014. Convolutional neural networks for sentence classification. In EMNLP, 1746–1751.
    Google ScholarLocate open access versionFindings
  • Kingma, D., and Ba, J. 2015. Adam: A method for stochastic optimization. In ICLR.
    Google ScholarFindings
  • Klein, D., and Manning, C. D. 2003. Accurate unlexicalized parsing. In ACL, 423–430.
    Google ScholarLocate open access versionFindings
  • Le, Q., and Mikolov, T. 20Distributed representations of sentences and documents. In ICML, 1188–1196.
    Google ScholarLocate open access versionFindings
  • Lei, T.; Barzilay, R.; and Jaakkola, T. 20Molding cnns for text: non-linear, non-consecutive convolutions. In EMNLP, 1565–1575.
    Google ScholarLocate open access versionFindings
  • Lin, Z.; Feng, M.; Santos, C. N. d.; Yu, M.; Xiang, B.; Zhou, B.; and Bengio, Y. 2017. A structured self-attentive sentence embedding. In ICLR.
    Google ScholarFindings
  • Liu, B.; Huang, M.; Sun, J.; and Zhu, X. 2015. Incorporating domain and sentiment supervision in representation learning for domain adaptation. In IJCAI, 1277–1283.
    Google ScholarFindings
  • Pang, B., and Lee, L. 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In ACL, 271.
    Google ScholarLocate open access versionFindings
  • Pang, B., and Lee, L. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In ACL, 115–124.
    Google ScholarLocate open access versionFindings
  • Pennington, J.; Socher, R.; and Manning, C. D. 2014. Glove: Global vectors for word representation. In EMNLP, volume 14, 1532–1543.
    Google ScholarLocate open access versionFindings
  • Qian, Q.; Tian, B.; Huang, M.; Liu, Y.; Zhu, X.; and Zhu, X. 2015. Learning tag embeddings and tag-specific composition functions in recursive neural network. In ACL, 1365– 1374.
    Google ScholarLocate open access versionFindings
  • Qian, Q.; Huang, M.; Lei, J.; and Zhu, X. 2017. Linguistically regularized lstm for sentiment classification. In ACL, 1679–1689.
    Google ScholarLocate open access versionFindings
  • Socher, R.; Pennington, J.; Huang, E. H.; Ng, A. Y.; and Manning, C. D. 2011. Semi-supervised recursive autoencoders for predicting sentiment distributions. In EMNLP, 151–161.
    Google ScholarLocate open access versionFindings
  • Socher, R.; Perelygin, A.; Wu, J. Y.; Chuang, J.; Manning, C. D.; Ng, A. Y.; Potts, C.; et al. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In EMNLP, volume 1631, 1642.
    Google ScholarLocate open access versionFindings
  • Sutton, R. S.; McAllester, D. A.; Singh, S. P.; and Mansour, Y. 2000. Policy gradient methods for reinforcement learning with function approximation. In NIPS, 1057–1063.
    Google ScholarFindings
  • Tai, K. S.; Socher, R.; and Manning, C. D. 2015. Improved semantic representations from tree-structured long short-term memory networks. In ACL, 1556–1566.
    Google ScholarFindings
  • Tang, D.; Qin, B.; and Liu, T. 2015. Document modeling with gated recurrent neural network for sentiment classification. In EMNLP, 1422–1432.
    Google ScholarFindings
  • Williams, R. J. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8(3-4):229–256.
    Google ScholarLocate open access versionFindings
  • Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A. J.; and Hovy, E. H. 2016. Hierarchical attention networks for document classification. In NAACL-HLT, 1480–1489.
    Google ScholarFindings
  • Yogatama, D.; Blunsom, P.; Dyer, C.; Grefenstette, E.; and Ling, W. 2017. Learning to compose words into sentences with reinforcement learning. In ICLR.
    Google ScholarFindings
  • Zhang, X.; Zhao, J.; and LeCun, Y. 2015. Character-level convolutional networks for text classification. In NIPS, 649– 657.
    Google ScholarLocate open access versionFindings
  • Zhou, X.; Wan, X.; and Xiao, J. 2016. Attention-based lstm network for cross-lingual sentiment classification. In EMNLP, 247–256.
    Google ScholarLocate open access versionFindings
  • Zhu, X.; Guo, H.; Mohammad, S.; and Kiritchenko, S. 2014. An empirical study on the effect of negation words on sentiment. In ACL, 304–313.
    Google ScholarLocate open access versionFindings
  • Zhu, X.; Sobihani, P.; and Guo, H. 2015. Long short-term memory over recursive structures. In International Conference on Machine Learning, 1604–1612.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments