End-to-End Reinforcement Learning for Automatic Taxonomy Induction

ACL, pp. 2462-2472, 2018.

Cited by: 0|Bibtex|Views59|Links
EI
Keywords:
reinforcement learningterm pairmaximum spanning treeautomatic taxonomy inductionhypernymy detectionMore(2+)
Weibo:
This paper presents a novel end-to-end reinforcement learning approach for automatic taxonomy induction

Abstract:

We present a novel end-to-end reinforcement learning approach to automatic taxonomy induction from a set of terms. While prior methods treat the problem as a two-phase task (i.e., detecting hypernymy pairs followed by organizing these pairs into a tree-structured hierarchy), we argue that such two-phase methods may suffer from error propa...More

Code:

Data:

Introduction
Highlights
  • Many tasks in natural language understanding (e.g., information extraction (Demeester et al, 2016), question answering (Yang et al, 2017), and textual entailment (Sammons, 2012)) rely on lexical resources in the form of term taxonomies
  • The hypernymy pairs extracted in the first subtask form a noisy hypernym graph, which is transformed into a tree-structured taxonomy in the hypernymy organization subtask, using different graph pruning methods including maximum spanning tree (MST) (Bansal et al, 2014; Zhang et al, 2016), minimum-cost flow (MCF) (Gupta et al, 2017) and other pruning heuristics (Kozareva and Hovy, 2010; Velardi et al, 2013; Faralli et al, 2015; Panchenko et al, 2016)
  • We present the reinforcement learning (RL) approach to taxonomy induction
  • After we introduce the term pair representations and define the states, actions, and rewards, the problem becomes how to choose an action from the action space, i.e., which term pair (x1, x2) should be selected given the current state? To solve the problem, we parameterize each action a by a policy network π(a | s; WRL)
  • This paper presents a novel end-to-end reinforcement learning approach for automatic taxonomy induction
  • 11 tea-strainer c1o3lander sieve filter_tip oil_filter that treat term pairs independently or our approach learns the representations of term pairs by optimizing a holistic tree metric over the training taxonomies
Methods
  • The authors design two experiments to demonstrate the effectiveness of the proposed RL approach for taxonomy induction.
  • The authors introduce the details of the two experiments on validating that (1) the proposed approach can effectively reduce error propagation; and (2) the approach yields better taxonomies via optimizing metrics on holistic taxonomy structure.
  • The authors show that the joint learning approach is superior to twophase methods
  • Towards this goal, the authors compare with TAXI (Panchenko et al, 2016), a typical two-phase approach, two-phase HypeNET, implemented by pairwise hypernymy detection and hypernymy organization using MST, and Bansal et al (2014).
  • The dataset contains 761 nonoverlapped taxonomies in total and is partitioned by 70/15/15% (533/114/114) as training, validation, and test set, respectively
Results
  • HypeNET+MST extends HypeNET by first constructing a hypernym graph using HypeNET’s output as weights of edges and finding the MST (Chu, 1965) of this graph.
  • The authors can see that TAXI has the lowest F 1a while HypeNET performs the worst in F 1e.
  • Both TAXI and HypeNET’s F 1a and F 1e are lower than 30.
Conclusion
  • This paper presents a novel end-to-end reinforcement learning approach for automatic taxonomy induction.
  • Unlike previous two-phase methods filter air_filter strainer water_filter filter_bed glass_wool oil_filter bacteria_bed filter_tip drain_basket light_filter filter coffee_filter diatomaceous_earth fuel_filter colander tea-strainer sifter sieve riddle air_filter.
  • 7 3 8 16 6 15 10 strainer bacteria_bed drain_basket water_filter sifter fuel_filter diatomaceous_earth riddle light_filter.
  • 11 tea-strainer c1o3lander sieve that treat term pairs independently or the approach learns the representations of term pairs by optimizing a holistic tree metric over the training taxonomies.
  • Study on how to effectively encode induction history will be interesting
Summary
  • Introduction:

    Many tasks in natural language understanding (e.g., information extraction (Demeester et al, 2016), question answering (Yang et al, 2017), and textual entailment (Sammons, 2012)) rely on lexical resources in the form of term taxonomies.
  • The hypernymy pairs extracted in the first subtask form a noisy hypernym graph, which is transformed into a tree-structured taxonomy in the hypernymy organization subtask, using different graph pruning methods including maximum spanning tree (MST) (Bansal et al, 2014; Zhang et al, 2016), minimum-cost flow (MCF) (Gupta et al, 2017) and other pruning heuristics (Kozareva and Hovy, 2010; Velardi et al, 2013; Faralli et al, 2015; Panchenko et al, 2016)
  • Methods:

    The authors design two experiments to demonstrate the effectiveness of the proposed RL approach for taxonomy induction.
  • The authors introduce the details of the two experiments on validating that (1) the proposed approach can effectively reduce error propagation; and (2) the approach yields better taxonomies via optimizing metrics on holistic taxonomy structure.
  • The authors show that the joint learning approach is superior to twophase methods
  • Towards this goal, the authors compare with TAXI (Panchenko et al, 2016), a typical two-phase approach, two-phase HypeNET, implemented by pairwise hypernymy detection and hypernymy organization using MST, and Bansal et al (2014).
  • The dataset contains 761 nonoverlapped taxonomies in total and is partitioned by 70/15/15% (533/114/114) as training, validation, and test set, respectively
  • Results:

    HypeNET+MST extends HypeNET by first constructing a hypernym graph using HypeNET’s output as weights of edges and finding the MST (Chu, 1965) of this graph.
  • The authors can see that TAXI has the lowest F 1a while HypeNET performs the worst in F 1e.
  • Both TAXI and HypeNET’s F 1a and F 1e are lower than 30.
  • Conclusion:

    This paper presents a novel end-to-end reinforcement learning approach for automatic taxonomy induction.
  • Unlike previous two-phase methods filter air_filter strainer water_filter filter_bed glass_wool oil_filter bacteria_bed filter_tip drain_basket light_filter filter coffee_filter diatomaceous_earth fuel_filter colander tea-strainer sifter sieve riddle air_filter.
  • 7 3 8 16 6 15 10 strainer bacteria_bed drain_basket water_filter sifter fuel_filter diatomaceous_earth riddle light_filter.
  • 11 tea-strainer c1o3lander sieve that treat term pairs independently or the approach learns the representations of term pairs by optimizing a holistic tree metric over the training taxonomies.
  • Study on how to effectively encode induction history will be interesting
Tables
  • Table1: Results of the end-to-end taxonomy induction experiment. Our approach significantly outperforms two-phase methods (<a class="ref-link" id="cPanchenko_et+al_2016_a" href="#rPanchenko_et+al_2016_a">Panchenko et al, 2016</a>; <a class="ref-link" id="cShwartz_et+al_2016_a" href="#rShwartz_et+al_2016_a">Shwartz et al, 2016</a>; <a class="ref-link" id="cBansal_et+al_2014_a" href="#rBansal_et+al_2014_a">Bansal et al, 2014</a>). <a class="ref-link" id="cBansal_et+al_2014_a" href="#rBansal_et+al_2014_a">Bansal et al (2014</a>) and TaxoRL (NR) + FG are listed separately because they use extra resources
  • Table2: Results of the hypernymy organization experiment. Our approach outperforms <a class="ref-link" id="cPanchenko_et+al_2016_a" href="#rPanchenko_et+al_2016_a">Panchenko et al (2016</a>); <a class="ref-link" id="cGupta_et+al_2017_a" href="#rGupta_et+al_2017_a">Gupta et al (2017</a>) when the same hypernym graph is used as input. The precision of partial induction in both metrics is high. The precision of full induction is relatively lower but its recall is much higher
  • Table3: Ablation study on the WordNet dataset (<a class="ref-link" id="cBansal_et+al_2014_a" href="#rBansal_et+al_2014_a">Bansal et al, 2014</a>). Pe and Re are omitted because they are the same as F 1e for each model. We can see that our approach benefits from multiple sources of information which are complementary to each other
Download tables as Excel
Related work
  • 6.1 Hypernymy Detection

    Finding high-quality hypernyms is of great importance since it serves as the first step of taxonomy induction. In previous works, there are mainly two categories of approaches for hypernymy detection, namely pattern-based and distributional methods. Pattern-based methods consider lexicosyntactic patterns between the joint occurrences of term pairs for hypernymy detection. They generally achieve high precision but suffer from low recall. Typical methods that leverage patterns for hypernym extraction include (Hearst, 1992; Snow et al, 2005; Kozareva and Hovy, 2010; Panchenko et al, 2016; Nakashole et al, 2012). Distributional methods leverage the contexts of each term separately. The co-occurrence of term pairs is hence unnecessary. Some distributional methods are developed in an unsupervised manner. Measures such as symmetric similarity (Lin et al, 1998) and those based on distributional inclusion hypothesis (Weeds et al, 2004; Chang et al, 2017) were proposed. Supervised methods, on the other hand, usually have better performance than unsupervised methods for hypernymy detection. Recent works towards this direction include (Fu et al, 2014; Rimell, 2014; Yu et al, 2015; Tuan et al, 2016; Shwartz et al, 2016).
Funding
  • Research was sponsored in part by U.S Army Research Lab. under Cooperative Agreement No W911NF-09-2-0053 (NSCTA), DARPA under Agreement No W911NF-17-C-0099, National Science Foundation IIS 16-18481, IIS 1704532, and IIS-17-41317, and grant 1U54GM114838 awarded by NIGMS through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative (www.bd2k.nih.gov)
Study subjects and analysis
public datasets: 2
All components are trained in an end-to-end manner with cumulative rewards, measured by a holistic tree metric over the training taxonomies. Experiments on two public datasets of different domains show that our approach outperforms prior state-ofthe-art taxonomy induction methods up to 19.6% on ancestor F1. 1. We design two experiments to demonstrate the effectiveness of our proposed RL approach for taxonomy induction

public datasets: 2
In our approach, the representations of term pairs are learned using multiple sources of information and used to determine which term to select and where to place it on the taxonomy via a policy network. Experiments on two public datasets of different domains show that our approach outperforms prior state-ofthe-art taxonomy induction methods up to 19.6% on ancestor F1. 1. Many tasks in natural language understanding (e.g., information extraction (Demeester et al, 2016), question answering (Yang et al, 2017), and textual entailment (Sammons, 2012)) rely on lexical resources in the form of term taxonomies (cf. rightmost column in Fig. 1)

public datasets: 2
(2) We design a policy network to incorporate semantic information of term pairs and use cumulative rewards to measure the quality of constructed taxonomies holistically. (3) Experiments on two public datasets from different domains demonstrate the superior performance of our approach compared with state-of-the-art methods. We also show that our method can effectively reduce error propagation and capture global taxonomy structure

because some term pairs: 200
We use pre-trained GloVe word vectors (Pennington et al, 2014) with dimensionality 50 as word embeddings. We limit the maximum number of dependency paths between each term pair to be 200 because some term pairs containing general terms may have too many dependency paths. We run with different random seeds and hyperparameters and use the validation set to pick the best model

public datasets: 2
The error propagation between two phases is thus effectively reduced and the global taxonomy structure is better captured. Experiments on two public datasets from different domains show that our approach outperforms state-of-the-art methods significantly. In the future, we will explore more strategies towards term pair selection (e.g., allow the RL agent to remove terms from the taxonomy) and reward function design

Reference
  • Tuan Luu Anh, Jung-jae Kim, and See Kiong Ng. 2014. Taxonomy construction using syntactic contextual evidence. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 810–819.
    Google ScholarLocate open access versionFindings
  • Mohit Bansal, David Burkett, Gerard De Melo, and Dan Klein. 2014. Structured learning for taxonomy induction with belief propagation. In ACL (1), pages 1041–1051.
    Google ScholarLocate open access versionFindings
  • Marco Baroni and Alessandro Lenci. 2011. How we blessed distributional semantic evaluation. In Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics, pages 1–10. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Georgeta Bordea, Els Lefever, and Paul Buitelaar. 2016. Semeval-2016 task 13: Taxonomy extraction evaluation (texeval-2). In SemEval-2016, pages 1081–1091. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Thorsten Brants and Alex Franz. 2006. Web 1t 5-gram corpus version 1.1. Google Inc.
    Google ScholarLocate open access versionFindings
  • Jose Camacho-Collados. 2017. Why we have switched from building full-fledged taxonomies to simply detecting hypernymy relations. arXiv preprint arXiv:1703.04178.
    Findings
  • Haw-Shiuan Chang, ZiYun Wang, Luke Vilnis, and Andrew McCallum. 201Unsupervised hypernym detection by distributional inclusion vector embedding. arXiv preprint arXiv:1710.00880.
    Findings
  • Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, and Tony Robinson. 2013. One billion word benchmark for measuring progress in statistical language modeling. arXiv preprint arXiv:1312.3005.
    Findings
  • Yoeng-Jin Chu. 1965. On the shortest arborescence of a directed graph. Science Sinica, 14:1396–1400.
    Google ScholarLocate open access versionFindings
  • Thomas Demeester, Tim Rocktaschel, and Sebastian Riedel. 2016. Lifted rule injection for relation embeddings. In EMNLP.
    Google ScholarFindings
  • Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 601–610. ACM.
    Google ScholarLocate open access versionFindings
  • Stefano Faralli, Giovanni Stilo, and Paola Velardi. 2015. Large scale homophily analysis in twitter using a twixonomy. In IJCAI, pages 2334–2340.
    Google ScholarLocate open access versionFindings
  • Tiziano Flati, Daniele Vannella, Tommaso Pasini, and Roberto Navigli. 2014. Two is bigger (and better) than one: the wikipedia bitaxonomy project. In ACL (1), pages 945–955.
    Google ScholarLocate open access versionFindings
  • Ruiji Fu, Jiang Guo, Bing Qin, Wanxiang Che, Haifeng Wang, and Ting Liu. 20Learning semantic hierarchies via word embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1199–1209.
    Google ScholarLocate open access versionFindings
  • Amit Gupta, Remi Lebret, Hamza Harkous, and Karl Aberer. 2017. Taxonomy induction using hypernym subsequences. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pages 1329–1338. ACM.
    Google ScholarLocate open access versionFindings
  • Lushan Han, Abhay L Kashyap, Tim Finin, James Mayfield, and Jonathan Weese. 2013. Umbc ebiquity-core: Semantic textual similarity systems. In * SEM@ NAACL-HLT, pages 44–52.
    Google ScholarLocate open access versionFindings
  • Marti A Hearst. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on Computational linguisticsVolume 2, pages 539–545. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Nan Jiang, Alex Kulesza, Satinder Singh, and Richard Lewis. 2015. The dependence of effective planning horizon on model accuracy. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, pages 1181–1189. International Foundation for Autonomous Agents and Multiagent Systems.
    Google ScholarLocate open access versionFindings
  • David Jurgens and Mohammad Taher Pilehvar. 2015. Reserating the awesometastic: An automatic extension of the wordnet taxonomy for novel terms. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1459–1465.
    Google ScholarLocate open access versionFindings
  • Zornitsa Kozareva and Eduard Hovy. 2010. A semi-supervised method to learn and construct taxonomies using the web. In Proceedings of the 2010 conference on empirical methods in natural language processing, pages 1110–1118. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Douglas B Lenat. 1995. Cyc: A large-scale investment in knowledge infrastructure. Communications of the ACM, 38(11):33–38.
    Google ScholarLocate open access versionFindings
  • Dekang Lin et al. 1998. An information-theoretic definition of similarity. In Icml, volume 98, pages 296– 304. Citeseer.
    Google ScholarLocate open access versionFindings
  • Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2016. Neural relation extraction with selective attention over instances. In ACL (1).
    Google ScholarFindings
  • Xueqing Liu, Yangqiu Song, Shixia Liu, and Haixun Wang. 2012. Automatic taxonomy construction from keywords. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1433–1441. ACM.
    Google ScholarLocate open access versionFindings
  • George A Miller. 1995. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39– 41.
    Google ScholarLocate open access versionFindings
  • Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. 2012. Patty: a taxonomy of relational patterns with semantic types. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1135–1145. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, et al. 2017. Dynet: The dynamic neural network toolkit. arXiv preprint arXiv:1701.03980.
    Findings
  • Alexander Panchenko, Stefano Faralli, Eugen Ruppert, Steffen Remus, Hubert Naets, Cedrick Fairon, Simone Paolo Ponzetto, and Chris Biemann. 2016. Taxi at semeval-2016 task 13: a taxonomy induction method based on lexico-syntactic patterns, substrings and focused crawling. In Proceedings of the 10th International Workshop on Semantic Evaluation, San Diego, CA, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.
    Google ScholarLocate open access versionFindings
  • Simone Paolo Ponzetto and Michael Strube. 2008. Wikitaxonomy: A large scale knowledge resource. In ECAI, volume 178, pages 751–752.
    Google ScholarLocate open access versionFindings
  • Laura Rimell. 2014. Distributional lexical entailment by topic coherence. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 511–519.
    Google ScholarLocate open access versionFindings
  • Mark Sammons. 2012. Recognizing textual entailment.
    Google ScholarFindings
  • Wei Shen, Jianyong Wang, Ping Luo, and Min Wang. 2012. A graph-based approach for ontology population with named entities. In Proceedings of the 21st ACM international conference on Information and knowledge management, pages 345–354. ACM.
    Google ScholarLocate open access versionFindings
  • Vered Shwartz, Yoav Goldberg, and Ido Dagan. 2016. Improving hypernymy detection with an integrated path-based and distributional method. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 2389–2398.
    Google ScholarLocate open access versionFindings
  • Rion Snow, Daniel Jurafsky, and Andrew Y Ng. 2005. Learning syntactic patterns for automatic hypernym discovery. In Advances in neural information processing systems, pages 1297–1304.
    Google ScholarLocate open access versionFindings
  • Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: a core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web, pages 697–706. ACM.
    Google ScholarLocate open access versionFindings
  • Luu A Tuan, Yi Tay, Siu C Hui, and See K Ng. 2016. Learning term embeddings for taxonomic relation identification using dynamic weighting neural network. In Proceedings of the EMNLP conference, pages 403–413.
    Google ScholarLocate open access versionFindings
  • Paola Velardi, Stefano Faralli, and Roberto Navigli. 2013. Ontolearn reloaded: A graph-based algorithm for taxonomy induction. Computational Linguistics, 39(3):665–707.
    Google ScholarLocate open access versionFindings
  • Julie Weeds, David Weir, and Diana McCarthy. 2004. Characterising measures of lexical distributional similarity. In Proceedings of the 20th international conference on Computational Linguistics, page 1015. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ronald J Williams. 1992. Simple statistical gradientfollowing algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256.
    Google ScholarLocate open access versionFindings
  • Ichiro Yamada, Kentaro Torisawa, Jun’ichi Kazama, Kow Kuroda, Masaki Murata, Stijn De Saeger, Francis Bond, and Asuka Sumida. 2009. Hypernym discovery based on distributional similarity and hierarchical structures. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2-Volume 2, pages 929–937. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Hui Yang and Jamie Callan. 2009. A metric-based framework for automatic taxonomy induction. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1, pages 271–279. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Shuo Yang, Lei Zou, Zhongyuan Wang, Jun Yan, and Ji-Rong Wen. 2017. Efficiently answering technical questions - a knowledge graph approach. In AAAI.
    Google ScholarFindings
  • Zheng Yu, Haixun Wang, Xuemin Lin, and Min Wang. 2015. Learning term embeddings for hypernymy identification. In IJCAI, pages 1390–1397.
    Google ScholarLocate open access versionFindings
  • Hao Zhang, Zhiting Hu, Yuntian Deng, Mrinmaya Sachan, Zhicheng Yan, and Eric Xing. 2016. Learning concept taxonomies from multi-modal data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1791–1801.
    Google ScholarLocate open access versionFindings
  • Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D Manning. 2017. Positionaware attention and supervised data improve slot filling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 35–45.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments