AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
In view of the widespread of these applications, we propose a methodology to construct appropriate domain-specific datasets and metrics to assess the accuracy of relatedness and similarity estimations
Top Rank Focused Adaptive Vote Collection for the Evaluation of Domain Specific Semantic Models
EMNLP 2020, pp.3081-3093, (2020)
The growth of domain-specific applications of semantic models, boosted by the recent achievements of unsupervised embedding learning algorithms, demands domain-specific evaluation datasets. In many cases, content-based recommenders being a prime example, these models are required to rank words or texts according to their semantic relatedn...More
PPT (Upload PPT)
- The authors have been witnessing a growth of Natural Language Processing (NLP) applications in a wide range of specific domains, such as recruiting (INDA; Qin et al, 2018), law (Sugathadasa et al, 2017), oil and gas (Nooralahzadeh et al, 2018), social media analysis (ALRashdi and O’Keefe, 2019), online education (Dessı et al, 2019), and biomedical (Patel et al, 2020).
- These notions underlie the downstream tasks of countless NLP applications, including information retrieval (Akmal et al, 2014; Chen et al, 2017; Gurevych et al, 2007; Hliaoutakis et al, 2006; Ji et al, 2017; Lopez-Gazpio et al, 2017; Srihari et al, 2000; Uddin et al, 2013), content-based recommendation (De Gemmis et al, 2008, 2015; Lops et al, 2011), semantic matching (Giunchiglia et al, 2004; Li and Xu, 2014; Wan et al, 2016), ontology learning and knowledge management (Aouicha et al, 2016a; Georgiev and Georgiev, 2018; Jiang et al, 2014; Sanchez and Moreno, 2008), and word sense disambiguation (Aouicha et al, 2016b; Patwardhan et al, 2003)
- In recent years, we have been witnessing a growth of Natural Language Processing (NLP) applications in a wide range of specific domains, such as recruiting (INDA; Qin et al, 2018), law (Sugathadasa et al, 2017), oil and gas (Nooralahzadeh et al, 2018), social media analysis (ALRashdi and O’Keefe, 2019), online education (Dessı et al, 2019), and biomedical (Patel et al, 2020)
- Semantic similarity and relatedness are related but distinct notions in linguistics, the first being associated with concepts which share taxonomic properties and being maximized by synonyms; on the other hand, semantically related concepts can share any kind of semantic relation, including antonym (Cai et al, 2010; Harispe et al, 2015). These notions underlie the downstream tasks of countless NLP applications, including information retrieval (Akmal et al, 2014; Chen et al, 2017; Gurevych et al, 2007; Hliaoutakis et al, 2006; Ji et al, 2017; Lopez-Gazpio et al, 2017; Srihari et al, 2000; Uddin et al, 2013), content-based recommendation (De Gemmis et al, 2008, 2015; Lops et al, 2011), semantic matching (Giunchiglia et al, 2004; Li and Xu, 2014; Wan et al, 2016), ontology learning and knowledge management (Aouicha et al, 2016a; Georgiev and Georgiev, 2018; Jiang et al, 2014; Sanchez and Moreno, 2008), and word sense disambiguation (Aouicha et al, 2016b; Patwardhan et al, 2003)
- We provided a protocol for the construction – based on adaptive pairwise comparisons and tailored on the available resources – of a dataset, which can be used to test or validate any relatedness-based domain-specific semantic model and which is optimized to be accurate in top-rank evaluation
- As the purpose of the adaptive approach is to focus votes on top rank items, a reasonable request is to have no more than 10% of items surviving up to the last ballot, which gives an upper bound α (0.1)1/(nb−1)
- We defined a stochastic transitivity model to simulate semantic-driven pairwise comparisons, which allows tuning the parameters of the data collection approach and which confirmed a significant increase in the performance metrics ρw and τw of the proposed adaptive approach compared with the uniform approach
- The authors estimated the accuracy of a data collection approach by comparing, via the metrics defined in
Section 3, the ranking that it produces with the underlying theoretical ranks.
- The results of the simulations are presented in Table 2, which contains, as measures of the accuracy of the proposed approaches, the ρw and τw coefficients defined in Equation 8 and discussed in Section 3; in order to check the overall rank accuracy, the authors report the standard Spearman’s ρ and Kendall’s τ coefficients.
- The adaptive approach, compared with the uniform approach, determines a relevant increase in both ρw and τw for any of the underlying similarity distributions considered, with no relevant changes in the overall rank precision measured by ρ and τ.
- The results suggest that the proposed stochastic model is robust for changes in the underlying similarity distribution
- The authors provided a protocol for the construction – based on adaptive pairwise comparisons and tailored on the available resources – of a dataset, which can be used to test or validate any relatedness-based domain-specific semantic model and which is optimized to be accurate in top-rank evaluation.
- The authors defined a stochastic transitivity model to simulate semantic-driven pairwise comparisons, which allows tuning the parameters of the data collection approach and which confirmed a significant increase in the performance metrics ρw and τw of the proposed adaptive approach compared with the uniform approach.
- Additional future investigations may include a deeper analysis of the mathematical and statistical properties of the weighted coefficients ρw, τw, as well as a rigorous derivation of the optimal values for the parameters of the data collection approach
- Table1: Most commonly used symbols
- Table2: Mean ± standard deviation (unbiased estimation over 50 simulations of relatedness-driven comparisons, as described in text) for ρw and τw metrics, defined in Equation 8, and for Spearman’s ρ and Kendall’s τ coefficients
- 2A token can be considered rare within a particular domain if its frequency in a corpus of domain-specific texts is, e.g., lower than 10% of the average token frequency in the corpus
- As the purpose of the adaptive approach is to focus votes on top rank items, a reasonable request is to have no more than 10% of items surviving up to the last ballot, which gives an upper bound α (0.1)1/(nb−1)
Study subjects and analysis
For this reason, we rely on the standard pairwise comparison: we generate Ncomp pairs of items (as described in Sections 2.4 and 2.5), each one to be presented to one voter, who is requested to identify the item formed by the most similar tokens. 3More in detail, 990 pairs of distinct tokens (associated with 45 tokens) have been considered within the semantic area Sales & Marketing. 2.4 Uniform Item Selection
- Suriati Akmal, Li-Hsing Shih, and Rafael Batres. 2014. Ontology-based similarity for product information retrieval. Computers in Industry, 65(1):91–107.
- Reem ALRashdi and Simon O’Keefe. 2019. Deep learning and word embeddings for tweet classification for crisis response. arXiv preprint arXiv:1903.11024.
- Ammar Ammar and Devavrat Shah. 2011. Ranking: Compare, don’t score. In 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 776–78IEEE.
- Mohamed Ben Aouicha, Mohamed Ali Hadj Taieb, and Malek Ezzeddine. 2016a. Derivation of “is a” taxonomy from wikipedia category graph. Engineering Applications of Artificial Intelligence, 50:265–286.
- Mohamed Ben Aouicha, Mohamed Ali Hadj Taieb, and Hania Ibn Marai. 2016b. Wsd-tic: word sense disambiguation using taxonomic information content. In International Conference on Computational Collective Intelligence, pages 131–142. Springer.
- Jeremy Auguste, Arnaud Rey, and Benoit Favre. 2017. Evaluation of word embeddings against cognitive processes: primed reaction times in lexical decision and naming tasks. In Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP, pages 21–26.
- Amir Bakarov. 2018. A survey of word embeddings evaluation methods. arXiv preprint arXiv:1801.09536.
- Rajendra Banjade, Nabin Maharjan, Nobal B Niraula, Vasile Rus, and Dipesh Gautam. 2015. Lemon and tea are not similar: Measuring word-to-word similarity by combining different methods. In International Conference on Intelligent Text Processing and Computational Linguistics, pages 335–346. Springer.
- Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of machine learning research, 3(Feb):1137–1155.
- Roi Blanco, Harry Halpin, Daniel M Herzig, Peter Mika, Jeffrey Pound, Henry S Thompson, and Thanh Tran. 2013. Repeatable and reliable semantic search evaluation. Journal of web semantics, 21:14– 29.
- David C Blest. 2000.
- Theory & methods: Rank correlation—an alternative measure. Australian & New Zealand Journal of Statistics, 42(1):101–111.
- Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146.
- JC de Borda. 1784. Memoire sur les elections au scrutin. Histoire de l’Academie Royale des Sciences pour 1781 (Paris, 1784).
- Elia Bruni, Nam-Khanh Tran, and Marco Baroni. 2014. Multimodal distributional semantics. Journal of artificial intelligence research, 49:1–47.
- Songmei Cai, Zhao Lu, and Junzhong Gu. 2010. An effective measure of semantic similarity. In Advances in Wireless Networks and Information Systems, pages 9–17. Springer.
- Manuela Cattelan. 2012. Models for paired comparison data: A review with emphasis on dependent data. Statistical Science, pages 412–433.
- Fuzan Chen, Chenghua Lu, Harris Wu, and Minqiang Li. 2017. A semantic similarity measure integrating multiple conceptual relationships for web service discovery. Expert Systems with Applications, 67:19–31.
- Joaquim Pinto da Costa and Carlos Soares. 2005. A weighted rank measure of correlation. Australian & New Zealand Journal of Statistics, 47(4):515–529.
- Elise AV Crompvoets, Anton A Beguin, and Klaas Sijtsma. 2019. Adaptive pairwise comparison for educational measurement. Journal of Educational and Behavioral Statistics, page 1076998619890589.
- Livia Dancelli, Marica Manisera, and Marika Vezzoli. 2013. On two classes of weighted rank correlation measures deriving from the spearman’s ρ. In Statistical Models for Data Analysis, pages 107–114. Springer.
- Marco De Gemmis, Pasquale Lops, Cataldo Musto, Fedelucio Narducci, and Giovanni Semeraro. 2015. Semantics-aware content-based recommender systems. In Recommender Systems Handbook, pages 119–159. Springer.
- Marco De Gemmis, Pasquale Lops, Giovanni Semeraro, and Pierpaolo Basile. 2008. Integrating tags in a semantic content-based recommender. In Proceedings of the 2008 ACM conference on Recommender systems, pages 163–170.
- Danilo Dessı, Mauro Dragoni, Gianni Fenu, Mirko Marras, and Diego Reforgiato Recupero. 2019. Evaluating neural word embeddings created from online course reviews for sentiment analysis. In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, pages 2124–2127.
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Georgiana Dinu, Angeliki Lazaridou, and Marco Baroni. 2014. Improving zero-shot learning by mitigating the hubness problem. arXiv preprint arXiv:1412.6568.
- Daniel M Ennis. 2016. Thurstonian models: Categorical decision making in the presence of noise. Institute for Perception.
- Manaal Faruqui, Yulia Tsvetkov, Pushpendre Rastogi, and Chris Dyer. 2016. Problems with evaluation of word embeddings using word similarity tasks. arXiv preprint arXiv:1605.02276.
- Roman Feldbauer, Maximilian Leodolter, Claudia Plant, and Arthur Flexer. 2018. Fast approximate hubness reduction for large high-dimensional data. In 2018 IEEE International Conference on Big Knowledge (ICBK), pages 358–367. IEEE.
- Damien Francois, Vincent Wertz, and Michel Verleysen. 2007. The concentration of fractional distances. IEEE Transactions on Knowledge and Data Engineering, 19(7):873–886.
- Johannes Furnkranz and Eyke Hullermeier. 2010. Preference learning and ranking by pairwise comparison. In Preference learning, pages 65–82. Springer.
- Georgi V Georgiev and Danko D Georgiev. 2018. Enhancing user creativity: Semantic measures for idea generation. Knowledge-Based Systems, 151:1–15.
- Fausto Giunchiglia, Pavel Shvaiko, and Mikalai Yatskevich. 2004. S-match: an algorithm and an implementation of semantic matching. In European semantic web symposium, pages 61–75. Springer.
- Anna Gladkova and Aleksandr Drozd. 2016. Intrinsic evaluations of word embeddings: What can we do better? In Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, pages 36–42.
- Iryna Gurevych, Christof Muller, and Torsten Zesch. 2007. What to be?-electronic career guidance based on semantic relatedness. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 1032–1039.
- Harry Halpin, Daniel M Herzig, Peter Mika, Roi Blanco, Jeffrey Pound, Henry Thompon, and Duc Thanh Tran. 2010. Evaluating ad-hoc object retrieval. In IWEST@ ISWC.
- Sebastien Harispe, Sylvie Ranwez, and Stefan Janaqi. 2015. Semantic similarity from natural language and ontology analysis. Morgan & Claypool Publishers.
- Reinhard Heckel, Nihar B Shah, Kannan Ramchandran, Martin J Wainwright, et al. 2019. Active ranking from pairwise comparisons and when parametric assumptions do not help. The Annals of Statistics, 47(6):3099–3126.
- Reinhard Heckel, Max Simchowitz, Kannan Ramchandran, and Martin J Wainwright. 2018. Approximate ranking from pairwise comparisons. arXiv preprint arXiv:1801.01253.
- Angelos Hliaoutakis, Giannis Varelas, Epimenidis Voutsakis, Euripides GM Petrakis, and Evangelos Milios. 2006. Information retrieval by semantic similarity. International journal on semantic Web and information systems (IJSWIS), 2(3):55–73.
- Ronald L Iman and WJ Conover. 1987. A measure of top–down correlation. Technometrics, 29(3):351– 357.
- Kevin G Jamieson and Robert Nowak. 2011. Active ranking using pairwise comparisons. In Advances in Neural Information Processing Systems, pages 2240–2248.
- Xiaonan Ji, Alan Ritter, and Po-Yin Yen. 2017. Using ontology-based semantic similarity to facilitate the article screening process for systematic reviews. Journal of biomedical informatics, 69:33–42.
- Yong Jiang, Xinmin Wang, and Hai-Tao Zheng. 2014. A semantic similarity measure based on information distance for ontology alignment. Information Sciences, 278:76–87.
- Maurice G Kendall. 1938. A new measure of rank correlation. Biometrika, 30(1/2):81–93.
- Maurice George Kendall. 1948. Rank correlation methods.
- Svetlana Kiritchenko and Saif M Mohammad. 2017. Best-worst scaling more reliable than rating scales: A case study on sentiment intensity annotation. arXiv preprint arXiv:1712.01765.
- William H Kruskal. 1958. Ordinal measures of association. Journal of the American Statistical Association, 53(284):814–861.
- Juan J Lastra-Dıaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana Garcıa-Serrano, Mohamed Ben Aouicha, and Eneko Agirre. 2019. A reproducible survey on word embeddings and ontology-based methods for word similarity: linear combinations outperform the state of the art. Engineering Applications of Artificial Intelligence, 85:645–665.
- Hang Li and Jun Xu. 2014. Semantic matching in search. Foundations and Trends in Information retrieval, 7(5):343–469.
- Inigo Lopez-Gazpio, Montse Maritxalar, Aitor Gonzalez-Agirre, German Rigau, Larraitz Uria, and Eneko Agirre. 2017. Interpretable semantic textual similarity: Finding and explaining differences between sentences. Knowledge-Based Systems, 119:186–199.
- Pasquale Lops, Marco De Gemmis, and Giovanni Semeraro. 2011. Content-based recommender systems: State of the art and trends. In Recommender systems handbook, pages 73–105. Springer.
- Jordan J Louviere and George G Woodworth. 1991. Best-worst scaling: A model for the largest difference judgments. University of Alberta: Working Paper.
- Tahani A Maturi and Ezz H Abdelfattah. 2008. A new weighted rank correlation. Journal of mathematics and statistics., 4(4):226–230.
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
- Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. 2017. Advances in pre-training distributed word representations. arXiv preprint arXiv:1712.09405.
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013b. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119.
- Dunja Mladenic. 1999. Text-learning and related intelligent agents: a survey. IEEE intelligent systems and their applications, 14(4):44–54.
- Sahand Negahban, Sewoong Oh, and Devavrat Shah. 2017. Rank centrality: Ranking from pairwise comparisons. Operations Research, 65(1):266–287.
- Farhad Nooralahzadeh, Lilja Øvrelid, and Jan Tore Lønning. 2018. Evaluation of domain-specific word embeddings using knowledge resources. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).
- Dohyung Park, Joe Neeman, Jin Zhang, Sujay Sanghavi, and Inderjit Dhillon. 2015. Preference completion: Large-scale collaborative ranking from pairwise comparisons. In International Conference on Machine Learning, pages 1907–1916.
- Rashmi Patel, Jessica Irving, Matthew Taylor, Hitesh Shetty, Megan Pritchard, Robert Stewart, Paolo Fusar-Poli, and Philip McGuire. 2020. T109. traversing the transdiagnostic gap between depression, mania and psychosis with natural language processing. Schizophrenia Bulletin, 46(Supplement 1):S272–S273.
- Siddharth Patwardhan, Satanjeev Banerjee, and Ted Pedersen. 2003. Using measures of semantic relatedness for word sense disambiguation. In International Conference on Intelligent Text Processing and Computational Linguistics, pages 241–257. Springer.
- Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.
- Chuan Qin, Hengshu Zhu, Tong Xu, Chen Zhu, Liang Jiang, Enhong Chen, and Hui Xiong. 2018. Enhancing person-job fit for talent recruitment: An abilityaware neural network approach. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pages 25–34.
- Milos Radovanovic, Alexandros Nanopoulos, and Mirjana Ivanovic. 2010a. Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research, 11(Sep):2487–2531.
- Milos Radovanovic, Alexandros Nanopoulos, and Mirjana Ivanovic. 2010b. On the existence of obstinate results in vector space models. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 186–193.
- Anna Rogers, Shashwath Hosur Ananthakrishna, and Anna Rumshisky. 2018. What’s in your embedding, and how it predicts task performance. In Proceedings of the 27th International Conference on Computational Linguistics, pages 2690–2703.
- David Sanchez and Antonio Moreno. 2008. Learning non-taxonomic relationships from web documents for domain ontology construction. Data & Knowledge Engineering, 64(3):600–623.
- Tobias Schnabel, Igor Labutov, David Mimno, and Thorsten Joachims. 2015. Evaluation methods for unsupervised word embeddings. In Proceedings of the 2015 conference on empirical methods in natural language processing, pages 298–307.
- Grace S Shieh. 1998. A weighted kendall’s tau statistic. Statistics & probability letters, 39(1):17–24.
- Charles Spearman. 1961. The proof and measurement of association between two things.
- Rohini K Srihari, Zhongfei Zhang, and Aibing Rao. 2000. Intelligent indexing and semantic retrieval of multimodal documents. Information Retrieval, 2(23):245–275.
- Keet Sugathadasa, Buddhi Ayesha, Nisansa de Silva, Amal Shehan Perera, Vindula Jayawardana, Dimuthu Lakmal, and Madhavi Perera. 2017. Synergistic union of word2vec and lexicon for domain specific semantic similarity. In 2017 IEEE International Conference on Industrial and Information Systems (ICIIS), pages 1–6. IEEE.
- Mohamed Ali Hadj Taieb, Torsten Zesch, and Mohamed Ben Aouicha. 2019. A survey of semantic relatedness evaluation datasets and procedures. Artificial Intelligence Review, pages 1–42.
- Louis L Thurstone. 1927. A law of comparative judgment. Psychological review, 34(4):273.
- Mohammed Nazim Uddin, Trong Hai Duong, Ngoc Thanh Nguyen, Xin-Min Qi, and Geun Sik Jo. 2013. Semantic similarity measures for enhancing information retrieval in folksonomies. Expert Systems with Applications, 40(5):1645–1653.
- Sebastiano Vigna. 2015. A weighted correlation index for rankings with ties. In Proceedings of the 24th international conference on World Wide Web, pages 1166–1176.
- Shengxian Wan, Yanyan Lan, Jiafeng Guo, Jun Xu, Liang Pang, and Xueqi Cheng. 2016. A deep architecture for semantic matching with multiple positional sentence representations. In Thirtieth AAAI Conference on Artificial Intelligence.
- Bin Wang, Angela Wang, Fenxiao Chen, Yuncheng Wang, and C-C Jay Kuo. 2019. Evaluating word embedding models: methods and experimental results. APSIPA Transactions on Signal and Information Processing, 8.
- Fabian Wauthier, Michael Jordan, and Nebojsa Jojic. 2013. Efficient ranking from pairwise comparisons. In International Conference on Machine Learning, pages 109–117.
- William Webber, Alistair Moffat, and Justin Zobel. 2010. A similarity measure for indefinite rankings. ACM Transactions on Information Systems (TOIS), 28(4):1–38.