AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
In view of the widespread of these applications, we propose a methodology to construct appropriate domain-specific datasets and metrics to assess the accuracy of relatedness and similarity estimations

Top Rank Focused Adaptive Vote Collection for the Evaluation of Domain Specific Semantic Models

EMNLP 2020, pp.3081-3093, (2020)

Cited by: 0|Views178
Full Text
Bibtex
Weibo

Abstract

The growth of domain-specific applications of semantic models, boosted by the recent achievements of unsupervised embedding learning algorithms, demands domain-specific evaluation datasets. In many cases, content-based recommenders being a prime example, these models are required to rank words or texts according to their semantic relatedn...More

Code:

Data:

0
Introduction
Highlights
Results
  • The authors estimated the accuracy of a data collection approach by comparing, via the metrics defined in

    Section 3, the ranking that it produces with the underlying theoretical ranks.
  • The results of the simulations are presented in Table 2, which contains, as measures of the accuracy of the proposed approaches, the ρw and τw coefficients defined in Equation 8 and discussed in Section 3; in order to check the overall rank accuracy, the authors report the standard Spearman’s ρ and Kendall’s τ coefficients.
  • The adaptive approach, compared with the uniform approach, determines a relevant increase in both ρw and τw for any of the underlying similarity distributions considered, with no relevant changes in the overall rank precision measured by ρ and τ.
  • The results suggest that the proposed stochastic model is robust for changes in the underlying similarity distribution
Conclusion
  • The authors provided a protocol for the construction – based on adaptive pairwise comparisons and tailored on the available resources – of a dataset, which can be used to test or validate any relatedness-based domain-specific semantic model and which is optimized to be accurate in top-rank evaluation.
  • The authors defined a stochastic transitivity model to simulate semantic-driven pairwise comparisons, which allows tuning the parameters of the data collection approach and which confirmed a significant increase in the performance metrics ρw and τw of the proposed adaptive approach compared with the uniform approach.
  • Additional future investigations may include a deeper analysis of the mathematical and statistical properties of the weighted coefficients ρw, τw, as well as a rigorous derivation of the optimal values for the parameters of the data collection approach
Tables
  • Table1: Most commonly used symbols
  • Table2: Mean ± standard deviation (unbiased estimation over 50 simulations of relatedness-driven comparisons, as described in text) for ρw and τw metrics, defined in Equation 8, and for Spearman’s ρ and Kendall’s τ coefficients
Download tables as Excel
Funding
  • 2A token can be considered rare within a particular domain if its frequency in a corpus of domain-specific texts is, e.g., lower than 10% of the average token frequency in the corpus
  • As the purpose of the adaptive approach is to focus votes on top rank items, a reasonable request is to have no more than 10% of items surviving up to the last ballot, which gives an upper bound α (0.1)1/(nb−1)
Study subjects and analysis
pairs: 990
For this reason, we rely on the standard pairwise comparison: we generate Ncomp pairs of items (as described in Sections 2.4 and 2.5), each one to be presented to one voter, who is requested to identify the item formed by the most similar tokens. 3More in detail, 990 pairs of distinct tokens (associated with 45 tokens) have been considered within the semantic area Sales & Marketing. 2.4 Uniform Item Selection

Reference
  • Suriati Akmal, Li-Hsing Shih, and Rafael Batres. 2014. Ontology-based similarity for product information retrieval. Computers in Industry, 65(1):91–107.
    Google ScholarLocate open access versionFindings
  • Reem ALRashdi and Simon O’Keefe. 2019. Deep learning and word embeddings for tweet classification for crisis response. arXiv preprint arXiv:1903.11024.
    Findings
  • Ammar Ammar and Devavrat Shah. 2011. Ranking: Compare, don’t score. In 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 776–78IEEE.
    Google ScholarLocate open access versionFindings
  • Mohamed Ben Aouicha, Mohamed Ali Hadj Taieb, and Malek Ezzeddine. 2016a. Derivation of “is a” taxonomy from wikipedia category graph. Engineering Applications of Artificial Intelligence, 50:265–286.
    Google ScholarLocate open access versionFindings
  • Mohamed Ben Aouicha, Mohamed Ali Hadj Taieb, and Hania Ibn Marai. 2016b. Wsd-tic: word sense disambiguation using taxonomic information content. In International Conference on Computational Collective Intelligence, pages 131–142. Springer.
    Google ScholarLocate open access versionFindings
  • Jeremy Auguste, Arnaud Rey, and Benoit Favre. 2017. Evaluation of word embeddings against cognitive processes: primed reaction times in lexical decision and naming tasks. In Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP, pages 21–26.
    Google ScholarLocate open access versionFindings
  • Amir Bakarov. 2018. A survey of word embeddings evaluation methods. arXiv preprint arXiv:1801.09536.
    Findings
  • Rajendra Banjade, Nabin Maharjan, Nobal B Niraula, Vasile Rus, and Dipesh Gautam. 2015. Lemon and tea are not similar: Measuring word-to-word similarity by combining different methods. In International Conference on Intelligent Text Processing and Computational Linguistics, pages 335–346. Springer.
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of machine learning research, 3(Feb):1137–1155.
    Google ScholarLocate open access versionFindings
  • Roi Blanco, Harry Halpin, Daniel M Herzig, Peter Mika, Jeffrey Pound, Henry S Thompson, and Thanh Tran. 2013. Repeatable and reliable semantic search evaluation. Journal of web semantics, 21:14– 29.
    Google ScholarLocate open access versionFindings
  • David C Blest. 2000.
    Google ScholarFindings
  • Theory & methods: Rank correlation—an alternative measure. Australian & New Zealand Journal of Statistics, 42(1):101–111.
    Google ScholarLocate open access versionFindings
  • Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146.
    Google ScholarLocate open access versionFindings
  • JC de Borda. 1784. Memoire sur les elections au scrutin. Histoire de l’Academie Royale des Sciences pour 1781 (Paris, 1784).
    Google ScholarFindings
  • Elia Bruni, Nam-Khanh Tran, and Marco Baroni. 2014. Multimodal distributional semantics. Journal of artificial intelligence research, 49:1–47.
    Google ScholarLocate open access versionFindings
  • Songmei Cai, Zhao Lu, and Junzhong Gu. 2010. An effective measure of semantic similarity. In Advances in Wireless Networks and Information Systems, pages 9–17. Springer.
    Google ScholarLocate open access versionFindings
  • Manuela Cattelan. 2012. Models for paired comparison data: A review with emphasis on dependent data. Statistical Science, pages 412–433.
    Google ScholarLocate open access versionFindings
  • Fuzan Chen, Chenghua Lu, Harris Wu, and Minqiang Li. 2017. A semantic similarity measure integrating multiple conceptual relationships for web service discovery. Expert Systems with Applications, 67:19–31.
    Google ScholarLocate open access versionFindings
  • Joaquim Pinto da Costa and Carlos Soares. 2005. A weighted rank measure of correlation. Australian & New Zealand Journal of Statistics, 47(4):515–529.
    Google ScholarLocate open access versionFindings
  • Elise AV Crompvoets, Anton A Beguin, and Klaas Sijtsma. 2019. Adaptive pairwise comparison for educational measurement. Journal of Educational and Behavioral Statistics, page 1076998619890589.
    Google ScholarLocate open access versionFindings
  • Livia Dancelli, Marica Manisera, and Marika Vezzoli. 2013. On two classes of weighted rank correlation measures deriving from the spearman’s ρ. In Statistical Models for Data Analysis, pages 107–114. Springer.
    Google ScholarLocate open access versionFindings
  • Marco De Gemmis, Pasquale Lops, Cataldo Musto, Fedelucio Narducci, and Giovanni Semeraro. 2015. Semantics-aware content-based recommender systems. In Recommender Systems Handbook, pages 119–159. Springer.
    Google ScholarLocate open access versionFindings
  • Marco De Gemmis, Pasquale Lops, Giovanni Semeraro, and Pierpaolo Basile. 2008. Integrating tags in a semantic content-based recommender. In Proceedings of the 2008 ACM conference on Recommender systems, pages 163–170.
    Google ScholarLocate open access versionFindings
  • Danilo Dessı, Mauro Dragoni, Gianni Fenu, Mirko Marras, and Diego Reforgiato Recupero. 2019. Evaluating neural word embeddings created from online course reviews for sentiment analysis. In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, pages 2124–2127.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    Findings
  • Georgiana Dinu, Angeliki Lazaridou, and Marco Baroni. 2014. Improving zero-shot learning by mitigating the hubness problem. arXiv preprint arXiv:1412.6568.
    Findings
  • Daniel M Ennis. 2016. Thurstonian models: Categorical decision making in the presence of noise. Institute for Perception.
    Google ScholarFindings
  • Manaal Faruqui, Yulia Tsvetkov, Pushpendre Rastogi, and Chris Dyer. 2016. Problems with evaluation of word embeddings using word similarity tasks. arXiv preprint arXiv:1605.02276.
    Findings
  • Roman Feldbauer, Maximilian Leodolter, Claudia Plant, and Arthur Flexer. 2018. Fast approximate hubness reduction for large high-dimensional data. In 2018 IEEE International Conference on Big Knowledge (ICBK), pages 358–367. IEEE.
    Google ScholarLocate open access versionFindings
  • Damien Francois, Vincent Wertz, and Michel Verleysen. 2007. The concentration of fractional distances. IEEE Transactions on Knowledge and Data Engineering, 19(7):873–886.
    Google ScholarLocate open access versionFindings
  • Johannes Furnkranz and Eyke Hullermeier. 2010. Preference learning and ranking by pairwise comparison. In Preference learning, pages 65–82. Springer.
    Google ScholarLocate open access versionFindings
  • Georgi V Georgiev and Danko D Georgiev. 2018. Enhancing user creativity: Semantic measures for idea generation. Knowledge-Based Systems, 151:1–15.
    Google ScholarLocate open access versionFindings
  • Fausto Giunchiglia, Pavel Shvaiko, and Mikalai Yatskevich. 2004. S-match: an algorithm and an implementation of semantic matching. In European semantic web symposium, pages 61–75. Springer.
    Google ScholarLocate open access versionFindings
  • Anna Gladkova and Aleksandr Drozd. 2016. Intrinsic evaluations of word embeddings: What can we do better? In Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, pages 36–42.
    Google ScholarLocate open access versionFindings
  • Iryna Gurevych, Christof Muller, and Torsten Zesch. 2007. What to be?-electronic career guidance based on semantic relatedness. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 1032–1039.
    Google ScholarLocate open access versionFindings
  • Harry Halpin, Daniel M Herzig, Peter Mika, Roi Blanco, Jeffrey Pound, Henry Thompon, and Duc Thanh Tran. 2010. Evaluating ad-hoc object retrieval. In IWEST@ ISWC.
    Google ScholarFindings
  • Sebastien Harispe, Sylvie Ranwez, and Stefan Janaqi. 2015. Semantic similarity from natural language and ontology analysis. Morgan & Claypool Publishers.
    Google ScholarFindings
  • Reinhard Heckel, Nihar B Shah, Kannan Ramchandran, Martin J Wainwright, et al. 2019. Active ranking from pairwise comparisons and when parametric assumptions do not help. The Annals of Statistics, 47(6):3099–3126.
    Google ScholarLocate open access versionFindings
  • Reinhard Heckel, Max Simchowitz, Kannan Ramchandran, and Martin J Wainwright. 2018. Approximate ranking from pairwise comparisons. arXiv preprint arXiv:1801.01253.
    Findings
  • Angelos Hliaoutakis, Giannis Varelas, Epimenidis Voutsakis, Euripides GM Petrakis, and Evangelos Milios. 2006. Information retrieval by semantic similarity. International journal on semantic Web and information systems (IJSWIS), 2(3):55–73.
    Google ScholarLocate open access versionFindings
  • Ronald L Iman and WJ Conover. 1987. A measure of top–down correlation. Technometrics, 29(3):351– 357.
    Google ScholarLocate open access versionFindings
  • (https://inda.ai).
    Findings
  • Kevin G Jamieson and Robert Nowak. 2011. Active ranking using pairwise comparisons. In Advances in Neural Information Processing Systems, pages 2240–2248.
    Google ScholarLocate open access versionFindings
  • Xiaonan Ji, Alan Ritter, and Po-Yin Yen. 2017. Using ontology-based semantic similarity to facilitate the article screening process for systematic reviews. Journal of biomedical informatics, 69:33–42.
    Google ScholarLocate open access versionFindings
  • Yong Jiang, Xinmin Wang, and Hai-Tao Zheng. 2014. A semantic similarity measure based on information distance for ontology alignment. Information Sciences, 278:76–87.
    Google ScholarLocate open access versionFindings
  • Maurice G Kendall. 1938. A new measure of rank correlation. Biometrika, 30(1/2):81–93.
    Google ScholarLocate open access versionFindings
  • Maurice George Kendall. 1948. Rank correlation methods.
    Google ScholarFindings
  • Svetlana Kiritchenko and Saif M Mohammad. 2017. Best-worst scaling more reliable than rating scales: A case study on sentiment intensity annotation. arXiv preprint arXiv:1712.01765.
    Findings
  • William H Kruskal. 1958. Ordinal measures of association. Journal of the American Statistical Association, 53(284):814–861.
    Google ScholarLocate open access versionFindings
  • Juan J Lastra-Dıaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana Garcıa-Serrano, Mohamed Ben Aouicha, and Eneko Agirre. 2019. A reproducible survey on word embeddings and ontology-based methods for word similarity: linear combinations outperform the state of the art. Engineering Applications of Artificial Intelligence, 85:645–665.
    Google ScholarLocate open access versionFindings
  • Hang Li and Jun Xu. 2014. Semantic matching in search. Foundations and Trends in Information retrieval, 7(5):343–469.
    Google ScholarLocate open access versionFindings
  • Inigo Lopez-Gazpio, Montse Maritxalar, Aitor Gonzalez-Agirre, German Rigau, Larraitz Uria, and Eneko Agirre. 2017. Interpretable semantic textual similarity: Finding and explaining differences between sentences. Knowledge-Based Systems, 119:186–199.
    Google ScholarLocate open access versionFindings
  • Pasquale Lops, Marco De Gemmis, and Giovanni Semeraro. 2011. Content-based recommender systems: State of the art and trends. In Recommender systems handbook, pages 73–105. Springer.
    Google ScholarLocate open access versionFindings
  • Jordan J Louviere and George G Woodworth. 1991. Best-worst scaling: A model for the largest difference judgments. University of Alberta: Working Paper.
    Google ScholarFindings
  • Tahani A Maturi and Ezz H Abdelfattah. 2008. A new weighted rank correlation. Journal of mathematics and statistics., 4(4):226–230.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
    Findings
  • Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. 2017. Advances in pre-training distributed word representations. arXiv preprint arXiv:1712.09405.
    Findings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013b. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119.
    Google ScholarLocate open access versionFindings
  • Dunja Mladenic. 1999. Text-learning and related intelligent agents: a survey. IEEE intelligent systems and their applications, 14(4):44–54.
    Google ScholarLocate open access versionFindings
  • Sahand Negahban, Sewoong Oh, and Devavrat Shah. 2017. Rank centrality: Ranking from pairwise comparisons. Operations Research, 65(1):266–287.
    Google ScholarLocate open access versionFindings
  • Farhad Nooralahzadeh, Lilja Øvrelid, and Jan Tore Lønning. 2018. Evaluation of domain-specific word embeddings using knowledge resources. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).
    Google ScholarLocate open access versionFindings
  • Dohyung Park, Joe Neeman, Jin Zhang, Sujay Sanghavi, and Inderjit Dhillon. 2015. Preference completion: Large-scale collaborative ranking from pairwise comparisons. In International Conference on Machine Learning, pages 1907–1916.
    Google ScholarLocate open access versionFindings
  • Rashmi Patel, Jessica Irving, Matthew Taylor, Hitesh Shetty, Megan Pritchard, Robert Stewart, Paolo Fusar-Poli, and Philip McGuire. 2020. T109. traversing the transdiagnostic gap between depression, mania and psychosis with natural language processing. Schizophrenia Bulletin, 46(Supplement 1):S272–S273.
    Google ScholarLocate open access versionFindings
  • Siddharth Patwardhan, Satanjeev Banerjee, and Ted Pedersen. 2003. Using measures of semantic relatedness for word sense disambiguation. In International Conference on Intelligent Text Processing and Computational Linguistics, pages 241–257. Springer.
    Google ScholarLocate open access versionFindings
  • Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.
    Google ScholarLocate open access versionFindings
  • Chuan Qin, Hengshu Zhu, Tong Xu, Chen Zhu, Liang Jiang, Enhong Chen, and Hui Xiong. 2018. Enhancing person-job fit for talent recruitment: An abilityaware neural network approach. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pages 25–34.
    Google ScholarLocate open access versionFindings
  • Milos Radovanovic, Alexandros Nanopoulos, and Mirjana Ivanovic. 2010a. Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research, 11(Sep):2487–2531.
    Google ScholarLocate open access versionFindings
  • Milos Radovanovic, Alexandros Nanopoulos, and Mirjana Ivanovic. 2010b. On the existence of obstinate results in vector space models. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 186–193.
    Google ScholarLocate open access versionFindings
  • Anna Rogers, Shashwath Hosur Ananthakrishna, and Anna Rumshisky. 2018. What’s in your embedding, and how it predicts task performance. In Proceedings of the 27th International Conference on Computational Linguistics, pages 2690–2703.
    Google ScholarLocate open access versionFindings
  • David Sanchez and Antonio Moreno. 2008. Learning non-taxonomic relationships from web documents for domain ontology construction. Data & Knowledge Engineering, 64(3):600–623.
    Google ScholarLocate open access versionFindings
  • Tobias Schnabel, Igor Labutov, David Mimno, and Thorsten Joachims. 2015. Evaluation methods for unsupervised word embeddings. In Proceedings of the 2015 conference on empirical methods in natural language processing, pages 298–307.
    Google ScholarLocate open access versionFindings
  • Grace S Shieh. 1998. A weighted kendall’s tau statistic. Statistics & probability letters, 39(1):17–24.
    Google ScholarLocate open access versionFindings
  • Charles Spearman. 1961. The proof and measurement of association between two things.
    Google ScholarFindings
  • Rohini K Srihari, Zhongfei Zhang, and Aibing Rao. 2000. Intelligent indexing and semantic retrieval of multimodal documents. Information Retrieval, 2(23):245–275.
    Google ScholarLocate open access versionFindings
  • Keet Sugathadasa, Buddhi Ayesha, Nisansa de Silva, Amal Shehan Perera, Vindula Jayawardana, Dimuthu Lakmal, and Madhavi Perera. 2017. Synergistic union of word2vec and lexicon for domain specific semantic similarity. In 2017 IEEE International Conference on Industrial and Information Systems (ICIIS), pages 1–6. IEEE.
    Google ScholarLocate open access versionFindings
  • Mohamed Ali Hadj Taieb, Torsten Zesch, and Mohamed Ben Aouicha. 2019. A survey of semantic relatedness evaluation datasets and procedures. Artificial Intelligence Review, pages 1–42.
    Google ScholarLocate open access versionFindings
  • Louis L Thurstone. 1927. A law of comparative judgment. Psychological review, 34(4):273.
    Google ScholarLocate open access versionFindings
  • Mohammed Nazim Uddin, Trong Hai Duong, Ngoc Thanh Nguyen, Xin-Min Qi, and Geun Sik Jo. 2013. Semantic similarity measures for enhancing information retrieval in folksonomies. Expert Systems with Applications, 40(5):1645–1653.
    Google ScholarLocate open access versionFindings
  • Sebastiano Vigna. 2015. A weighted correlation index for rankings with ties. In Proceedings of the 24th international conference on World Wide Web, pages 1166–1176.
    Google ScholarLocate open access versionFindings
  • Shengxian Wan, Yanyan Lan, Jiafeng Guo, Jun Xu, Liang Pang, and Xueqi Cheng. 2016. A deep architecture for semantic matching with multiple positional sentence representations. In Thirtieth AAAI Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Bin Wang, Angela Wang, Fenxiao Chen, Yuncheng Wang, and C-C Jay Kuo. 2019. Evaluating word embedding models: methods and experimental results. APSIPA Transactions on Signal and Information Processing, 8.
    Google ScholarLocate open access versionFindings
  • Fabian Wauthier, Michael Jordan, and Nebojsa Jojic. 2013. Efficient ranking from pairwise comparisons. In International Conference on Machine Learning, pages 109–117.
    Google ScholarLocate open access versionFindings
  • William Webber, Alistair Moffat, and Justin Zobel. 2010. A similarity measure for indefinite rankings. ACM Transactions on Information Systems (TOIS), 28(4):1–38.
    Google ScholarLocate open access versionFindings
Author
Pierangelo Lombardo
Pierangelo Lombardo
Alessio Boiardi
Alessio Boiardi
Luca Colombo
Luca Colombo
Angelo Schiavone
Angelo Schiavone
Nicolò Tamagnone
Nicolò Tamagnone
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科