Study of Similarity Measures as Features in Classification for Answer Sentence Selection Task in Hindi Question Answering: Language-Specific v/s Other Measures.

Devika Verma,Ramprasad Joshi,Shubhamkar Joshi,Onkar Susladkar

Pacific Asia Conference on Language, Information and Computation (PACLIC)（2021）

引用 0|浏览1

暂无评分

摘要

Answer sentence selection is an important sub-task in Question Answering (QA) that determines the correct answer sentence from a passage. This task can naturally be reduced to the semantic text similarity problem between question and answer candidate. In this work, we investigate the significance of various similarity measures for the answer sentence selection task in Hindi an Indo-Aryan language. Karaka relations is the core of dependency annotation scheme used for Hindi and are crucial to syntactico-semantic analysis of the sentence. We investigate this, and compare them to other, hitherto known measures. To investigate and compare the utility of various measures, we develop a test-bench over a benchmark Hindi and English multilingual QA corpus for comparison, making two tool-chains and designing empirical experiments across combinations of similarity measures, sentence embedding schemes, and supervised machine learning models for classification. Combining Karaka relations with different similarity measures shows significant performance improvement for sentence selection task, suggesting them as potentially a semantic similarity measure. Moreover, our results give us confidence that refinement of Karaka relations extraction to optimal quality will reduce the need for availability of large pre-trained language models.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要