
Study of Similarity Measures as Features in Classification for Answer Sentence Selection Task in Hindi Question Answering: Language-Specific v/s Other Measures.

Pacific Asia Conference on Language, Information and Computation (PACLIC)(2021)

引用 0|浏览1
Answer sentence selection is an important sub-task in Question Answering (QA) that determines the correct answer sentence from a passage. This task can naturally be reduced to the semantic text similarity problem between question and answer candidate. In this work, we investigate the significance of various similarity measures for the answer sentence selection task in Hindi an Indo-Aryan language. Karaka relations is the core of dependency annotation scheme used for Hindi and are crucial to syntactico-semantic analysis of the sentence. We investigate this, and compare them to other, hitherto known measures. To investigate and compare the utility of various measures, we develop a test-bench over a benchmark Hindi and English multilingual QA corpus for comparison, making two tool-chains and designing empirical experiments across combinations of similarity measures, sentence embedding schemes, and supervised machine learning models for classification. Combining Karaka relations with different similarity measures shows significant performance improvement for sentence selection task, suggesting them as potentially a semantic similarity measure. Moreover, our results give us confidence that refinement of Karaka relations extraction to optimal quality will reduce the need for availability of large pre-trained language models.
AI 理解论文
Chat Paper