Incorporating noun compounds in distributional-based semantic representation approaches for measuring semantic relatedness.

Abdulgabbar Saif,Nazlia Omar,Ummi Zakiah Zainodin

IJRIS（2019）

引用 0|浏览2

暂无评分

摘要

Identifying noun compounds in natural language documents is very important for handling their various linguistic features, such as semantic, syntactic, and pragmatic features. In this study, we introduce a knowledge-based method for incorporating noun compounds in distributional-based semantic representation approaches. Wikipedia is exploited as a knowledge resource for extracting noun compounds based on its structural features. The categories are then used to classify the extracted noun compounds as linguistic terms and named entities. Next, the look-up list technique is employed to identify the noun compounds when extracting the semantics of the terms using the corpus-based approach for semantic representation. To obtain the semantic representation, we use five well-known distributional-based approaches: latent semantic analysis (LSA), hyperspace analogue to language (HAL), correlated occurrence analogue to lexical semantic (COALS), bound encoding of the aggregate language environment (BEAGLE), and explicit semantic analysis (ESA). The proposed method was evaluated by measuring the semantic relatedness using five benchmark datasets employed in previous studies. The experimental results demonstrate that incorporating noun compounds in the distributional-based semantic representation helps to improve the semantic evidence for the relationships among words.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要