Exploring language relations through syntactic distances and geographic proximity
CoRR(2024)
摘要
Languages are grouped into families that share common linguistic traits.
While this approach has been successful in understanding genetic relations
between diverse languages, more analyses are needed to accurately quantify
their relatedness, especially in less studied linguistic levels such as syntax.
Here, we explore linguistic distances using series of parts of speech (POS)
extracted from the Universal Dependencies dataset. Within an
information-theoretic framework, we show that employing POS trigrams maximizes
the possibility of capturing syntactic variations while being at the same time
compatible with the amount of available data. Linguistic connections are then
established by assessing pairwise distances based on the POS distributions.
Intriguingly, our analysis reveals definite clusters that correspond to well
known language families and groups, with exceptions explained by distinct
morphological typologies. Furthermore, we obtain a significant correlation
between language similarity and geographic distance, which underscores the
influence of spatial proximity on language kinships.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要