A data-driven approach to studying changing vocabularies in historical newspaper collections

Simon Hengchen,Ruben Ros,Jani Marjanen,Mikko Tolonen

DIGITAL SCHOLARSHIP IN THE HUMANITIES（2021）

引用 1|浏览4

暂无评分

摘要

Nation and nationhood are among the most frequently studied concepts in the field of intellectual history. At the same time, theword 'nation' and its historical usage are very vague. The aim in this article was to develop a data-drivenmethod using dependency parsing and neuralword embeddings to clarify some of the vagueness in the evolution of this concept. To this end, we propose the following two-step method. First, using linguistic processing, we create a large set of words pertaining to the topic of nation. Second, we traindiachronicwordembeddings anduse themto quantify the strength of the semantic similarity between these words and thereby create meaningful clusters, which are then aligned diachronically. To illustrate the robustness of the study across languages, time spans, as well as large datasets, we apply it to the entirety of five historical newspaper archives in Dutch, Swedish, Finnish, and English. To our knowledge, thus far there have been no large-scale comparative studies of this kind that purport to grasp long-term developments in as many as four different languages in a data-driven way. A particular strength of themethod we describe in this article is that, by design, it is not limited to the study of nationhood, but rather expands beyond it to other research questions and is reusable in different contexts.

查看译文

关键词

historical newspaper collections,vocabularies,data-driven data-driven

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要