A Platform for Creating Multimodal and Multilingual Spoken Corpora for Turkic Languages: Insights from the Spoken Turkish Corpus

user-5f03edee4c775ed682ef5237（2012）

引用 9|浏览13

暂无评分

摘要

Based on insights gained from the corpus design and corpus management work involved in the compilation of the Spoken Turkish Corpus (STC), this paper addresses the possibility of developing sustainable, comparable, multimodal spoken corpora for facilitating comparative studies on Turkic Languages, with the capacities of a digital platform that incorporates EXMARaLDA software suite and a web-based corpus management system (STC-CMS), which together provide an interoperable system that can be customized for the creation of spoken and written corpora. Section 2 highlights the significance of multimodal corpus resources for comparative research and the development of technologies, and describes the implementation in STC, especially focusing on its metadata parameters and the flexibility of its transcription tools for representing cross-linguistic variation. Section 3 addresses the issue of developing common infrastructure for corpus compilation that can facilitate data transfer between resources. The paper concludes with a brief discussion on the challenge for creating comparable spoken corpora for the Turkic languages in regard to orthographic systems.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要