谷歌浏览器插件
订阅小程序
在清言上使用

A studyforrest extension, an annotation of spoken language in the German dubbed movie "Forrest Gump" and its audio-description.

F1000Research(2021)

引用 4|浏览2
暂无评分
摘要
Here we present an annotation of speech in the audio-visual movie "Forrest Gump" and its audio-description for a visually impaired audience, as an addition to a large public functional brain imaging dataset ( studyforrest.org). The annotation provides information about the exact timing of each of the more than 2500 spoken sentences, 16,000 words (including 202 non-speech vocalizations), 66,000 phonemes, and their corresponding speaker. Additionally, for every word, we provide lemmatization, a simple part-of-speech-tagging (15 grammatical categories), a detailed part-of-speech tagging (43 grammatical categories), syntactic dependencies, and a semantic analysis based on word embedding which represents each word in a 300-dimensional semantic space. To validate the dataset's quality, we build a model of hemodynamic brain activity based on information drawn from the annotation. Results suggest that the annotation's content and quality enable independent researchers to create models of brain activity correlating with a variety of linguistic aspects under conditions of near-real-life complexity.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要