Do self-supervised speech and language models extract similar representations as human brain?
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)
摘要
Speech and language models trained through self-supervised learning (SSL)
demonstrate strong alignment with brain activity during speech and language
perception. However, given their distinct training modalities, it remains
unclear whether they correlate with the same neural aspects. We directly
address this question by evaluating the brain prediction performance of two
representative SSL models, Wav2Vec2.0 and GPT-2, designed for speech and
language tasks. Our findings reveal that both models accurately predict speech
responses in the auditory cortex, with a significant correlation between their
brain predictions. Notably, shared speech contextual information between
Wav2Vec2.0 and GPT-2 accounts for the majority of explained variance in brain
activity, surpassing static semantic and lower-level acoustic-phonetic
information. These results underscore the convergence of speech contextual
representations in SSL models and their alignment with the neural network
underlying speech perception, offering valuable insights into both SSL models
and the neural basis of speech and language processing.
更多查看译文
关键词
self-supervised model,speech perception,auditory cortex,brain encoding,electrocorticography
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要