AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We show that the new method assigns timescales more accurately than previous methods that relied on changing the length of the context to the long short-term memory language models

Interpretable multi-timescale models for predicting fMRI responses to continuous natural speech

NIPS 2020, (2020)

Cited by: 0|Views23
EI
Full Text
Bibtex
Weibo

Abstract

Natural language contains information at multiple timescales. To understand how the human brain represents this information, one approach is to build encoding models that predict fMRI responses to natural language using representations extracted from neural network language models (LMs). However, these LM-derived representations do not ex...More

Code:

Data:

0
Introduction
  • Natural language contains information at multiple timescales, ranging from phonemes to narratives [1].
  • Stages represent acoustic and word information at the sub-second scales, while at later stages information is combined over many seconds to derive meaning [3, 4]
  • These representations have.
  • Short timescale areas respond to all these stimuli, while long timescale areas respond weakly to language scrambled at the word or sentence scale
  • This method has been effective at revealing where these representations are in the brain, but is unable to reveal how these representations are computed or what information is contained in each representation beyond its timescale
Highlights
  • Natural language contains information at multiple timescales, ranging from phonemes to narratives [1]
  • The encoding model transform g(S) can be effectively modeled by long short-term memory (LSTM) hidden state representations [8]. We directly extend this approach by building interpretable, multi-timescale LSTM language models (LMs) for encoding models to facilitate a principled analysis of the cortical temporal hierarchy for language
  • Previous work that uses scrambling experiments to estimate timescales can only make coarse distinctions in Tv based on the response of a voxel to stimuli that are scrambled at different temporal scales [2]
  • To create the encoding model features, representations are extracted from a modified LSTM LM with explicitly fixed timescales for each hidden unit
  • We show that the new method assigns timescales more accurately than previous methods that relied on changing the length of the context to the LSTM LM
  • Our work creates a framework for developing interpretable, LM-based encoding models that can be used to formulate and test new hypotheses about timescale representations in the brain
Results
  • The multi-timescale (MT) encoding model uses a stateful, multi-timescale LSTM followed by RBF interpolation to model g(S).
  • Multi-timescale language encoding models represent stimuli in a high-dimensional, densely sampled temporal feature space.
  • This facilitates a principled and fine-grained estimation of Tv as described in Section 5.
  • In the precuneus (Pr), the authors observe a medium to long timescale gradient along a ventral to dorsal axis
  • These patterns are broadly in agreement with prior findings [2], but offer additional fine-grained detail
Conclusion
  • This work presents a multi-timescale encoding model for predicting fMRI responses to natural speech.
  • To create the encoding model features, representations are extracted from a modified LSTM LM with explicitly fixed timescales for each hidden unit.
  • The authors' new interpolation method improved timescale estimates across a variety of brain regions
  • This can be generalized for use in any encoding model study investigating timescale representations in the brain, and is not specific to natural language or LSTMs. The authors' work illustrates that improving the interpretability of neural networks can yield more interpretable formal models of the brain
Summary
  • Introduction:

    Natural language contains information at multiple timescales, ranging from phonemes to narratives [1].
  • Stages represent acoustic and word information at the sub-second scales, while at later stages information is combined over many seconds to derive meaning [3, 4]
  • These representations have.
  • Short timescale areas respond to all these stimuli, while long timescale areas respond weakly to language scrambled at the word or sentence scale
  • This method has been effective at revealing where these representations are in the brain, but is unable to reveal how these representations are computed or what information is contained in each representation beyond its timescale
  • Objectives:

    The authors' goal is to obtain Br, the downsampled feature response, where r ∈ {1, 2, . . . , nT R} is the corresponding volume index in the fMRI acquisition.
  • Results:

    The multi-timescale (MT) encoding model uses a stateful, multi-timescale LSTM followed by RBF interpolation to model g(S).
  • Multi-timescale language encoding models represent stimuli in a high-dimensional, densely sampled temporal feature space.
  • This facilitates a principled and fine-grained estimation of Tv as described in Section 5.
  • In the precuneus (Pr), the authors observe a medium to long timescale gradient along a ventral to dorsal axis
  • These patterns are broadly in agreement with prior findings [2], but offer additional fine-grained detail
  • Conclusion:

    This work presents a multi-timescale encoding model for predicting fMRI responses to natural speech.
  • To create the encoding model features, representations are extracted from a modified LSTM LM with explicitly fixed timescales for each hidden unit.
  • The authors' new interpolation method improved timescale estimates across a variety of brain regions
  • This can be generalized for use in any encoding model study investigating timescale representations in the brain, and is not specific to natural language or LSTMs. The authors' work illustrates that improving the interpretability of neural networks can yield more interpretable formal models of the brain
Study subjects and analysis
human subjects: 6
2.1 Natural speech fMRI experiment. To build encoding models for language, we used data from an fMRI experiment comprising 6 human subjects (3 female) listening to spoken narrative stories from The Moth Radio Hour (an English language podcast) [13]. These rich, complex stimuli are highly representative of language that humans encounter on a daily basis

subjects: 6
Estimated timescale Tv across cortex for significant voxels (p < 0.05, FDR-corrected) in the MT model. Voxels with longer estimated timescales are shown in blue, and those with short timescales are shown in red on the flatmap. Non-significantly predicted voxels are gray. The right inset shows a schematic of timescale estimates from previous work [2] based on stimulus scrambling. Our approach corroborates patterns highlighted in the inset, but provides a more detailed account of timescales across the cortex. The bottom inset shows a histogram of voxel correlations and the significance threshold. AC: Auditory Cortex, Pr: Precuneus, PFC: Prefrontal cortex. Similar maps for other subjects are shown in Supplementary Section 1. Similar maps for other subjects are shown in Supplementary Section 1. Comparing estimated timescales Tv across two different temporal downsampling schemes. Both use stateful, multi-timescale LSTMs (layer 2) to model g(S). One scheme downsamples representations by interpolating with an RBF kernel while the δ-sum model sums across δ-functions. Histograms show the distributions of Tv for significant voxels in each ROI across all 6 subjects. The black vertical line in the histogram shows the mean Tv across cortex for each downsampling scheme. The flatmap and inflated 3D brain are for a single subject. In the AC, δ-sum inaccurately assigns more medium to long timescales. In the precuneus and PFC, the δ-sum model overrepresents short timescales. This highlights the drawbacks of the δ-sum approach for creating encoding features that operate at different timescales. In contrast, the RBF interpolated model appropriately estimates timescales in all brain regions. Colormap follows. Estimating timescale by manipulating the context length (CL) is a less interpretable method. A stateless LSTM was used to create encoding models for CLs [0, 2, 4, 8, 16, 32, 64]. The timescale of each voxel was estimated with a CL preference index [8]. Only voxels significant in all CL models are shown. In AC, some voxels have long CL preferences. Further analysis reveals that long CL representations still retain short-timescale information. Similar maps for other subjects are shown in Supplementary Section 1

Reference
  • Henry W. Lin and Max Tegmark. Critical behavior in physics and probabilistic formal languages. Entropy, 19(7):299, Jun 2017.
    Google ScholarLocate open access versionFindings
  • Yulia Lerner, Christopher J. Honey, Lauren J. Silbert, and Uri Hasson. Topographic mapping of a hierarchy of temporal receptive windows using a narrated story. Journal of Neuroscience, 31(8):2906–2915, 2011.
    Google ScholarLocate open access versionFindings
  • Christopher J. Honey, Thomas Thesen, Tobias H. Donner, Lauren J. Silbert, Chad E. Carlson, Orrin Devinsky, Werner K. Doyle, Nava Rubin, David J. Heeger, and Uri Hasson. Slow cortical dynamics and the accumulation of information over long timescales. Neuron, 76(2):423 – 434, 2012.
    Google ScholarLocate open access versionFindings
  • Wendy A. de Heer, Alexander G. Huth, Thomas L. Griffiths, Jack L. Gallant, and Frédéric E. Theunissen. The hierarchical cortical organization of human speech processing. Journal of Neuroscience, 37(27):6539–6557, 2017.
    Google ScholarLocate open access versionFindings
  • Tom M. Mitchell, Svetlana V. Shinkareva, Andrew Carlson, Kai-Min Chang, Vicente L. Malave, Robert A. Mason, and Marcel Adam Just. Predicting human brain activity associated with the meanings of nouns. Science, 320(5880):1191–1195, 2008.
    Google ScholarLocate open access versionFindings
  • Leila Wehbe, Brian Murphy, Partha Talukdar, Alona Fyshe, Aaditya Ramdas, and Tom Mitchell. Simultaneously uncovering the patterns of brain regions involved in different story reading subprocesses. PLOS ONE, 9(11):1–19, 11 2014.
    Google ScholarLocate open access versionFindings
  • Alexander G. Huth, Wendy A. de Heer, Thomas L. Griffiths, Frédéric E. Theunissen, and Jack L. Gallant. Natural speech reveals the semantic maps that tile human cerebral cortex. Nature, 532(7600):453–458, 2016.
    Google ScholarLocate open access versionFindings
  • Shailee Jain and Alexander Huth. Incorporating context into language encoding models for fMRI. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 6628–6637. Curran Associates, Inc., 2018.
    Google ScholarLocate open access versionFindings
  • Mariya Toneva and Leila Wehbe. Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain). In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’ Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 14954–14964. Curran Associates, Inc., 2019.
    Google ScholarLocate open access versionFindings
  • Corentin Tallec and Yann Ollivier. Can recurrent neural networks warp time? In Proceedings of the 6th International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • Shivangi Mahto, Vy A. Vo, Javier S. Turek, and Alexander G. Huth. Multi-timescale representation learning in lstm language models, 2020.
    Google ScholarFindings
  • Shinji Nishimoto, An T Vu, Thomas Naselaris, Yuval Benjamini, Bin Yu, and Jack L Gallant. Reconstructing visual experiences from brain activity evoked by natural movies. Current Biology, 21(19):1641–1646, 2011.
    Google ScholarLocate open access versionFindings
  • https://themoth.org. The moth radio hour, 2020.
    Findings
  • Jiahong Yuan and Mark Liberman. Speaker identification on the scotus corpus. Journal of the Acoustical Society of America, 123(5):3878, 2008.
    Google ScholarLocate open access versionFindings
  • Michael C.-K. Wu, Stephen V. David, and Jack L. Gallant. Complete functional characterization of sensory neurons by system identification. Annual Review of Neuroscience, 29(1):477–505, 2006. PMID: 16776594.
    Locate open access versionFindings
  • Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289–300, 1995.
    Google ScholarLocate open access versionFindings
  • Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227–2237, New Orleans, Louisiana, June 2018. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Matthew Peters, Waleed Ammar, Chandra Bhagavatula, and Russell Power. Semi-supervised sequence tagging with bidirectional language models. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1756–1765, Vancouver, Canada, July 2017. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. Learned in translation: Contextualized word vectors. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 6294–6305. Curran Associates, Inc., 2017.
    Google ScholarLocate open access versionFindings
  • Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving language understanding by generative pre-training. Technical report, OpenAI, 2018.
    Google ScholarFindings
  • Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. Technical report, OpenAI, 2019.
    Google ScholarFindings
  • Stephen Merity, Nitish Shirish Keskar, and Richard Socher. Regularizing and optimizing LSTM language models. In Proceedings of the 6th International Conference on Learning Representations, 2018.
    Google ScholarLocate open access versionFindings
  • Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models. In Proceedings of the 5th International Conference on Learning Representations, 2017.
    Google ScholarLocate open access versionFindings
  • Hsiang-Yun Sherry Chien and Christopher J. Honey. Constructing and forgetting temporal context in the human cerebral cortex. Neuron, 106(4):675–686, May 2020.
    Google ScholarLocate open access versionFindings
  • Greg J. Stephens, Christopher J. Honey, and Uri Hasson. A place for time: the spatiotemporal structure of neural dynamics during natural audition. Journal of Neurophysiology, 110(9):2019–2026, Aug 2013.
    Google ScholarLocate open access versionFindings
  • Julia M. Huntenburg, Pierre-Louis Bazin, and Daniel S. Margulies. Large-scale gradients in human cortical organization. Trends in Cognitive Sciences, 22(1):21–31, Jan 2018.
    Google ScholarLocate open access versionFindings
  • Biyu J. He, John M. Zempel, Abraham Z. Snyder, and Marcus E. Raichle. The temporal structures and functional significance of scale-free brain activity. Neuron, 66(3):353–369, May 2010.
    Google ScholarLocate open access versionFindings
  • John D. Murray, Alberto Bernacchia, David J. Freedman, Ranulfo Romo, Jonathan D. Wallis, Xinying Cai, Camillo Padoa-Schioppa, Tatiana Pasternak, Hyojung Seo, Daeyeol Lee, and et al. A hierarchy of intrinsic timescales across primate cortex. Nature Neuroscience, 17(12):1661–1663, Dec 2014.
    Google ScholarLocate open access versionFindings
  • Rishidev Chaudhuri, Kenneth Knoblauch, Marie-Alice Gariel, Henry Kennedy, and Xiao-Jing Wang. A large-scale circuit mechanism for hierarchical dynamical processing in the primate cortex. Neuron, 88(2):419–431, Oct 2015.
    Google ScholarLocate open access versionFindings
  • Mehran Spitmaan, Hyojung Seo, Daeyeol Lee, and Alireza Soltani. Multiple timescales of neural dynamics and integration of task-relevant signals across cortex. Proceedings of the National Academy of Sciences, 117(36):22522–22531, Sep 2020.
    Google ScholarLocate open access versionFindings
Author
Shailee Jain
Shailee Jain
Vy Vo
Vy Vo
Shivangi Mahto
Shivangi Mahto
Amanda LeBel
Amanda LeBel
Your rating :
0

 

Tags
Comments
小科