Decision Letter: Learning Cortical Representations Through Perturbed and Adversarial Dreaming

crossref（2022）

引用 0|浏览6

暂无评分

摘要

Article Figures and data Abstract Editor's evaluation Introduction Results Discussion Materials and methods Appendix 1 Data availability References Decision letter Author response Article and author information Metrics Abstract Humans and other animals learn to extract general concepts from sensory experience without extensive teaching. This ability is thought to be facilitated by offline states like sleep where previous experiences are systemically replayed. However, the characteristic creative nature of dreams suggests that learning semantic representations may go beyond merely replaying previous experiences. We support this hypothesis by implementing a cortical architecture inspired by generative adversarial networks (GANs). Learning in our model is organized across three different global brain states mimicking wakefulness, non-rapid eye movement (NREM), and REM sleep, optimizing different, but complementary, objective functions. We train the model on standard datasets of natural images and evaluate the quality of the learned representations. Our results suggest that generating new, virtual sensory inputs via adversarial dreaming during REM sleep is essential for extracting semantic concepts, while replaying episodic memories via perturbed dreaming during NREM sleep improves the robustness of latent representations. The model provides a new computational perspective on sleep states, memory replay, and dreams, and suggests a cortical implementation of GANs. Editor's evaluation This paper presents a generative adversarial network-inspired model of how learning during wakefulness, non-rapid eye movement (NREM), and REM sleep work together to facilitate the emergence of object category representations. The model is impressive in its ability to shape representations based on internally generated activity that does not directly recapitulate prior experience, and has properties that correspond to replay and dreams in NREM and REM sleep. The model makes predictions that can be tested in sleep experiments in humans and animals. https://doi.org/10.7554/eLife.76384.sa0 Decision letter eLife's review process Introduction After just a single night of bad sleep, we are acutely aware of the importance of sleep for orderly body and brain function. In fact, it has become clear that sleep serves multiple crucial physiological functions (Siegel, 2009; Xie et al., 2013), and growing evidence highlights its impact on cognitive processes (Walker, 2009). Yet, a lot remains unknown about the precise contribution of sleep, and in particular dreams, on normal brain function. One remarkable cognitive ability of humans and other animals lies in the extraction of general concepts and statistical regularities from sensory experience without extensive teaching (Bergelson and Swingley, 2012). Such regularities in the sensorium are reflected on the neuronal level in invariant object-specific representations in high-level areas of the visual cortex (Grill-Spector et al., 2001; Hung et al., 2005; DiCarlo et al., 2012) on which downstreams areas can operate. These so-called semantic representations are progressively constructed and enriched over an organism’s lifetime (Tenenbaum et al., 2011; Yee et al., 2013), and their emergence is hypothesized to be facilitated by offline states such as sleep (Dudai et al., 2015). Previously, several cortical models have been proposed to explain how offline states could contribute to the emergence of high-level, semantic representations. Stochastic hierarchical models that learn to maximize the likelihood of observed data under a generative model such as the Helmholtz machine (Dayan et al., 1995) and the closely related Wake–Sleep algorithm (Hinton et al., 1995; Bornschein and Bengio, 2015) have demonstrated the potential of combining online and offline states to learn semantic representations. However, these models do not leverage offline states to improve their generative model but are explicitly trained to reproduce sensory inputs during wakefulness. In contrast, most dreams during REM sleep exhibit realistic imagery beyond past sensory experience (Fosse et al., 2003; Nir and Tononi, 2010; Wamsley, 2014), suggesting learning principles that go beyond mere reconstructions. In parallel, cognitive models inspired by psychological studies of sleep proposed a ‘trace transformation theory’ where semantic knowledge is actively extracted in the cortex from replayed hippocampal episodic memories (Nadel and Moscovitch, 1997; Winocur et al., 2010; Lewis and Durrant, 2011). However, these models lack a mechanistic implementation compatible with cortical structures and only consider the replay of waking activity during sleep. Recently, implicit generative models that do not explicitly try to reconstruct observed sensory inputs, and in particular generative adversarial networks (GANs; Goodfellow et al., 2014), have been successfully applied in machine learning to generate new but realistic data from random patterns. This ability has been shown to be accompanied by the learning of disentangled and semantically meaningful representations (Radford et al., 2015; Donahue et al., 2016; Liu et al., 2021). They thus may provide computational principles for learning cortical semantic representations during offline states by generating previously unobserved sensory content as reported from dream experiences. Most dreams experienced during rapid eye movement (REM) sleep only incorporate fragments of previous waking experience, often intermingled with past memories (Schwartz, 2003). Surprisingly, such random combinations of memory fragments often result in visual experiences that are perceived as highly structured and realistic by the dreamer. The striking similarity between the inner world of dreams and the external world of wakefulness suggests that the brain actively creates novel experiences by rearranging stored episodic patterns in a meaningful manner (Nir and Tononi, 2010). A few hypothetical functions were attributed to this phenomenon, such as enhancing creative problem solving by building novel associations between unrelated memory elements (Cai et al., 2009; Llewellyn, 2016; Lewis et al., 2018), forming internal prospective codes oriented toward future waking experiences (Llewellyn, 2015), or refining a generative model by minimizing its complexity and improving generalization (Hobson et al., 2014; Hoel, 2021). However, these theories do not consider the role of dreams for a more basic function, such as the formation of semantic cortical representations. Here, we propose that dreams, and in particular their creative combination of episodic memories, play an essential role in forming semantic representations over the course of development. The formation of representations that abstract away redundant information from sensory input and that can thus be easily used by downstream areas is an important basis for memory semantization. To support this hypothesis, we introduce a new, functional model of cortical representation learning. The central ingredient of our model is a creative generative process via feedback from higher to lower cortical areas that mimics dreaming during REM sleep. This generative process is trained to produce a more realistic virtual sensory experience in an adversarial fashion by trying to fool an internal mechanism distinguishing low-level activities between wakefulness and REM sleep. Intuitively, generating new but realistic sensory experiences, instead of merely reconstructing previous observations, requires the brain to understand the composition of its sensorium. In line with transformation theories, this suggests that cortical representations should carry semantic, decontextualized gist information. We implement this model in a cortical architecture with hierarchically organized forward and backward pathways, loosely inspired by GANs. The connectivity of the model is adapted by gradient-based synaptic plasticity, optimizing different, but complementary objective functions depending on the brain’s global state. During wakefulness, the model learns to recognize that low-level activity is externally driven, stores high-level representations in the hippocampus, and tries to predict low-level from high-level activity (Figure 1a). During NREM sleep, the model learns to reconstruct replayed high-level activity patterns from generated low-level activity, perturbed by virtual occlusions, referred to as perturbed dreaming (Figure 1b). During REM sleep, the model learns to generate realistic low-level activity patterns from random combinations of several hippocampal memories and spontaneous cortical activity, while simultaneously learning to distinguish these virtual experiences from externally driven waking experiences, referred to as adversarial dreaming (Figure 1c). Together with the wakefulness, the two sleep states, NREM and REM, jointly implement our model of perturbed and adversarial dreaming (PAD). Figure 1 Download asset Open asset Cortical representation learning through perturbed and adversarial dreaming (PAD). (a) During wakefulness (Wake), cortical feedforward pathways learn to recognize that low-level activity is externally driven and feedback pathways learn to reconstruct it from high-level neuronal representations. These high-level representations are stored in the hippocampus. (b) During non-rapid eye movement sleep (NREM), feedforward pathways learn to reconstruct high-level activity patterns replayed from the hippocampus affected by low-level perturbations, referred to as perturbed dreaming. (c) During rapid eye movement sleep (REM), feedforward and feedback pathways operate in an adversarial fashion, referred to as adversarial dreaming. Feedback pathways generate virtual low-level activity from combinations of multiple hippocampal memories and spontaneous cortical activity. While feedforward pathways learn to recognize low-level activity patterns as internally generated, feedback pathways learn to fool feedforward pathways. Over the course of learning, constrained by its architecture and the prior distribution of latent activities, our cortical model trained on natural images develops rich latent representations along with the capacity to generate plausible early sensory activities. We demonstrate that adversarial dreaming during REM sleep is essential for learning representations organized according to object semantics, which are improved and robustified by perturbed dreaming during NREM sleep. Together, our results demonstrate a potential role of dreams and suggest complementary functions of REM and NREM sleep in cortical representation learning. Results Complementary objectives for wakefulness, NREM, and REM sleep We consider an abstract model of the visual ventral pathway consisting of multiple, hierarchically organized cortical areas, with a feedforward pathway, or encoder, transforming neuronal activities from lower to higher areas (Figure 2, E). These high-level activities are compressed representations of low-level activities and are called latent representations, here denoted by z. In addition to this feedforward pathway, we similarly model a feedback pathway, or generator, projecting from higher to lower areas (Figure 2, G). These two pathways are supported by a simple hippocampal module that can store and replay latent representations. Three different global brain states are considered: wakefulness (Wake), non-REM sleep (NREM), and REM sleep (REM). We focus on the functional role of these phases while abstracting away dynamic features such as bursts, spindles, or slow waves (Léger et al., 2018), in line with previous approaches based on goal-driven modeling that successfully predict physiological features along the ventral stream (Yamins et al., 2014; Zhuang et al., 2021). Figure 2 Download asset Open asset Different objectives during wakefulness, non-rapid eye movement (NREM), and rapid eye movement (REM) sleep govern the organization of feedforward and feedback pathways in perturbed and adversarial dreaming (PAD). The variable x corresponds to 32 × 32 image, z is a 256-dimensional vector representing the latent layer (higher sensory cortex). Encoder (E, green) and generator (G, blue) networks project bottom-up and top-down signals between lower and higher sensory areas. An oblique arrow (↗) indicates that learning occurs in a given pathway. (a) During Wake, low-level activities x are reconstructed. At the same time, E learns to classify low-level activity as external (red target ‘external!’) with its output discriminator d. The obtained latent representations z are stored in the hippocampus. (b) During NREM, the activity z stored during wakefulness is replayed from the hippocampal memory and regenerates visual input from the previous day perturbed by occlusions, modeled by squares of various sizes applied along the generated low-level activity with a certain probability (see Materials and methods). In this phase, E adapts to reproduce the replayed latent activity. (c) During REM, convex combinations of multiple random hippocampal memories (z and zold) and spontaneous cortical activity (ϵ), here with specific prefactors, generate a virtual activity in lower areas. While the encoder learns to classify this activity as internal (red target ‘internal!’), the generator adversarially learns to generate visual inputs that would be classified as external. The red minus on G indicates the inverted plasticity implementing this adversarial training. In our model, the three brain states only differ in their objective function and the presence or absence of external input. Synaptic plasticity performs stochastic gradient descent on state-specific objective functions via error backpropagation (LeCun et al., 2015). We assume that efficient credit assignment is realized in the cortex and focus on the functional consequences of our specific architecture. For potential implementations of biophysically plausible backpropagation in cortical circuits, we refer to previous work (e.g., Whittington and Bogacz, 2019; Lillicrap et al., 2020). During Wake (Figure 2a), sensory inputs evoke activities x in the lower sensory cortex that are transformed via the feedforward pathway E into latent representations z in the higher sensory cortex. The hippocampal module stores these latent representations, mimicking the formation of episodic memories. Simultaneously, the feedback pathway G generates low-level activities x′ from these representations. Synaptic plasticity adapts the encoding and generative pathways (E and G) to minimize the mismatch between externally driven and internally generated activities (Figure 2a). Thus, the network learns to reproduce low-level activity from abstract high-level representations. Simultaneously, E also acts as a ‘discriminator’ with output d that is trained to become active, reflecting that the low-level activity was driven by an external stimuli. The discriminator learning during Wake is essential to drive adversarial learning during REM. Note that computationally the classification of low-level cortical activities into ‘externally driven’ and ‘internally generated’ is not different from classification into, for example, different object categories, even though conceptually they serve different purposes. The dual use of E reflects a view of cortical information processing in which several network functions are preferentially shared among a single network mimicking the ventral visual stream (DiCarlo et al., 2012). This approach has been previously successfully employed in machine learning models (Huang et al., 2018; Brock et al., 2017; Ulyanov et al., 2017; Munjal et al., 2020; Bang et al., 2020). For the subsequent sleep phases, the system is disconnected from the external environment, and activity in the lower sensory cortex is driven by top-down signals originating from higher areas, as previously suggested (Nir and Tononi, 2010; Aru et al., 2020). During NREM (Figure 2b), latent representations z are recalled from the hippocampal module, corresponding to the replay of episodic memories. These representations generate low-level activities that are perturbed by suppressing early sensory neurons, modeling the observed differences between replayed and waking activities (Ji and Wilson, 2007). The encoder reconstructs latent representations from these activity patterns, and synaptic plasticity adjusts the feedforward pathway to make the latent representation of the perturbed generated activity similar to the original episodic memory. This process defines perturbed dreaming. During REM (Figure 2c), sleep is characterized by creative dreams generating realistic virtual sensory experiences out of the combination of episodic memories (Fosse et al., 2003; Lewis et al., 2018). In PAD, multiple random episodic memories from the hippocampal module are linearly combined and projected to the cortex. Reflecting the decreased coupling (Wierzynski et al., 2009; Lewis et al., 2018) between hippocampus and cortex during REM sleep, these mixed representations are diluted with spontaneous cortical activity, here abstracted as Gaussian noise with zero mean and unit variance. From this new high-level cortical representation, activity in the lower sensory cortex is generated and finally passed through the feedforward pathway. Synaptic plasticity adjusts feedforward connections E to silence the activity of the discriminator output as it should learn to distinguish it from externally evoked sensory activity. Simultaneously, feedback connections are adjusted adversarially to generate activity patterns that appear externally driven and thereby trick the discriminator into believing that the low-level activity was externally driven. This is achieved by inverting the sign of the errors that determine synaptic weight changes in the generative network. This process defines adversarial dreaming. The functional differences between our proposed NREM and REM sleep phases are motivated by experimental data describing a reactivation of hippocampal memories during NREM sleep and the occurrence of creative dreams during REM sleep. In particular, hippocampal replay has been reported during NREM sleep within sharp-wave-ripples (O’Neill et al., 2010), also observed in the visual cortex (Ji and Wilson, 2007), which resembles activity from wakefulness. Our REM sleep phase is built upon cognitive theories of REM dreams (Llewellyn, 2015; Lewis et al., 2018) postulating that they emerge from random combinations between episodic memory elements, sometimes remote from each other, which appear realistic for the dreamer. This random coactivation could be caused by theta oscillations in the hippocampus during REM sleep (Buzsáki, 2002). The addition of cortical noise is motivated by experimental work showing reduced correlations between hippocampal and cortical activity during REM sleep (Wierzynski et al., 2009), and the occurrence of ponto-geniculo-occipital (PGO) waves (Nelson et al., 1983) in the visual cortex often associated with the generation of novel visual imagery in dreams (Hobson et al., 2000; Hobson et al., 2014). Furthermore, the cortical contribution in REM dreaming is supported by experimental evidence that dreaming still occurs with hippocampal damage, while reported to be less episodic-like in nature (Spanò et al., 2020). Within our suggested framework, ‘dreams’ arise as early sensory activity that is internally generated via feedback pathways during offline states, and subsequently processed by feedforward pathways. In particular, this implies that besides REM dreams, NREM dreams exist. However, in contrast to REM dreams, which are significantly different from waking experiences (Fosse et al., 2003), our model implies that NREM dreams are more similar to waking experiences since they are driven by single episodic memories, in contrast to REM dreams that are generated from a mixture of episodic memories. Furthermore, the implementation of adversarial dreaming requires an internal representation of whether early sensory activity is externally or internally generated, that is, a distinction whether a sensory experience is real or imagined. Dreams become more realistic over the course of learning Dreams in our model arise from both NREM (perturbed dreaming) and REM (adversarial dreaming) phases. In both cases, they are characterized by activity in early sensory areas generated via feedback pathways. To illustrate learning in PAD, we consider these low-level activities during NREM and during REM for a model with little learning experience (‘early training’) and a model that has experienced many wake–sleep cycles (‘late training’; Figure 3). A single wake–sleep cycle consists of Wake, NREM, and REM phases. As an example, we train our model on a dataset of natural images (CIFAR-10; Krizhevsky and Hinton, 2009) and a dataset of images of house numbers (SVHN; Netzer et al., 2011). Initially, internally generated low-level activities during sleep do not share significant similarities with sensory-evoked activities from Wake (Figure 3a); for example, no obvious object shapes are represented (Figure 3b). After plasticity has organized network connectivity over many wake–sleep cycles (50 training epochs), low-level internally generated activity patterns resemble sensory-evoked activity (Figure 3c). NREM-generated activities reflect the sensory content of the episodic memory (sensory input from the previous day). REM-generated activities are different from the sensory activities corresponding to the original episodic memories underlying them as they recombine features of sensory activities from the two previous days, but still exhibit a realistic structure. This increase in similarity between externally driven and internally generated low-level activity patterns is also reflected in a decreasing Fréchet inception distance (Figure 3d), a metric used to quantify the realism of generated images (Heusel et al., 2018). The increase of dreams realism, here mostly driven by a combination of reconstruction learning (Wake) and adversarial learning (Wake and REM), correlates with the development of dreams in children, which are initially plain and fail to represent objects, people, but become more realistic and structured over time (Foulkes, 1999; Nir and Tononi, 2010). Figure 3 Download asset Open asset Both non-rapid eye movement (NREM) and rapid eye movement (REM) dreams become more realistic over the course of learning. (a) Examples of sensory inputs observed during wakefulness. Their corresponding latent representations are stored in the hippocampus. (b, c) Single episodic memories (latent representations of stimuli) during NREM from the previous day and combinations of episodic memories from the two previous days during REM are recalled from hippocampus and generate early sensory activity via feedback pathways. This activity is shown for early (epoch 1) and late (epoch 50) training stages of the model. (d) Discrepancy between externally driven and internally generated early sensory activity as measured by the Fréchet inception distance (FID) (Heusel et al., 2018) during NREM and REM for networks trained on CIFAR-10 (top) and SVHN (bottom). Lower distance reflects higher similarity between sensory-evoked and generated activity. Error bars indicate ±1 SEM over four different initial conditions. The PAD training paradigm hence leads to internally generated low-level activity patterns that become more difficult to discern from externally driven activities, whether they originate from single episodic memories during NREM or from noisy random combinations thereof during REM. We will next demonstrate that the same learning process leads to the emergence of robust semantic representations. Adversarial dreaming during REM facilitates the emergence of semantic representations Semantic knowledge is fundamental for animals to learn quickly, adapt to new environments and communicate, and is hypothesized to be held by so-called semantic representations in the cortex (DiCarlo et al., 2012). An example of such semantic representations are neurons from higher visual areas that contain linearly separable information about object category, invariant to other factors of variation, such as background, orientation or pose (Grill-Spector et al., 2001; Hung et al., 2005; Majaj et al., 2015). Here, we demonstrate that PAD, due to the specific combination of plasticity mechanisms during Wake, NREM, and REM, develops such semantic representations in higher visual areas. Similarly as in the previous section, we train our model on the CIFAR-10 and SVHN datasets. To quantify the quality of inferred latent representations, we measure how easily downstream neurons can read out object identity from these. For a simple linear readout, its classification accuracy reflects the linear separability of different contents represented in a given dataset. Technically, we train a linear classifier that distinguishes object categories based on their latent representations z after different numbers of wake–sleep cycles (‘epochs’, Figure 4a) and report its accuracy on data not used during training of the model and classifier (‘test data’). While training the classifier, the connectivity of the network (E and G) is fixed. Figure 4 Download asset Open asset Adversarial dreaming during rapid eye movement (REM) improves the linear separability of the latent representation. (a) A linear classifier is trained on the latent representations z inferred from an external input x to predict its associated label (here, the category ‘car’). (b) Training phases and pathological conditions: full model (perturbed and adversarial dreaming [PAD], black), no REM phase (pink) and PAD with a REM phase using a single episodic memory only (‘w/o memory mix’, purple). (c, d) Classification accuracy obtained on test datasets (c: CIFAR-10; d: SVHN) after training the linear classifier to convergence on the latent space z for each epoch of the E-G-network learning. Full model (PAD): black line; without REM: pink line; with REM, but without memory mix: purple line. Solid lines represent mean, and shaded areas indicate ±1 SEM over four different initial conditions. The latent representation (z) emerging from the trained network (Figure 4b, PAD) shows increasing linear separability reaching around 59% test accuracy on CIFAR-10 (Figure 4c, black line; for details, see Appendix 1—table 1) and 79% on SVHN (Figure 4d, black line), comparable to less biologically plausible machine learning models (Berthelot et al., 2018). These results show the ability of PAD to discover semantic concepts across wake–sleep cycles in an unsupervised fashion. Within our computational framework, we can easily consider sleep pathologies by directly interfering with the sleep phases. To highlight the importance of REM in learning semantic representations, we consider a reduced model in which the REM phase with adversarial dreaming is suppressed and only perturbed dreaming during NREM remains (Figure 4b, pink cross). Without REM sleep, linear separability increases much slower and even after a large number of epochs remains significantly below the PAD (see also Appendix 1—figure 3c and d). This suggests that adversarial dreaming during REM, here modeled by an adversarial game between feedforward and feedback pathways, is essential for the emergence of easily readable, semantic representations in the cortex. From a computational point of view, this result is in line with previous work showing that learning to generate virtual inputs via adversarial learning (GANs variants) forms better representations than simply learning to reproduce external inputs (Radford et al., 2015; Donahue et al., 2016; Berthelot et al., 2018). Finally, we consider a different pathology in which REM is not driven by randomly combined episodic memories and noise, but by single episodic memories without noise, as during NREM (Figure 4b, purple cross). Similarly to removing REM, linear separability increases much slower across epochs, leading to worse performance of the readout (Figure 4c and d, purple lines). For the SVHN dataset, the performance does not reach the level of the PAD even after many wake–sleep cycles (see also Appendix 1—figure 3d). This suggests that combining different, possibly nonrelated episodic memories, together with spontaneous cortical activity, as reported during REM dreaming (Fosse et al., 2003), leads to significantly faster representation learning. Our results suggest that generating virtual sensory inputs during REM dreaming, via a high-level combination of hippocampal memories and spontaneous cortical activity and subsequent adversarial learning, allows animals to extract semantic concepts from their sensorium. Our model provides hypotheses about the effects of REM deprivation, complementing pharmacological and optogenetic studies reporting impairments in the learning of complex rules and spatial object recognition (Boyce et al., 2016). For example, our model predicts that object identity would be less easily decodable from recordings of neuronal activity in the inferior-temporal (IT) cortex in animal models with chronically impaired REM sleep. Perturbed dreaming during NREM improves robustness of semantic representations Generalizing beyond previously experienced stimuli is essential for an animal’s survival. This generalization is required due to natural perturbations of sensory inputs, for example, partial occlusions, noise, or varying viewing angles. These alter the stimulation pattern, but in general should not change its latent representation subsequently used to make decisions. Here, we model such sensory perturbations by silencing patches o

查看译文

关键词

Learning,Cortical Responses,Sensory Processing,Neuroanatomical Correlates,Working Memory

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要