Dynamic Organization of Hierarchical Memories

In the brain, external objects are categorized in a hierarchical way. Although it is widely accepted that objects are represented as static attractors in neural state space, this view does not take account interaction between intrinsic neural dynamics and external input, which is essential to understand how neural system responds to inputs. Indeed, structured spontaneous neural activity without external inputs is known to exist, and its relationship with evoked activities is discussed. Then, how categorical representation is embedded into the spontaneous and evoked activities has to be uncovered. To address this question, we studied bifurcation process with increasing input after hierarchically clustered associative memories are learned. We found a “dynamic categorization”; neural activity without input wanders globally over the state space including all memories. Then with the increase of input strength, diffuse representation of higher category exhibits transitions to focused ones specific to each object. The hierarchy of memories is embedded in the transition probability from one memory to another during the spontaneous dynamics. With increased input strength, neural activity wanders over a narrower state space including a smaller set of memories, showing more specific category or memory corresponding to the applied input. Moreover, such coarse-to-fine transitions are also observed temporally during transient process under constant input, which agrees with experimental findings in the temporal cortex. These results suggest the hierarchy emerging through interaction with an external input underlies hierarchy during transient process, as well as in the spontaneous activity.


Introduction
Categorization of objects in the environment according to their similarity is one of the fundamental functions of the human brain. It is typical to conceptualize categorization as involving a hierarchical structure: an object (e.g., A Doberman) belongs to a higher category (Dog), which, in turn, belongs to a much higher category (Animal). Unveiling how such hierarchical structure is represented in neural activity is essential to understanding the process of our memory and recognition. In spite of several studies that are related to the formation of categories with neural associations called η(ξ): each neuron (e.g., i-th neuron) in the network receives one input element (η i ) and is required to generate ξ i . Note that an input pattern η I and the corresponding target output pattern ξ i are uncorrelated. Dynamics for neural activity x i for neuron i are driven by interaction among neurons through synaptic connections, as well as an input pattern η i . We take a rate-coding model so that x i is continuous. Thus, an input pattern does not perfectly determine the network state, but rather interplay between an input pattern and internal dynamics is critical to generate the adequate output pattern.

Fig 1. Schematic depiction of the dynamical systems view of categorization through bifurcation as a function of input intensity.
A) Hierarchical structure of memorized input/ output target (I/O) associations. In the presence of an input, a network is trained for its activity pattern to represent the associated target. These associations are divided into categories A and B: associations Z A 1 =x A 1 and Z A 2 =x A 2 belong to the same category A, and their patterns are correlated, while associations A 1 and B 1 belong to different categories and their patterns are not correlated. See "Materials and Methods" for details. B) Representation of hierarchically structured associations with the change in input intensity. Under an input pattern Z A 1 or Z B 1 of a sufficient strength, neural activity is localized as an attractor at each of the associated target patterns (bottom panel). The neural activity represents each target pattern and thus this target is recalled successfully. With decreasing input intensity, the above attractor is no longer stable, and the neural activity diffuses to wander among the target patterns belonging to the same category (middle panel). The neural activities under the different input patterns in the same category are similar patterns, so that only 2 schematic figures for category A and B are plotted in the middle panel. The neural activity represents each of the categories A or B. With further decreasing input intensity, the neural activity outspreads over all the targets as spontaneous activity in the absence of inputs (top panel).
Synaptic matrix between neurons is changed so that the neural dynamics under an input generates an output that matches with the target pattern corresponding to the applied input (See "Materials and Methods" for the explicit neural and learning dynamics). Input patterns are randomly chosen but are correlated with each other in hierarchical manner, while target patterns are uncorrelated with input patterns, but are again correlated among the given hierarchical cluster. The learning process works so that for each input A i (B i , C i ,. . .) for the category A (B,C,,,), the corresponding target is generated as an output. Now consider associations of A 1 and A 2 belong to the identical category A, and associations B 1 and B 2 belong to the category B (see subsection "Memory structure" below for details). After learning, the neural activity-upon presentation of a sufficiently strong input A 1 -is localized around the target pattern A 1 . The neural activity represents the target itself. With a weakerintensity input A 1 , the neural activity changes in time, and approaches intermittently the targets A 1 and A 2 , which belong to the category A. Hence, the evoked neural activity upon intermediate input A 1 represents the category A including the target A 1 . With a further weakened input-or with no input-the neural activity is not focused on the corresponding category. The neural activity wanders across the target patterns (e.g., A 1 , A 2 , B 1 , B 2 ). In this manner, depending on the input strength, the neural activity can code the target pattern, the category, and all the category's items. We demonstrate numerically that the simple learning rule we have proposed produces this dynamic representation of the neural activity.

Memory structure
We adopted a neural network of N (= 100) rate-coding neurons, whose synaptic matrix J ij changes according to a learning rule (described in the subsection "Materials and Methods") with which I/O associations are memorized in the network. Each of input and output patterns has as many elements as neurons (here, N = 100). In contrast to our previous study [25], we applied correlated patterns for inputs and targets forming the hierarchical structure, in which M associations are included in each of K categories, and a total of M × K associations is memorized. M input patterns as well as M target patterns are correlated within a category, while patterns across different categories are not correlated, and any pair of an input and a target pattern is not correlated. We established 6 categories (A-F), and each category contains 6 associations, giving a total of 36 I/O associations. Here, indices 0-5 associations belong to category A, and indices 6-11 belong to category B, and so forth.

Recall of memory with strong input
We analyzed the neural dynamics after learning process in the following part. In order to represent neural activity dynamics, which has many variables, with a few relevant variables, we use the overlaps of the neural activity with input m m I ¼ i =N (see the subsection "Materials and Methods"). In the following section, we mainly use m m T , denoting it as simply m μ . We first demonstrate that our learning process works well so that a given target is recalled upon the associated input with sufficient input strength γ (γ = Γ = 16, where Γ denotes the input strength used in the learning process. See "Materials and Methods"). As a typical example, the dynamics of evoked activities under inputs 3 and 6 are shown. By applying one of the learned input patterns, the overlap with the associated target surges to almost unity (Fig  2A(iii) and 2A(iv)) and an attractor that matches the target pattern emerges (Fig 2B(iii) and 2B(iv)). Thus, these targets are recalled successfully, consistent with our earlier study [25].
To examine the generality of these recall behaviors, we measured the overlap with the target μ under the associated input μ (μ = 0,Á Á Á,35) in 10 networks (Fig 2C), each of which learns a different set of I/O associations. Almost all of overlaps (around 80%) take higher values over 0.9, The neural dynamics (i) without input, (ii) under the input 6 with strength γ = 6, (iii) with γ = 16, and (iv) under the input 3 with γ = 16, indicated by colored bars above figures in white, right blue, blue and green, respectively. A) The overlaps with the targets 6, 7, and 3 are plotted as blue, red, and green lines, respectively. Targets 6 and 7 are in same category B, while target 3 is in category A. B) The probability density distribution of the neural activity and a sample trajectory in principal component (PC) space. Dotted circles represent position of the targets. C) The distribution of the overlaps with the targets. We calculated the overlap with a target under the associated input with γ = 16 for 360 associations (36 associations in 10 networks). D) The temporal average overlap profiles for the spontaneous activity (gray) and for evoked activities with input strength γ = 6 (light blue) and 16 (dark blue). For calculating temporal average, we used overlaps over 400 unit time after 100 unit time transient. Error bars are standard deviation of neural activity over 400 unit time. In the following analysis, temporal average is calculated in this way unless otherwise mentioned. E,F) Change in overlap with increasing input is shown. The overlap with all of targets under input 6 against different strength is computed in E. As samples, the overlaps with targets 3,6, and 7 are plotted in F. G) Bifurcation diagram of the overlap with target 6 through increasing the input strength of input 6 (bottom) and the largest Lyapunov exponent (Top). We plot the overlaps at every 5 unit times over 250 unit times after transient time for each input strength. meaning that almost all the targets were recalled successfully. Thus, the learning process works well and I/O associations are embedded successfully.

Emergence of hierarchy with increasing input strength: From diffuse to focused representation
In the previous subsection, we showed that under a sufficiently strong input, a different attractor corresponding to the target pattern emerges depending on the applied input. We investigated how neural activity changed with increasing input strength in the following subsections. By using neural activity under input 6 as an example (Fig 2), we demonstrated that hierarchy of representation emerges: from representation of a category to that of a specific memory. Fig 2A and 2B, show the neural activity dynamics during no input (i.e., spontaneous activity) and those under input pattern 6, with γ = 6 and with γ = 16. The spontaneous activity shows chaotic behavior (Fig 2A(i)) with a positive maximal Lyapunov exponent ( Fig 2G) and shapes an attractor around the origin in the PC space ( Fig 2B(i)). The neural activity approaches intermittently to the targets, which is analyzed later. Under the input with γ = 6, flow structure is changed (Fig 2B(ii)). A new attractor appears in the vicinity of both the target patterns 6 and 7. In fact, the overlaps of the evoked activity with targets 6 and 7 are elevated, while the overlap with target 3 remains low (Fig 2A(ii)). Here, targets 6 and 7 belong to the same category (called B), while 3 belongs to a different category (called A).
To more clearly show the relation between the evoked neural activity and targets depending on input strength, we computed a set of overlaps m μ as a function of μ = 0,1,. . .,35 (= all memorized targets), which we refer to as the overlap profile (see "Materials and Methods"). Here, we calculated the temporal average overlap profile ( Fig 2D). The profile line for the spontaneous activity is flat over all of targets. A network, thus, cannot distinguish targets each other, by overlap values. Then, the profile for γ = 6 shows that the overlaps with targets that belong to the same category as target 6 take distinctly higher values, as compared with other overlaps. The overlap profile is nearly flat against targets in category B, with the overlap values, 0.44 ±0.19, 0.35±0.19, 0.23±0.15, 0.22±0.17, 0.30±0.17, and 0.24±0.17 for target 6-11, respectively. Here, SD shows not the statistical error but the amplitude of temporal fluctuation around the mean value. Mean values plus one-SDs are comparable to the given correlation of targets (1, 0.56, 0.38, 0.3, 0.3, and 0.38, respectively), indicating that overlaps sometimes take values larger than the given correlation. Thus, the neural activity is sometime highly correlated with some of the targets in the category B beyond the given correlation of targets. Thus a network generates a neural activity which has a much higher similarity with targets in the category B than the expected by the given correlation structure. The produced output is interpreted not as representing only target 6, but representing the category B. Under a much stronger input value, the overlaps with targets 6 and 7 are separated (Fig 2A(iii)). The overlap profile shows that the neural activity matches well with target 6 and distinguishes it from other targets ( Fig 2D).
To show the change in the neural activity by gradually increasing the input strength γ, we computed change of the overlaps with all of the targets (0, 1, . . ., 35) with increasing input strength ( Fig 2E). For the intensity near zero, none of the overlap takes selectively a high value. As the intensity increases, the overlaps belonging to category B take higher values than those of the rest of overlaps in other categories. With increasing the strength further, the overlap with only target 6 is clearly higher than other overlaps. With increasing input strength, therefore, representation by the neural activity progresses from all targets to a category and then to a specific target. Finally, we confirmed generality of this change with increasing input strength in S1 File and S1 Fig.

Hierarchical clusters of neural activity under different inputs
We, next, investigate how activities evoked by different inputs are organized with increasing input strength. Especially we will show that different clusters of these neural activities are generated in a hierarchical way depending on the input strength. For this purpose, we first demonstrate that neural activities for inputs 3,5,6 and 7 exhibit such hierarchical clustering, as an example and then analyze the entire behavior for all input patterns. Fig 3A shows temporal average activity patterns under input patterns 3, 5, 6, and 7 with increasing input intensity, represented in the PC space. Relationship between these associations are shown in a tree diagram in Fig 3B. For lower intensity of input, the neural activity patterns under inputs 6 and 7 (belonging to the same category B) are located closely with each other, while they are separated distinctly from those under inputs 3 and 5, which belong to a different category (i.e., A). These similar patterns are much more similar than expected by the given correlation structure as shown below. With increasing input strength, the activity patterns in the same category are separated. For γ = 16, the overlap under input 6 is distinct not only from those under inputs 3 and 5 but also from that under input 7. To quantitatively compare closeness, similarity between the neural activities under different inputs, μ and ν were computed by S mv ¼ patterns evoked by inputs 6 and 7 are more clustered than a given target cluster and hard to be distinguished, while for a higher strength, the neural patterns evoked by these inputs are more easily distinguished. In other words, the network responds similarly to different inputs in the same category when input is at an intermediate strength, while the network responds in a manner that allows the categories to be easily distinguished, even for inputs in the same category, when the input is stronger.
We also compared the neural activity under all the inputs memorized (μ = 0,1,Á Á Á,35) to confirm the generality of the above behavior. Fig 3C diagrams similarity S μν between the neural activity patterns evoked by inputs μ and ν, for input strengths 4, 6, and 16. Higher similarity S μν indicates that a network generates similar neural patterns even under different inputs μ and ν. In these diagrams, a cluster formation that is dependent on the intensity of input is clearly visible. For γ = 16, each small cluster in the diagram corresponds to one of 6 categories A-F. This cluster structure exactly reflects that of the embedded targets (S2 Fig). As input strength is decreased to γ = 6, the degree of similarity in each category increases. Average similarity between neural patterns evoked by inputs in the same category is 0.71, while that between targets is~0.49, which is about the value of imposed correlation. The average similarity between 0.49 and 1 indicates that similarity of neural patterns are determined not by solely correlation of the targets themselves, but is generated by the interplay between internal dynamics and an applied input. Decreasing the input strength to γ = 4 results in much higher similarity Sμν for inputs μ and ν in the same category (averaged similarity = 0.78), particularly for inputs A, B, and F. Interestingly, the similarity for neural activities not only between pairs in the same category, but also even between some pairs in different categories (A and F, B and C, and D and E) is selectively increased. This structure is not included in the given correlation structure of the target at all (S2 Fig). These clusters of categories are spontaneously generated with internal neural dynamics, through learning process. These clusters of categories can be considered as a higher level than the category level. Hence, a higher level in hierarchy is generated spontaneously as a result of memorizing simply correlated patterns. Finally, we quantitatively characterized the similarity diagram by using the hierarchical cluster analysis (see"Materials and Methods"). We calculated the diagram from the above similarity matrix S μν in Fig 3D and counted the number of clusters by setting a certain threshold for similarity distance (= 0.3 for all input strength). In Fig 3E, the number of clusters is plotted as a function of the input strength. For a weaker input strength, the number of clusters is around 10, implying that the neural activities under 36 different inputs are categorized into 10 groups. By increasing input strength, the number increases, until it reaches 31, nearly the number of learned associations, showing that each of the neural activities under different inputs is distinguishable from all others. To confirm if such behavior of the number of clusters is general, we computed the number of clusters for other ten networks and plotted in Fig 3E. The numbers of clusters in all networks are around 10 for a weaker input and with increasing input strength, the number increases more than 30.
In summary, neural activity patterns evoked by different inputs form a hierarchical structure. With increases in input strength, large clusters-within which neural activities evoked by different inputs are much more similar than the expected by correlation of targets-are successively separated into smaller clusters. Hence A network in our model represents a hierarchical category structure that depends on input strength. Note that this hierarchy is generally observed for another correlation parameter C as long as C is sufficiently larger (larger than around 0. 16). See details in Supplemental information.

Hierarchical structure embedded in spontaneous activity
Spontaneous neural activity in the absence of inputs shows chaotic oscillation (Fig 2A(i) and 2B(i)). As already discussed [16,19,20], spontaneous activity observed in neural system is not just noisy but also shows a structured spatio-temporal pattern in which specific patterns similar to stimulus-evoked patterns appear transiently [17,18]. It suggests that information about the stimulus-evoked patterns is potentially embedded in the spontaneous activity. In the present study, target patterns are embedded as evoked patterns in the presence of an input with sufficient strength. We then asked how a hierarchical category of these targets is embedded in the spontaneous neural activity, by comparing overlaps of spontaneous activity with target patterns.
We first examined whether spontaneous activity contains information of the target or input patterns. Here, (normalized) overlap of the neural activity with a given pattern provides a measure of distance between the generated neural activity and the pattern: If the overlap is closer to unity, the neural activity is closer to the pattern. In Fig 4A, the overlap of the spontaneous activity with a given target (target 1) intermittently takes a large value, while overlap neither with a non-memorized, randomly chosen pattern (green) nor with an input pattern (red) shows such intermittent increase and fluctuates around the value 0. This indicates that the spontaneous activity approaches the memorized target patterns more closely than other patterns. and 7 (red) with increase in input intensity are depicted. Bold points on lines show the neural activity patterns under the different input intensities from γ = 0 to γ = 16 by every 1 point. B) The hierarchy of embedded mappings is also plotted for reference, where the targets are marked in the same colors in the PC space. C) The similarity matrix S μν between the evoked patterns under inputs μ and ν is plotted for γ = 4,6, and 16. Similarity is defined in the main text (see also detailed definition in"Materials and Methods"), whose (μ,ν) element indicates the similarity between the neural activities evoked by different inputs μ and ν. D) The cluster dendrogram of the similarity matrix in C. Dotted line shows threshold value (= 0.3) for counting the number of clusters. E) The number of the clusters as a function of the input strength. For each input strength, the number of clusters is obtained from the cluster dendrogram as in D. Bold line corresponds the number of the clusters for the network analyzed in above, and other lines show those for other networks. To analyze closely relationship between the spontaneous activity and the target patterns, we again used an example, where the targets 6 and 7 belong to the same category B, the target 1 belongs to category A, and 22 belongs to category D. The overlaps of identical spontaneous activity with target patterns 6 and 7 (Fig 4B(i)) and those with target patterns 1, 6, and 22 ( Fig  4B(ii)) are presented. The overlaps with target patterns 6 and 7 oscillate with high correlation with each other, although these show different peaks with slightly different timing. In contrast, the overlaps with target patterns 1, 6, and 22 do not show a correlated change. Approaches to these targets exhibit much larger different timing. The spontaneous activity approaches intermittently the different targets belonging to the different categories ( Fig 4C) and we present the overlap profiles at these approaching times in Fig 4D. At t = t 1 , t 2 , and t 3 , the overlaps with the targets in the categories A, B, and D take high values in a target-selective fashion, respectively.
Finally, we investigated how the spontaneous activity temporally approaches the targets within the same category and across different categories quantitatively. We calculated the Dynamic Organization of Hierarchical Memories transition probability P μν (Fig 5A) and mean transition time T μν from the ν-th to the μ-th target (Fig 5C) (See "Materials and Methods"). Spontaneous activity exhibited more frequent transitions often from one target to another (e.g., from target 2 to 5) than others that are less frequent (e.g., from target 2 to 35). To compare transition within a category and between categories, we calculated transition probability P' ab and mean transition time matrix T' ab among categories, as are plotted in Fig 5B and 5D, respectively. Spontaneous activity is highly likely to move within each of categories with shorter transition time, while it rarely jumps across categories Temporal structure of spontaneous activity. A) Transition probability P μν from the ν-th target to the μ-th target. We cannot compute the probability of self-visiting P μμ , and set at 0, because we did not distinguish continuous stay of the neural state around a target from coming in-outin the identical target. B) Transition probability P ab from the category b to the category a. C) Transition time T μν from the ν-th to the μ-th target which is averaged with P μν . White tiles indicate that there is no transition and we cannot calculate the transition time. D) Transition time Tab from the target b to a. all of values are calculated from the spontaneous activity (0<t<10000). See "Materials and Methods" for the detailed. and takes longer transition time. This means that the hierarchy of the targets is embedded in the temporal structure in the spontaneous activity.
Hierarchical structure in transient dynamics: From diffuse to focused representation in time So far, we have shown how neural representations change with increases in strength of an input in accordance with a hierarchical structure. We found that these transitions also occur temporally during the recall process under an input, when its input strength is constant. During recall, neural activity initially represents a category of several target patterns, and then converges to an attractor that represents a specific target.
In Fig 6, we present recall dynamics until the neural activity converges on an attractor under input pattern 11 for γ = 5, which is slightly above the bifurcation point: under this point (γ = 4) an attractor in which the activity wanders among targets of a category is formed and, above this point γ = 5, a more localized pattern emerges as an attractor. The overlap of this neural activity with target 11 is plotted in Fig 6A. Up to t = 150, the neural activity fluctuates and then converges to a fixed-point attractor. In the transient fluctuation, the neural activity approaches not only target 11 but also other targets in the same category (Fig 6B). After this approach to the category, the neural activity converges on target 11 selectively. The neural activity pattern during the transient process reflects the attractor under a weaker input, where the neural activity is less localized and represents a coarser category. In fact, neural activity under input 11 after convergence to the attractor for γ = 4 (gray line in Fig 6) is highly overlapped with the transient neural activity when γ = 5 (magenta line in Fig 6). For many different initial conditions of trajectories, the neural activities show the same transition in time with different timing. Although, here, we demonstrate just one example, this temporal change is generally observed for different inputs (S3 Fig). In this way, the neural activity first approaches a state representing the category, before reaching the final attractor matching the single target, showing that the neural activity dynamics are organized hierarchically in time from a coarse to a specific representation. with γ = 5. The dynamics is colored before reaching the attractor in magenta and after convergence to the attractor in red. Neural activity after convergence to an attractor with a less intense strength (γ = 4) is plotted in gray, for reference. A) Time series of the overlap with the target pattern 11. B) Overlap profiles before and after convergence in magenta and red are plotted. Here, the former overlap profile is measured by averaging overlap after 50 unit-time transient up to the convergence point. The latter profile is measured after convergence. We also plot the profile for the neural activity with a less intense strength in gray as reference. C) The orbit of the neural activity dynamics in A is plotted, by projecting it into the two-dimensional space. The horizontal and vertical axes represent the overlaps with targets 8 and 11, respectively. doi:10.1371/journal.pone.0162640.g006

Discussion
It has recently been revealed that the interplay between spontaneous neural dynamics and external stimuli is significant in response behavior of neural activity [17][18][19][20][21][22]. We have proposed the novel viewpoint, "bifurcation memories through input strength" to understand this interplay [25,26]. By introducing a model that embodies this view, we focus on how hierarchical memories could be represented. We have shown that the neural dynamics generates their hierarchical structure from a coarse to a fine level with the increase in the input strength. Spontaneous activity in the absence of inputs shows successive approaches to and departures from each of the memories in the neural state space, representing the highest (coarsest) category to which all memories belong. By applying a learned input pattern with an intermediate strength, the evoked neural activity is localized near several targets, to represent the lower (finer) category to which a few associated memories belong. With further increases in the input strength, neural activity is localized only around a single target pattern associated with the applied input.
With increasing input strength, a network represents different level in hierarchy, (all targets -> a category -> a target). This hierarchy is included in a given correlation structure in I/O associations in the present model. However, by focusing carefully on the neural activities evoked by different inputs for smaller input strength shown in Fig 3, we found that the neural activities under different inputs in different categories, A and F, or B and C, exhibit higher similarity than correlation of these targets. This indicates that a network identifies these activities as a same pattern and a new level in hierarchy is generated through interplay between internal neural dynamics and inputs. Detailed study for the mechanism of spontaneous generation of new level is left as a future work.
Interestingly, we also found that the transition from coarse to fine categories is valid not only against the input strength but also in the temporal course of memory recall, which is consistent with previous experimental studies [2,4,27]. These studies have investigated how hierarchical objects are represented and processed in neural systems, especially in the visual system. In a visual system, many studies have revealed that the neural activity patterns showing coarse category emerge first and then fine category-related neural patterns arise in the process of identifying an object [2,27].
In our model, neural activity, in response to an input with a sufficient strength, first takes a broader distribution ranging over all the target patterns within the same category including the target to be recalled, before it converges on the target pattern. The broader distribution in the transient time represents the coarse category. Further, we have found that the temporally transient recall from a coarser to finer category reflects the bifurcation structure from the broader to localized attractor through the input strength. This result implies that the bifurcation structure in the intrinsic neural dynamics underlies the temporal processing from coarse to fine category observed in the experimental studies. Our proposition that relates the hierarchy to time and to input strength can be tested experimentally by observing neural activity dependence upon the input strength. This may be achieved by measuring the activity in the Inferior Temporal area [2] or olfactory bulb [28], as a function of input strength e.g., visual contrast or concentration of odors.
There are extensive studies on the conventional associative attractor memory with hierarchical structure, concerning dependence of the stability of memories upon their number [5,8,9,11,14] and also fluctuation on the weight matrix in terms of statistical physics [29]. Further, some studies [12,13] have proposed Hopfield-type models that embed hierarchical memories and exhibit the transient recall of mixed memories corresponding to coarse category. These, however, adopt a constant flow structure on neural state space with initial conditions determined by inputs. Hence relationship between neural activities with and without inputs cannot be discussed. Although some studies investigated this relationship [30][31][32], representation of hierarchical memories are not addressed. Rather, in the present study, we have revealed that a hierarchical structure of embedded memories in a network is reflected in the spontaneous activity, in bifurcation process with increasing the input strength and in the transient process during recall of a memorized pattern. These different representations of the structure of embedded memories have not been investigated in the previous studies. In addition, in our view, there is a single (or a few) attractor(s) for each of input strength and neural trajectories on neural state space from most of initial states converge to the required attractor, which is in contrast to the previous studies where trajectories from only neighborhood of the required attractor converge due to multiple attractors.
Finally, we discuss on the biological plausibility briefly. The learning rule we employ does not follow the Hebbian unsupervised fashion often used in standard recurrent neural-network models of the cerebral cortex [33], nevertheless it satisfies the minimum requirement for a biological learning rule in contrast to standard supervised learning fashion in recurrent networks [34,35]: According to one study [33], information available to a synapse is only local information for pre-and postsynaptic cells it directly connects and not any global information on cells it does not directly connect. In fact, our learning rule needs information only on the neural activity of the pre-and postsynaptic cells (x j and x i , respectively) and the target signal to the postsynaptic cell ξ i . A possible situation might be considered to be a cortical circuit that receives bottom-up input from a lower sensory area or the thalamus and also top-down input from a higher area. This circuit is assumed to learn associations between these inputs and, after learning, reproduces the neural pattern matching with the top-down input pattern in response to the bottom-up input. Here, the top down input to an i-th neuron represents ξ i . This example is only one possible way to implement our model in a biological neural network, and future studies are needed to establish the model's validity.

Patterns to be memorized
We let a network of N-neurons learn I/O associations, in which inputs called η and targets called ξ are binary patterns. In contrast to our previous study [25], we applied correlated patterns, which have a hierarchical structure, as follows: We generated K categories and M × K associations. We set 2K random patterns, η A and ξ A , (A = 0,1,Á Á Á,K − 1), as typical patterns of categories, where each element of a pattern takes a binary value generated with the probability p(x) = (1/2)δ(x − 1)+(1/2)δ(x + 1), and any pair of patterns is not correlated. Second, 2M members for each category are generated from the typical patterns, by flipping each element of the typical pattern with p flp . Thus, targets or inputs in the same category are correlated with C = (1 − 2p flp ) 2 on average and otherwise any pairs (including pairs of an input and its associated target) are not. We set N = 100, K = 6, M = 6, and p flp = 0.15 so that C = 0.49. Within each category, inputs and targets are generated to satisfy ½ for μ6 ¼v, while otherwise, correlation between any patterns is chosen to be zero. Here, [Á Á Á] represents the average over an ensemble of randomly generated patterns. Hence, 36 associations distributed among 6 categories are learned. Six categories j (j = 0,1, ÁÁÁ,5) is termed as A, B, C, ÁÁÁ, F in the order of j, and the associations μ = 6j + l (l = 0,1, ÁÁ,5) belong to the same category.

Neural dynamics
We consider a system composed of N (= 100) continuous rate-coding neurons whose activity x i (i = 1,2,ÁÁÁ,N) lies between −1 and 1, basically identical to the network in our earlier study [25] and evolves according to where J ij denotes a connection from the j-th to i-th neuron, γη μ represents an input patternμwith strength γ, andμis the index of learned I/O associations. Parameterβ is set at 4, respectively. We adopt the following learning procedure to embed the I/O associations.

Learning procedure
The neural activity during learning process evolves in the presence of γη with a constant γ = Γ, while the connection J ij is changed so that the neural activity pattern matches the target ξ. Note that Γ is used during learning process and γ is used in analysis of recall process after learning is completed. While Γ set at 16, we change γ between 0 and Γ to study the dependence of input strength on recall process. This evolution of the synaptic connection J ij is given by where α > 0 is a learning parameter representing the rate of change in synaptic connections (relative to that of the neural activity). The above synaptic dynamics are determined by correlations between the activities of the pre-and postsynaptic neurons. This learning rule takes a similar form as the perceptron learning rule [36] where the synaptic connection is changed by correlations between activities of elements in the input and output layers. We set parameters at α = 0.01 and Γ = 16, since we have already confirmed that the network can learn a large number of memories with these parameter values (see [25]). After an I/O association is learned according to (eq 2), the next association is learned. We train a network to memorize MK associations by applying this process iteratively MK × 100 times in a random order. During the learning process, both neural and synaptic dynamics run concurrently. The initial states are set as follows: the neural activities x i take random values between −1 and 1 with a uniform probability, J ij takes a random value from a binary ensemble of ±1 with equal probability. Fully connected networks without self-connections are used. For each run of different learning processes, different sets of mappings are learned so that the generated networks are different.

Overlap analysis of neural dynamics
We define the following quantities to characterize the neural dynamics. The dynamics of the network are represented as a trail in N-dimensional neural state space; N is the number of neurons and is set at 100. N-dimensional trail is difficult to investigate directly, so we project these high-dimensional dynamics to a lower-dimensional state space by using the overlap of the neural dynamics with the inputs and targets: the overlaps with the input μ and the target μ are defined as m m I ¼ respectively. These values increase as the distance between the neural activity pattern and an input (or target) pattern is decreased, and they take unity when the distance vanishes. We use the overlap with a few of targets to understand how close the neural dynamics is to the concerned targets, denoted as m μ (expressed, for simplicity, without the index T).
In contrast to single overlaps, we also use "overlap profile" to analyze global information. This is a function of overlap m μ against μ = 0,1,Á Á Á,(KM − 1) and represents how far the neural dynamics are to each of the targets. Let us first considering uncorrelated patterns. In this case, when the neural activity pattern is similar to one of targets, only the overlap with this target takes near-unity and others are close to zero, forming an overlap profile with a single peak. In contrast, when the neural dynamics do not approach any of the targets or approach all the targets equally, the overlaps with all the targets take near zero or a comparably high value, respectively, forming a flat overlap profile. In this way, the shape of the overlap profile indicates how much a given neural activity is localized to one (or a few) target pattern(s). This is the case for hierarchical patterns. Quantitative analysis is in Supporting information.

Cluster analysis
In order to analyze how the neural activity is organized in response to different inputs of different input strength, we applied hierarchical cluster analysis against KM samples of neural activity under KM different input patterns for the same strength; here, KM = 36. In hierarchical cluster analysis, we chose two of the closest elements (a sample or a cluster of samples) and merged them into a larger cluster as a new element and this process was applied iteratively until the number of elements reached 1. To do this, we needed to define distance between elements (i.e., between a sample and a sample, a sample and a cluster, and a cluster and a cluster). In our analysis, samples were represented by neural activity patterns, and we defined the distance D μν between the neural activity patterns under input μ and ν by defining the similarity S μν as S mn ¼ X i x i;m x i;n =jx m; x n j ð3Þ We denoted the neural activity under input ν as x ν and x represents the temporal average of x. We adopted the group average method as the criterion for forming the new larger cluster, using SciPy (see http://www.scipy.org/).
To study the effect of different input strengths, we examined the hierarchical analysis to plot dendrograms for γ = 4,6, and 16 in Fig 3D. We set the threshold at 0.3 under which the number of clusters is counted. The cluster number is computed for different input strengths and plotted for ten networks (Fig 3E). The value of the threshold is arbitrary, but the qualitative behavior of the cluster number against the input strength does not change.

Transition probability in spontaneous activity
To analyze temporal structure of the spontaneous activity, we calculated which of the targets and when the spontaneous activity approaches. We first defined "approach to a target" as the neural state showing the overlap with this target larger than 0.5. (If more than one targets give overlaps of more than 0.5 simultaneously, a target with the highest overlap is defined as the approached one). By this definition, we computed set of time {t i } (0< t 0 <t 1 <. . .) at which the spontaneous activity approaches a target most closely. A approached target at t i is denoted as α i (i = 0,1,,,35). We calculated sequences of the approach time {t i } and of approached target {α i } from spontaneous activity (0<t<10000). From these sequences, we calculated the transition probability P μν (S μ P μv = 1) and mean transition time T μν from the ν-th to the μ-th target. Further, we computed transition probability P' ab and mean transition time T' ab from category b to a according to