Multiscale representations of community structures in attractor neural networks

Our cognition relies on the ability of the brain to segment hierarchically structured events on multiple scales. Recent evidence suggests that the brain performs this event segmentation based on the structure of state-transition graphs behind sequential experiences. However, the underlying circuit mechanisms are poorly understood. In this paper we propose an extended attractor network model for graph-based hierarchical computation which we call the Laplacian associative memory. This model generates multiscale representations for communities (clusters) of associative links between memory items, and the scale is regulated by the heterogenous modulation of inhibitory circuits. We analytically and numerically show that these representations correspond to graph Laplacian eigenvectors, a popular method for graph segmentation and dimensionality reduction. Finally, we demonstrate that our model exhibits chunked sequential activity patterns resembling hippocampal theta sequences. Our model connects graph theory and attractor dynamics to provide a biologically plausible mechanism for abstraction in the brain.

between neurons. This interpretation, however, is valid only in the above value range, so models with < −1 are not biologically meaningful as the reviewer doubted.
Nevertheless, we performed simulations for α<-1 to check the mathematical properties of the model in the whole range of α. With those simulations, we could validate the results of our mathematical analysis, which suggests that meaningful memory recall occurs only in the range α>-1 in agreement with the biologically meaningful range of the parameter.

If I understand it correctly, this novelty index is a way to set the adjacency matrix of the graph and as such set H? Could you elaborate on s(mu,nu) used in the methods in equation 30?
Reply -We are afraid of whether we accurately understood the point raised by the reviewer. The index ( , ) merely represents the correlation between two attractor patterns triggered by patterns and when we used LAM, or represents the cosine similarity between two low-dimensional representation vectors when we used the Laplacian eigenmaps (namely, the embedding of nodes using GL eigenvectors with low eigenvalues). To avoid confusion, we have revised the explanation of the novelty index in the Methods section (L805 -L819). We hope the revised description aids the understanding of the method.

-------------------------------------------------------------------------------------------------
Reviewer #2: Summary: In ``Multiscale representations of community structures in attractor neural networks,'' the authors study the brain's ability to represent hierarchical structure through the use of Laplacian associative memory (LAM Reply -We thank the reviewer for the valuable comments on our manuscript. Reply -We performed additional simulations in the revised manuscript, in which we systematically generated the hierarchical community structures of random graphs by using a stochastic block model. We simulated our model on random graphs with various numbers of nodes, communities, and hierarchies, and different choices of random seeds (to generate the different realization of graph structure). We summarized these results at L315 -L345 in the main text and also in Figs. 4, S7, and S8. We observed the same tendency across the random graphs simulated in this study. We hope these novel results alleviate the reviewer's concern about the generalizability of our model. Reply -In the Hopfield-type associative memory models, a monotonic decrease of the energy function is rigorously guaranteed if the synaptic matrix is symmetric, and the state of each neuron undergoes an asynchronous sequential update (Hopfield, 1982). To accelerate the simulations, we used a synchronous parallel update in this study.
Practically, however, it is known that the parallel update of neural activities does not significantly change the behavior of the model. Actually, in the mean-field approximation, ).
-Rigor of notation and derivations: Several times in the text, I noticed the usage of equal signs when approximation signs should have been used. This is seen in Figure   3D where the pattern overlap is an approximate sum of the 2nd, 3rd, and 4th GL eigenvectors, but an equality is used. Another example is the statement in Reply -As we show in the additional simulations, N must be large enough to get stable performance. The demanded number depends on the number of embedded patterns as in the conventional Hopfield models. For image segmentation tasks, we embedded the larger number of patterns (corresponding to the number of pixels) than other simulations, so we had to increase the number of neurons. We could use N=30000 setting for all simulations but it would unreasonably increase computational time especially when we simulate the model with various parameter settings.
The pitch of discretization η could also be set to 0.01 in all simulations, but we used 0.1 for asymmetric LAM to accelerate the simulations. This large value could have increased noise to impair the stability of attractors. However, the particular value did not cause any problem in the present simulations of sequential retrieval dynamics.
In contrast with N and , it was hard to determine a single suitable value of common to all cases. In our model, the additional global inhibition term helps to generate sparse representations for large values of α, but too strong inhibition obviously impairs the model behavior. The optimal value of γ depends on the task and the choices of other parameters, but we have not completely understood this dependence. Therefore, we manually tuned these values in this work. The analysis of optimal parameter settings in LAM is an interesting open question that we wish to address in future studies. We discussed this point in the Discussion section (L561 -L566). Reply -We thank the reviewer for the encouraging comments.

correctly, this is exactly what Schapiro's RNN does). If the distinction is that for LAM
is that the order of presentation of the memory patterns is irrelevant, then that should be highlighted, as opposed to the one-dimensional chain special case.
Reply -As the reviewer pointed out, our model is potentially applicable to non-sequential presentations of memory items. The model is applicable if the structure of heteroassociation and inhibitory connections are created by any learning scheme that can create the symmetric Hebbian connection matrix hypothesized in this study. Though we did not specify a particular learning mechanism, we assumed that neural circuits in the brain learn the statistical structure of episodes through sequential experiences, as supported by the existence of such strategy in the human brain (probably also in the brain of monkeys and rats). As doubted by the reviewer, our model may not be novel in that aspect. However, the main finding of this study resides in the recall process of hierarchical memory structures rather than the learning process. Especially, our model offers the flexible modulation of the scale of representations during the recall of community structures. Importantly, the representation scale needs not to be specified during the learning process. To our knowledge, no previous models provide this flexible recall of hierarchical community structures at multiple representational scales. Therefore, we believe that our model contains essential differences from the previous models even though these models were built in the same spirit.
We agree with the interpretation of "one-dimensional chain" that all sequential experiences are essentially one-dimensional chain of inputs. However, this does not necessarily imply that the structure behind sequential experiences is also onedimensional. For example, in Shapiro's experiment, sequences of sensory inputs were generated by random walk on a complex graph. Each sequential experience was a one-dimensional chain of items, but the underlying graph structure was not one-dimensional.
In contrast, the previous associative memory network model by Griniasty et al. (1993) was structurally one-dimensional because it modeled an experiment in which a fixed sequence of sensory inputs was repeated (Miyashita, 1989). This experiment is equivalent to sensory experience during going back and forth on the same one-dimensional track, but Schapiro's experiment is essentially different. We extended the associative memory network model for experiences on a one-dimensional chain to experiences on more general graph structures like Schapiro's experiment. To clarify this point, we have revised the related descriptions in the Results section (L136-139).
2. The interpretation of the sign of alpha throughout the paper seems a bit confusing without considering alpha_max or gamma -e.g. from Eq. 3, if alpha_max is very large, alpha = -1 means only local inhibition, but alpha = 0 means mostly local inhibition. Also in equation 2, the effect of alpha clearly depends on gamma.
Basically, it's unclear to me if the setpoint of zero carries any special significance (this is also evident in figure 2).
Reply -We agree with the point raised by the reviewer. We regarded zero as a threshold because one of the major novel points of our work is considering a negative region of alpha. However, both theoretically and empirically, α=0 is not a phase transition point for the model. 4. Some results in the paper, in particular figures 4, 5, and 6 are largely qualitative.
It will be useful if the authors compute relevant summary statistics and perform the corresponding statistical tests.
Reply -In the previous studies of subgoal findings, the definition of subgoals by itself depended on the graph Laplacian (e.g., Simsek et al., 2005;Machado et al., 2017). There is no ground truth in such a case, and therefore these studies did not attempt to evaluate the accuracy of subgoal findings. To evaluate the goodness of subgoals, we should investigate, for example, the performance of reinforcement learning based on the subgoals identified by LAM. This is an ongoing project of our laboratory, and we will report the results in the future.
We showed the simulation results about theta sequences for comparison with experimental data (Fig. 7 in Gupta et al., 2012), but this experimental study did not provide summary statistics and statistical tests. Our current simulation settings were also simplified compared to the experimental settings. Therefore, we only showed qualitative matches in the present study.

Relatedly, it would be useful if if the authors put side-by-side neural data next
to the simulation results in Fig 6. Reply -Following the suggestion by the reviewer, we have put the experimental data ( Fig.   7 in Gupta et al., 2012) side-by-side with our simulation results (Fig. 8L in the revised manuscript). We feel that this modification has made the biological relevance of the model clearer. We thank the reviewer for the advice.

in the bottleneck and the overrepresentation model, the additional nodes do not
have the their own simulated patterns but are rather set artificially (line 591). I understand why this would be convenient for computational tractability, but it seems a bit misleading since the emergent similarity of the attractor states corresponding to those patterns is crucial for the result.