^{1}

^{2}

^{3}

^{4}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: OS. Performed the experiments: OS DY. Analyzed the data: OS DY. Contributed reagents/materials/analysis tools: OS DY. Wrote the paper: OS DY.

Recurrent connections play an important role in cortical function, yet their exact contribution to the network computation remains unknown. The principles guiding the long-term evolution of these connections are poorly understood as well. Therefore, gaining insight into their computational role and into the mechanism shaping their pattern would be of great importance. To that end, we studied the learning dynamics and emergent recurrent connectivity in a sensory network model based on a first-principle information theoretic approach. As a test case, we applied this framework to a model of a hypercolumn in the visual cortex and found that the evolved connections between orientation columns have a "Mexican hat" profile, consistent with empirical data and previous modeling work. Furthermore, we found that optimal information representation is achieved when the network operates near a critical point in its dynamics. Neuronal networks working near such a phase transition are most sensitive to their inputs and are thus optimal in terms of information representation. Nevertheless, a mild change in the pattern of interactions may cause such networks to undergo a transition into a different regime of behavior in which the network activity is dominated by its internal recurrent dynamics and does not reflect the objective input. We discuss several mechanisms by which the pattern of interactions can be driven into this supercritical regime and relate them to various neurological and neuropsychiatric phenomena.

The recurrent interactions among cortical neurons shape the representation of incoming information but the principles governing their evolution are yet unclear. We investigate the computational role of recurrent connections in the context of sensory processing. Specifically, we study a neuronal network model in which the recurrent connections evolve to optimize the information representation of the network. Interestingly, these networks tend to operate near a "critical" point in their dynamics, namely close to a phase of "hallucinations", in which non-trivial spontaneous patterns of activity evolve even without structured input. We provide insights into this behavior by applying the framework to a network of orientation selective neurons, modeling a processing unit in the primary visual cortex. Various scenarios, such as attenuation of the external inputs or increased plasticity, can lead such networks to cross the border into the supercritical phase, which may manifest as neurological and neuropsychiatric phenomena.

The anatomical abundance of lateral interactions [

An additional aspect of recurrently connected networks (relative to networks connected by feedforward links only) involves their dynamic properties. Networks with recurrent connections have been shown to form associative-memory related attractor states[

A central question regarding recurrent interactions, which has not yet been properly addressed, is how they evolve to facilitate the network’s computational capacity and what principles govern this evolution. Their optimal pattern within the network also remains unknown. In this work, we address these issues using a first-principle information theoretic approach, namely using the principle of maximum information preservation (also known as ‘infomax’ [

Tanaka et al [

The present work can be seen as a further extension of these earlier efforts, studying how the gradual development of a network’s recurrent interactions may optimize the representation of input stimuli. Unsupervised learning is applied in training networks to maximize mutual information between the input layer and an overcomplete recurrently connected output layer. The evolving pattern of recurrent interactions is investigated in a model of a hypercolumn in primary visual cortex, considered the base functional unit of V1, receiving input from both eyes, in a full representation of all possible orientations. Various constellations of input stimuli and network connectivity are examined, in aim of studying their relationship with different network measures. Methods to evaluate the optimal pattern of recurrent interactions in a neural network model and its dependence on the statistics of the external inputs were extended from Shriki et al. [

The general scheme and many methods applied in this study can be viewed as a direct evolution of the earlier work reported in [

The basic network model consists of two layers of neurons,

(A) The general network architecture characterized by an overcomplete representation with _{i} (_{i} (

During the presentation of each input sample, the input components _{i} are fixed. The dynamics of the output neurons are given by
^{−x})). We assume that the activities of the output neurons reach equilibrium after some time and define the output as the steady-state pattern of activity. For the cases we studied, numerical simulations of the network dynamics indeed stabilized and proved this assumption to be consistent. The steady-state responses are given by

To evaluate the neuronal representation of the external inputs we used the mutual information between the input and output of the network [

The adaptive parameters of the algorithm are the sets of feedforward and recurrent interactions, _{ij} and _{ij}. The learning rules for these parameters are derived from this objective function using the gradient decent method, as shown in [^{−1}−^{−1} and satisfies χ = ^{T}^{−1}^{T}

We defined several measures to characterize the behavior of the network and gain further insight into its dynamics. As described in the Results section, after the learning process converges, the networks tend to operate near a critical point. Thus, it is helpful to define metrics that may behave differently when the networks approach that critical point. One such measure is the time it takes the recurrent network dynamics to reach steady-state—the

To gain insight in the present context, we note that near a steady state, the linearized dynamics (in vector notation) are given by

To estimate the convergence time, we defined a criterion for stability of the neuronal activities and measured the time it takes the network to satisfy this criterion. This stability criterion means that for each neuron in the network, the difference in its activity between the current time step and the previous time step is smaller than a predefined small number.

When the network becomes supercritical, it converges onto attractor states, which reflect the underlying connectivity. In the context of orientation tuning, which we study here, a natural measure to quantify this behavior is the population vector [

Similar to previous papers concerning training of networks over natural scenes [

The feed-forward filters were set to be Gabor filters with the same center in the visual field and the same spatial frequency. The size of each Gabor filter was 25-by-25 pixels. The full feed-forward matrix was a product of two matrices: A 380-by-625 matrix containing a Gabor filter in each row, which was multiplied from the right by a 625-by-100 matrix representing the reconstruction after the dimensionality reduction.

Close to the critical point, accurate simulation of the network dynamics requires a long time due to the phenomenon of

When the evolving networks approached a critical point, the objective function tended to be very sensitive to changes in the pattern of interactions. In some cases, the objective function could even increase rather than decrease, implying that the learning rate was not small enough. To overcome this problem, we calculated the expected value of the objective function before actually updating the interactions. When an upcoming increase was identified, the learning rate was reduced by a factor of one-half and the process was repeated again.

To establish the credibility of our model, we first identified conditions under which a comparison between analytical and numerical results could be facilitated. This was achieved via a toy model of a visual hypercolumn, which is amenable to analytical solution in the limit of very low contrast. An important insight from this toy model is that in the low contrast limit, optimal information representation is obtained at a critical point of the network dynamics. These results are then verified using numerical simulations of this simple model. Using similar simulation approach, we next show that critical behavior also arises in a more complex setting, when natural images are used as inputs in the training phase.

The architecture of the network model is presented in _{0}, representing the orientation of a visual stimulus and amplitude (its distance from the origin), _{1},_{2}) = r(cos_{0},cos_{0}). For clarity, we consider periodicity of 360° rather than 180°, which is the relevant symmetry when considering orientations. The angles _{0} are distributed uniformly between 0 and 2

The network represents this two-dimensional input by _{ij}, _{i1},_{i2}) = r(cos_{i},cos_{i}) where _{i} = 2π_{i} and the network has a ring architecture (

The sensitivity matrix, _{i} given in

To investigate analytically the optimal pattern of recurrent interactions when the typical input contrast is low, namely when 〈_{ij} between the _{0} = _{0} = 1/4. Since the number of output neurons, _{i1} can be written as
_{i2}. We define the Fourier series of _{1}
_{n} = 0, where _{1} is the first cosine harmonic of the interaction profile, ^{T}_{1} approaches the critical value _{0} the objective function diverges to −∞. This means that the optimal pattern of recurrent interactions has the form
_{0}, the network settles into a homogeneous state with _{i} = _{0}, the network dynamics evolve into an inhomogeneous solution with a typical ''hill'' shape [

In the limit of 〈

Results of the toy model following gradient-descent learning that minimizes the objective function at a mean contrast of 〈

While running the numerical simulations, we noticed that the basic shape of the interaction profile appeared already at early stages of the training. During the rest of the learning process, the main factor that changed was the scale of the profile until it reached an optimal value. In this sense, although there were

When the mean input contrast during learning is not too low and not too high the recurrent interactions are less crucial for network performance.

Results of the toy model following gradient-descent learning that minimizes the objective function at a mean contrast of 〈

We next investigated a more complex network model of a visual hypercolumn (

The figure is organized similarly to

To characterize the network behavior after training it with natural images we examined its response to simple oriented stimuli.

We studied the long-term evolution of recurrent interactions in a model of a sensory neural network and their dependence on the input statistics. We found that under very general conditions, optimal information representation is achieved when the network operates near a critical point in its dynamics.

The study focused on a simplified model of visual hypercolumn, a local processing unit in the visual cortex. The feedforward interactions from the input layer to the output layer were manually set such that each neuron in the output layer had a certain preferred orientation. The recurrent interactions among these neurons evolved according to learning rules that maximize the mutual information between the external input to the network and the network's steady-state output. When the inputs to the network during learning were natural images, the evolved profile of interactions had a Mexican-hat shape. The idea that neurons with similar preferred orientations should effectively excite each other and that neurons with distant preferred orientations should effectively inhibit each other has been suggested in the past based on empirical findings, e.g. [

A learning algorithm for information maximization in recurrent neural networks was also derived in [

The present model is clearly overly simplified in many aspects as a model of the primary visual cortex. For example, the gradient-based learning rules employed here are likely to be very different from the plasticity mechanisms in the biological system, but the assumption is that they reflect the long-term evolution of the relevant neural system and converge to a similar functional behavior. Despite its simplicity, the model provides a concrete setting for examining the role of recurrent interactions in the context of sensory processing. This leads to general insights that go beyond the context of early visual processing, as we discuss below.

The dynamics of recurrent networks, like the one studied here, can allow the network to hold persistent activity even when the external drive is weak or absent. The network is then said to display attractor dynamics. In the context of memory systems, attractors are used to model associative memory [

This tendency to operate near a critical point can be explained intuitively. The task of the network is to maximize the mutual information between input and output, which amounts to maximizing its sensitivity to changes in the external inputs. The network uses the recurrent interactions to amplify the external inputs, but too strong amplification may generate hallucinations. Thus, the learning process should settle at an optimal point, which reflects a compromise between these two factors. An interesting insight comes from comparing the network to physical systems that may experience phase-transitions in their behavior. A universal property of these systems is that their sensitivity to external influences, or in physical terminology their

There are several important distinctions to be made when comparing previous research [

Second, here the critical point relates to the transition from normal amplification of external inputs to an attractor regime. At the supercritical regime, the network may present inhomogeneous activity patterns but it is not necessarily driven to saturation. In other words, the supercritical regime does not necessarily correspond to an explosive growth of the activity or to epileptic seizures. In the subcritical regime, the representation is faithful to the input and cannot generate hallucinations, but the activity does not necessarily die out. This should be compared with models based on branching processes, in which the supercritical regime generally refers to runaway activity and the subcritical regime refers to premature termination of activity. In the present model, the network may have a branching parameter of 1 in both the subcritical and supercritical regimes. In this sense, the type of criticality presented by this model can be thought of as a subspace within the space of all networks with branching parameter equal to 1. Furthermore, in contrast to [

The issues raised above call for future experimental and theoretical work aimed at elucidating the effect of input statistics on the approach to criticality and at characterizing the type of criticality that emerges. In particular, future modeling work should consider learning algorithms that optimize information representation in spiking and conductance-based neural networks, which have richer dynamics. An interesting approach to take spike times into account is proposed in [

An interesting universal phenomenon that occurs when networks approach the critical point is a change in the effective integration times. As demonstrated here, close to the critical point the time it takes the network to settle after the presentation of an input is considerably longer. This phenomenon is termed

Clearly, because the neurons in our network are characterized by their firing rates, the network dynamics are not rich enough to display spatiotemporal patterns of activity like neuronal avalanches, synchronized firing or chaotic behavior. Nevertheless, the rate models can often be translated to more realistic conductance-based neuronal networks, which display similar dynamics [

In real-life biological settings, the pattern of recurrent interactions in a network can be driven into the supercritical 'pattern formation' regime as a result of several possible mechanisms. One possibility is via direct application of certain drugs that increase the effective synaptic efficacy. Bressloff et al. [

Another plausible scenario for approaching criticality is through a high degree of plasticity. In numerical simulations of the learning algorithm, an important parameter is the learning rate that controls the step size of the learning dynamics and can be biophysically interpreted as the degree of plasticity [

A third route to criticality is through attenuation of the external inputs. When the external inputs to the network are very weak the recurrent interactions at the output layer compensate by further approaching the critical point. This process increases the effective gain of the network but may lead to instabilities in the network dynamics and to false percepts. For instance, such a mechanism may play a role in the generation of hallucinations as a result of sensory deprivation. An interesting example in this context is

It is also interesting to discuss how a network that became supercritical can return to the normal subcritical regime. In principle, the gradient descent learning algorithm should drive the network to the optimal point even when it is supercritical. However, the learning is based on certain continuity assumptions regarding the mapping of input patterns to output patterns, which may be violated in the supercritical attractor regime. In particular, we assume that there is an invertible continuous mapping between input and output with a well-defined Jacobian matrix. Topologically, the output space may become disconnected with different islands corresponding to different attractor states, making the mapping non-invertible and dis-continuous. Under these conditions, the learning algorithm may not be able to optimize information representation and bring the network back to subcritical dynamics. A similar phenomenon might happen in real brains, preventing the intrinsic learning rules from getting the network back to normal healthy dynamics.

Our findings suggest that optimal information representation in recurrent networks is often obtained when the network operates near criticality. This is consistent with a growing body of theoretical and experimental literature relating to near criticality in the brain [