Phenotypes to remember: Evolutionary developmental memory capacity and robustness

There is increased awareness of the possibility of developmental memories resulting from evolutionary learning. Genetic regulatory and neural networks can be modelled by analogous formalism raising the important question of productive analogies in principles, processes and performance. We investigate the formation and persistence of various developmental memories of past phenotypes asking how the number of remembered past phenotypes scales with network size, to what extent memories stored form by Hebbian-like rules, and how robust these developmental “devo-engrams” are against networks perturbations (graceful degradation). The analogy between neural and genetic regulatory networks is not superficial in that it allows knowledge transfer between fields that used to be developed separately from each other. Known examples of spectacular phenotypic radiations could partly be accounted for in such terms.

Introduction Alan Turing, the father of machine learning, also formulated one of the most important mathematical models in developmental biology: the reaction-diffusion model for pattern generation [1]. This is striking because, although both gene regulatory networks [2][3][4][5] and associative neural networks [6][7][8] have extensive literature, only recently a conceptual analogy between evolutionary developmental processes and artificial neural network-based learning models has been articulated. [9][10][11][12][13][14]. Since development is the process whereby the phenotype is specified by the evolving genotype, late-evolved morphologies or functional capacities retain aspects of earlier stages ("memory") that were likely shaped by natural selection. These earlier stages might become reactivated if they are again useful in a different or a changing environment [15]. In this formulation evolutionary changes provide no novel structures that are nonhomologous to an ancestral or existing one [16,17], but allow for recursion. For instance, mimetic color patterns of an extinct morph of the butterfly Heliconius cydno, presumably as a result of human disturbance, can be reconstructed from wild-caught butterflies [18], meaning that the morph could recur in nature if the former conditions reappear. Also surprising is the repeatability of evolution among closely related lineages [19,20]. An iconic textbook example is the extraordinary morphological convergence associated with adaptation to distinct ecological niches in cichlid fishes [21], with a large taxonomic diversity in the African Great Lakes Tanganyika (the oldest radiation, around 9-12 Myr ago with about 250 species), Malawi (less than 0.8 Myr ago and over 700 species) and Victoria (about 700 species evolved within the past 15,000 years) [22].
The idea that developmental processes can retain a memory of past selected phenotypes [13], together with the exceptional ability of genomes to find adaptive solutions that quickly converge upon remarkably similar states ("attractors" [23]) in closely related lineages, clearly suggests a non-linear genotype-phenotype mapping capable of producing multiple distinct phenotypes [13,24]. Non-linearity is also the hallmark of reaction-diffusion (Turing) and signaling systems involved in patterning processes [25], and developmental evolutionary biology (evo-devo) views the genotype-phenotype mapping as highly non-linear [26,27]. Furthermore, it might not be farfetched to think of some sort of developmental memory in the cichlid adaptive radiation. The explosive diversification in Lake Victoria was predated by an ancient admixture between two distantly related riverine lineages, one from the Upper Congo and one from the Upper Nile drainage [28]. Many phenotypic traits known to contribute to the adaptation of different ecological niches in the Lake Victoria radiation are also divergent between the riverine species [29,30]. Thus, when referring to the anatomical and morphological variation of Haplochromine cichlids, which are at the origin of the Lake Victoria radiation [28], Greenwood writes [30, p. 266]: "It is amongst the species of these various lacustrine flocks that one encounters the great range of anatomical, dental and morphological differentiation usually associated with the genus. The fluviatile species appear to be less diversified, but even here there is more diversity than is realized at first." If the high diversity in the Haplochromine cichlids of Lake Victoria is, to some extent, the result of re-evolved (similar) phenotypes in the ancestral fluviatile lineages, then the enduring question of why such an explosive diversification happened within a short time interval might have a simpler solution than previously thought. We aim here to sketch what the solution could be.
The genomic program for development operates primarily by the regulatory inputs and functional outputs of control genes that constitute network-like architectures [31], which are mathematically equivalent to artificial neural networks [10,11]. Although the insights of Vohradsky [10,11] and Watson et al. [13] shed light on an important analogy between neural and genetic regulatory networks, the conclusion of the theory of autoassociative networks cannot yet be readily extended to developmental systems. This is because of the different state space representations, as well as the nature of the task to be solved. Models of autoassociative networks tend to work with positive/negative state variables (inherited from ferromagnetic systems, but see [6]). In contrast to this, in ontogenetic systems the relevant space is that of nonnegative real numbers, corresponding to concentrations of different molecules, see e.g. [32]. Due to the nonlinear activation function features of models working with the above mentioned, alternative state representations can markedly differ. Another important consideration is that autoassociative networks (as their name indicates) solve the problem of the recovery of a particular state (attractor property). During ontogeny we require something more: not only should the adult stage be stable, but the system should reach this state from a particular embryo state (referred to as heteroassociative property). Moreover the system has to find transitions from different embryo states to corresponding, different adults states. In short, we require networks that solve problems of auto/heteroassociativity in one.
In the present investigation we extend the theory of gene regulatory networks to involve both auto-and heteroassociativity. We derive a heuristic formula for regulatory weights to obtain a functional system with the desired properties and we follow how Darwinian dynamics shapes the regulatory networks to acquire these properties. We compare the resulting regulatory matrices and analyze their robustness against different kinds of perturbation. Evolution of developmental pathways is interpreted within the context of ontogenetic dynamics.

Developmental model
Our model is a formal description of ontogenetic development operating primarily by the regulatory inputs and functional outputs of control genes. Consider an organism with N genes. Its developmental state at time t, expressed by its gene product composition (e.g., proteins), can be represented by the vector p(t) = (p 1 ,p 2 ,. . .,p N ) T with each element being the quantity of the product of a gene. These quantities are assumed to change due to protein decay and gene expression processes. Following [13], the ontogenetic dynamics of the developmental state can be described by the difference equation where τ denotes the decay rate, τ denotes the maximal gene expression rate, f(.) is the activation function, and the matrix M stands for the regulatory network. An m ij entry of the matrix gives the regulatory effect of the product of gene j on the expression level of gene i; positive and negative elements imply activation and inhibition, respectively. The cumulative regulatory effects on any single gene i, i.e. the ith element of the product Mp, determine the gene expressions via a sigmoid activation function modelled here as f(x) = (1+tanh (ωx))/2, where ω is the slope parameter.
From an ontogenetic viewpoint, the role of the gene regulatory network is to guide the individual along a developmental pathway from an initial embryonic state p(0) = e to a specific adult state p(T)!a. In real systems, an ensemble of different developmental pathways is desired, each responsible for achieving some environment-specific adult state from a particular embryonic state. We used T = 150 iterations to reach the steady state, when the output does not change in time.

Evolutionary model
In the evolutionary model we considered a population of K individuals, with each member of the population represented by its regulatory matrix. All the interaction matrix elements were zero initially, representing an undeveloped regulation. Every individual shared the same environment, but the environment can change in time. We assumed Q = 3 number of different selective environments, each defining an embryonic state e (q) and a corresponding adapted adult state a (q) . The selective environments alternated randomly; if the average fitness of the population approached the optimum (w > 0:95 for at least 20 consecutive generations), or after 10000 generations, a new environment was chosen at random. In each generation the individuals underwent mutation, development and selection steps as follows.
Mutation: The mutation of the regulation network was implemented by adding a normally distributed random value, with zero mean and μ W variance, to a randomly selected matrix element. Matrix elements were clipped into the range [−1,1].
Development: The equilibrium, adult state of each member of the population was obtained by iterating Eq (1).
Selection: The fitness of individual k was expressed by a similarity index derived from the Euclidean distance between the actual adult state p(T) and the environment-specific optimal adult phenotype a (q) as w k ¼ 1 À ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi X N n¼1 p n ðTÞ À a ðqÞ Then the regulatory matrix of a randomly selected individual was replaced by that of the individual with the highest fitness (elitist selection). Although a stochastic Moran process [33] would be a more realistic selection scheme, for computational reasons simulations were performed using a relatively small population size (K = 100) that would result in too much genetic drift. Embryonic and (optimally adapted) adult vectors: The number of genes was N = 100 with a low average expression level of σ = 0.1, where 40% of the expressed genes were common, 20% were partially common (appear in two pairs only), and 40% were unique in all the embryonic and all the adult vectors. Specifically, the expression sites of the employed state vectors were e 1  where underlines and overlines denote the common and partially common elements, respectively. E.g. gene 13 is expressed in all embryo states (common element), gene 12 is expressed in two adult states (partially common element). The initial state was always a perturbed embryonic state. The perturbation was performed, similar to the mutations, by adding a normally distributed random value, with zero mean and μ e variance, to a randomly selected element of the environment-specific embryo vector. Vector elements were clipped into the range [0,τ/δ].

Perturbation analysis
To investigate the robustness of the resulting gene regulatory networks we evaluated their performance against three different kind of perturbations. The embryo states were perturbed by flipping the vector elements from low to high, or vice versa, with the given probability. The interaction matrices were perturbed by either adding random values to all matrix elements, drawn from a normal distribution with the given standard deviation, or by nullifying a proportion of the elements. Note that, in the evolutionary algorithm we perturbed only single elements of the embryonic states. In contrast, in the analytical matrix construction we perturbed all elements of the embryonic vectors to incorporate the accumulating effects of many consecutive perturbations on the interaction matrix.

Results
To perform the developmental task, the network must guarantee that (i) each adult state is a stable equilibrium point of the dynamics (stability condition), and (ii) each embryonic state is within the basin of attraction of its corresponding adult state (attraction condition); these two conditions correspond to the auto-and heteroassociative properties in a neural network [34]. Note that this is a more difficult task than a simple pattern recovery problem, which is known to be achievable by a neural network with the standard Hebbian learning rule that fulfils only the stability condition [7]. Not only must all the adult states have a basin of attraction, but these basins must include the corresponding embryonic states.
We found that the task-optimized structure of the regulatory network can be inferred from the embryo-adult state vector pairs in the form of an interaction matrix M (Fig 1). Consider the simplest case with one embryo-adult pair (i.e. one developmental pathway). Depending on whether a gene is expressed in the adult state or not, all the other expressed gene products, in either the embryonic or the adult state, must enhance or block its expression, respectively. This would provide, on the one hand, stability for the adult state and, on the other hand, attraction from the embryonic state. Note, however, that if a gene is expressed in neither the embryonic nor the adult state, then its regulatory effect is irrelevant, therefore the corresponding matrix elements are undetermined. In summary, an m ij element of the regulatory matrix M should be positive or negative, depending on whether the ith gene is expressed in the adult state or not, except when the jth gene is expressed in neither the embryonic nor the adult state. The above line of thought can be generalized for arbitrary Q number of embryo-adult state pairs. Denoting the zero-one normalized embryonic and adult state vectors by e and a, such a matrix can be obtained by averaging two dyadic products for all developmental pathways as ð2â ðqÞ À 1Þ �â ðqÞ þ ð2â ðqÞ À 1Þ �ê ðqÞ ; ð3Þ where Q stands for the number of embryo-adult state pairs and q denotes the different pairs. The first and second dyadic products are responsible for the stability and attraction conditions, respectively. Note the similarity of our treatment and Kurikava and Kaneko 2012 and 2013 approach [35,36]. Within each dyadic product the right term determines whether an entry is relevant from the viewpoint of the state vector, whereas the left dyadic term determines its sign. The resulting matrix contains positive values, negative values and zeros for activator, inhibitory and undetermined elements, respectively. Notice that the developmental pathways can be in conflict with each other as to whether a gene should be up-or downregulated by another gene. It is instructive to compare this formula with the standard Hebbian learning rule H = a�a for a i 2{−1,+1}. Its modification for a i 2{0,+1} vectors that preserves that stability condition is H = (2a−1)�a, which is identical to the first term in Eq (3), c.f. Table 1, and see [6].
(This is the standard procedure in ANNs with unsigned states as it converts the training patterns to signed values even though the state vector is unsigned. Natural selection will not operate like this-we are just showing how to hand-construct a solution that works.) We investigated the parameter dependence of the analytic model. As for the regulatory matrix we used a slightly modified version of Eq (3). Treves [8] claims that the interaction terms should be modified by the average expression σ, i.e. the proportion of expressed genes. This is because if a larger proportion of genes is expressed, then proportionally smaller interaction strengths are needed for the same regulatory effect on any single gene. Incorporating this  The performance of a regulatory network constructed by the above rule changes with the number of developmental pathways and gene expression levels (Fig 2). With increasing number of embryo-adult pairs, the accumulating conflicts between them inevitably corrupt the regulatory ability of the network; some adult states will be unreachable from their embryonic states. Nevertheless, the network is able to tolerate a fair number of conflicts, related to its structural stability. Since conflicts can occur only between non-orthogonal state vectors, the performance of the network also depends on the amount of overlap in the expression patterns of states belonging to different pairs. This highlights the importance of the proportion of expressed genes; i.e., the sparseness of the state vectors. If these vectors are very sparse, then they are likely to be orthogonal, therefore the number of learnable embryo-adult pairs does not reduce. The number of learnable pairs is a decreasing function of sparseness, because the numerous nonorthogonal state vectors and the largely different gene expressions in the adult states lead to several conflicts among them. Regarding the effect of system size on functionality, the results are in line with the expectations; the higher the number of genes, the higher is the number of "error free" developmental pathways (Fig 3). We have also analyzed the "memory capacity", i.e. the number of learnable developmental pathways as a function of the system size (N). There are several studies on the memory capacity of Hopfield networks [e.g. 37,38], but these results are not applicable to our hetero/autoassociative system, and the capacity definitions themselves are valid only in the N!1 limit, while our systems are relatively small (N�500). Therefore, we define the memory capacity of a given system as the average number of developmental pathways which can be reconstructed at r = 0.95 performance (denoted by Q 0.95 , c.f. Fig 2C).   [39]. Notice, however, that these values should be compared with caution on one hand due to the different interpretation of memory capacity, and on the other hand owing to the "double task" nature (heteroand autoassociative tasks) of our system.
A key question is whether a functional network is attainable by Darwinian selection via a series of mutation-selection steps. In our evolutionary model we used a more realistic Darwinian dynamics than the solitary stochastic hill climbing [13]. From the viewpoint of the theory of artificial neural networks this process can be regarded as a Darwinian dynamics-driven learning process. The evolutionary algorithm yields interaction matrices that contain positive and negative values where the heuristic formulation predicts them (Fig 5). While the individual interaction matrices vary, their average is in line with the heuristically derived matrix. The values are arranged into a characteristic structure; positive and negative entries form horizontal stripes, intermitted with vertical stripes of near-zero values (c.f. Fig 1). Those genes have the

PLOS COMPUTATIONAL BIOLOGY
Phenotypes to remember: Evolutionary developmental memory capacity and robustness largest effect on the developmental process, which are expressed in any embryonic or adult states (c.f. marked columns in Fig 5). Depending on whether the affected gene is expressed in any of the adult states, they have a strong positive or negative effect (c.f. marked rows in Fig 5). The rest of the genes drift freely in individual realizations due to a lack of selective pressure. Consequently, the average values in these positions are approximately zero (c.f. grey columns in left panel of Fig 5). The corresponding values in the analytic treatment (undefined elements) are zero by definition. The only major difference from the heuristic matrix is that the main diagonal elements of the evolutionary matrix are mainly negative, which means that the expression of every gene is under negative feedback by its own inhibitory product. A possible explanation is that without a strong negative feedback a gene could be easily overexpressed due to the perturbations of the interaction elements. This is more probable if the sparseness of the expression vectors is low, as it was in our case. This picture is likely to change with hierarchical developmental regulation, the evolution of which takes longer time and should be investigated in the future.
A detailed view of the evolutionary process is shown in Fig 6. During the early generations, where the gene regulation is undeveloped, it takes many generations (i.e., mutation-selection steps) to approach the environment-specific optimum. In addition, selection for one environment can have adverse effects on performance in another environment if the basin of attraction of the actually selected adult state engulfs the neighborhood of the embryonic states of other adult states. But those interactions which are not beneficial in any of the environments are eliminated. Deleterious mutations may arise any time also in a well-functioning system, but selection eliminates them over the timescale of a few environmental changes.
A developmental process must be sufficiently robust against stochastic perturbations of both the embryonic state and the gene interaction matrix [5,12]. It requires that the neighborhood (according to a given metric) of the embryonic states must also be in the basin of their

PLOS COMPUTATIONAL BIOLOGY
Phenotypes to remember: Evolutionary developmental memory capacity and robustness corresponding adult states. Therefore, some inputs of variation should produce little or no phenotypic variation at all, a phenomenon that has received a lot of attention under the labels of canalization, robustness or buffering [31,32,[40][41][42]. The recovery performance of the network changes with increasing amount of perturbations (Fig 7). The system is very robust against perturbations regarding the embryonic state, it is moderately robust against additive perturbations and has limited robustness against eliminated interactions regarding the interaction network. The sharp difference between the robustness of evolutionary and analytic models in the latter case is the consequence of the peculiarities of the evolutionary process. In the evolutionary matrix the regulatory weights are less evenly distributed as compared to the analytic one, due to the stochastic nature of the evolutionary process. It makes the system more sensitive to eliminating perturbations. Resilience understandably decreases with the number of developmental pathways in all cases, but conforming to "graceful degradation" in artificial neural networks; i.e., performance first decreases mildly and drops fast only beyond a critical strength of perturbation [7]. To sum up, variation is apportioned into discontinuous (basins of attraction) and continuous (small perturbations around the target) phenotypes (Fig 6B). Evodevo mainly focuses on the first kind of variation whereas standard evolutionary genetics focuses on the second [26,43].

Discussion
Treating gene regulatory networks as formally analogous to artificial neural networks [10,11] allows translating the well-known dynamics of the latter [44] to model genomic programs for development. There is widespread natural variation in morphogenic pathways [45], and the developmental memory of past selected phenotypes [13] is akin to the memory capacity of neural networks. This developmental memory allows populations to re-evolve phenotypes much faster than it would be possible if they had to evolve de novo. Previous speculation on the effect of the heat-shock protein Hsp90 as a capacitor for releasing hidden morphogenetic variation that could allow fast morphological radiations [45] has been criticized on the grounds that the function of Hsp90 is to prevent morphological aberrations. Furthermore,

PLOS COMPUTATIONAL BIOLOGY
Phenotypes to remember: Evolutionary developmental memory capacity and robustness some sense of purposive evolution, fully incompatible with the lack of foresight of natural selection, lays behind this sort of interpretations [46].
These criticisms do not apply here because in our developmental model past selected states can recur in the population if they appear useful again in a different environment or body context. As any theoretical model, ours obviously has inherent limitations and highly simplifies the representation of biological systems. However, to the extent that it captures sufficient conditions to generate the phenomenon of morphological radiations, more complex explanations are not required. Thus, the assumption that structural novelties (or "key innovations") are associated with adaptive radiations into new ecological niches (e.g. [47, p. 159]) might be

PLOS COMPUTATIONAL BIOLOGY
Phenotypes to remember: Evolutionary developmental memory capacity and robustness unwarranted. There is a noteworthy implication in the foregoing consideration for the understanding of atavism. Crocodilian teeth can grow in mutant birds, which suggests the reactivation of the associated developmental machinery [48], that required the resurrection [49] of a key aspect of regulation. The same neurons participate in the storage of different engrams in neural networks. The same holds for the storage of devo-engrams in genetic regulatory networks. Resurrection leading to atavism requires only limited reactivation of a few connections in a network that is maintained by the current selective forces. An exciting question is how evo-devo learning can generalize from the "training set" (previously selected target phenotypes) to novel ones [13,50]. The prediction described by these authors is that generalization potential works within a set that can be characterized by the same formal grammar.
While the theory of neural networks can (and does) infer the same conclusions based on different representations, in the case of modelling real biological situations the adequacy of the representation can be crucial (the same holds for neuronal networks). Our results show that a linear change to the representation has profound impact on the essential features of the system. While in the customary (neural) {-1,+1} representation there are no neutral elements in the interaction matrix, the biologically adequate {0,+1} representation of genetic regulatory networks allows for the free choice of interaction elements being opposite to "0". This feature turns out to increase the robustness of the system against the disturbance of interaction coefficients only if the system is very sparse, which guarantees the commonly zero elements in the embryo and adult states. Another feature of our representation is the large number of different interaction matrices entailing the same developmental process, thus evolution "from scratch" does not face so many constraints. In other words, starting with a single ancestor an extraordinarily rapid morphological diversification could be attained, which is the hallmark of adaptive radiations.