φ-evo: A program to evolve phenotypic models of biological networks

Molecular networks are at the core of most cellular decisions, but are often difficult to comprehend. Reverse engineering of network architecture from their functions has proved fruitful to classify and predict the structure and function of molecular networks, suggesting new experimental tests and biological predictions. We present φ-evo, an open-source program to evolve in silico phenotypic networks performing a given biological function. We include implementations for evolution of biochemical adaptation, adaptive sorting for immune recognition, metazoan development (somitogenesis, hox patterning), as well as Pareto evolution. We detail the program architecture based on C, Python 3, and a Jupyter interface for project configuration and network analysis. We illustrate the predictive power of φ-evo by first recovering the asymmetrical structure of the lac operon regulation from an objective function with symmetrical constraints. Second, we use the problem of hox-like embryonic patterning to show how a single effective fitness can emerge from multi-objective (Pareto) evolution. φ-evo provides an efficient approach and user-friendly interface for the phenotypic prediction of networks and the numerical study of evolution itself.

• minimal_project defines empty function for testing, and is a good starting point for user-defined functions. Notice the Notebook ProjectCreator also allows to define other evolutionary parameters such as initial topologies These project directories contain all necessary files to perform evolutionary runs similar to the ones presented in the main text and in previous studies. They can either be cloned using git or downloaded directly using the φ-evo download tools.
To use the download tools, the φ-evo library must be correctly installed. The following python script will download everything required for the evolution of a given project (here the adaptation one) on a local computer: in a python shell (e.g. Ipython). An evolution is run with the run_evolution script. The project path is specified using the -m option. The following command # s h e l l p y tho n 3 r u n _ e v o l u t i o n . py −m e x a m p l e _ a d a p t a t i o n launches an evolutionary simulation with parameters stored in the directory project example_adaptation .
Instead of running an existing project one can also create a new project de novo as described in the documentation. To simplify the process of starting a new project, the ProjectCreator notebook can be used: # s h e l l j u p y t e r notebook P r o j e c t C r e a t o r . i p y n b A short tutorial video explaining how to create a biased version of the somitogenesis project with the ProjectCreator notebook is availabe on Youtube.
Once the simulation is finished, the results can be easily analyzed with the help of a jupyter notebook, e.g.: # s h e l l j u p y t e r notebook −−NotebookApp . i o p u b _ d a t a _ r a t e _ l i m i t =10000000000 AnalyzeRun . i p y n b A more comprehensive guide on how to analyze the results is available here. A short tutorial video explaining how to run the code and the analysis on a simple example is available on Youtube.

Immune "add-on"
We include implementation of in silico evolution of an immune recognition system similar to [4] as an independent package, where new interactions are defined as external add-ons. This illustrates how new interactions and new problems can be encapuslated for specific problems. A specific Notebook is also included for visualization. Check README in directory example_immune for more details. We show one result of the evolution of such network on Figure  (Left) We display the network topology found by φ-evo combining a proofreading cascade (nodes 8 → 9 → 10) with the regulation of one kinase. An adaptive sorting motif similar to what was observed in [4] has evolved, where the complex (node 8) formed by the ligand (node 0) receptor (node 1) interaction phosphorylates the kinase (node 4), which is implicated downstream of the proofreading cascade. (Right) The behaviour displayed shows the concentration of all species as a function of the number of ligands presented for two types of ligands (binding times τ = 3s and τ = 10s): the output concentration (node 10) displays characteristic flat lines at different levels for different binding times, allowing for discrimination.

Examples of runs
Because some simulations can take time and ressources to run, we also provide the several full evolutionary runs we used to generate the figures of the manuscript: The results can also be visualized using the AnalyzeRun.ipynb notebook or by following the instructions in the documentation.

Hox pareto
The complete simulation for the Hox Pareto taking a lot of disk space, only a portion of the original results is accessible through φ-evo . You can manually download the complete simulation here.

Supporting information : Pruning on somitogenesis and adaptation evolution
We simplify the end results of the simulations with an extra step of evolutionary pruning. Starting from the best network obtained from a simulation (somitogenesis or adaptation) the simulation is restarted allowing only edge and node removal and possibly parameter mutations. The purpose of the procedure is to get rid of the unnecessary complexity of the network obtained from the original simulation. The pruning did not reduce the structure of final network of the adaptation evolution.
On the somitogenesis example, the pruning removed an inhibition from O 3 on O 1 . O 1 needs an activation from S 2 in order to be present. A inhibition of O 3 on S 2 is enough to lead to a reduction of O 1 production. Figure S 2 shows the network before undergoing a pruning evolution.

Choice of the γ parameter
As said in the main text, evolution requires a smooth fitness landscape. It thus depends on the choice of the parameters β and γ, quantifying respectively the benefit and the cost of the protein production.
In a binary framework, there are four possible inputs configuration that we can write in the form of 00, 01, 10 and 11 (0/1 corresponding respectively to absence/presence of the Inupt). For each of these four input configurations, taken in this order, we can also express the Output configuration for a given logical gate. For instance, the AND logical gate can be encoded with 0001, meaning that out of the 4 input configurations, the network should respond only to the last one (11).
If we want to evolve such a logical gate, we need to adjust β and γ so that the corresponding fitness is minimal. Fixing β = 1, and assuming that all input configurations have equal weight (rescaled to 1), we thus expect the following logical gates to have fitness equal to Logical Gate truth table fitness False 0000 0 We see that the best (lowest) possible fitness is obtained for the AND gate as expected by design. But if we want a smooth (downhill) landscape leading to this AND gate, we should ensure that intermediate steps leading to it are favorable, e.g. that gates such as Input1 or Input2 alone should be lower than the non responding gate that is permanently off. This imposes that γ < 1. The fitness gap between the best and the second best solution being γ, we also choose this parameter as large as possible to have a significant selective pressure. Since it needs to be at the same time significantly lower than 1, 3 4 represents a good compromise that we take.

Dynamics of evolution
We present dynamics of the evolution for one successful run on Figure S3. Given the discussion on γ above, we would expect to see two plateaus of fitness for the development of the two branches of the AND regulation (from 0 to γ − 1 to γ), however we always see only one sudden jump from 0 to the final fitness. When considering the network structure close to the drop in fitness, we actually see that the two parts of the regulatory mechanism appear close to each other in evolutionary time after a long period of unsuccessful trials. Starting at generation 380, it is only a matter of 20 generations to evolve the optimal topology as can be seen on Figure S 3. On other runs, the path can be slightly more involved but usually takes less than 50 generations. More generally, the evolution of fitness for different simulations typically exhibits a series of sudden burst corresponding to real innovations in terms of topology of the network that are followed by a characteristic relaxation phase where the parameters of the new topology are finely tuned to minimize the fitness.

Supporting information -Pareto front for evolution of Hox-like domains
The following calculation generalizes the one originally published in the Supplement of [3] to explain the shape of the Pareto front discovered here by φ-evo .  A). It is interesting to see that the two branches of the regulation appear within a small number of generations, thus starting the sudden fitness decrease which is followed by a slow parameter fitting corresponding to the relaxation shape.
The selection should drive the evolution toward networks in which the anteriorposterior axis should present as much diversity as possible in term of gene expression. However one also wants every cell to be specialized and thus to express one or at least only a few genes.
2 nd f itness : Based on the local entropy: Given the probability definitions: With c i,x , p i , p i|x and L being respectively the concentration of protein i in cell x, the global probability of protein i, the local probability of protein i in the cell x and L the number of cells in the embryo. The 1/L factor assigns the same weight to every cell. We consider the simple scenario where only the relative concentrations of the genes already expressed in a region is changed by a mutation. We also assume genes are only expressed in a unique region R of size K cells. Let evaluate a global fitness as in [3]: In a the region R where the protein i is expressed, let us rewrite p i|x = a i . For the cells outside of R, p i|x = 0. It also leads to p i = Ka i .
The contribution of R to the global fitness is And because the genes are present in only one region, the normalization condition ∑ i∈R a i = 1 applies: A change in the relative concentrations of the genes present in the region R affects the weights a i but not the size K. The normalization being still verified, the total fitness F remain unchanged.
As a consequence, a mutation that only changes the relative concentrations of the genes in a given region moves the network on a straight line of slope −1 in the (−F 1 ,F 2 ) space. This explains why most mutations localize on this Pareto front.

Quick introduction on the structure of the networks
Lastly in the following we very briefly review the main types defined in Python to encode the networks. More details can be found in the documentation of φ-evo .

Species
: classes_eds2.py Parameters: List of species types types List of possible tags. Species may have the following tags (the term in parenthesis indicate the existence of a second parameter attached to the tag and its type): Species A generic tag, mainly used for internal operation Degradable (float degradation rate), indicate a passive degradation of the species with a rate given as parameter. Note also that only Degradable species may be subjected to an active degradation (see 5.3.4) by an enzyme.
TF (bool activity), indicate that the species is a possible transcription factor, the activity indicate if it is an activator (1) or a repressor (0).
Kinase mark the species as a kinase, a species susceptible of phosphorylating others.
Phosphatase mark the species as a phosphatase, a species susceptible of dephosphorylating others.
Output (int nput), the species is a possible output, the index indicate the order in which the different outputs are chosen, if only one is needed to compute the fitness, the output tagged as 0 will be used.
Input (int nput), the species is an input, the index indicate the order in which the different inputs are used by the fitness.
Complexable The species is susceptible to bind to others to form Complex.

Complex
The species is the product of a PPI.
Phospho (int nphospho), indicates a phosphorylated species the parameters denotes the number of phosphate groups attached.
Phosphorylable Only species with phosphorylated tags can be phosphorylated.
Diffusible (float diffusion), the species diffuse between different cells, the diffusion rate is given as parameter.
Species is the most basic type, encoding chemical species in the cell (most often proteins).

TModule
: classes_eds2.py Parameters: Production rate r rate, basal activity b basal As shown in figure 4, TModules are the main elements in the architecture of the genes. They indicate the production rate which can be modulate by activator and the basal production that cannot. They are 'Species' like in that sense that they can only be linked to interactions (usually CorePromoter downstream and TFHill upstream).
A bare TModule (wihtout activation nor repression) produce its species at a rate given by: where the lower script (t − d S ) indicate that the whole expression is based on the delayed value where the delay d is a parameter of the core promoter (see below).

Interactions
Interactions are described in separate python module, however they are added manually to the core of the programm in the Networks/interaction.py.  Figure S 4: Inside representation of the gene regulation mechanism by our algorithm, the corresponding equations are given by 5. Note that repressor are multiplicative but only the highest activator is taken into account.

Core Promoter
: CorePromoter.py Parameters: delay d delay Core Promoter are the main element responsible for the addition of a new species in the network. However, as species also need TModule and other technical points, they are only added through the new_gene function which create directly the species, the CorePromoter and the TModule (see figure 4). The only parameter of Core Promoter are an integer called delay which determine the time (in number of integration steps) that the new species take to be produced. Thus it is only after a certain time that an activated protein will appear in the system.

Transcriptional Regulation
: TFHill.py Parameters: Activation/repression flag activity -Concentration threshold h threshold -Hill coefficient n hill Regulation (in the form of activation or repression) is another important function of living system to process information and adapt to their environment. In our algorithm, species which are tag as transcription factor (TF) are susceptible to control the production of other species by interacting with their TModule as depicted in figure 4. We use Hill's equation to model these regulations.
When regulated by other species, the production rate of a species S is given by the following expression: where r S and b S are the rate and basal parameter from the associated TModule (see 5.2), A i denotes the activators, R i the repressors and the lower script (t − d S ) indicate that the whole expression is based on the delayed value where the delay d is a parameter of the core promoter.
So in plain text, the strongest of the activator multiply the production rate and gives the un-repressed production if it is higher than the basal rate, otherwise, this rate is maintained to the basal level. Then each repressor add a (lower than 1) multiplicative term given by its hill function. Note that when several activators are simultanously present, only the strongest of them is taken into account.

Protein-Protein Interaction
: PPI.py Parameters: Association rate k on (association) -Dissociation rate k off (dissociation) Binding among protein are certainly among the most common interactions present in cells. Our protein-protein interaction (usually abbreviated PPI) take two existing species and create a new one corresponding to their complex. (The complex species is created when the PPI is added to the network and automatically deleted when the interaction is removed.). r a = k on S 1 S 2 (6) r d = k off C Given the reactant S 1 and S 2 the complex C is produced at a rate r a and dissociates back to its reactants at the rate r d . Note that the complex can then have reaction, degradation, etc. of its own independently of its reactant species.

Degradation
: degradation.py Parameters: Degradation rate r (rate) Each species is suceptible to degrade naturally with a fixed degradation rate δ which is under evolutionary control. However, species can also act as enzyme and promote actively the degradation of other ones. This interaction called degradation has for single parameter the degradation rate r and linked to an enzyme E and a degraded species S. The degradation rate of S is therefore given by (δ + rE) · S.

Phosphorylation
: Phosphorylation.py Parameters: phosphorylation rate k p (rate) -threshold h (threshold) -hill coefficient n (hill) -dephosphorylation rate k d (dephosphorylation) A given kinase K may phosphorylate different species and thus be involved in several phophorylations. The rate at which the species S is phosphorylated into species S p , is given by: where the dots stands for the other phosphorylations in which the kinase may be involved. The dephosphorylation is supposed to be constant at rate k d S p . Note that there is no phosphatase in this model.