A logic-based dynamic modeling approach to explicate the evolution of the central dogma of molecular biology

It is nearly half a century past the age of the introduction of the Central Dogma (CD) of molecular biology. This biological axiom has been developed and currently appears to be all the more complex. In this study, we modified CD by adding further species to the CD information flow and mathematically expressed CD within a dynamic framework by using Boolean network based on its present-day and 1965 editions. We show that the enhancement of the Dogma not only now entails a higher level of complexity, but it also shows a higher level of robustness, thus far more consistent with the nature of biological systems. Using this mathematical modeling approach, we put forward a logic-based expression of our conceptual view of molecular biology. Finally, we show that such biological concepts can be converted into dynamic mathematical models using a logic-based approach and thus may be useful as a framework for improving static conceptual models in biology.


Introduction
In 1965, the pioneering work of Jacob and Monod showed that DNA is transcribed to RNA and further translated into protein, and that the rate of transcription is controlled by a feedback loop in which protein regulates the activity of the transcriptional complex [1]. This vignette was also meticulously illustrated in the three kingdoms of life by Francis Crick [2] where he formulated an information transfer law in biological systems, namely the Central Dogma (CD) of molecular biology [3].
Analogous to the "gene" definition, which has been updated continuously in the past century and a half [4], this fundamental biological law has also been modified [3,5,6] since the flow of biological information is understood to be much more complex than previously thought. It has been discovered that DNA itself is not static and genes may be permanently silenced based on the needs governed by cellular circumstances [7]. Proteins may also transfer information in the alternate direction by silencing genes through epigenetic mechanisms. In addition, post-translational chemical modifications may switch a protein to active and inactive states, or may result in proteasomal degradation, entailing ubiquitin tagging followed by entering cellular recycling bins, i.e. proteasomes [3]. Moreover, genes are transcribed in the form of distinct splice variants, where exons are omitted or alternative combinations of exons are utilized, leading to a diverse set of gene products with functional differences. The latest species discovered in the field is microRNA (miRNA) which has led to further elucidation of the regulatory process of transcription in which information flow is blocked either by degrading the transcribed mRNA or by facilitating gene silencing at the DNA level [8][9][10][11] (Fig 1A).
The objective of this study was to compare these two different abstractions of information flow reported in 1965 and the present-day revised edition using a dynamic modeling approach. For this, we formulated the following questions. What is to be expected when this conceptual wiring diagram changes in a cell over time? Further, which of the known dynamic behaviors is detectable and which ones are of predictive value? On the basis of the availability of biological information and the nature of the problem at hand, different types of dynamic modeling such as ordinary differential equation (ODE), Wilson-Cowan and fuzzy logic models are used to model a biological question [12][13][14]. Given the fact that the models presented in this study demonstrated a qualitative problem of certain general concepts in molecular biology, we thus decided to apply discrete logical modeling with two states (i.e. Boolean modeling) to fathom potential dynamic behaviors. Even though Boolean modeling is the simplest species among dynamic modeling methods [15], Robeva and Murrugarra have previously illustrated that it can successfully exhibit complex dynamic behaviors such as bistability [16]. This capability mainly originates from the strength of the step function as a good approximation of the sigmoidal kinetic function of molecular bindings (e.g. enzyme-substrate interaction) and that thresholds exist in most biological processes [17][18][19]. In line with the above-mentioned questions, we aimed to ascertain the potential of logic-based modeling to map out the perceptual design of scientific notions.

Materials and methods
The following is the description of the CD models, used in this study, in which, all possible behaviors of single genes and encoded gene products are simulated without considering the combinatorial behaviors of multiple genes for simplicity.

Boolean dynamic models
The initial representative model of CD containing four components, namely DNA, mRNA, Protein and Activator was reconstructed according to the original studies on CD in molecular biology [1,2] (Fig 1B and 1D). Given the activatory or inhibitory effect of Protein on mRNA transcription, two possible versions of this model were constructed (Table 1). In this model, the long half-life components (i.e. DNA, Protein and Activator) are distinguished by a self-loop in Boolean rules (note that self-loops are not shown in Fig 1B and 1C). It is crucial to emphasize that the names used in the model do not fully comply with the meanings directly inferred in molecular biology. For instance, as described in the second column of Table 1, DNA refers to any gene which can be transcribed to coding RNA. Such definitions are according to the concepts put forth by Francis Crick [2] when splicing, post-transcriptional and translational modifications were still nebulous. Moreover, in this context, the label "Protein" denotes the active and functional protein and not simply any peptide sequence translated by ribosomes.
In the current edition, CD is more complex given the addition of new components and functional relationships (see Fig 1C and 1E and Table 2) where the main difference pertains to the concept of "turnover" of the dynamic molecules included in the present-day model of CD. The DegRNA and DegProtein components are thus used to represent the degradation of mRNA and proteins (i.e. DegProtein, DegRNA and Protein) respectively. Yet another revision is the inclusion of miRNA molecules in the model as a representative of all RNA species which negatively affect protein production by regulating mRNA half-life. Regulatory effects of miR-NAs on target mRNAs are not exclusively at the post-transcriptional level (miRNA-mRNA) and also show their effects in the form of miRNA-DNA interactions [9][10][11]20]. The definition of DNA and Activator has also changed after the emergence of epigenetics and discovery of the post-translational modification phenomenon. The "DNA" label is, therefore, used in this model to explain the non-silenced genes recognized by RNA polymerase and the "Activator" is a molecule that connects to "Protein" in a feedback loop to delineate the role of the latter in the "Activator" turnover. Specifically, the exogenous and endogenous activators are usually affected by proteins such as enzymes, transporters and channels. The rules of this model are derived from the vignette presented in [1-3, 7-11, 20, 21]. Based on the three functional relationships (Protein-mRNA, Protein-miRNA and miRNA-DNA) with each able to take opposite signs (i.e. activation and inhibition) generates eight different versions of present-day CD ( Table 2). For more details about Boolean models reconstruction, see S1 File.

Model simulation and identification of attractors
To identify the synchronous and asynchronous attractors, we used the R package "Bool-Net" [22] which conducts an exhaustive search in identifying attractors. The difference between these attractors is based on the updating procedure of components in each time step of the simulation. Therefore, to check the dynamic behavior of components, which are updated either at the same time or at different points of time, we undertook both synchronous and asynchronous updating simulations [15,19,23]. The synchronous analysis was separately undertaken on all the ten proposed versions whereas the asynchronous analysis was just done on the eight present-day model versions. In the synchronous simulation, the state-transition graphs were analyzed and the attractors, proportionate with the size of the basin of attraction, were identified. For the asynchronous simulation, the complex attractors were depicted in which betweenness and closeness centralities of each state (node) were illustrated by node size and color. Network visualization and centrality measure computation were implemented using Gephi (v.0.8.2) [24]. The probability of reaching each state was also calculated using Markov chain simulations. Therefore, probabilistic Boolean networks (PBN) was used to specify more than one transition function per variable/component with specified probability values. It should be noted that the probabilities of all functions for each component sum up to one. The state transition graph was then reconstructed by choosing one function per component using probability values [22,23]. In addition to all the eight versions of the present-day CD model, this analysis was also undertaken on the probabilistic version of the combined Boolean network (V1_V8) as shown in Table 3. Fig 1B derived from [1, 2]. The symbols "&", "|" and "!" represent "AND", "OR" and "NOT" operators. Finally, the plausibility of the reconstructed models was assessed using two different measures of robustness to noise and mismeasurements. The normalized Hamming distance, which is the fraction of different Boolean bits between the states and the corresponding perturbed copies was obtained from 100 randomly generated copies [25]. The Gini index, an index of homogeneity in the in-degree property of the nodes, was calculated in the state-transition graph [26]. Both measures were statistically assessed based on the z-test and randomized version of the networks to demonstrate statistical significance of difference (S2 File and S3 File). Fig 1C derived from [1-3, 7-11, 20, 21]. The symbols "&", "|" and "!" represent "AND", "OR" and "NOT" operators.

Logical rule
Node

Synchronous modeling
We found by simulation that, with the synchronous update, the old conceptual model possesses 11 simple attractors (or fixed points) out of the 16 possible initial conditions (  The symbols "&", "|" and "!" represent "AND", "OR" and "NOT" operators. Our results indicate that by including steady states in the present-day model of CD, obtained by the synchronous updating method, this model becomes more complex, hence more consistent with the current understanding of molecular biology. Overall, 12% (15 out of 128) of the possible initial states fell into the attractor set (Table 5). Compared with the 1965 model, which showed almost six-fold attractor percentage, the present-day model has a far more dynamic nature and thus a more realistic model. However, similar to the 1965 model, the null fixed point (state A) and the Activator ON state (state B) were observed in all eight versions with similar basins of attraction. There were five other fixed points that clustered the versions in the present-day model into three classes. The C and D states were observed in the Activatory (V1) and Gene silencing (V4) models which seems that without the basal expression of Protein, these models have similar behaviors with a final outcome of system collapse. State E clustered all versions of the model which contained inhibition of mRNA expression (i.e. mRNA expression inhibition (V2), Gene silencing & mRNA expression inhibition (V5), mRNA & miRNA expression inhibition (V7) and Inhibitory (V8) versions. Although the turnover of RNA species was considered in this model, the intactness of mRNA in the model is in line with amino acid paucity which is not included in the model. In addition, even if the translation had occurred and amino acids were in excess, the proteins were nonetheless not activated due to the absence of the Activator. Therefore, it is not surprising for state E to have appropriated a small fraction of all states (12.5%; 16 out of 128) in all the four mentioned versions. States F and G were analogous to state E but were restricted to the versions of the model which included inhibition of miRNA expression only (i.e. miRNA expression inhibition (V3) and gene silencing and miRNA expression inhibition (V6)).

Nodes Logical rule Probabilities
Unlike the previous models, there were two limit cycles playing the attractor role in this simulation. State H is the common cycle between mRNA expression inhibition (V2) and Gene silencing & mRNA expression inhibition (V5) attracting more than 50% of all initial states. Similarly, state I is the common cycle between those having inhibition of RNA species expression namely mRNA & miRNA expression inhibition (V7) and Inhibitory (V8). A common trait in all limit cycles was the constant presence of DNA and Activator. It is, therefore, expected that steady states are related to the long half-life components of the biological systems.

Asynchronous modeling
Using the asynchronous Boolean dynamic approach, we analyzed the present-day model of CD to explore its attractors. Tantamount to the fixed points found in the synchronous updating model, there were seven simple attractors (states A to G) with profiles identical to previous attractors (Table 6). However, the complex (loose) attractor was found to be different from the limit cycle attractors of the synchronous approach and appeared to be more convoluted. This large complex attractor comprised 32 nodes and 84 edges (state H) (see Table 6 and Fig 2) where nodes played various roles in the graph. Given that the node size is proportional to betweenness centrality, larger nodes (i.e. states) are those with more presence in transitions from one state to another. With respect to closeness centrality (represented by node color), darker nodes are those with a higher closeness centrality measure and thus more accessible or close to other nodes. The node 1111111, which is the full ON state of all components, had the highest closeness in this loose attractor, however, not the highest betweenness. The highest betweenness value was observed for the state in which the degradation components were not active (i.e. 1001111). Furthermore, simulation data suggest that the inhibition of mRNA expression alone or in combination with miRNA expression inhibition and gene silencing is the main contributor to oscillatory behavior of the molecular machinery.  1011111,1101111,1001111,1111011,1011011,  1101011,1001011,1111101,1011101,1101101,1001101,  1111001,1011001,1101001,1001001,1111110,1011110,  1101110,1001110,1111010,1011010,1101010,1001010,  1111100,1011100,1101100,1001100,1111000 To estimate the probability of reaching an attractor following a large number of iterations, we performed a Markov chain simulation on the present-day CD model. In Table 7, the states reached at the end of the simulation (attractors) are listed for all versions of the model. In addition, the combined version (V1_V8) was added to the analysis to systematically study the global dynamic network by incorporating all versions, an approach which is robust in tackling uncertainty. In all versions, it was equally probable to reach the null fixed point and fixed

Robustness of the reconstructed model
To assess the plausibility of the CD models, we tested model robustness to noise and mismeasurements in all versions of the 1965 and present-day models of CD. As shown in Table 8, the present-day model was more robust than the previous model. In specific, the Hamming distance was significantly smaller in all eight versions of this model, meaning that by applying noise to the network states in these models, attractors remained largely unperturbed. The measure of indegree homogeneity (Gini index) was assessed in the state transition graph of all versions in both models and was further compared with the randomized graph to assess statistical significance. Generally, as expected in biological networks, the present-day model is enriched with many low in-degree nodes along with a few high in-degree nodes in the state transition graph, indicating that the updated model is more consistent with the reality of biological processes.  Fig 1C in all versions. These probabilities were estimated based on 1000 iterations. The last column represents the absorption probabilities of the probabilistic Boolean network of the present-day Central Dogma (See Table 2

Conclusion
Biology, in this contemporary era, is an edifice made up of two primary axioms as building blocks, Darwin's theory of evolution and the Central Dogma of information flow in molecular biology [3]. Although the overall structure of the latter (i.e. information flow from DNA-> RNA-> Protein) has remained intact over the past half-century, the complexity of this cascade has surprisingly increased. Discovering diverse species of RNA with distinguishing roles, as well as dynamic features of DNA, mRNA and proteins, from cell to cell and over time, contribute to this complexity [4,27,28]. The Boolean network modeling, as a dynamic framework, was essential to the development of the CD models and providing further insight into gene regulation. We show that in addition to predicting reliable steady states, the present-day model of CD is more robust against and insensitive to noise than the 1965 model. Using this mathematical modeling approach, we put forward a logic-based expression of our conceptual view of molecular biology [29], thus, not only validating our abstract understanding of molecular biology, but it also improves our predictive power of biological processes.