Evolution of the Division of Labor between Genes and Enzymes in the RNA World

The RNA world is a very likely interim stage of the evolution after the first replicators and before the advent of the genetic code and translated proteins. Ribozymes are known to be able to catalyze many reaction types, including cofactor-aided metabolic transformations. In a metabolically complex RNA world, early division of labor between genes and enzymes could have evolved, where the ribozymes would have been transcribed from the genes more often than the other way round, benefiting the encapsulating cells through this dosage effect. Here we show, by computer simulations of protocells harboring unlinked RNA replicators, that the origin of replicational asymmetry producing more ribozymes from a gene template than gene strands from a ribozyme template is feasible and robust. Enzymatic activities of the two modeled ribozymes are in trade-off with their replication rates, and the relative replication rates compared to those of complementary strands are evolvable traits of the ribozymes. The degree of trade-off is shown to have the strongest effect in favor of the division of labor. Although some asymmetry between gene and enzymatic strands could have evolved even in earlier, surface-bound systems, the shown mechanism in protocells seems inevitable and under strong positive selection. This could have preadapted the genetic system for transcription after the subsequent origin of chromosomes and DNA.


Introduction
The RNA world is ''almost a logical necessity'', for example by the fact that aminoacyl-tRNA synthetases are not among the most ancient proteins [1]. Despite eminent attempts [2,3] we still lack a generalized RNA replicase that would be able to unzip and copy general, long RNA templates, similar to the contemporary activity of, say, the Qb replicase [4], made of protein. A way out could be the assembly, out of replicable shorter pieces, of a replicase and an associated ligase [5], encouraged by the recent finding of a collectively autocatalytic ligase-based RNA network [6]. Twenty years ago the possibility of an early evolution of a division of labor between gene (z) and enzymatic ({) RNA strands was raised: ''The fate of both the plus (z) and minus ({) strands is important for the following discussion. If both strands are to be replicated, both of them must be recognized by the replicase: the 39 and 59 ends of the same strand must therefore be complementary (it is assumed that replication goes in the 59R39 direction as today). Interestingly, violation of such a complete symmetry opens up the possibility for a very early origin of ''transcription'' in the form of replication bias. If the plus strand is the gene, and the minus strand is the ribozyme, naturally it pays to make more enzymes than genes. If the tag of the minus ribozyme acts as a weaker target (owing to some point mutations, for example) for the replicase, this shift in ''emphasis'' is guaranteed'' ( [7], p. 448). The authors noted that there is such asymmetry in contemporary RNA viruses [8].
Besides their target affinity, the complementary strands of RNA molecules also have to be different regarding enzymatic activities. It is not inconceivable that complementary strands of RNAs can act as enzymes: Sergei Rodin has convincingly argued that this could have been the case for at least some tRNA [9] and aminoacyl-tRNA synthetase [10] species. It is thus biologically plausible to assume a system where both RNA strands would be weakly enzymatic, but in general this would imply different functions (unless the two strands are palindromic or the contexts in which the strands must act are highly comparable, as in the Rodin case). To conclude, a truly symmetric initial condition in enzymatic activities cannot be very common. Having said that, it is probable that one strand would lose the weak enzymatic function, whereas the complementary strand would be optimized for its enzymatic activity.
As it is likely that some surface-bound metabolic complexity preceded the advent of protocells (e.g. Ref. [11]), earliest ribozymes may also have acted on surfaces [12,13], including evolving replicases [14]. It is in the context of such a surface-bound replicase population that the evolution of strand asymmetry has been dynamically investigated by the technique of cellular automata [15]: the authors have shown that strand asymmetry evolves (assuming a strand-displacement replication mechanism), but depending on diffusion and decay rates in a complex manner; sometimes genes rather than enzymes dominated the population. No model in the context of metabolically active ribozymes [13,16] is known.

Results/Discussion
Here we address the problem of RNA strand asymmetry in the context of metabolically active ribozymes encapsulated in reproducing protocells, relying on the stochastic corrector model [12,17] for the basic dynamics. There are two different ribozymes (T~2) that are assumed to be essential for protocell growth and reproduction ( Figure 1). In contrast to previous treatments plus (z) and minus ({) strands are explicitly considered. For simplicity we assume that only minus strands are enzymatically active. All templates grow stochastically within each protocell, and protocells also grow and divide stochastically. There is selection at two levels: faster replicating templates within protocells have an advantage, but protocells with a balanced and adequately abundant ribozyme composition are favored [17]. Although we assume their existence, we do not explicitly model replicase molecules, except that a limited number of templates can be replicated at the same time. Their effect is assumed to allow for copying of plus strand from minus strands and vice versa, including neat strand separation (which is still an unsolved problem in the origin of life studies [18]). It is assumed that minus strands being copied cannot perform enzymatic function at the same time, due to the opening of the catalytic sites. The two ribozymes are assumed to contribute to the production of the nucleotide monomers of the RNAs. One of the ribozymes (type 1) transforms a source material R available in the environment to intermediate L 1 , which in turn is transformed by the other ribozyme (type 2) to the monomer L 2 . The monomer L 2 is then consumed to build up the four different kinds of strands present in the vesicle. Concrete examples of similar ribozymes that could have helped sustain the RNA world have been successfully selected in vitro [19], including nucleoside synthesis, phosphorylation of nucleosides, activation of nucleotides, and processive RNA primer extension. The rates of these reactions are determined by the catalytic activities of the ribozymes. The enzymatic activities of the ribozymes are in trade-off with their replication rates (e.g., active ribozymes are more difficult to unfold due to a denser structure and substrate binding), and the relative replication rates compared to those of complementary strands are evolvable traits of the ribozymes. Both higher and lower relative replication rates of the minus strands are allowed to evolve. The traits can change at each replication due to mutations. When the within-vesicle concentration of RNAs reaches a critical level the vesicle splits into two and its content is divided randomly, without replacement, between the two resultant daughter vesicles. See Methods for details and Table 1 for parameters and their values used throughout this study.
Average copy number of plus strands can be reduced through evolution even to 1 or 2 gene strands per protocell in cases when trade-off is strong between replication and enzymatic rates. The survival of plus strands in such cases is ensured by the fact that they can be copied from the ribozymes. Figure 2 shows an example of such a successful division of labor between enzymes and genes. In Figure 3 we demonstrate that evolutionary trajectories converge to the same equilibrium ratio of division of labor from different initial states, even when the evolution of replication rates and enzymatic rates is not bound, but only limited by the trade-off function assumed, and when the replication affinity of the plus strand is also allowed to evolve ( Figure 3). Less pronounced division of labor is observed for weaker trade-off between replication rate and metabolic efficiency ( Figure 4A), for higher numbers of molecules per protocell ( Figure 4B), and for higher food concentration and kinetic rate constants ( Figure 4C). By far the strongest effect is that of the trade-off, which is understandable, since it is a trait that affects every ribozyme individually. The mild decrease with protocell size is due to the fact that if there are many RNA molecules in total, there are likely to be many enzymes present anyhow, thus the force of selection should decline with protocell size. Similarly, higher food concentrations and higher kinetic rate constants reduce the force of selection for very high enzymatic efficiency. We note that some division of labor evolves even with negligible trade-off: this we attribute to the metabolic cost of the templates. In short, for the same total template copy number, protocells harboring more enzymes than genes are better off than those with reversed proportions, since the former carry a smaller load of ''useless'' templates (redundant genes). This effect becomes more pronounced with low food concentrations and kinetic rate constants, as in these cases the selective advantage of protocells with more enzymes increases ( Figure 4C). Of course, assortment load (i.e., the drop in average fitness due to the random loss of any essential gene after stochastic assortment of templates in the two daughter protocells), and the fact that high enzymatic efficiency can already be reached without evolving high rate of strand asymmetry, prevents the system from evolving stronger asymmetry without strong trade-off.
High degradation rates can narrow the potential for the evolution of pronounced division of labor. Extreme trade-off between replication and metabolic activity selects for only few gene strands per protocell, hence a higher degradation rate easily eliminates them, and the few new genes synthesized from the ribozymes as templates may well suffer a similar fate: in the end the ribozymes cannot increase in number either, so all in all higher degradation rates lead to weaker admissible trade-off and result in weaker strand asymmetry ( Figure 5). Larger protocells could, however, survive at higher degradation rates, potentially allowing for strand differentiation at strong trade-offs (the lower right part of Figure 5, where populations do not survive at the parameter values employed).

Author Summary
The RNA world refers to the stage of early evolution when RNA macromolecules were responsible both for storing hereditary information and performing enzymatic activities. Conflict arises between these two functions, however, as enzymatic activities of the ribozymes are in tradeoff with their replication rates. Here we address this problem by investigating the evolutionary emergence of a primordial transcription-like system in model protocells inhabited by unlinked replicators. Our numerical analysis demonstrates that division of labor between genes and enzymes could have emerged, given that there was a moderate to strong tradeoff between the enzymatic and template efficiency of one strand of the ribozymes. This division of labor results in a strong asymmetry in the numbers of the enzymatic and genetic strands of the macromolecules, in favor of the former. We offer insight into the emergence of the first transcription-like system, which is today characteristic of all known life forms.
Division of labor between genes and enzymes can only be partial in our model, as expected in an RNA-based system, since a complete replication cycle requires that both strands act as templates to some extent. Division of labor implies that entities required to perform two different tasks end up with one doing (mostly) one of the tasks while the other do the other task. In our case this means that one strand acts mostly as an enzyme, while the other acts mostly as an information carrier. As only one of the strands in our model has enzymatic activity, that strand can be called an enzyme. Both of the strands need to act as templates, Vesicles are composed of two types of macromolecules (type 1 as red, and type 2 as blue), and with two strand types (plus (z) strands with light, and minus ({) strands with dark shading). The minus ({) strands (molecules colored dark red) serve both as enzymes (enzymatic activity indicated with asterisk) for producing monomers (molecule colored green) from source material, and as templates for producing plus (z) strands (molecules colored orange). The monomers are used as the building blocks (green arrow) for the productions of replicators (replication complexes are indicated in curly brackets). The plus strand only serves as template for producing minus strands. For molecule type 2, the metabolic and replication processes are similar to those of molecule type 1 described above, except that the minus ({) strand catalyzes a different chemical reaction. doi:10.1371/journal.pcbi.1003936.g001 Table 1. Parameters of the model. otherwise information is lost, but as one of the strands mainly acts as template and does not have enzymatic activity, we can call this a gene.
We would like to note here, that division of labor does not require sharp, and full specialization in different tasks: intermediate degree of specialization with the interchangeability of taskperforming entities suffices too [20]. The important point is that the gain in performance due to specialization in one task exceeds the cost of loss of performance due to specialization in another task [21]. Let us give two unrelated examples from biology to illustrate our point. In eusocial insects, such as bees, division of labor often implies high degree of specialization in different tasks, as for example the queen and the workers are quite different in morphology and in behavior. But among the workers there are behavioral, but no morphological, castes, groups that tend to perform different task (cleaning, foraging, tending the young, etc.) [22]. Moreover, in primitively eusocial wasps, like the Ropalidia marginata, even the queen cannot be distinguished from others except for the presence of well-developed ovaries [4]. The other example comes from clonal plants (such as strawberry), where the members are connected physiologically. In these plants, one ''plant member'' (called the ramet) may specialize in the uptake of belowground nutrients, and thus develop an extensive root system, while the other specializes in the capture of light and develops bigger leaves [5]. Only the relative investments change into shoot or root, but ramets still have both functioning root systems and leaves. In biology, it is thus common to observe intermediate levels of division of labor and functional specialization within the boundaries set by physiological and developmental constraints of an organism.
Chemical difference between the enzymes and the templates is not a requirement for division of labor between genetic and enzymatic functions. The present neat chemical distinction between genes (DNA) and enzymes (proteins) is a rather late invention. Comparative analysis of the genes involved in DNA replication [1,2] and the age of protein domain fold required for dNTP synthesis [3] suggest that the emergence of DNA genome was a late phenomenon which could have happened after the LUCA, thus was most likely a successor to the RNA world. As the authors of a somewhat related theoretical work note: ''DNA releases RNA from the trade-off between template and catalyst that is inevitable in the RNA world and thereby enhances the system's resistance against parasitic templates'' [23] (p. 2). It is exactly this trade-off that drives the evolution of the division of labor in our protocellular system. (We note in passing that the analysis in Ref. [23] is not enough by itself to explain the advantage of DNA, since DNA molecules can also be selected to act as enzymes [24]).
We investigated the evolution of division labor between enzymatic and genetic strands based on the implicit assumption that minus and plus strands can have very different secondary structures. This indeed proves to be the case: on a sample of 10 million sequences, the distances between the secondary structures of minus and plus strands are slightly higher than those between pairs of randomly generated sequences ( Figure 6A). Furthermore, there is asymmetry in the complexity of secondary structures ( Figure 6A, C, D); and the difference between the free energies of folding can reach levels up to 20 kcal/mol ( Figure 6B). Thus, there is a fraction of complementary, folded strand pairs for which one member is more readily opened by a replicase than the other, due to the looser structure of the former ( Figure 6C). Here we have only considered the minimum free energy (MFE) structures of the RNAs. It is known that there are suboptimal structures that could be quite close energetically to the MFE structure [25], and thus provide additional ways in which the two strands can be different (albeit evolution can lead to well-defined structures with little ambiguity in their energetically close sub-optimal structures [26]). Co-folding of the RNA with smaller RNAs can further increase the structural diversity of RNAs [27], again possibly promoting functional diversification of the strands. Our conservative estimate of structural difference is sufficient for strand separation, and incorporation of further mechanisms can further foster the effect demonstrated above.
The origin of basic genetic operations, including replication and transcription, belongs to the key questions of the origin of life. While there has been considerable progress with template copying [2,3], unzipping remains an open problem [18] (but see [28]). In this paper we have shown that once evolution had reached the stage of reproducing compartments with unlinked ribozymes inside, division of labor between enzymatic and gene strands 2T), and of equal replication rates (r T,z={,J~0 :5) (J denotes the mutation class with trait r T,z={,J~0 :5). The trade-off in this case is assumed to be strong between the replication affinity and the catalytic activity. Hence the trait r T,{,i of the minus strand (B) gradually evolves towards lower replication rates (r T,{,i ?0) in order to achieve higher metabolic activity (m T,{,i ?1). During trait evolution the ratio of minus (dark shadings) and plus (light shadings) strands changes, and the minuses significantly increase in numbers. At stable equilibrium, for the very extreme cases, only 4-8% of the macromolecules, on average 2 or 3 per vesicle, are plus strands. Other parameters: V~1000, N~100, s~10, Z~10, n~1000, r T,z,i~0 :5, m~0:03, l~3, d~10 {5 , k~0:2,R R~10, K 1~1 0 and K 2~1 . doi:10.1371/journal.pcbi.1003936.g002 Division of Labor between Genes and Enzymes in the RNA World PLOS Computational Biology | www.ploscompbiol.org readily followed provided there was moderate to strong tradeoff between the enzymatic and template efficiency of ribozymes. This is to be expected due to the tightly folded structure of ribozymes (for example the Qb replicase replicates the X-motif ribozyme [29] very slowly compared to other, less complex secondary structures; A. Griffiths, personal communication). Furthermore, analysis of the minimum free energy structures of real ribozymes and aptamers indicates that there is a tendency of them being more thermodynamically stable than random sequences (81.9% are more stable than half of the random sequences; 59.6% are more stable than 75% of the random sequences; and 27,5% are more stable than 95% of the random sequences). This transcription-like process could have been augmented by the evolution of tags recognized by the replicase as envisaged by Szathmáry and Maynard Smith [7], although we have not included this component in the present model. We conclude that division of :01) converge to the same equilibrium. Solid and dotted lines depict molecule types 1 and 2, respectively. Filled circles represent the initial data points, while light shaded circles and rectangles represent the evolutionary endpoints for traits of molecules 1 and 2, respectively. For the above results we employed a continuous-trait model, in which traits were allowed to change continuously between 0 and 1, and mutant traits were drawn from a normal distribution with the resident trait as a mean and with variance s. Other parameters: V~1000, N~100, s~10, Z~10, r T,z,i~0 :5, m~0:025, s~0:025, d~10 {5 , k~0:5,R R~10, K 1~1 0 and K 2~1 . doi:10.1371/journal.pcbi.1003936.g003 labor between genes and enzymes was under strong positive selection in the RNA world.

Characteristics of the RNA molecules
An RNA molecule M i of length s (s~10) is characterized by its type T (T~2), its role of being a ribozyme ({) or an informational strand (z), and a combined trait r representing the replication affinity and the polymerization rate of the given molecule. We assume r T,z, :~0 :5 for the (z) strands (except for We assume that the ribozymes cannot perform any metabolic function during the replication process, as the molecule is in an unfolded state and cannot form the pocket responsible for enzymatic activity. Thus there is a trade-off between the processes of replication and catalytic activity which is characterized by the following one-parameter function and for additional investigations (see Fig. 3) we also allow where k characterizes the strength of this trade-off (kƒ1). Mutation can occur with probability m at each replication of the molecules. We allow r T ,{, : to change in a discrete manner: where n is the number of mutation classes and c is randomly drawn from Poisson-distribution with parameter l. There is an equal probability of having mutants with higher or with lower traits compared to the original trait. We opted for discrete traits as it facilitates faster convergence to evolutionary equilibrium, as our additional studies indicated similar result can be attained employing continuous traits (see Fig. 3). In the latter case traits are allowed to change on a continuous scale, and mutant traits are  :8), the asymmetry between the minus and plus strands is strong, however as the strength of trade-off decreases (1{k?0), since in these cases molecules can achieve high metabolic activity without trading off their replication affinities, the asymmetry becomes less pronounced.
(B) As the number of the initial number of molecules (N) per vesicle is increased (N~25, 50, 100, 250, 500, 1000, 2500) the rate of asymmetry gradually decreases (1{k~0:8). (C) The effect of kinetic parameters for strong trade-off (blue lines: 1{k~0:8) and for weak trade-off (green lines: 1{k~0). Here we increased the inflow rate of source material from the environment into the vesicle (R R) (light blue and green lines: K 1~1 and K 2~1 ; middle dark blue and green lines: K 1~1 0 2 and K 2~1 0 2 ; dark blue and green lines: K 1~1 0 4 and K 2~1 0 4 ). For low inflow rate and kinetic constants, high metabolic activities of minus strands evolve, which results in high rate of asymmetries between the two strands. However lowering the inflow rate or the kinetic rate of reactions beyond a threshold results in the extinction of replicators (notice the absence of equilibrium ratio of asymmetry, for example R~1, K 1~1 and K 2~1 , i.e. left hand side of the light blue curve   The full replication cycle is completed after two steps of copying: M T,z,i ? sL2 M T,{,j ? sL2 M T,z,h . We assume a limited number of replicase enzymes in a vesicle, hence we limit the number of simultaneous replication processes to Z. We apply the Gillespie algorithm [30] to follow the reactions within the vesicle. We introduce the quantity a v (t)dt that characterizes the probability of reaction v[ C 1,i ,C 2,i ,P T,z,1,i ::: f P T,z,s,i ,P T,{,1,i :::P T,{,s,i ,D T,z,i ,D T,{,i g (i~1,:::,n, T~1,2) in the time interval (t,tzdt). a v (t) is the product of two factors: the chemical constant for the given reaction type v and the number of possible reactions within a given vesicle. For the reaction C 1,i We note that the input material R has a fixed concentrationR R.
Similarly, for reaction types e.g. P T,z,2,i and P Degradation is a monomolecular reaction; its probability is proportional to the present amount of the given molecules. The chemical constants d for degradation is common for all types of macromolecules M. The degradation and dissociation of replication complexes is neglected in our model.
We define the sum of all a v 's as The time t after t at which the next reaction will take place is drawn from an exponential probability density function of rate a 0 : At time tzt, we choose reaction v as the next reaction with probability a v =a 0 in the vesicle. We then update the number of different molecules according to reaction scheme v and the process is reiterated.

Population dynamics of protocells
The population is composed of V number of protocells, with the initial number of N replicating molecules, andL L 1 number of intermediate andL L 2 number of building block molecules. The number of RNAs can increase up to 2N, at which point the vesicle splits randomly assorting all the replicator molecules into two daughter vesicles. During splitting, small molecules L 1 and L 2 , as well as the initiated replication complexes are also randomly allocated to the daughter vesicles. One daughter vesicle is replacing the parent, while the other replaces another random vesicle in the population (i.e. it is a Moran process [31]).

Structural similarity of complementary strands
We have assumed that complementary strands can be quite dissimilar in structure, so that one of them can fold to be a ribozyme while the other has a structure that can be more readily processed by the replicase enzyme. We check if complementary strands can be dissimilar enough to potentially achieve such a state. We have determined the minimum free energy structure of 10 7 random RNA sequences of length 50. We have also determined the minimum free energy (MFE) structure of 10 7 random complementary pairs of RNAs of length 50. Each individual sequence's structure is compared to the structure of the next sequence to obtain the full tree edit distance [32] between the two structures. Similarly, the distance between each complementary pair of sequences is also determined. All computations are done with the Vienna RNA Package 2.0.7 [33].

Thermodynamic stability of ribozymes and aptamers
We analyzed the set of 305 ribozyme and aptamer sequences mainly from the Aptamer Database [34] and from the review of Chen and co-workers [35] (the full list is reported in Supplementary Table S1 of [36]). For each sequence the MFE was determined. Then we generated 100,000 random sequences of the same length and recorded their MFE. Then we counted the number of random sequences having lower MFE (i.e. being more stable) than the ribozyme/aptamer.