Figures
Abstract
Bacteriophage ϕX174 has been widely used as a model organism to study fundamental processes in molecular biology. However, several aspects of ϕX174 gene regulation are not fully resolved. Here we construct a computational model for ϕX174 and use the model to study gene regulation during the phage infection cycle. We estimate the relative strengths of transcription regulatory elements (promoters and terminators) by fitting the model to transcriptomics data. We show that the specific arrangement of a promoter followed immediately by a terminator, which occurs naturally in the ϕX174 genome, poses a parameter identifiability problem for the model, since the activity of one element can be partially compensated for by the other. We also simulate ϕX174 gene expression with two additional, putative transcription regulatory elements that have been proposed in prior studies. We find that the activities of these putative elements are estimated to be weak, and that variation in ϕX174 transcript abundances can be adequately explained without them. Overall, our work demonstrates that ϕX174 gene regulation is well described by the canonical set of promoters and terminators widely used in the literature.
Citation: Hill AM, Ingle TA, Wilke CO (2024) A computational model for bacteriophage ϕX174 gene expression. PLoS ONE 19(10): e0313039. https://doi.org/10.1371/journal.pone.0313039
Editor: Sayed Haidar Abbas Raza, South China Agricultural University, CHINA
Received: September 9, 2024; Accepted: October 16, 2024; Published: October 31, 2024
Copyright: © 2024 Hill et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Our simulation and analysis scripts are available at: https://github.com/alexismhill3/phix174-simulation.
Funding: This work was supported by a National Institutes of Health grant R01 GM088344 and by the Jane and Roland Blumberg Centennial Professorship in Molecular Evolution and the Dwight W. and Blanche Faye Reeder Centennial Fellowship in Systematic and Evolutionary Biology at The University of Texas at Austin. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
ϕX174 is a single-stranded circular bacteriophage from the family microvirdae. It has a small, 5386 nucleotide genome comprised of 11 genes, which are organized into two distinct functional clusters within the genome [1, 2]. The structural genes (J, F, G, and H) form one contiguous genomic region, and encode proteins that enclose and protect the phage genome. The remaining seven genes encode proteins that facilitate viral propagation, for example by disrupting host cell replication or catalyzing cell lysis. ϕX174 gene regulation is relatively straightforward and depends largely on the joint activities of its promoters and terminators [3, 4]. Notably, the ϕX174 genome is also extremely compact, in that half of its genes overlap with at least one other coding region. Due to its tractable size and unusual genomic architecture, ϕX174 has served as an important model organism across the fields of structural and synthetic biology, and has been the subject of numerous efforts to re-organize, re-code, or modularize its genome [5–7].
Characterization of ϕX174 transcription regulation has been ongoing since even before it was first sequenced nearly 50 years ago [2], with subsequent refinements being made as more sophisticated molecular technologies become available. First, the locations of ϕX174 promoters and terminators were mapped for in vitro [8, 9] and in vivo [10] transcription regulation using a combination of selective oligonucleotide initiation and DNA hybridization. Follow-up studies verified the initial mapping using sequence analysis [11]. For the promoters, in vitro kinetic activities were also established [4]. However, these assays can be difficult to set up in such a way that they accurately capture in vivo activity. For example, measurements for ϕX174 promoter activities can vary by several orders of magnitude, depending on how much original sequence context is retained around the cloned promoters [4]. Furthermore, the ϕX174 regulatory map itself may be incomplete. For example, Logel and Jaschke [12] proposed that two additional regulatory elements (one promoter and one terminator) may play a role in ϕX174 gene expression regulation, based on novel high-resolution RNA-seq measurements of phage transcription. Although ϕX174 transcription has been well studied, additional work may be needed before transcription regulation for ϕX174 can be fully resolved.
Here, we investigate ϕX174 gene expression regulation using mechanistic, computational simulations. We infer kinetic values for ϕX174 promoters and terminators by fitting simulations to transcription data [12, 13], and we assess the effect of two putative regulatory elements [12] on global ϕX174 transcription patterns. We also use simulation to explore the consequences of ϕX174 genome decompression [5]. Overall, we find that rate constants for regulatory elements can be reliably identified and are broadly consistent with prior kinetic measurements. These observations suggest that the current understanding of ϕX174 transcription regulation is reasonably complete. However, we also observe that parameter estimation is somewhat ambiguous in the case of a promoter immediately followed by a terminator, an arrangement that occurs once in the ϕX174 genome. In this case, the terminator partially compensates for the promoter, and the model cannot distinguish between a strong promoter followed by a strong terminator or a weak promoter followed by a weak terminator. Our approach is general and could be applied to study gene regulation in other organisms where transcription control is less well understood.
Results
A model for ϕX174 gene expression
In order to simulate phiX174 gene expression regulation, we built a model for bacteriophage ϕX174 using a customizable stochastic simulation framework that was previously used to model bacteriophage T7 infection dynamics [14, 15]. The ϕX174 infection cycle begins with injection of the single-stranded phage DNA into the host cell [16], followed by synthesis of the complementary DNA strand, which serves as the mRNA template. Our simulations begin with transcription and assume that the fully double-stranded phage DNA is already synthesized. We modeled the phage genome as a linear molecule with a start site 63 nucleotides upstream of gene A (Fig 1A). To simulate circular genome transcription, polymerases that reach the end of the genome without terminating automatically begin another round of transcription starting from the beginning of the genome (Fig 1B). We modeled background host-cell expression by simulating transcription of a single ORF of average length for E. coli. During the simulation, E. coli gene expression competes with the phage for cellular resources (polymerases, ribosomes). Finally, we defined initial resource availability and transcription/translation kinetics using parameters derived from the prior bacteriophage T7 gene expression model [15] (see also Materials and methods).
A: The ϕX174 genome is linearized starting from gene A. Reading frames for genes A*, B, K and E overlap with at least one other gene. Canonical promoters (solid black symbols) and the putative promoter and terminator (gray with a dashed boarders) are shown directly above the gene diagram. B: ϕX174 gene expression is simulated for a single E. coli cell. Transcription and translation of the phage genome is modeled mechanistically, with explicit tracking of ribosomes and polymerases as they traverse DNA and RNA polymers. E. coli gene expression is modeled as a set of reactions that consume/produce resources utilized by the mechanistic simulations.
Fitting the model to qPCR data
Since exact values for promoter and terminator strengths were not known a priori, we estimated these values by fitting the simulations to ϕX174 transcription data. For the fitting procedure, promoters and terminators were initialized arbitrarily and then fit using a simple iterative optimization procedure that adjusts one randomly selected element at a time. During model optimization, new parameter values were retained only if they decreased the error between the steady-state simulation output and the target transcription pattern. Specifically, we calculated the root mean squared error (RMSE) between final simulated transcript abundances and measurements of ϕX174 transcription taken several minutes into the infection cycle.
As a proof of concept, we first fit our model to low-resolution qPCR data of ϕX174 [13]. The data set consists of transcript abundances mapped to six locations on the ϕX174 genome. We seeded 20 independent simulations and fit each to these measurements (Fig 2A). After 4000 generations, all 20 simulations converged with RMSE’s <5.5 (Fig 2B, S1 Fig) and output that qualitatively matched qPCR measurements. We manually inspected the simulated transcription patterns and RMSE values and defined 2.75 as the cutoff RMSE score for further analysis; 14 out of the 20 fitted models met this criterion.
A: Time course for one representative model, out of the 20 that were fit to data. Target transcript abundances were measured using qPCR [13] and are shown normalized to gene A. The time course shows final simulated transcript abundances for the initial simulation (generation 0), a simulation from the middle of model optimization (generation 415), and the final optimized simulation that minimizes the RMSE. Output from the final simulation is qualitatively similar to the target data. B: The RMSE declines with increasing generations as fitted promoter and terminator strengths produce more accurate transcription patterns.
Next, we compared parameter estimates from the fitted models to empirical measurements of ϕX174 promoter and terminator activities. Fig 3A and 3B show estimates for the three canonical ϕX174 promoters and four terminators, respectively, from fitted models with RMSE’s below the cut-off value. ΦX174 terminator efficiencies have been measured to be about 40–60% for terminators TJ, TF, and TG, and 90% for terminator TH [17]. Using mean parameter values from our fitted models, we estimated an efficiency of 0.93 for terminator TH, and efficiencies of 0.34, 0.51, and 0.78 for terminators TJ, TF, and TG, respectively. Thus, we concluded that our terminator estimates were reasonably consistent with experimental values.
Using empirical measurements as a baseline for promoter comparison is more challenging, since in vivo kinetic characterizations have found different relationships for pA, pB, and pD [4, 18]. For example, Sorensen et al. [4] measured an 18-fold difference between pA and pB (with pA > pB) when 90 bp of sequence context was included around cloned ϕX174 promoters. (There was no detectable transcription signal from cloned pD sequences.) In contrast, activities for all three major promoters were approximately the same (S1 Table) for constructs that contained a smaller amount of the original flanking sequence context. In our fitted model, the promoter strengths necessary to achieve target transcription patterns were intermediate between these results; the largest difference between promoters (pB and pD) was about 10 fold, with pD ≫ pA > pB.
Promoter strength distributions (A) and terminator strength distributions (B) from models with a final RMSE < = 3. The qPCR data used to fit all models was originally reported by Zhao et al. [13]. (C) Relationship between estimates for promoter pA and terminator TH; each point represents a pA–TH pair from one fit model.
For most of the fitted parameters, variation between each simulation is low, suggesting that a single set of parameters minimizes model error. The exception is promoter A, for which there is a relatively wide range of values that provide equally good fits to the data (Fig 3A). In the ϕX174 genome, promoter A is situated less than 100 bp upstream of the strongest ϕX174 terminator, TH. As a consequence, the majority of transcriptional current originating from pA is cut-off by termination at the TH site. Given the proximity of the two elements, we reasoned that their activities were likely coupled in the simulations. In other words, TH compensates for pA (or vice versa), such that a range of TH–pA values is capable of producing the same gene expression pattern. There is a strong positive relationship between estimates for pA and TH (Fig 3C). Specifically, pA appears to be extremely sensitive to small deviations in the efficiency of TH.
If ϕX174 gene expression is optimal (in the sense that it maximizes phage fitness) then a compensatory relationship between pA and TH could be beneficial, since it would provide some additional flexibility for maintaining optimal expression. For example, a mutation weakening the affinity of pA for E. coli polymerases could be rescued by a subsequent decrease in TH stability, in addition to reversion of the original mutation. Alternatively, it could be that ϕX174 biology requires the specific ordering of pA followed by TH, and that coupling between the two elements is simply a consequence of this requirement. When we simulated ϕX174 gene expression with the order of pA and TH reversed (effectively de-coupling their activities, since TH stops most transcriptional current) the promoter A strength needed to maintain the same level of expression (keeping all other parameters the same) was very low, about 16 times weaker than the fitted wild-type value (S2 Fig).
Fitting the model to an updated ϕX174 transcriptome
Since we were able to fit the model to coarse-grained qPCR data, we next fit the simulations to higher-resolution RNA-seq data. Logel and Jaschke [12] recently re-measured ϕX174 transcription using RNA-seq, and found several locations in the transcriptome where sharp changes in read abundance did not correspond to any known ϕX174 regulatory elements. To explain these discrepancies, they proposed one putative promoter and one putative terminator (called btss49 and RUT-3, respectively). The putative elements were computationally verified using promoter/terminator prediction tools but were not tested experimentally.
We were interested to see if simulations could provide evidence for or against the putative ϕX174 regulatory elements. To do so, we re-fit simulations of the canonical model (without putative regulatory elements) and the expanded regulatory model (with btss49 and RUT-3) to the updated RNA-seq measurements. After 4000 generations, all replicate simulations converged with similar mean RMSE values (S3 Fig). Fig 4 shows the parameter distributions from all fitted simulations. The mean estimated binding strength of btss49 is 1.7 × 106M−1s−1, which is lower than the other three canonical promoter strengths (Fig 4A) but similar in magnitude to promoter B. The mean estimated termination efficiency for RUT-3 is 30%, similar to estimates for TF and TG (Fig 4B).
Simulations were fit to RNA-seq data from [12]; promoter estimates (A) and terminator estimates (B) are shown for each model type (canonical model in brown, expanded/putative model in green). Here, each point represents a parameter from the best-fit simulation (the simulation with the lowest RMSE) for each of the 20 replicate simulations that was fit for each model type.
Interestingly, including two additional parameters (the putative regulatory elements) did not improve model fit (S3 Fig). In fact, the mean RMSE for the canonical model was lower than the mean RMSE for the expanded model. Also, parameter estimates for the other canonical regulatory elements were essentially unchanged in the fitted expanded model. These results suggest that the impact of btss49 and RUT-3 on ϕX174 transcription as a whole is minor. We also estimated a very low, almost negligible strength for btss49, suggesting that this may not be a true promoter, at least in the sense that it is probably not under strong selection to initiate transcription. For the other element, a putative Rho-dependent terminator, we estimated a similar efficiency as the other weak canonical ϕX174 terminators.
For the canonical model (without putative regulatory elements), we note that the fit to RNA-seq data (Fig 4) is similar but not exactly identical to the fit to qPCR data (Fig 3). Most importantly, promoter pD is much stronger in the model fit to qPCR data than in the model fit to RNA-seq data. Also, terminators TJ and TG are stronger in the fit to qPCR data. In fact, terminator TJ is estimated to have near zero termination strength when fit to RNA-seq data. When comparing the measured gene expression levels (Fig 2 for qPCR vs. S4 Fig for RNA-seq), we see that expression differences are generally smaller for RNA-seq data than for qPCR data (for example, genes D, E, and J are expressed nearly 3× higher than genes B, K, C according to qPCR data but only about 2× higher according to RNA-seq data). These smaller differences in measured gene expression levels naturally translate to weaker promoters and weaker terminators in the fitted model.
Simulating decompressed ϕX174
In addition to promoter and terminator arrangements, another important aspect of ϕX174 genome organization is the presence of multiple overlapping genes. In a prior study, the ϕX174 genome was refactored to separate all primary coding sequences, in order to better understand the regulatory consequences of gene overlaps [5]. In this process (referred to as genome decompression) all partially or completely overlapping coding regions were copied and placed side-by-side. To prevent expression of the original gene copies (which would otherwise still be viable), initiation sites were disrupted by a series of point mutations made within each start codon. The decompressed ϕX174 strain was viable but showed reduced fitness compared to wild-type ϕX174, although the exact causes of the fitness reduction were unclear. Notably, protein A* abundances were significantly up-regulated, while protein B and C abundances were down-regulated [6] (Fig 5, blue bars). It is unclear whether gene order differences (which have been shown to affect gene expression and fitness), or unintended changes to regulatory sequences, or both, were responsible for altered ϕX174 gene expression.
Measured protein fold-changes are in blue, and simulated fold-changes are in orange. The experimental data was collected by Wright et al. using targeted proteomics [6]. Only measurements for genes that were significantly over- or under-expressed relative to wild-type are shown.
We investigated some possible causes of disrupted ϕX174 protein production using a model for decompressed ϕX174 gene expression. We began by assuming that all promoter, terminator, and ribosome binding sites were unchanged during the decompression process (excluding deliberately knocked-out ribosome binding sites). We defined promoter and terminator strengths using parameters from the wild-type model that was fit to qPCR data. By keeping gene regulatory elements consistent across the wild-type and decompressed architectures, we intended to isolate the effects of gene rearrangements. Fig 5 shows fold-changes in simulated decompressed protein abundances for the three genes with significant measured abundance differences. In the simulations, the impact of gene rearrangements on protein synthesis is very small (Fig 5, grey bars), suggesting that gene re-ordering/decompression alone is unlikely to account for the altered protein expression.
We reasoned that the decompression process may have also disrupted promoter or terminator sequences. We were interested if changing these parameters in our model could recapitulate the decompressed ϕX174 gene expression pattern. We fit parameters for promoters and terminators for the decompressed ϕX174 simulation to measured fold-changes in protein abundances (S5 Fig). When compared to mean parameter estimates from the fitted wild-type model, pD, TG, and TH are much weaker in the decompressed model, while estimates for the other elements are mostly unchanged (S5A and S5B Fig). It is unlikely that TG and TH activities were directly affected by the decompression process, since they are outside of the region that was refactored. Although it is possible that decompression had an indirect effect on G and H expression, due to, for example, complex interplay between transcription and translation de-regulation, our fit of transcription elements only could not have captured such an effect. In addition, the estimate for TG did not converge fully, unlike in the wild-type model. Also, it is impossible for the model to achieve a simultaneous increase in protein A/A* expression and decrease in B/C expression by only adjusting promoter and terminator strengths (S5C Fig). Overall, we were unable to achieve a good fit to the decompressed phenotype by only allowing promoters and terminators to vary. Taken together, these results point to a cause other than gene rearrangements or accidental disruption of promoter/terminator sequences for the altered protein expression phenotype observed for decompressed ϕX174.
Discussion
We have developed a mechanistic computational simulation for bacteriophage ϕX174 gene expression, and we have used this model to estimate the relative activities of ϕX174 promoters and terminators by systematically fitting these parameters to transcription data. Our estimates are broadly consistent with empirical measurements for promoter and terminator strengths, but with some notable discrepancies, in particular for the promoters. We have used the same parameter estimation strategy to assess two putative regulatory elements. The alternative/putative regulatory model did not meaningfully improve the ability of the simulation to explain variations in ϕX174 transcript abundances, and estimates for canonical elements were essentially unchanged. We conclude from these findings that the evidence in favor of these putative elements is weak. Finally, we have used our model to simulate expression of a decompressed version of the ϕX174 genome in which overlapping genes have been placed side-by-side. We observe that our model does not recapitulate the observed expression changes from decompression, even after re-fitting promoter and terminator strengths. Our results suggest that the process of engineering the decompressed genome affected aspects of gene regulation other than transcription. Overall, our work helps to clarify aspects of transcription regulation in the important model organism ϕX174.
We fit the same computational model of ϕX174 to two different sources of bacteriophage expression data, a set of fairly coarse-grained qPCR measurements [13] and a much-more fine-grained set of RNA-seq measurements [12]. Overall the estimated promoter and terminator strengths were comparable, though we note that several regulatory elements were estimated to be weaker when the model was fit to RNA-seq data than when it was fit to qPCR data. These weaker estimates for regulatory elements are consistent with the input gene expression data sets, which show consistently smaller differences in gene expression levels for the RNA-seq data than for the qPCR data. Unfortunately we have no way of assessing which estimates are more realistic. RNA-seq data can suffer from various biases [19, 20], though in direct comparisons to qPCR measurements the observed differences tend to be small [21]. Importantly, here the two data sets were also obtained by different groups using different protocols and materials, so that the observed differences in gene expression levels may also, at least in part, have been caused by these factors.
We also considered two different regulatory models of ϕX174, the canonical model widely used in the literature and an expanded model containing an additional promoter and an additional terminator [12]. We could fit the expanded model only to the RNA-seq data, as the qPCR data lacked sufficient gene-level resolution to be informative for the expanded model. (Not every gene was assayed for the qPCR data, and instead the assumption was made that gene expression levels could only change at the precise genomic locations where canonical promoters and terminators are present.) In a direct comparison of the two models fit to the same RNA-seq data, we found only weak evidence in favor of the expanded model. The estimated strengths of the canonical regulatory elements were largely unchanged in the expanded model, and the putative new promoter and terminator were weak. Moreover, the RMSE was not lower for the expanded model. We emphasize, however, that our approach cannot be used to rule out the existence of these elements. They may well exist and be weak. What our approach does show, however, is that these elements are not critically important to produce a model that can describe the observed gene expression levels.
The ϕX174 genome contains several overlapping genes. Such gene arrangements are commonly observed in viruses and bacteriophages [22], though the evolutionary pressures that lead to overlapping genes are not well understood. When bacteriophage genomes are engineered such that all gene overlaps are removed, a process generally referred to as “decompression,” the resulting phages tend to display reduced fitness or other growth defects [6, 23, 24]. A priori, such effects may be caused by packaging defects if the enlarged genome no longer fits into the phage particle, by disrupted gene regulation—as the expression level of a given gene often results from the aggregate activity of multiple, staggered promoters—, or by the accidental disruption of unknown ORFs or regulatory elements during the engineering efforts. For ϕX174, it was observed that decompression led to significantly altered gene expression patterns [6]. Thus, we asked here whether we could recapitulate these findings in our simulations. However, we found that when we held activities for promoters, terminators, and ribosome binding sites fixed (changing only the positions of ϕX174 genes), simulated gene expression levels were largely unchanged between the regular and the decompressed ϕX174 genome. We were also unable to recapitulate gene expression changes by adjusting promoter and terminator strengths in the model. Specifically, it was impossible to simulate an increase in protein A* expression while also decreasing protein B and C expression, as was observed experimentally for the decompressed strain. Our results indicate that the causes of altered expression for the decompressed ϕX174 are likely more complex, for example, a result of a combination of changes to transcription and translation initiation.
Even though ϕX174 was one of the first organisms whose genome was fully sequenced and it remains today a widely studied bacteriophage, it is not commonly used for mathematical or computational modeling studies, unlike bacteriophages Qβ [25–28], λ [29, 30], or T7 [15, 31–34] (see also Ref. [35] and references therein). Therefore, we cannot offer much in terms of comparison of our results to prior ϕX174 modeling work. We emphasize, however, that our approach here was not to develop a one-off, purpose-built models that can describe only one particular organism or biological system, as commonly seen in the field of biological simulation. Instead, we leveraged the Pinetree simulator [14], which—even though originally developed for T7—is a general-purpose prokaryotic gene expression simulator that can be adapted relatively easily to any system of interest. All that is required to adapt Pinetree to a new organism is a detailed list of where in the genome individual genes start and end, and where promoters, terminators, and other regulatory elements are located. Consequently, any bacteriophage with a well annotated genome can be simulated in Pinetree with little effort. We note that the Pinetree approach was inspired by TABASCO [33], which similarly aimed to be a general-purpose simulator but which to our knowledge was never used for any application other than simulating bacteriophage T7.
Materials and methods
To construct models for ϕX174, we used a customizable gene expression simulation framework called Pinetree [14], which uses an implementation of the Gillespie Stochastic Simulation Algorithm [36]. The Pinetree simulations take as input a target organism’s genomic information, including the positions of genes, promotors, terminators, and ribosome binding sites. The user defines the number of cellular resources available at the start of a simulation, as well as any additional reactions involved in target gene expression or cell regulation. Pinetree was previously used to model bacteriophage T7’s infection cycle [15]. Since T7 and ϕX174 infect the same host (E.coli), we began by downloading simulation materials (python scripts and control files) from the T7 project (https://github.com/benjaminjack/phage_simulation) and used these as a starting point for the ϕX174 model.
Defining the ϕX174 genome
We downloaded a reference genome sequence for ϕX174 (NC_001422.1) from the National Center for Biotechnology Information (NCBI). We used a Python script to extract gene start and stop locations, and combined these with literature values for promoter and terminator locations [3, 4, 12] (Table 1). Since Pinetree requires linear genome sequences whereas the ϕX174 genome is circular, we linearized the genome in the following manner: First, we arbitrarily defined a location 63 nucleotides upstream of the gene A start site as location zero and cut the circular genome at this location. This cut location was chosen such that it did not overlap with any known genes or regulatory elements. The cut genome was then used as a linear genome in the Pinetree simulation. However, we also modified the Pinetree code such that any polymerases reaching the end of the linearized genome would not fall off but instead proceed to the beginning of the genome and continue transcription there. Furthermore, because Pinetree does not support certain genomic architectures, such as transcription elements that overlap with ribosome binding sites, we made minor adjustments to several genomic elements to accommodate these constraints. In doing so, we endeavored to keep the sizes and relative positions of all elements as close to original as possible. The revised element locations after linearization and removal of conflicts are provided in Table 1.
Here, “Reference” refers to the location of the elements in the reference genome sequence that was downloaded from NCBI. Coordinates are provided for simulations of the wild-type and decompressed strains.
Defining the host-cell environment
We simulated ϕX174 gene-expression dynamics for a single viral particle infecting a simplified version of an E. coli cell. The simulated host-cell environment contains resources needed by the phage (ribosomes and RNA polymerases), the majority of which are bound to host nucleic acids at the start of the simulation (Table 2). As the simulation progresses, polymerases and ribosomes dissociate from host DNA and RNA and become available to ϕX174. We modeled host-cell gene expression by simulating transcription and translation of a single, arbitrary E. coli gene of an average length. The rate constants for E. coli gene expression processes were defined using values from the prior T7 model [15] (Table 3). We held the total number of polymerases and ribosomes fixed, such that ϕX174 and E. coli needed to compete over a finite pool of resources for the duration of the simulation. With the parameters chosen, the steady-state number of polymerases and ribosomes occupied by ϕX174 was a small fraction of the total, less than 10%. We note that this fraction could be adjusted by tuning the rate constants for E. coli gene expression, which would increase (or decrease) phage gene expression rates uniformly across all genes, however, for simplicity we decided to keep the same host-cell parameters as the T7 model.
Model optimization
We used a simple iterative optimization procedure to fit models of ϕX174 gene expression to transcription data. We initialized simulations with random values for the three canonical ϕX174 promoters and four terminators (or, alternatively for the non-canonical model, the seven elements plus one putative promoter and one putative terminator). The initial terminator values, which correspond to the efficiency of termination, were sampled from the interval (0, 1]. Promoter values were sampled from a normal distribution with mean 12 and standard deviation 3, and then scaled by a factor of 106 to convert to units of M−1s−1 (for mesoscopic binding rates).
During each iteration of model training, a single promoter or terminator was randomly selected and adjusted. Promoter strengths were adjusted by multiplying by 2n, where n is a random value drawn from a normal distribution with a mean of 0 and standard deviation of 0.1. Terminator strengths were adjusted by adding a random value drawn from a normal distribution with mean of 0 and standard deviation of 0.05. After adjusting the chosen parameter, we ran five simulations and used the averages from these runs to calculate the error between simulation output and training data.
Experimentally measured transcript abundances are usually reported as relative quantities, however, we found that using absolute quantities worked better for model fitting. To prepare experimental data for model optimization, we converted relative ϕX174 transcript abundances from literature to an absolute, discrete quantity for each gene. We set the target total simulated transcript quantity to 1500—that is, the transcript counts for all 11 ϕX174 genes should sum to around 1500 at the endpoint of the simulation. We found that when using these parameters, simulations take about 30 seconds each to run. Then, we scaled the amount for each individual gene according to reported relative abundances, generating a final data set of 11 values. This was done separately for qPCR and RNA-seq. Then, the mean squared error can be computed,
(1)
where N is 11, the number of ϕX174 genes.
To fit the decompressed model, we used exactly the same procedure as described above, except the MSE was calculated for protein abundances instead of transcript abundances. For the optimization data, we started with simulated protein abundances from the wild-type model (specifically, the model that was fit to qPCR data) and scaled these according to the protein fold-changes reported in [6] to generate the final optimization data set.
Supporting information
S1 Table. Empirical measurements of ϕX174 promoter and terminator strengths.
For promoters pA and pB, the first/top value is the activity measured from PCR-generated fragments, and the second/bottom value is for promoters cloned into reporter plasmids. For promoter pD, activity could be measured for the PCR-generated fragments only. Measured promoter values are reported in [4], and measured terminator values are reported in [17].
https://doi.org/10.1371/journal.pone.0313039.s001
(XLSX)
S1 Fig. RMSE of models trained on qPCR data.
Each point corresponds to the simulation that had the best (lowest) RMSE over the 4000 training generations. The dashed line marks the manually-defined cutoff score used in downstream analysis.
https://doi.org/10.1371/journal.pone.0313039.s002
(PNG)
S2 Fig. Simulations of ϕX174 gene expression with the order of promoter A (pA) and terminator H (TH) reversed.
Left panel: simulation of wild-type ϕX174 (pA before TH) with parameter values obtained from fitting simulations to qPCR data. Middle panel: simulation with pA/TH order reversed (TH before pA) and the same pA binding strength as the fitted simulation (pA = 8.44). Transcript abundances for genes A/A* are about 10 times greater than wild-type, and abundances for genes B/K/C are about 2 times greater than wild-type. Right panel: simulation with pA/TH order reversed (TH before pA) and a pA binding strength of 0.5. The value of 0.5 was obtained by manually adjusting promoter A to revert transcript abundances for genes A/A*/B/K/C back to their wild-type ratios. For all panels, simulated transcript abundances are steady-state quantities averaged over five replicate simulations.
https://doi.org/10.1371/journal.pone.0313039.s003
(PNG)
S3 Fig. Distribution of RMSEs for models trained on RNA-seq data.
Each point corresponds to one simulation that had the best (lowest) RMSE over the 4000 training generations. − btss49, RUT-3: Simulations with the canonical ϕX174 regulatory model. + btss49, RUT-3: Simulations with an alternative regulatory model proposed by Logel and Jaschke [12], with one additional promoter and one additional terminator. The mean RMSE for simulations with the alternative model is significantly higher than the mean for simulations with the canonical model only (p < 0.05, Student’s t-test).
https://doi.org/10.1371/journal.pone.0313039.s004
(PNG)
S4 Fig. Simulated transcript abundances from ϕX174 models fit to RNA-seq data.
Left panel: Target ϕX174 transcription data, measured using RNA-seq. Transcript abundances shown are per million reads, normalized to gene A. The transcription data was collected by Logel and Jaschke [12]. Middle panel: Output from the best fit simulation (simulation with the lowest RMSE out of 20 independently fit models) with putative promoter btss49 and putative terminator RUT-3. Right panel: Output from the best fit simulation of the canonical regulatory model.
https://doi.org/10.1371/journal.pone.0313039.s005
(PNG)
S5 Fig. Fitting models of decompressed ϕX174 protein expression.
The target protein abundance was prepared by adjusting simulated wild-type protein abundances to match fold-change differences as reported by Wright et al. [6] (see also Materials and methods). Promoter distributions (A) and terminator distributions (B) from 20 independently fit models. In both A and B, the light blue diamonds are the mean parameter estimates from fitting the wild-type model to qPCR data. C: Final simulated protein abundances from the decompressed model. Simulated wild-type protein abundances (left panel) and the target, adjusted protein abundances (middle panel) are shown for reference.
https://doi.org/10.1371/journal.pone.0313039.s006
(PNG)
References
- 1. Benbow RM, Hutchison CA III, Fabricant JD, Sinsheimer RL. Genetic map of bacteriophage ϕX174. Journal of Virology. 1971;7:549–558. pmid:16789129
- 2. Sanger F, Air GM, Barrell BG, Brown NL, Coulson AR, Fiddes JC, et al. Nucleotide sequence of bacteriophage ϕX174 DNA. Nature. 1977;265:687–695. pmid:870828
- 3. Hayashi MN, Yaghmai R, McConnell M, Hayashi M. mRNA stabilizing signals encoded in the genome of the bacteriophage ϕx174. Molecular and General Genetics MGG. 1989;216:364–371. pmid:2526289
- 4. Sorensen SE, Barrett JM, Wong AKC, Spencer JH. Identification of the in vivo promoters of bacteriophages S13 and ϕX174 and measurement of their relative activities. Biochemistry and Cell Biology. 1998;76:625–636. pmid:10099783
- 5. Jaschke PR, Lieberman EK, Rodriguez J, Sierra A, Endy D. A fully decompressed synthetic bacteriophage ϕX174 genome assembled and archived in yeast. Virology. 2012;434:278–284. pmid:23079106
- 6. Wright BW, Ruan J, Molloy MP, Jaschke PR. Genome modularization reveals overlapped gene topology is necessary for efficient viral reproduction. ACS Synthetic Biology. 2020;9:3079–3090. pmid:33044064
- 7. Van Leuven JT, Ederer MM, Burleigh K, Scott L, Hughes RA, Codrea V, et al. ΦX174 attenuation by whole-genome codon deoptimization. Genome biology and evolution. 2021;13:evaa214. pmid:33045052
- 8. Axelrod N. Transcription of bacteriophage ϕX174 in vitro: Selective initiation with oligonucleotides. Journal of Molecular Biology. 1976;108:753–770. pmid:1018323
- 9. Axelrod N. Transcription of bacteriophage ϕX174 in vitro: analysis with restriction enzymes. Journal of Molecular Biology. 1976;108:771–779. pmid:1018324
- 10. Hayashi M, Fujimura FK, Hayashi M. Mapping of in vivo messenger RNAs for bacteriophage phiX-174. Proceedings of the National Academy of Sciences. 1976;73:3519–3523. pmid:1068463
- 11. Brendel V. Mapping of transcription terminators of bacteriophages phi X174 and G4 by sequence analysis. Journal of virology. 1985;53:340–342. pmid:3155555
- 12. Logel DY, Jaschke PR. A high-resolution map of bacteriophage ΦX174 transcription. Virology. 2020;547:47–56. pmid:32560904
- 13. Zhao L, Stancik AD, Brown CJ. Differential transcription of bacteriophage ϕX174 genes at 37°C and 42°C. PLoS One. 2012;7:e35909. pmid:22540010
- 14. Jack BR, Wilke CO. Pinetree: A step-wise gene expression simulator with codon-specific translation rates. Bioinformatics. 2019;35:4176–4178. pmid:30923831
- 15. Jack BR, Boutz DR, Paff ML, Smith BL, Wilke CO. Transcript degradation and codon usage regulate gene expression in a lytic phage. Virus Evolution. 2019;5:vez055. pmid:31908847
- 16.
Fane BA, Brentlinger KL, Burch AD, Hafenstein S, Moore E, Novak CR, et al. ϕX174 et. al., the Microviridae. In: Calendar R, Abedon ST, editors. The Bacteriophages (2 ed.). Oxford: Oxford University Press; 2006. p. 129–145.
- 17. Hayashi MN, Hayashi M, Imai M. Bacteriophage phi X174-specific mRNA synthesis in cells deficient in termination factor rho activity. Journal of Virology. 1981;38:198–207. pmid:6454004
- 18. Ringuette MJ, Spencer JH. Mapping the initiation sites of in vitro transcripts of bacteriophage S13. Biochimica et Biophysica Acta (BBA)-Gene Structure and Expression. 1994;1218(3):331–338. pmid:8049259
- 19. Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, et al. A survey of best practices for RNA-seq data analysis. Genome biology. 2016;17:1–19.
- 20. Lahens NF, Kavakli IH, Zhang R, Hayer K, Black MB, Dueck H, et al. IVT-seq reveals extreme bias in RNA sequencing. Genome biology. 2014;15:1–15. pmid:24981968
- 21. Coenye T. Do results obtained with RNA-sequencing require independent verification? Biofilm. 2021;3:100043. pmid:33665610
- 22. Schlub TE, Holmes EC. Properties and abundance of overlapping genes in viruses. Virus Evol. 2020;6:veaa009. pmid:32071766
- 23. Chan LY, Kosuri S, Endy D. Refactoring bacteriophage T7. Mol Syst Biol. 2005;1:2005.0018. pmid:16729053
- 24. Springman R, Molineux IJ, Duong C, Bull RJ, Bull JJ. Evolutionary stability of a refactored phage genome. ACS Synth Biol. 2012;1:425–430. pmid:23519680
- 25. Biebricher CK, Eigen M, Luce R. Kinetic analysis of template-instructed and de novo RNA synthesis by Q beta replicase. J Mol Biol. 1981;148:391–410. pmid:6273581
- 26. Biebricher CK, Eigen M, Gardiner WC Jr. Kinetics of RNA replication. Biochemistry. 1983;22:2544–2559. pmid:6860647
- 27. Eigen M, Biebricher C, Gebinoga M, Gardiner W. The hypercycle. Coupling of RNA and protein biosynthesis in the infection cycle of an RNA bacteriophage. Biochemistry. 1991;30:11005–11018. pmid:1932025
- 28. Kim H, Yin J. Energy-efficient growth of phage Qβ in Escherichia coli. Biotechnology and bioengineering. 2004;88(2):148–156. pmid:15449299
- 29. Meyers S, Friedland P. Knowledge-based simulation of genetic regulation in bacteriophage lambda. Nucleic Acids Res. 1984;12:1–9. pmid:6229714
- 30. Oppenheim AB, Kobiler O, Stavans J, Court DL, Adhya S. Switches in Bacteriophage Lambda Development. Annual Rev Genet. 2005;39:409–429. pmid:16285866
- 31. Endy D, Kong D, Yin J. Intracellular kinetics of a growing virus: A genetically structured simulation for bacteriophage T7. Biotechnol Bioeng. 1997;55:375–389. pmid:18636496
- 32. Endy D, You L, Yin J, Molineux IJ. Computation, prediction, and experimental tests of fitness for bacteriophage T7 mutants with permuted genomes. Proc Natl Acad Sci USA. 2000;97:5375–5380. pmid:10792041
- 33. Kosuri S, Kelly JR, Endy D. TABASCO: A single molecule, base-pair resolved gene expression simulator. BMC Bioinform. 2007;8:480.
- 34. Birch EW, Ruggero NA, Covert MW. Determining Host Metabolic Limitations on Viral Replication via Integrated Modeling and Experimental Perturbation. PLOS Comp Biol. 2012;8:e1002746. pmid:23093930
- 35. Yin J, Redovich J. Kinetic modeling of virus growth in cells. Microbiology and Molecular Biology Reviews. 2018;82:e00066–17. pmid:29592895
- 36. Gillespie DT. Exact stochastic simulation of coupled chemical reactions. The Journal of Physical Chemistry. 1977;81:2340–2361.