Genotype to Phenotype Mapping and the Fitness Landscape of the E. coli lac Promoter

doi:10.1371/journal.pone.0061570

Table 1.

Mutation encoding scheme (dummy variables).

More »

Expand

Figure 1.

Stem plot of the linear coefficients.

Three circles on each stem represent the changes in phenotype for each of the three possible mutations per site. CRP and RNAP are known to each bind at two sites (magenta and cyan areas). Red circles correspond to the mutations needed to get the consensus sequences.

More »

Expand

Figure 2.

Histogram of phenotype values of uniformly random sequences for the inferred epistatic model.

Random sequences have very low inferred phenotype values because of the specificity of binding sites. The peak of the distribution indicates what phenotype values evolve under neutral conditions. The the wild-type value, (green line), is much higher than the neutral value indicating selective pressure.

More »

Expand

Figure 3.

a) Matrix of the sum of the absolute values of the pair interaction coefficients for each pair of sites (3 mutations per site equals 9 interactions) for the chosen statistical model. The clusters near the diagonal are interactions within the RNAP and CRP binding sites, and the off-diagonal clusters are interactions between the binding sites. b) Red: Site-specific sum of absolute values of additive coefficients, divided by 3 (the number of possible mutations). Black: site-specific sum of absolute values of epistatic coefficients, divided by 9 (the number of possible mutation pairs). Epistatic and additive effects are strongly correlated, with the correlation coefficient 0.90.

More »

Expand

Table 2.

The interaction coefficients for are clustered around the subunits of the system: CRP, RNAP, and their constituent binding sites (defined by white rectangles in figure 3a).

More »

Expand

Figure 4.

(blue) coefficients for the non-epistatic model with no-glucose (normal levels of cAMP) (red) with glucose (no cAMP).

CRP is activated by cAMP and does not bind without it.

More »

Expand

Figure 5.

2D histogram of expression for the two environments, no cAMP (glucose), and cAMP (no glucose) for random sequences (orange), and sequences from the experiment (blue), which are closer to the wild type (plus sign).

The wild-type is nearly on the optimal front in that very few sequences have both higher expression with cAMP and lower expression without cAMP (above and to the left of the plus sign). The phenotype values range from 1 to 5 in these experiments. The dis-similarity of measured expression and expressions predicted for random sequences along the vertical, but not the horizontal axis, likely signals presence of poorly understood biophysical mechanisms differentially employed in the two considered environments.

More »

Expand

Figure 6.

Generalizing the fitted function by replacing the output values with a non-linear function improves the least squares fit.

Constrained non-linear optimization found the optimal for the linear model with . The non-linearity is due to the first few bins being dominated by background fluorescence and not gene expression.

More »

Expand

Figure 7.

The LASSO solution of the quadratic model was computed for 100 values of .

Blue is the value, and red is the 10-fold cross-validated . The green curve is the variance of for randomly generated sequences. The variance is too large even for values of that are larger than the optimal value predicted by the maximum of the curve. We choose the model with (dashed line) for further analysis. This model has non-zero coefficients, most of which are epistatic.

More »

Expand

Figure 8.

Sensitivity of the epistatic coefficients to the choice of the regularization parameter .

As in Fig. 3, we show the matrices of the sums of the absolute values of the pair interaction coefficients for each pair of sites . a) Coefficients for the model with maximum (). b) Coefficients for the full model: . Notice the same general structure of the coefficients for varying , including in Fig. 3. This indicates stability under changes of the parameter.

More »

Expand