Mapping DNA sequence to transcription factor binding energy in vivo
Fig 3
Energy matrix predictions compared to binding energies derived from fold-change data.
A: Fold-change data were obtained by flow cytometry for each of the mutant operators by measuring their respective fluorescence levels at multiple LacI copy numbers and normalizing by the fluorescence when R = 0. The solid lines in each plot represent a fold-change curve that has been fitted to the data set to obtain a binding energy measurement. Each plot shows data and fits for two operator mutants, one weak and one strong, for 1 bp (left), 2 bp (middle), and 3 bp (right) mutants. The fitted energy values are shown for each mutant, where the superscripts and subscripts represent the 95% confidence interval for the fit. All remaining data is shown in S2–S4 Figs. Approximately 30 operator mutants were measured in total. We note that lower expression measurements are less accurate than higher expression measurements due to autofluorescence and limitations in the flow cytometer’s ability to measure weak signals. This adversely affects the accuracy of fold-change values for strongly repressed strains. B: The measured binding energy values ΔεR (y axis) are plotted against binding energy values predicted from an energy matrix derived from the O1 operator (x axis). The horizontal error bars represent the standard deviation of predictions made from three matrix replicates obtained by splitting the Sort-Seq data into three groups. MCMC was used to obtain a scaling factor for each matrix to convert it into kBT units. The vertical error bars represent the 95% confidence interval of the fitted ΔεR values (where not visible, these error bars are smaller than the marker). While the quality of the binding energy predictions does appear to degrade as the number of mutations relative to O1 is increased, the O1 energy matrix is still able to approximately predict the measured values. C: Binding energies for each mutant were predicted using both the O1 and O2 energy matrices and compared against measured binding energy values. The prediction error, defined as the magnitude of the difference in kBT between a predicted binding energy and the corresponding measured binding energy, is plotted here against the number of mutations relative to the reference sequence whose energy matrix was used to make the prediction. Each data point is shown in purple, and box plots representing the data are overlaid to clearly show the median error and variability in error. For sequences with 4 or fewer mutations, the median prediction error is consistently lower than 1.5 kBT. The dashed horizontal line represents the point at which the error corresponds to an approximately 10-fold difference in fold-change.