A theoretical analysis of single molecule protein sequencing via weak binding spectra

doi:10.1371/journal.pone.0212868

Fig 1.

Identifying amino acids from kinetic measurements.

A Example affinity matrix for a set of NAABs. The affinities of each of the 17 NAABs are shown for all 19 amino acids excluding cysteine, which is used to anchor the peptides to the surface. Reproduced from [1]. B In the proposed measurement scheme, the target (green disk) is attached to a glass slide and is observed using TIRF microscopy. NAAB binders (brown clefts) bearing fluorophores (red dots) are excited by a TIRF beam (purple) and generate fluorescent photon emissions (red waves). C When a fluorophore is bound, there is an increase in fluorescence in the spot containing the target. Photobleaching of the fluorophore is indistinguishable from unbinding events, so it is important to use a dye that is robust against photobleaching. Plot shows an illustrative stochastic kinetics simulation incorporating Poisson shot noise of photon emission. A relatively strong binder is shown solely for purposes of illustration. In practice, the method relies on many measurements performed on weak binders. D The plot shows the result of a proposed kinetic measurement on an N-terminal amino acid using only two NAABs. The affinity of each N-terminal amino acid (black Xs, excluding cysteine) for the methionine-targeting and tryptophan-targeting NAABs are shown as a scatterplot, with the affinity for the met-targeting NAAB on the x axis and the affinity for the Trp-targeting NAAB on the y axis. Upon measuring the affinities for these NAABs against an unknown target undergoing sequencing, the unknown target can be identified with the amino acid with expected vector of affinities closest in the two-dimensional Euclidean space (higher-dimensional in a full experiment) to the measured affinity. The colored regions correspond to the regions within which a measured multi-NAAB affinity vector would be assigned to a given amino acid. As an example, a pair of measurements yielding the white star in D would identify the target as glycine. E The affinities of the glutamine and lysine targeting NAABs are shown for each of the amino acids. Some amino acids that are practically indistinguishable using the Met and Trp NAABs are easily distinguished using the Gln and Lys NAABs. As an example, if the same target amino acid described in D were measured with only the Gln and Lys NAABs, yielding the white star, we would identify the target as proline. However, combining these measurements with those for the white star in D with Met and Trp NAABs, we see that the true identity of the target is serine. Thus, the higher dimensional measurement of the amino acid using many different NAABs allows disambiguation of the amino acid identity.

More »

Expand

Fig 2.

Two types of affinity measurements using TIRF microscopy.

A A measurement performed using the proposed scheme yields a fluorescence intensity trace where periods of high intensity correspond to the target being bound and periods of low intensity correspond to the target being free. The affinity of a binder against the target may then be determined in two ways, either via occupancy measurements or via luminosity measurements. B An occupancy measurement is performed “along the time axis,” by calculating k_on from the average time between binding events, and k_off from the average length of binding events. C On the other hand, a luminosity measurement is performed “along the brightness axis,” by calculating k_D directly from the average luminosity of the target over the whole observation period. D We validated our simulation by applying occupancy measurements to determine k_on and k_off from simulated data. The parameters used here were identical to those used in the production of Fig 2a in [23]. See text for symbol definitions.

More »

Expand

Fig 3.

Two types of affinity measurements using TIRF microscopy.

A The accuracies of occupation measurements of k_D are shown as a function of k_D and k_on for the simulation described in the text, with T_exp = 100 s. These measurements achieve high accuracy for k_on ≥ 10⁴ m⁻¹ s⁻¹ and k_off ≪ 100 s⁻¹. For values of k_off on the order of 100 s⁻¹ (upper right-hand corner), the accuracy deteriorates significantly. B The accuracies of luminosity measurements of k_D are shown as a function of k_D and k_on. These measurements achieve high accuracy for k_on ≥ 10⁵ m⁻¹ s⁻¹ and k_D ≥ 100 nm. The heat map shown gives the fractional errors as a function of k_D and k_on for the simulation described in the text, with T_exp = 100 s. In contrast to occupation measurements, the accuracy of luminosity measurements does not deteriorate for very high values of k_off. C For luminosity measurements only, the mean fractional error in the measured value of k_D is plotted as a function of the observation time for five different values of k_D. The line y = 1/x is plotted as a guide to the eye. For k_D = 10 nm and k_D = 100 nm, the effects of photobleaching are evident at longer runtimes. D Also, for luminosity measurements only, the measured value of k_D is plotted as a function of the actual value of k_D for 8 different values of the runtime. The performance of the algorithm improves dramatically for τ_obs > 25 s. The line y = x is plotted as a guide to the eye. Error bars in C, D denote standard error over 100 trials.

More »

Expand

Fig 4.

Identification of amino acids is robust against systematic error.

The fraction of amino acids incorrectly identified is plotted as a function of τ_obs for four different values of the systematic calibration error σ_C and four different values of the systematic kinetic error σ_K (as described in the text). A In the absence of systematic error, measurements with τ_obs = 50 s result in correct amino acid identification more than 98% of the time. For 25% error in k_D, the accuracy drops to 97.5%, and if 5% calibration error is added, it drops further to 92%. More than 5% systematic error in the calibration leads to very significant numbers of mistakes in amino acid identification. B With τ_obs = 100 s, an accuracy of 97.5% was obtained for 25% error in k_D and 5% error in the calibration. Axes for B, C, and D are the same as in A. C Increasing τ_obs beyond 100 s at the same binder concentration leads to diminishing improvements in the accuracy. D The sensitivity to calibration error could be substantially reduced by decreasing the concentration of free binders to 100 nM. However, this decreased concentration necessitates a longer runtime. E For τ_obs = 100 s, plots are shown for each value of σ_C and σ_K, depicting the probability that a given target amino acid (on the horizontal axis) was assigned a particular identity (on the vertical axis). Off-diagonal elements correspond to errors.

More »

Expand

Fig 5.

Overall error rates for 100 random affinity matrices.

A histogram of the overall error rate, calculated as the sum of incorrect residue calls divided by the total number of residue calls over 10000 trials, is plotted for 100 random affinity matrices.

More »

Expand

Fig 6.

Accuracies for amino acid calling obtained for 100 random affinity matrices in simulations.

100 random affinity matrices were generated by randomly shuffling the entries of the NAAB affinity matrix. For each resulting matrix, we simulated 10000 amino acid calls, with 5% calibration error and 0.25% kinetic error. The resulting accuracy matrices are presented here. The scale and axes for each matrix are identical to those in Fig 4E.

More »

Expand