A Theoretical Justification for Single Molecule Peptide Sequencing

doi:10.1371/journal.pcbi.1004080

A Theoretical Justification for Single Molecule Peptide Sequencing

Fig 2

Simulations of ideal experimental conditions suggest relatively simple labeling schemes are sufficient to identify most proteins in the human proteome.

Each curve summarizes the fraction of human proteins uniquely identified by at least one peptide as a function of the number of sequential experimental cycles (a paired Edman degradation reaction and TIRF observation). Here, we consider peptides generated by different proteases (e.g. Glu represents cleavage C-terminal to glutamic acid residues by GluC, Met represents cleavage after methionine residues by cyanogen bromide) and under different labeling schemes (e.g. Lys + Tyr indicates Lys and Tyr selectively labeled with two distinguishable fluorophores. Asp/Glu indicates both residues are labeled with identical fluorophores). Peptides are immobilized as indicated, with Cys representing anchoring by cysteines (thus, only cysteine-containing peptides are sequenced) and C-term representing anchoring by C-terminal amino acids. Increasing the number of distinct label types improves identification up to 80% within only 20 experimental cycles even when only Cys-containing peptides are sequenced; near total proteome coverage is theoretically achievable when cyanogen bromide generated peptides are anchored by their C-termini and labeled by a combination of four different fluorophores. Cycle numbers denote upper bounds, since each fluorosequence is not allowed to proceed past the anchoring residue (cysteine or C-terminus). Note also that the peptide length distributions change depending on the enzyme used for cleavage, with median lengths of 26 amino acids for cyanogen bromide, 8 for GluC and 10 for trypsin digests.

doi: https://doi.org/10.1371/journal.pcbi.1004080.g002