Skip to main content
Advertisement

< Back to Article

Pervasive, conserved secondary structure in highly charged protein regions

Fig 4

Helical regions can be predicted from composition.

a Uversky plot of all regions used to train the LR model. The marginals of the distribution are shown on the plot border. b The coefficients of the logistic regression (LR) model which predicts whether a region is helical or disordered on the basis of amino acid composition. The model was trained on purely helical and disordered regions (predicted by AlphaFold) selected from the S. cerevisiae proteome. Amino acids with a positive coefficient are correlated with helices, those with a negative value are correlated with disordered regions. c The helix propensity from [46] plotted against the LR model coefficients. d Accuracy of both the LR model (top), the Uversky dividing line (middle, from [2]), and flDPnn [47] (bottom) on purely helical and disordered regions (held out data from the training set, n = 3360; left), randomly-drawn regions, which are predicted by AlphaFold to be majority (but not completely) helical or disordered (n = 3405; center), and the highly charged regions (n = 681; right). e Summarized accuracy for all categories in d. f Summarized accuracy of the LR model with only a subset of coefficients, g (Left) Accuracy of the LR model prediction of regions from other organisms. (Right) Timetree showing the evolutionary divergence of the organisms.

Fig 4

doi: https://doi.org/10.1371/journal.pcbi.1011565.g004