Zero-shot segmentation using embeddings from a protein language model identifies functional regions in the human proteome

doi:10.1371/journal.pcbi.1012929

Zero-shot segmentation using embeddings from a protein language model identifies functional regions in the human proteome

Fig 5

Segment Embedding Evaluation and Visualization of IDRs.

(A) 1-nn precision for Disprot functional annotations, reported for ZPS (blue) and 3-mers (red). (B) 1-nn precision for ProtGPS localization annotations, reported for ZPS (blue) and 3-mers (red). Error bars represent the binomial confidence intervals and can be found along with precision values in S1 Table. (C) The normalized confusion matrix for 1-nn assessment of the top 20 most common annotations that overlap with MobiDB IDRs. Precision is shown along the diagonal, and n is the number of ZPS segments. (D) UMAP of the segments used in the 1-nn assessment labelled with the annotations that overlap MobiDB IDRs.

doi: https://doi.org/10.1371/journal.pcbi.1012929.g005