Zero-shot segmentation using embeddings from a protein language model identifies functional regions in the human proteome
Fig 5
Segment Embedding Evaluation and Visualization of IDRs.
(A) 1-nn precision for Disprot functional annotations, reported for ZPS (blue) and 3-mers (red). (B) 1-nn precision for ProtGPS localization annotations, reported for ZPS (blue) and 3-mers (red). Error bars represent the binomial confidence intervals and can be found along with precision values in S1 Table. (C) The normalized confusion matrix for 1-nn assessment of the top 20 most common annotations that overlap with MobiDB IDRs. Precision is shown along the diagonal, and n is the number of ZPS segments. (D) UMAP of the segments used in the 1-nn assessment labelled with the annotations that overlap MobiDB IDRs.