Skip to main content
Advertisement

< Back to Article

Zero-shot segmentation using embeddings from a protein language model identifies functional regions in the human proteome

Fig 2

Segment Embeddings and Segment Colours of RNA-Binding Proteins.

(A) Segment embedding boundaries and colours compared to annotations from the literature (see Methods) for the FET family (FUS, EWSR1 (UniProt ID: Q01844), and TAF-15 (UniProt ID: Q92804)), HnRNPA1 (UniProt ID: P09651), and TDP-43 (UniProt ID: Q13148). (B) Segment embeddings shown in panel A were ordered by hierarchical agglomerate clustering and visualized as a heatmap. Corresponding colours of the segment embeddings are shown to the right of the heatmap and labelled with annotations from the literature. (C) AlphaFoldDB structure of TDP-43 showing segment colours. Abbreviations for folded domains labelled here include RNA Recognition Motifs (RRM), N-terminal domain (NTD), and Zinc Fingers (ZF). Abbreviations for IDR sub-regions labelled here include canonical Nuclear Localization Signal (cNLS), Low Complexity (LC) regions, PY-motif Nuclear Localization Signals (PY-NLS), and RGG (arginine-glycine-glycine motif) regions. G-rich and SYGQ-rich regions describe compositional biases that define sub-regions of IDRs. (D) Arginine embeddings from the FET family, TDP-43, and HnRNPA1 proteins were ordered by hierarchical agglomerative clustering and shown in a heatmap. To the right of the heat map (and moving towards the right), we show the numbered clusters, which arginines are annotated as methylation sites on UniProt (black for methylated), which arginines are contained within a ProRule domain annotation on UniProt (black for domain), which protein the arginine originated from (FUS in green, EWSR1 in red, TAF-15 in blue, TDP-43 in yellow, and HnRNPA1 in purple), the colour of the ZPS segment that the arginine originated from, and the names of the domains and regions that the arginines in each cluster originated from.

Fig 2

doi: https://doi.org/10.1371/journal.pcbi.1012929.g002