Global Analysis of Proline-Rich Tandem Repeat Proteins Reveals Broad Phylogenetic Diversity in Plant Secretomes

Cluster landscape of Pro-rich TR motifs from plant secretome sequences.

Each node represents a TR cluster, node labels denote the original cluster identifier (see Tables S9, S10), and edge thickness represents the fraction of times each pair of TR clusters was co-clustered over ensemble re-sampling (see Materials and Methods) (also see “pairwise affinity” defined in Figure 4 of [49]). Thin, dotted, edges indicate a co-clustering of <10%. Large labeled nodes in the network denote clusters containing secreted TRPs found in at least ten species and twenty protein sequences (Table S10) while intermediate size labeled nodes satisfy one of these two criteria. Smaller unlabeled nodes do not meet either criterion, but are shown due to their similarity in motif content to larger, neighboring nodes. Major TRP classes from Tables S6, S7 and S8 are indicated around corresponding TR motif super-classes (circled in gray). Node color represents the retention rate of the TR taxonomy (Tables S3, S4, S5), defined as the proportion of all protein sequences corresponding to each cluster that are captured by TR class definitions (for class definitions, see Table S11; for a quantitative version of the taxonomy retention rate, see Table S10). For a high-resolution version of this network showing all individual TR consensus sequences, see Figure S1, and see Figure S2 for the same high-resolution network also showing TR super-classes. For details of network construction, see Materials and Methods. This network was rendered using Cytoscape 2.6.0 [50].

