Figure 1.
Conservation Study of the SH3 Domains of S. cerevisiae in Ten Other Yeast Genomes
CD, conserved domain (the SH3-containing protein has an ortholog and the ortholog SH3 domain is possibly conserved, i.e., less than three conservative changes and no nonconservative changes in the binding positions); DD, divergent domain (SH3-containing protein has an ortholog in this genome but the domain is not on the same branch of the phylogenetic tree); NO, no ortholog (no ortholog found for SH3-containing protein in a particular genome); PD, possibly divergent (SH3-containing protein has an ortholog in this genome but the ortholog SH3 domain has at least one nonconservative change in the binding positions or more than two conservative changes in the binding positions).
Table 1.
SH3 Consensus Sequence Information
Figure 2.
Size of Probing Window When Looking for Conservation of the Consensus Sequence in Orthologs of the Putative Target Protein
We defined the conservation score as simply the number of species where the consensus sequence is conserved. With this information the accuracy and coverage were calculated, with the gold (A) and platinum (B) positive sets, for consensus sequence conserved in different numbers of species and for different sizes of the probing region.
Figure 3.
Combining Conservation and Secondary Structure Prediction
We calculated, with the gold (A) and platinum (B) positive sets, the accuracy and coverage for target prediction when including or excluding secondary structure information. We used a probing region of 210 alignment positions in this analysis.
Figure 4.
Optimal Divergence Time to Search for Conservation of Target Motif of SH3 Domains
We designated seven groups of species with an increasing average divergence time from S. cerevisiae and calculated for each group the highest accuracy obtained for restricted windows of coverage. We used the gold positive and the negative set to calculate the accuracy and coverage (see Materials and Methods). The seven groups of species are as follows: (1) S. bayanus, S. paradoxus, S. mikatae, and C. glabrata (average divergence of 112.5 My from S. cerevisiae); (2) S. paradoxus, S. mikatae, C. glabrata, and K. lactis (average divergence of 200 My from S. cerevisiae); (3) S. mikatae, C. glabrata, K. lactis, and C. albicans (average divergence of 387.5 My from S. cerevisiae); (4) C. glabrata, K. lactis, C. albicans, and D. hansenii (average divergence of 575 My from S. cerevisiae); (5) K. lactis, C. albicans, D. hansenii, and Y. lipolytica (average divergence of 725 My from S. cerevisiae); (6) C. albicans, D. hansenii, Y. lipolytica, and N. crassa (average divergence of 875 My from S. cerevisiae); and (7) D. hansenii, Y. lipolytica, N. crassa, and Sch. pombe (average divergence of 950 My from S. cerevisiae). The individual values for the divergence time from S. cerevisiae were taken from the literature [32,42,43]. Although we tried to create groups that would not have genomes of species with very different separation dates from S. cerevisiae, it should be noted that because of the small number of available genomes, the groups are not homogenous. Also, the values of the divergence time of each species were not always obtained with the same method. Therefore, this range of values should be viewed critically.
Figure 5.
Most Informative Genomes in the Search for Conservation of Target Motif of SH3 Domains
We created all possible combinations of two or more genomes of our set of ten genomes. For each combination we calculated the highest accuracy obtained for 11 windows of coverage from 15% to 70% at intervals of 5%. We then calculated the average frequency, over all coverage windows, of each individual species in all groups of genomes, in the combinations of genomes scoring within the 20% highest accuracy values and in the combinations scoring in the lowest 20% values of accuracy. We then used a t-test to determine, for each species, whether the average frequencies within the highest and lowest combinations were significantly different from the frequency in all possible combinations. *, p < 0.05; **, p < 0.001.
Figure 6.
Predictions of S. cerevisiae SH3 Interactions
We considered that a potential target consensus sequence, found by pattern matching, in an S. cerevisiae protein would be biologically relevant if it was within an unstructured region of the S. cerevisiae protein and also conserved in four of the seven comparison genomes used. (C. glabrata, K. lactis, C. albicans, D. hansenii, Y. lipolytica, N. crassa, and Sch. pombe). Red lines indicate the interactions for which we found some experimental evidence in protein interaction databases [59–61]; thin black lines indicate interactions between proteins that are labeled as locating to different compartments; thick black lines indicate interactions for which we found no evidence. There were two S. cerevisiae SH3 domains for which we could not predict any interaction because of the stringency applied. A complete list of the interactions with function, localization, and binding positions is given in Table S4.