Skip to main content
Advertisement

< Back to Article

The (In)dependence of Alternative Splicing and Gene Duplication

Figure 3

Global and Local Sequence Identity in AS and GD Substitutions

AS data were obtained by querying SwissProt [44] database version 40, with the keywords VARSPLIC and HUMAN. GD data were obtained by clustering the SwissProt [44] data using CD-HIT [47] to 40% or 80% seq.id. (GD40 and GD80, respectively). We focus on AS+/GD+ cases, i.e., those sequences with both AS and GD, in Figure 3A–3C, and discuss the AS−/GD+ versus AS+/GD− case in Figure 3D.

(A) Global seq.id. The seq.id. in GD families depends on the cutoff used for clustering, e.g., GD40 (dark red) or GD80 (light violet), respectively. The global seq.id. between alternative splice isoforms (light green) is very high ( >90% seq.id.), reflecting the underlying nature of AS changes.

(B) Local seq.id. in alternative splice isoforms (dark green) is measured between substituted stretches, usually arising from mutually exclusive exons. The local seq.id. between gene duplicates is obtained using a moving window (GD80: light violet, GD40: dark red) and reporting the seq.id. observed in all possible window positions.

(C) Local seq.id. in AS and GD at equivalent positions. The graph compares local seq.id. found in alternative splice variants of a gene with the local seq.id. of a duplicate of the same gene. The AS local seq.id. was computed between substituted sequence stretches. For GD, we mapped the sequence positions of the AS event to the aligned GD, and computed the seq.id. between the GD, considering only the aligned positions within that region. The comparison is shown for AS and GD40 (red) and GD80 (blue), respectively.

The diagonal separates the plot into two halves: the upper half corresponds to the region for which GD seq.id. is higher than that for AS; the lower half corresponds to the opposite. For both types of gene families (GD40 and GD80), most substitutions show higher seq.id. amongst gene duplicates than amongst alternative splice variants, and this bias is significant (GD80: 111 of 142, χ2 test p-value < 1.9 × 10−11; and 492 of 786, χ2 test p-value < 6.5 × 10−15, respectively). This result confirms the overall distributions examined in Figure 3B: changes in AS are stronger and more localized than those in GD.

(D) Local seq.id. in AS−/GD+ and AS+/GD− substitutions. To compute local seq.id. in AS−/GD+ families, we first align two GD, then slide a 100-aa window over the sequence of one protein, and compute the seq.id. at all sequence positions of the window. The results of all the possible comparisons are plotted for GD40 (dark red) and GD80 (light violet) families. For genes with AS but no duplicates (AS+/GD−) (dark green), local seq.id. was computed between the two substituted stretches resulting from AS events. As for AS+/GD+ families (Figure 3B), we find that, in general, local seq.id.s are substantially lower for AS events (AS+/GD−) than for GD (AS−/GD+ families). The overlap between the AS and GD40 families is higher than that between AS and GD80 families, which may partly be due to differences in the structure constraints applying to the proteins in each set.

Figure 3

doi: https://doi.org/10.1371/journal.pcbi.0030033.g003