The (In)dependence of Alternative Splicing and Gene Duplication
Figure 2
The Relationship between AS and GD at the Genomic Level
(A) The diagram shows the uneven distribution of AS amongst GD families of different sizes for the human genome. Information on AS has been taken from the AltSplice database [43]. GD families were obtained by clustering all sequences of more than 40%, 60%, 80%, or 90% seq.id., respectively, using CD-HIT [47]. The dashed line marks the expected fraction of genes with AS, given an unbiased distribution of all known genes with splice variants across the whole genome. In accordance with previous results [12,13], for large GD families we observe fewer genes with AS than expected at random.
(B) The cartoons illustrate that alternative splice isoforms and gene duplicates may be expressed in the same number and/or types of tissues. Here, we compared the extent of coexpression amongst alternative splice variants (AS coexpression) and gene duplicates (GD coexpression).
(C) Coexpression levels amongst gene duplicates (GD coexpression) are estimated as the average pairwise PC between expression patterns of all genes within a GD family. GD coexpression amongst duplicates of >40% seq.id. (white diamonds) is more similar to the overall AS coexpression (red line indicating the value displayed in Figure 2D) than GD coexpression amongst duplicates of >80% seq.id. In other words, coexpression of alternative splice variants is similar to coexpression amongst gene duplicates of >40% seq.id.
As this dataset [17] is too small for GD80 families to be split into further subsets, we examined GD coexpression in an additional dataset [53] (black diamonds). For both 40% and 80% seq.id., expression variation amongst gene duplicates with alternative splice variants (AS+) is slightly higher than variation amongst gene duplicates without alternative splice variants (AS−). p-Values are based on t-test calculations. Data on alternative splice variants was taken from the AltSplice database [43]. Further details and results are provided in Table S4 and Figure S10A and S10B.
(D) Coexpression levels amongst alternative splice variants (AS coexpression) are estimated as average pairwise PC between the expression patterns of all exon junctions of a gene. High PC indicates little variation (high coexpression), and vice versa. The figure shows average AS coexpression across all genes in the dataset [17], and across subsets of the genes: GD families (GD+) and singletons (GD−) as defined by >40% and >80% seq.id., respectively. The overall AS coexpression is marked as a red diamond and indicated as a red line in Figure 2C. Further details are provided in the Table S4 and Figure S10A and S10B. p-Values are based on t-test calculations. Gene duplicates of high seq.id. (>80%) have slightly lower AS coexpression than singletons (p-value < 0.001).