Fig 1.
A model of alternative splicing.
(A) Splicing factor U1 and U2AF search the 5’ GU and 3’ AG splicing sites by 3D and 1D Brownian motion. Multiple candidate splice sites compete for the binding of U1 and U2AF. The binding is ATP-independent and reversible. (B) The binding of U1 and U2AF to the splice sites becomes stable only after the ATP-dependent binding of U2 snRNP. The identification of each intron is equivalent to a minimization process that U1 and U2AF dynamically search their global or local minimal energy sites on the pre-mRNA segment presented for AS. (C) The scaled expression level of transcript isoform follows type III extreme value distribution—a Weibull distribution. The approximate values of parameters a (0.44) and b (0.6) are estimated by curve fitting. Black curve represents the distribution of scaled expression level from experimental data. Red curve represent the Weibull distribution produced by curve fitting.
Fig 2.
The frequency distribution of transcript isoforms.
(A) Schematic diagram of alternative splicing and calculation of transcript isoform frequencies. Colored regions represent exons. Gray regions represent introns and intergenic sequences. For simplification, the expression values of isoforms are taken as integers. (B) The boxplot distribution of transcript isoform frequency f(k, M) with fixed k and increasing M. k is the rank of transcript isoform. M is the number of transcript isoforms of genes. Boxplot represents frequency distribution calculated from our RNA-seq data by Cufflinks based on merged gene datasets. Blue curve represents median values calculated from the approximation formula (4). Red curve represents median values from simulation of Weibull distribution W(0.39). (C) The distribution of the Euclidian distance relative to different a for all mf(k,M) in Fig 2B between experimental data and simulated data from Weibull distribution. The distance reaches the minimum when a = 0.39.
Fig 3.
The frequency distribution of all transcript isoforms from experimental data and simulated data for M = 2:10.
M is the number of transcript isoforms for a gene. Black curves represent experimental data, red curves represent simulated data from W(0.39). KLd is the Kullback-Leibler divergence between the two distributions.
Fig 4.
The frequency distribution of the kth dominant transcript isoform.
(A) k = 1. (B) k = 2. k is the rank of a transcript isoform. M is the number of transcript isoforms for a gene. Black curves represent frequency distribution of the experimental RNA-seq data. Red curves represent the frequency distribution of the simulated data from Weibull distribution W(0.39). KLd is the Kullback-Leibler divergence between the two distributions.
Fig 5.
Transcript isoform expression pattern of two genes in different conditions.
(A) BRD4. (B) SRSF7. Among 11 transcript isoforms of BRD4 and 12 transcript isoforms of SRSF7, ENST00000371835 and ENST00000409276 are the most dominant isoforms in all four activated conditions, ENST00000263377 and ENST00000477635 are the most dominant isoforms in all four resting conditions, respectively. This result indicates the major transcript isoform can be regulated by single external signal.
Fig 6.
The number of isoforms expressed versus those annotated.
The boxplot is the observed result from our RNA-seq data. The red curve is the expected median calculated from our Weibull model.