Fig 1.
The left panel of the figure shows the workflow diagram of our modeling methodology: we identified 9-base length annotated donor sequences from each analyzed genome and estimated 1-site and 2-site marginal probabilities fi(si) and fij(si, sj). The bottom panel displays mathematical expressions defining the maximum entropy model as a function of the fitting parameters hi(si) and Jij(si, sj). In the upper-right panel, we present the energetics of human donor sequences (γ = 0.025), showing the distribution of data-driven energy values for the 502197 5’ss sequences observed from the human genome. Rows are added to represent the distribution of u12 junctions along with decoy-A, decoy-B, and random sequences (sample of 10000), denoted by green, red, orange, and gray ticks, respectively. Black points represent mean values, whereas black lines represent σ energy intervals. The inset shows a circos representation of coupling interactions, where an outer ring of 36 boxes represents single-site relative frequencies. Warm colors are used for the three exonic sites, while cold colors represent intronic sites. The area of each box is proportional to the nucleotide-site observed probability fi(si). Positive and negative couplings are depicted connecting different site-base combinations in green or red curves, respectively.
Fig 2.
Circos diagrams illustrating coupling patterns for complete (genomics) and GT-restricted (constrained genomics) sets of human annotated donor sequences are shown in left and center panels. Interactions inferred from junctions transcriptionally expressed in normal human samples according to RJunBase (transcriptomics) are shown in the right panel.
Fig 3.
Left, Pij-induced dendrogram and phylogenetic tree laid out as a tanglegram. The x scales correspond to Euclidean distances between vectorized Pij values and the timetree’s divergence time estimations, respectively. Green, red and blue are used to highlight plant, animal and fungal species respectively. Right, presence/absence matrix for five modelled interaction parameters identified as statistically significant (see text). Square and circle shapes were used for negative and positive interactions respectively.
Table 1.
Phylogenetic signals associated with coupling parameters.
Each row depicts a model parameter found to be statistically significant in discerning animals, plants and/or fungi by a Maddison-Slatkin test. For each coupling parameter (first column) we can see the number of plants, animals and fungi the interaction was detected in (second, third and fourth columns). The observed number of Sankoff inferred evolutionary steps is reported in the fifth column. Minimum, median and maximum values for this quantity for bootstrapped samples are reported in the sixth column (comma separated values). Bonferroni corrected p-values were included in the last column of the table.
Fig 4.
Pairwise interaction patterns.
Circos diagrams for coupling parameters (γ = 0.025 model) identified for plants, animals and fungi donor sequences.