Figure 1.
Learning Methods: Discriminative versus Generative
Schematic comparison of discriminative (A) and generative (B) learning methods. In the discriminative case, all model parameters were estimated simultaneously to predict a segmentation as similar as possible to the annotation. In contrast, for generative HMM models, signal features and state features were assumed to be independent and trained separately.
Table 1.
Dataset Statistics
Table 2.
Accuracy Results for BGHM953
Table 3.
Accuracy Results for TIGR251
Table 4.
Accuracy Results for ENCODE294
Table 5.
Significance Testing
Figure 2.
F-Score as a Function of Intron Length
Results for all sets combined (A) and for individual test sets shown in subfigures (B–D). The boxed number appearing directly above each marker represents the total number of introns associated with the marker's length. For example, there were 1,475 introns with lengths between 1,000 and 2,000 base pairs for all sets combined (A).
Figure 3.
F-Score versus Intron Length for the Encode Test Set
Results in subfigures (A) and (B) correspond to the subset of alternatively spliced genes and its complementary subset, respectively.
Figure 4.
CRAIG's relative improvements in prediction specificity (orange bar) and sensitivity (blue bar) by signal type. In each case, the second-best program was used for the comparison: Genezilla for starts, Augustus for stops, and GenScan++ for splice sites.
Figure 5.
Finite-State Model for Eukaryotic Genes
Variable-length genomic regions are represented by states, and biological signals are represented by transitions between states. Short and long introns are denoted by IS and IL, respectively.
Table 6.
State Features for Each Segment Label
Table 7.
Transition Features per Signal Type