Skip to main content
Advertisement

< Back to Article

Table 1.

Comparison of ToPS with other Markov model toolkits.

More »

Table 1 Expand

Figure 1.

A diagram of examples of ToPS usage.

Square boxes represent data files, rounded boxes represent programs or manual processes. Each model may be described manually by editing a text file (1), or the train program can be used to estimate the parameters and automatically generate such file from a training set (2). The files that contain the model parameters (in our example model1.txt, model2.txt and model3.txt) are used by the programs evaluate (3), simulate (4), bayes_classifier(5) and viterbi_decoding (6). The evaluate program calculates the likelihood of a set of input sequences given a model, the simulate program samples new sequences, the viterbi_decoding program decodes input sequences using the Viterbi algorithm, and the bayes_classifier classifies input sequences given a set of probabilistic models.

More »

Figure 1 Expand

Figure 2.

The implemented GHMM for the CpG island detector.

In this GHMM we used IMMs as emission sub-models and we tested different values for the exit probability of the NONCPG state, , to generate the sensitivity analysis. The mean length of the CPG state emission was estimated using the training data.

More »

Figure 2 Expand

Figure 3.

Sensitivity associated with the combined length of the predicted CGIs.

In this experiment the points in the curve correspond to different values for the exit probability of the NONCPG state of the GHMM. For comparison, the results with the CGI list from UCSC Genome Browser and with the CGI list obtained using HMM [2] are shown as a blue square and green triangle, respectively.

More »

Figure 3 Expand

Table 2.

Comparison between CGI lists.

More »

Table 2 Expand

Figure 4.

GHMM architecture for eukaryotic protein-coding gene prediction.

is a state for representing an initial exon that ends at phase . is a state for representing an internal exon that begins at phase and ends at phase . is a state for representing a terminal exon that begins at phase . is a state for representing an intron at phase . is a state for representing intergenic regions. is a state for representing the start codon signal. is a state for representing the stop codon signal. is a state for representing acceptor splice site signal at phase . is a state for representing the donor splice site signal at phase . To model the reverse strand, we used the states that begin with the prefix ‘r-’. Squares with a self-transition represent states with geometric duration distribution. Squares without a self-transition represent states with a non-geometric duration distribution. Ellipses represent states with fixed-length durations.

More »

Figure 4 Expand

Table 3.

States of the GHMM for the gene prediction problem.

More »

Table 3 Expand

Table 4.

Accuracy of the gene predictions.

More »

Table 4 Expand