Modeling the Evolution of Regulatory Elements by Simultaneous Detection and Alignment with Phylogenetic Pair HMMs

doi:10.1371/journal.pcbi.1001037

Figure 1.

State-transition diagram for a PPHMM implementing a reversible and affine background indel model for a phylogeny branch of length t.

Ovals denote emitting states; arrows denote transitions. Start and stop are special non-emitting states. This model can be implemented in 33 lines of SEAL code. Parameter s = 1-(1-b_∞)(1-b(t)) gives the probability of leaving the background model, for gain-of-function probability b(t), b_∞ = lim_t_→∞b(t). Parameters α and β influence indel rates.

More »

Expand

Figure 2.

PPHMMs for loss (top) and gain (bottom) of function in a binding site.

Ovals are emitting states. The top half of an emitting state denotes the functional class in the parent, while the bottom half denotes the functional class in the child. Dash denotes a gap. bg denotes the background functional class. W_i denotes the functional class corresponding to the i^th column in a positional weight matrix (PWM). Transition probabilities are derived from the background indel model. Emission probabilities are derived via a substitution mixture model.

More »

Expand

Figure 3.

An example CRM evolution model that can be implemented in our framework.

Parallelograms denote groups of states in the PPHMM; small parallelograms denote states implementing a binding site profile (positional weight matrix). b(t): gain probability. q(t): loss probability. p(t): retention probability. b_∞: limit of b(t) as t→_∞. t: branch length. s: 1-(1-b_∞)(1-b(t)). ε: 0.00001. Plus and minus denote strand. See Materials and Methods for additional details.

More »

Expand

Figure 4.

Site-level prediction accuracy as a function of number of species in EVOS simulation runs (the simulator and predictor modeled the same number of species).

More »

Expand

Table 1.

Alignment accuracy for PSPE simulation runs, averaged across runs (CRMs).

More »

Expand

Table 2.

Binding-site prediction accuracy for PSPE simulation runs.

More »

Expand

Table 3.

Site-prediction accuracy on 17 Drosophila developmental enhancers.

More »

Expand

Figure 5.

Histogram of number of extant Drosophilids predicted to share a given site, for known sites (top pane) and novel predicted sites (bottom pane), over a 10-way phylogeny.

More »

Expand

Figure 6.

ROC-like curve for MAFIA (blue) applied to ten species, rMonkey (red) applied to six species, and the “gold standard” (gold).

Sensitivity is plotted on the y-axis, false-positive rate along the x-axis. Each point corresponds to a different stringency threshold in the processed ChIP-seq data.

More »

Expand

Figure 7.

An example D. melanogaster developmental enhancer.

At top are F-Seq scores from ChIP-seq data for six transcription factors (kr = kruppel, kni = knirps, hb = hunchback, gt = giant, cad = caudal, bcd = bicoid); curves were scaled to maximize visual impact for the figure. Predictions and known sites are shown below, with colors denoting factor identity as per the F-Seq curves (factor tailless was not assayed in the ChIP-seq experiments and is shown in white). Plus and minus tracks correspond respectively to sense and antisense strands of the dm3 assembly for chromosome 3R. The FlyReg track depicts known binding sites according to the “gold standard” (see text). The EMMA track was produced using the –e option for that program.

More »

Expand

Figure 8.

Example MAFIA alignments from the CRM shown in Figure 7.

Nucleotides predicted to participate in binding are shown in bold. Weight matrices for factors are shown as sequence logos above (sense strand) or below (antisense strand) the alignment.

More »

Expand