Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Figure 1.

Hierarchical orthologous groups and their relationship to the orthology graph and the underlying gene and species trees.

In this example, the hierarchical groups for the taxonomic range are drawn in orange. By definition, these groups correspond to the sets of leaves attached to the speciation nodes of the gene tree coloured in orange.

More »

Figure 1 Expand

Table 1.

Algorithm 1 GroupRoots.

More »

Table 1 Expand

Figure 2.

Illustration of Lemma 1: the taxonomic range induces a set of speciation node (left) and associated hierarchical orthologous groups (centre).

Likewise, it also induces an orthology subgraph with set of connected component (right). Lemma 1 establishes the one-to-one correspondence between and (which we prove by viewing it as composition of the one-to-one correspondences and ).

More »

Figure 2 Expand

Table 2.

Algorithm 2 GETHOGs.

More »

Table 2 Expand

Figure 3.

Example of an orthology graph.

An example orthology graph from the OMA database where two false positive prediction merges two well-defined orthologous groups. At the level of vertebrates, the NOX family forms 4 different orthologous groups. Because of two spurious predictions, the NOX1 and NOX2 clusters get weakly connected. The minimum cut algorithm will split them, as there are only two edges to cut.

More »

Figure 3 Expand

Table 3.

Algorithm 3 DivideGraph.

More »

Table 3 Expand

Table 4.

Algorithm 4 FractionReachableInTwoSteps.

More »

Table 4 Expand

Figure 4.

Validation on simulated data: precision-recall plots of COCO-CL, LOFT and the algorithm introduced here (GETHOGs) on two datasets of 30 simulated genomes (200 genes each).

The two datasets show average rates of 4 independent runs of genome simulations with fixed parameters. The difference between the two datasets are essentially different gene duplication rates (see Method section for details). As a point of reference, we also show the performance of pairwise orthologs inferred in OMA (OMA Pairwise). The colour gradient corresponds to various parameter values for GETHOGs and bootstrap value for COCO-CL.

More »

Figure 4 Expand

Figure 5.

Validation on empirical data: precision-recall plot of our newly proposed GETHOGs, COCO-CL, LOFT, EggNOG and OrthoDB on orthologous and paralogous gene relationships for the 3 gene families (3,783 relationships in total) analysed in Boeckmann et al.

[9]. Predictions for GETHOGs and COCO-CL are computed using the default parameters (respectively and bootstrap). The points for EggNOG and OrthoDB are from the original analysis (Reference [9],table 2).

More »

Figure 5 Expand