Figure 1.
Using BBH strategy to identify functionally identical genes.
(a) Three grey vertical bars represent three different species. Circles on each bar represent genes belonging to that species. Colors of the circles indicate a certain biological function; same colors indicate the same biological function. Black bi-directional arrows represent BBHs: a solid BBH arrow means a true positive, i.e., it links two genes with the same function, and a dashed BBH arrow means a false positive, i.e., it links two genes with different functions. Grey curved bi-directional arrows represent gene duplication. Genes are arranged into three tiers on the panel. The top tier is a group of four red circles representing four genes with identical functions. There is a recent gene duplication event in species A, which creates two paralogs (two red circles on the left bar) with the same biological function. In the middle tier, there are three orange circles, which should have been all connected by true positive BBHs. However, if the function corresponding to the orange circle has some relationships with that corresponding to red circle at the top tier, the orange gene from species B and a red gene from species A are detected as a pair of BBH. This is an example of false positive, which is shown as a dashed BBH arrow. The bottom tier includes four genes. The two green genes from species A and B is a pair of true positive BBH. There is a duplication event that caused a subfunctionalization event in species C, i.e., the original green function is shared by the blue and yellow functions in this species. Green gene from species A is connected through a BBH linkage to the yellow gene in species C, but their function are not identical. Similarly, green gene in species B is connected to blue gene in species C. In this tier, subfunctionalization results in two false positive BBH linkages. (b) A network showing the topology of a plausible ortholog group. Nodes are genes and edges are BBH linkages. There are three different functions in this ortholog group (indicated by the three colors). Further partition work is required.
Figure 2.
Phylogeny-based ortholog group construction.
(a) On the upper left panel, a tree delineates the phylogenetic relationships among six species, A–F. Below the species tree, a phylogenetic tree is shown, which includes ten genes taken from the six species. The right panel shows the tree after reconciliation, which is the process of comparing the gene tree with the species tree to date evolutionary events like duplication and deletion. For the reconciled tree, the dashed thick lines represent the species tree as the same as the one on the upper left panel, and solid lines indicate the reconciled gene tree. Three duplication events are dated. Duplication D1 occurs after the speciation of species A and B. D2 occurs before speciation of C and D, and D3 occurs before CD and EF. According to current tree analysis algorithms, functional partition points will be at D2 and D3. (b) Gene duplication close to leaf nodes does not necessarily result in function divergence. The schematic shows the evolutionary history of the same gene, with the only difference that the tree includes five closely related species of B, instead of one, where duplication D1 occurs before speciation of the five B species. D1 is so recent that it is hard to estimate if there will be subfunctionalization/neofunctionalization. It might result in “in-paralogs” where duplicated genes in all five B species have the same function. D2 and D3 are duplications that happened a long time ago. If paralogs due to D2 and D3 are present in most descendant species, there is a higher chance for them to have diverged biological functions.
Figure 3.
Cross comparison of human-fly ortholog pairs from three different ortholog resources: Inparanoid, OrthoMCL, and TreeFam.
Due to the asynchronous updates of these data resources, the gene sets used in the three are slightly different. To make a cross comparison, we mapped their gene IDs to the most recent human and fly gene IDs in Ensembl 53, using biomart (http://www.ensembl.org/biomart). After ID mapping, we got 10,834 pairs of human-fly ortholog genes from Inparanoid, 12,784 pairs from OrthoMCL, and 6,824 from TreeFam. Intersections of the three pairs sets are shown in the Venn diagram. Among these ortholog pairs, only 1,955 pairs of orthologs exist in all three ortholog resources, accounting for 18% of Inparanoid human-fly ortholog pairs (15% and 28% in OrthoMCL and TreeFam, respectively). Details of this and other orthologs comparisons can be found at http://wiki.gersteinlab.org/pubinfo/Ortholog_Resources.