Phylogenetic Reconstruction of Orthology, Paralogy, and Conserved Synteny for Dog and Human

doi:10.1371/journal.pcbi.0020133

Phylogenetic Reconstruction of Orthology, Paralogy, and Conserved Synteny for Dog and Human

Figure 2

Overview of the PhyOP Orthology Prediction Process

(A) Creation of transcript-based phylogenies. An all-versus-all BLASTP search is run for all proteins from two species (step 1) with an E value upper threshold of 10⁻⁵ and an alignment length threshold of 50 residues. Proteins pairs are linked together in initial clusters (step 2) if the alignment covers >60% of the residues of both sequences. Any remaining proteins are linked to the initial clusters if they align to >50% of the residues of either sequence (step 3). d_S values are calculated from the pairwise alignments (step 4), and unsaturated transcript pairs (d_S < 5.0) grouped first by single linkage and then hierarchically clustered using UPGMA (step 5). Phylogenies are created from cluster branches corresponding to d_S < 2.5 by applying a modified version of the Fitch-Margoliash criterion (step 6).

(B) Prediction of orthology from transcript phylogenies. Transcripts outside of clades of orthologous transcripts are discarded (step 7), and merged genes within orthologous clades are separated (step 8). Transcript clades were separated into three groups: unambiguous clades (step 9) containing genes with no other remaining splice variant; consistent sets of clades (step 10) with identical gene complements; and inconsistent clades (step 11) with different gene orthology relationships suggested by different sets of orthologous transcripts. The inconsistencies are resolved by separating merged genes and choosing transcripts with the lowest d_S to its orthologous transcripts (step 12). Candidate pseudogenes are then discarded to give the final set of orthologous and paralogous genes (step 13).

doi: https://doi.org/10.1371/journal.pcbi.0020133.g002