Performance and Scalability of Discriminative Metrics for Comparative Gene Identification in 12 Drosophila Genomes

Pairwise discovery power using different informant species.

(A) ROC curves for KA/KS using D. melanogaster with each of five different informant species. Species at a wide range of evolutionary distances performed comparably, except for D. erecta, the most closely related to D. melanogaster, which clearly underperformed the others. (B) MAE and AAC error statistics for each pairwise comparative metrics applied to the same five informants. D. ananassae (blue) is overall the preferred informant, but not uniformly so. For TBLASTX, the performance is also shown using mosquito (Anopheles gambiae) and honeybee (Apis mellifera), which led to worse performance than the Drosophila species. No pairwise comparison outperformed the best single-sequence metric (Z curve).

