Performance and Scalability of Discriminative Metrics for Comparative Gene Identification in 12 Drosophila Genomes
(A) ROC curves for the dN/dS test using subsets of Drosophila species corresponding to increasingly broad phylogenetic clades from D. melanogaster (see Figure 1). Discriminatory power steadily increased as more informants were used, leading to strictly better sensitivity and specificity. (B) Effect of additional species was most pronounced for short exon lengths. (x-axis) mean length within a quantile of the sequence length distribution (y-axis) sensitivity of the dN/dS test within each quantile at fixed specificity (99%). (C) MAE and AAC error statistics for each multi-species comparative metric using the same subsets of informants. Also shown for comparison are the best pairwise analysis and the best single-sequence metric, both of which are outperformed by multi-species methods with sufficient informants.