Table 1.
Classification of the 6 gamma proteobacteria and the artificial genomes used in this study according to their distance to artificial E. coli.
Table 2.
The sixteen horizontal transfer detection methods analyzed in this paper.
Figure 1.
ROC-like curves of the 16 methods.
Each dot of a curve corresponds to the values of type I error (100-sensitivity) and type II error (100-specificity) for each value of r (see M&M). The best methods are those with the less errors, i.e. those that are the closest of the origin.
Table 3.
Mean performances of all the 16 methods with “standard” model genomes.
Figure 2.
Mean errors of 7 methods according to (A) origin, (B) overall quantity, (C) size and (D) recipient genome.
The mean error is the mean of type I (sensitivity) and type II (specificity) errors. It is presented here for the 7 efficient HT detection methods of each criterion (codon usage: CU.KL; dinucleotide frequencies: dint5; GC content: GCtotal and GC1-GC3; and tetranucleotide frequencies: oli.chi2, oli.KL and signature) according to four parameters. A: the origin. The unique donor genome of the HTs are ordered according to their distance to the host genome (E. coli) in terms of tetranucleotide frequencies – the closest on the left and the farthest on the right. B: the overall quantity of HTs in percentage of the genome. C: the size of the HTs. Small, Medium, Large and Very Large respectively mean 1 to 5 genes, 5 to 10 genes, 10 to 20 genes and 20 to 30 genes. D: the host genome, i.e. the genome receiving the HTs.
Table 4.
Sensitivity, specificity and mean performance of the methods with HTs originating from real gamma-proteobacteria.
Table 5.
Mean performance of the combination of 2 methods over the “standard” model genomes and over the “real” E. coli genomes.