Enhanced Methods for Local Ancestry Assignment in Sequenced Admixed Individuals

Figure 2

Local ancestry inference accuracy in three simulated populations.

“Array data” denotes that a method was run only on the variants present on the Illumina 1 M genotyping array. “Full genome” denotes methods were run using all the variants. RFMix requires phased haplotype input, which was infered using Beagle; all other methods received unphased genotype data as input. Correlation values are the mean squared correlation across SNPs of the true vs. inferred ancestry across individuals. LAMP-LD and MULTIMIX were optimized to run with genotyping array data, possibly explaining the steep drop in accuracy when they are run using full sequencing data. MULTIMIX is not plotted when run on full sequencing data because it performed very poorly, possibly due to inaccurate parameters for sequencing data. Haploid and diploid errors are reported in Table 2.

Figure 2