Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure

doi:10.1371/journal.pgen.0010070

Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure

Figure 2

Inferred Population Structure Based on 1,048 Individuals and 993 Markers, Assuming Correlations among Allele Frequencies across Clusters

Each individual is represented by a thin line partitioned into K colored segments that represent the individual's estimated membership fractions in K clusters. Each plot, produced with DISTRUCT [23], is based on the highest-likelihood run of ten runs: the two runs that were used in further analysis, and the eight runs described under “Cluster Analysis using STRUCTURE.” As in [3], four of ten runs with K = 3 separated a cluster corresponding to East Asia instead of one corresponding to Europe, the Middle East, and Central/South Asia. Two of ten runs with K = 5 separated Surui instead of Oceania. The highest-likelihood run of the ten runs with K = 6, shown in the figure, had a different pattern from the other nine runs (not shown). These other runs, instead of subdividing native Americans into two clusters, subdivided a cluster roughly similar to the Kalash cluster seen in [3], except with a less pronounced separation of the Kalash population. The clusteredness scores for the plots shown with K = 2, 3, 4, 5, and 6 are 0.50, 0.76, 0.84, 0.86, and 0.87, respectively.

doi: https://doi.org/10.1371/journal.pgen.0010070.g002