Skip to main content
Advertisement

< Back to Article

Figure 1.

Average Correlation Coefficient between True and Predicted Membership of an Individual to a Particular Population or Continental Region, Using PCA and k-Means Clustering on all Available SNPs for a Given Geographic Region, and Sets of Ten to 200 PCA-Correlated, High-In or Random SNPs (Random Selection Was Repeated 30 Times)

The reported correlation coefficient is averaged over all populations in the respective geographic region or over the broad continental clusters.

More »

Figure 1 Expand

Figure 2.

Selecting PCA-Correlated SNPs for Intercontinental Clustering

(A) Raster plot of 255 subjects from four different continental regions with respect to 9,419 SNPs (red/green denotes homozygotic individuals and black denotes hererozygotic individuals).

(B) The scores pj for each SNP. A red star indicates SNPs corresponding to one of the top 30 scores.

(C) Raster plot of the 255 subjects with respect to the top 30 PCA-correlated SNPs. Notice the patterns formed in the four continental blocks.

(D) Plot of the 255 subjects in the “optimal” 2-D space using the top 30 PCA-correlated SNPs.

(E) Raster plot of the 255 subjects with respect to the top 30 In SNPs. Notice that the blocks corresponding to Asia and Europe are slightly more entangled when compared to (C).

More »

Figure 2 Expand

Figure 3.

Cross-Validation of Structure Informative SNPs Selected for Intercontinental Clustering

(A, B) Split of our worldwide sample in 50% training and 50% test set. Average correlation coefficient between true and predicted membership of an individual to a continental region using sets of (A) ten to 200 PCA-correlated or (B) ten to 200 high-In SNPs selected on the training set, and application of the same sets of selected SNPs on the test set (results are averaged over 50 training/test set splits).

(C) Application of the SNP panels selected for intercontinental clustering in our worldwide sample, on the HapMap populations (average correlation coefficient between true and predicted membership of an individual to one of three continents is shown).

More »

Figure 3 Expand

Table 1.

Overlap between the Top 200 PCA-Correlated SNPs and the Top 200 In SNPs

More »

Table 1 Expand

Table 2.

Number of Pairs among the Top 200 PCA-Correlated (PCA-c.) and In SNPs That Are in High LD. A Total of 20,100 Pairs Were Tested in Each Region

More »

Table 2 Expand

Figure 4.

Analysis of 1.7 Million SNPs Typed on the HapMap Han Chinese and Japanese populations (Available from the HapMap Database)

(A) Projection of all 90 Han Chinese and Japanese individuals on the top two principal components using PCA on all available SNPs

(B) k-Means clustering on panel (A).

(C) Average correlation coefficient between true and predicted membership of an individual to the Japanese of Han Chinese populations, using PCA and k-means clustering on all available SNPs and sets of 50 to 1,000 PCA-correlated, high-In or random SNPs (random selection was repeated 30 times). The dotted line represents a decline in the performance of high-In SNPs due to the detection of a very large number of significant principal components; see Results for details.

More »

Figure 4 Expand

Figure 5.

Analysis of Nine Indigenous Populations Typed for 9,160 SNPs

(A) Projection of all individuals of nine indigenous populations on the top three principal components using PCA on all available SNPs. (Ten significant principal components were actually detected.)

(B) Average correlation coefficient between true and predicted membership of the individuals to the nine populations, using PCA and k-means clustering on all available SNPs and sets of ten to 400 PCA-correlated, high-In or random SNPs (random selection was repeated 30 times).

More »

Figure 5 Expand

Table 3.

Incremental Analysis of Nine Populations and Effect on the Selection of PCA-Correlated SNPs

More »

Table 3 Expand

Figure 6.

Applying PCA-Correlated SNPs for Structure and Ancestry Prediction of the Admixed Puerto-Rican Population

(A) PCA on 7,259 SNPs typed on Puerto-Rican dataset A, as well as Europeans (Spanish and Caucasians), West Africans (Burunge), and Native Americans (Nahua and Quechua) (axes of variation are shown).

(B) Projection of 192 individuals from Puerto Rican dataset A on two significant principal components and variation across the European-West African axis.

(C) Comparison of ancestry coefficient of 192 Puerto Ricans across the West African-European axis and predicted ancestry coefficient using the top 200 PCA-correlated SNPs.

(D) Prediction of West African-European ancestry coefficient in Puerto Rican dataset A using PCA-correlated SNPs versus random SNPs.

(E) Using PCA-correlated SNPs selected as structure informative in Puerto Rican dataset A for ancestry prediction in Puerto Rican dataset B.

More »

Figure 6 Expand