Insights into the Genetic Structure and Diversity of 38 South Asian Indians from Deep Whole-Genome Sequencing

doi:10.1371/journal.pgen.1004377

Figure 1.

Size distribution and novelty of variants in SSIP.

Autosomal variants identified in the 36 SSIP samples, which included single nucleotide polymorphisms (SNPs), small insertion/deletions (indels) between 2 bp to 50 bp, and large deletions between 51 bp to 1 Mb. The SSIP SNPs and indels are defined as novel if they are not present in SSMP and dbSNP137, whereas dbSNP132 was used for defining the novelty of the 1 KGP SNPs and indels. The novelty of large deletions in SSIP and 1 KGP is defined with respect to SSMP and DGV release 2013-07-23.

More »

Expand

Table 1.

Summary of variants discovered in SSIP.

More »

Expand

Figure 2.

Principal component analysis (PCA) of SSIP samples with 132 South Asians.

PCA of 36 SSIP samples with 132 South Asian samples from 25 well-defined Indian groups by Reich and colleagues [44] using 202,600 SNPs that were present in both databases (panel A). Five groups corresponding to Great Andamanese, Onge, Nyshi, Aonaga and Siddi were subsequently removed, leaving 104 samples from 20 Indian groups to be analyzed in a second PCA, where the samples were first assigned a color according to their group memberships (panel B), and second by the latitude of origin into North and South Indians (panel C, see Table S2 for the classification of North and South Indians). The color assignments in panels A and B are represented by the color legend on the bottom left of the figure.

More »

Expand

Figure 3.

Principal component analysis (PCA) of 1,224 samples from 16 global populations.

PCA of 1,224 samples from SSIP, SSMP and 14 populations from Phase 1 of the 1-coded by continents (panel A). An analysis of admixture was also performed on the 16 populations with ADMIXTURE, where the number of distinct populations (K) was allowed to vary between 2 and 8 (panel B). The black window highlights the position of the SSIP samples on the admixture plot.

More »

Expand

Figure 4.

Unique SNP sharing between populations.

(A) Each row represents the distribution of SNPs that are shared uniquely between a reference population (vertical axis) and a target population (horizontal), where the bars along the diagonal indicate the number of SNPs that are unique to the reference population. Here, unique sharing is defined as SNPs that are present only in the two respective populations but not others. (B) Distribution of SNPs in the reference population (horizontal) that are shared by only one other population, but here the target populations are grouped by continents into four broad categories of the Americas (AMR: CLM, MXL, PUR), Africans (AFR: ASW, LWK, YRI), Asian (ASN: CHB, CHS, JPT, SSMP, SSIP) and Europeans (EUR: CEU, FIN, GBR, IBS, TSI).

More »

Expand

Figure 5.

Assessing intra-population diversity between the samples.

The extent of SNP sharing between every pair of samples in a population can be measured with a distance measure D that is scaled between 0 and 1 (vertical axis), where a higher value indicates a greater extent of heterogeneity in SNP content (or a lower degree of SNP sharing) between two samples. All possible pairwise measurements of D in each population are represented in a boxplot, where the ends of the whiskers indicate the minimum and maximum distances between specific pairs of samples in that population, the edges of the box indicates the 1^st and 3^rd quartiles, and the horizontal line in the box represents the median pairwise distance. The groups are colored with respect to the four continents (Americas – maroon; Africans – red; Asians – green; Europeans – blue). Each label on the horizontal axis indicates the continent label, population label, number of samples and total number of sample pairs of the population.

More »

Expand

Table 2.

Mitochondria haplogroup assignment for the 36 SSIP samples.

More »

Expand

Table 3.

Chromosome Y haplogroup assignment for the 11 SSIP male samples.

More »

Expand

Table 4.

Analysis of admixture with ancient hominid genomes, anchored with one SSIP genome (SSI033 as G1 in Dstatistic) and the chimpanzee genome.

More »

Expand

Table 5.

Dstatistic analysis with ancient genomes for 5 randomly selected paired samples from each population from 1KGP and SSMP, anchored with a different SSIP sample (G1) and the chimpanzee genome in each of the 5 iterations.

More »

Expand