Fig 1.
Genomic region surrounding three ST8SIA2 promoter SNPs.
The 1.3-kb region used for the promoter assay and the 10-kb region in D63 are depicted in the two upper panels. The 54-kb and 18-kb regions in D1000 (double-headed black arrows) are also depicted in the bottom panel, together with estimated mean recombination rates (cM/Mb). The 18-kb region is defined as the region between two recombination hotspots flanking the three promoter SNPs. A # mark on the X axis represents the location of a recombination hotspot.
Fig 2.
Global distribution of the ST8SIA2 promoter types.
(A) Large pie charts showing the proportion of promoter types in a population in D1000. African Caribbeans in Barbados (ACB) and Americans of African Ancestry in SW USA (ASW) are not shown because of lack of information about their homelands in Africa. Small pie charts represent individuals from 63 human samples (see S1 Table in details). (B) ADMIXTURE profile of the 18-kb region with K = 9 as the number of postulated ancestral populations [13] (see S4 Fig).
Fig 3.
Site frequency spectrum and barcode representation.
(A) Relative site frequency spectrum (rSFS) that is defined as the ratio of observed-to-expected proportion of SFS in the 54-kb region. The rSFS is calculated from D1000 and the expectation under the standard neutral model of constant size. (B) Barcode representation of SNPs in the 54-kb region of EAS. The red dotted line shows the location of the three promoter SNPs. A red bar and a gray bar at an individual SNP site indicate the number of derived alleles that are linked to the CGC-type and the nonCGC-type, respectively. The unshaded area corresponds to the 18-kb core LD region. (C) Simulated distributions of F<c7 under the standard model of constant size (1,237 replications in gray) and the demographic model of changing population size (1,259 replications in magenta; [17]). The red vertical line represents the observed F<c7 in EAS.
Fig 4.
Relative extended haplotype homozygosity (REHH) of the CGC-type in EAS, SAS and AMR.
REHH values are plotted against core allele frequencies. (A–C) Observed REHH values (magenta dots) and empirical distributions in chromosome 15 for EAS, SAS, and AMR. For a given frequency at a core SNP, red lines indicate the 95th percentile. Each inset depicts the empirical distribution of standardized ln(REHH) for genome-wide SNPs with allele frequencies comparable to the CGC-type (chromosomes 3–5 and 7–22). The magenta lines indicate the observed values. (D) The observed REHH value for EAS (magenta dot) together with the simulated distribution under the demographic model of changing population size [17] (10,000 replications). The inset shows the standardized ln(REHH) distribution for SNPs with derived allele frequencies comparable to the CGC-type. Simulation is based on 1,200 replications (the observation is indicated by the magenta line). (E) Observed REHH values for EAS, SAS, and AMR (magenta dots) together with the distributions simulated by ms (under the standard neutral model with 10,000 replications). (F) Decay of EHH from SNP3 (at 0 location) in EAS.
Fig 5.
Ancestral recombination graph in the human ST8SIA2 promoter region.
Ancestral recombination events in the 10-kb core LD region are inferred by comparing tree topologies between two neighboring blocks 1 and 2 (S6 Fig and S7 Table). The gene trees for blocks 1 and 2 are drawn simultaneously with required recombination events (orange lines). Dots on branches represent SNPs in block 2, of which five SNPs shown by red dots are present in D63 but absent in D1000. The number of each haplotype of block 2 is summarized with their geographic distribution in D63 and D1000.
Fig 6.
Promoter activities of the human, chimpanzee and gorilla ST8SIA2 promoter types measured by luciferase expression.
Each value represents mean ± standard error of the mean over six independent transfection experiments. Data are represented as relative fold-increase compared with the human CGC-type. Chimpanzee (Patr) and gorilla (Gogo) possess the TGT-type promoter.