Potential Signals of Natural Selection in the Top Risk Loci for Coronary Artery Disease: 9p21 and 10q11

Background Coronary artery disease (CAD) is a complex disease and the leading cause of death in the world. Populations of different ancestry do not always share the same risk markers. Natural selective processes may be the cause of some of the population differences detected for specific risk mutations. Objective In this study, 384 single nucleotide polymorphisms (SNPs) located in four genomic regions associated with CAD (1p13, 1q41, 9p21 and 10q11) are analysed in a set of 19 populations from Europe, Middle East and North Africa and also in Asian and African samples from the 1000 Genomes Project. The aim of this survey is to explore for the first time whether the genetic variability in these genomic regions is better explained by demography or by natural selection. Results The results indicate significant differences in the structure of genetic variation and in the LD patterns among populations that probably explain the population disparities found in markers of susceptibility to CAD. Conclusions The results are consistent with potential signature of positive selection in the 9p21 region and of balancing selection in the 9p21 and 10q11. Specifically, in Europe three CAD risk markers in the 9p21 region (rs9632884, rs1537371 and rs1333042) show consistent signals of positive selection. The results of this study are consistent with a potential selective role of CAD in the configuration of genetic diversity in current human populations.


Introduction
Coronary artery disease (CAD) is a complex disease and the main cause of death in the world. In 2007, the first genome-wide significant risk region, 9p21, was simultaneously discovered by two independent groups [1,2] and was subsequently confirmed by multiple investigators around the world. Currently, there are a total of 50 variants predisposing to CAD of genomewide significance confirmed in independent populations [3]. Out of these 50 risk loci, 15 are associated with conventional risk factors for CAD: seven with low-density lipoprotein-cholesterol (LDL-C); one with high-density lipoprotein (HDL); two with triglycerides; four with hypertension, and one with coronary thrombosis. The remaining 35 variants operate through mechanisms yet to be determined [3].
Most genome wide association studies (GWAS) for CAD have been performed on populations of European descent, but increasing numbers of such investigations are now performed also on populations of Asian or African ancestry. In the literature currently available, some of the genetic effects observed are shared among ethnic groups but many times genetic effects are different across populations of different ancestry. The latter is the case of the 9p21 locus, which has been documented in Europeans and also in other ethnic groups including East and South Asians, but not among African Americans [4]. Demography and selection are the most important evolutionary forces that shape the population-specific patterns of mutations and linkage disequilibrium (LD) identified across populations of different ancestry. Demographic phenomena such as population expansions, subdivisions and bottleneck events could underlie differences among populations in a uniform way across genes. On the other hand, there are evolutionary processes with locus-specific effects. The genetic basis of common complex diseases may have partially been shaped by positive selection events, which simultaneously increased fitness and susceptibility to the disease. An additional difficulty to detect the role of selection is that methods to detect it have historically been challenged by the confounding effects of demography [5]. In any case, examples of positive selection have been identified in complex diseases such as type 1 diabetes, rheumatoid arthritis, or Crohn's disease [6].
Concerning CAD risk regions, in the current literature there are discrepancies regarding the role of natural selection. A recent survey proposed natural selection as a possible explanation for the observed differences between Africa, East Asia, America, and Oceania in risk allele frequencies (RAFs) for eight CAD risk single nucleotide polymorphisms (SNPs) [7]. Another study reported evidence for selection at several SNPs identified through GWAS on sets of genes implicated in cardiovascular diseases [8]. In addition, Soranzo et al. identified one haplotype in a region of long-range LD (12q24) that contains disease loci for CAD, hypertension and type I diabetes, and that recently spread by positive selection in Europeans [9]. On the contrary, a recent survey affirmed that genetic differences for CAD among world-wide populations are due to random and demographic processes [6]. However, the conclusions of this last paper were not demonstrated through the use of different methodological approaches. Indeed, the authors used two tests (iHS and LRH) exclusively based on the same class of analytical method (LD decay) to search for signals of selection in CAD risk loci. Despite all the studies performed so far, the genetic architecture of CAD is not completely understood yet. In CAD, as in any complex trait, risk variants at many different loci may contribute to the phenotype each with a small effect. By combining evidence from GWAS with evidence from selection scans, it may be possible to separate true causative regions from the background noise inherent in GWAS [8]. The role that natural selection plays in CAD population differentiation may be a good tool to improve the knowledge of this complex trait.
Population differences in risk loci associated to CAD were recently detected for the 1p13, 1q41, 9p13, and 10q11 regions. These genomic regions showed disparities in the specific risk markers associated with CAD between European and North African samples [10].
The present work is centred on these four genetic regions (1p13, 1q41, 9p21 and 10q11) to explore whether the population genetic diversity is better explained by demographic or by natural selection effects.
To do this, we have analysed these risk regions in a set of North European and Mediterranean populations. Several specific tests to detect positive and/or balancing selection were performed. In addition, these analyses were expanded to populations from Asia and Africa looking for signatures of selection shared across continents or belonging to a specific population group.

Ethics statement
The study has been specifically approved by the Ethical Committee of the University of Barcelona (Institutional Review Board: IRB00003099) and all the participants provided a written informed consent.

Sample description
Nineteen population samples from Europe, Middle East and North Africa were genetically tested. DNA samples of 868 healthy unrelated individuals of both sexes having their relatives for at least three generations born in the same geographical region were analyzed.
Population details of all samples, including sample size, geographic origin and coordinates are recorded in S1 Table. The geographic distribution of the analyzed samples is displayed in Fig 1. Europe is represented by samples from Poland, Spain (4 populations), France (2 populations), Italy (2 populations), Bosnia-Herzegovina, Greece, and Turkey. The Middle East is represented by two populations from Jordan and the North African area is represented by samples from Tunisia, Morocco (2 populations), Algeria and Libya. Genetic data from European (CEU, GBR, FIN, and TSI), Sub-Saharan African (LWK and YRI) and Asian (JPT and CHB) samples from the 1000 Genomes Project [11] were also included in the analyses (S1 Table).

Polymorphisms and genotyping
Genomic DNA was extracted from blood cells using a Blood Midi kit (Omega Biotek, USA) according to manufacturer's procedures. DNA samples were genotyped for a combined set of 384 SNPs using a Custom GoldenGate Panel (Illumina Inc., San Diego, CA). These polymorphisms were located in four loci previously associated and replicated in independent studies with CAD [12,3,13,14,15], specifically the 1p13 (61SNPs), 1q41 (38 SNPs), 9p21 (159 SNPs), and 10q11 (126 SNPs) chromosomal regions.
SNPs were selected as a representative set of the common variation in the four genomic regions, according to the following criteria: i) average coverage of 1 SNP every 1.5 kb, ii) minor allele frequency (MAF) higher than 0.05 in CEU and TSI HapMap populations, iii) given priority to markers not in linkage disequilibrium (LD) (r 2 <0.8) in European populations, and iv) prioritizing markers previously associated with CAD [4]. These criteria were applied giving preference to tag SNPs. Genomic location of genetic variants is shown in S2 Table. In addition, all the variation concerning these four genomic regions reported in the 1000 Genomes Project was used to perform three specific selection analyses: Tajima's D, Long Range Haplotype (LRH) test and F ST tests.

Data Cleaning and Quality Control
Genotyping rate per SNP and individual, cryptic relatedness, and LD pruning (r 2 0.8) were assessed using PLINK version 1.07 [16]. Individuals with more than 5% of missing genotypes were eliminated. Non-polymorphic SNPs or SNPs with genotyping rate lower than 0.95 were also removed from the analyses.
Population allele frequencies and heterozygosity calculations were performed using Pop-Genkit R software package [17]. Fitting to Hardy-Weinberg equilibrium in each population was calculated by means of Arlequin v3.5 software [18].

Population Structure Analyses
Genetic structure was assessed by molecular variance (AMOVA) using Wright's F-statistics in the European and Mediterranean area and, in a broader context, in the European, African and Asian continents. Populations were clustered according to geographic criteria. Middle Eastern samples were clustered into the South European group due to affinities in their genetic distances. The samples were clustered in: 1) North Europe, South Europe, and North Africa, and 2) North Europe, South Europe, North Africa, Sub-Saharan Africa and Asia. AMOVAs were carried out using Arlequin v3.5 [18].
Genetic relationships among populations were assessed by Reynolds genetic distance and by a principal component (PC) analysis calculated through the Adegenet [19] and FactoMineR R packages [20].
The LD blocks based on the LD measure D 0 confidence interval [21] were performed in the four genomic regions using Haplot software [22]. These analyses were performed twice: first, considering all markers in all samples and, second, taking into account only the chromosomal fragments where markers previously associated in Europeans [21,13,22], Africans [10,23,24]  and Asians [25] are located. Permutation procedures to obtain a Monte Carlo statistical significance were used for each chromosomal region in order to evaluate population differences in LD structure through the VarLD software version 1.0 [26]. This analysis was carried out using 150 randomly selected individuals for each different population group (North Europe, South Europe, North Africa, Sub-Saharan Africa, and Asia) to avoid biases due to differences in sample size. The LD statistics (D' and r 2 ) for each pair of risk SNPs in each population analyzed were calculated through the Haploview software version 4.2 [27].

Selection Analyses
Detection of potential loci under selection was assessed using different methods. The first method used is based on the probability of observing locus by locus AMOVA statistics as a function of heterozygosity, given a null distribution generated under a hierarchically-structured island model of population differentiation [28]. This test detects loci under selection from genome scans that contrast patterns of genetic diversity within and between populations. A total of 20000 coalescent simulations in 50 groups, with 100 simulated demes per group were performed. The observed locus-specific measures of population differentiation (F ST ) were compared to a null distribution obtained by simulation samples. The P-value of each locus was estimated from the joint distribution of heterozygosity and F ST using a kernel density estimation procedure. The null distribution generated was summarized by quantiles of the joint distribution. The 1% and 99% quantiles of the distribution correspond to markers potentially under balancing or directional selection, respectively, at the 1% level, without multiple-test correction and assuming one-tailed test. This method was performed using Arlequin v3.5 software [18].
Moreover, the spatial ancestry (SPA) analysis was used to detect positive signatures of selection [29]. In this analysis, SNP data were used to model allele frequency distributions in a geographic space. Applying the SPA approach, polymorphisms showing steep geographic gradients in allele frequencies can be identified through SPA scores reflecting the steepness of the geographic gradient. Large SPA scores are indicators of potential selection. In this study we focused on SPA scores above the 99th percentile, based on the 367 SNPs analysed. In order to detect continental adaptation events, analyses were performed only in Europe and North Africa. Asian and Sub-Saharan African samples were not tested for the SPA scores due to the low number of available populations for these geographic regions.
Spatial distribution of mean MAF for one of the three CAD SNPs potentially positively selected across populations was mapped using the geostatistical method known as kriging from the ArcGIS software (ESRI, Redlands, CA, USA). Since anisotropy was not detected in the semivariogram, we used the ordinary spherical interpolation kriging method [30]. Finally, the Tajima's D [31], the extended haplotype homozygosity (EHH) and the cross-continental F ST tests were conducted in three populations of the larger dataset of the 1000 Genomes Project (CEU, CHB, and YRI) through the web tool 1000 Genome Selection Browser 1.0 [32]. The Tajima's D test was performed using windows of 3-kb to search signatures of balancing selection. The EHH test was applied to validate loci under positive selection found in previous tests, using a larger chromosome region of 20-Mb. The significance of the three tests was assessed by the rank score tracks, which provided a comparison to the rest of the genome, calculated by means of the web tool 1000 Genome Selection Browser 1.0 [32]. They were calculated by empirical comparisons sorting all the scores genome-wide and determining the-log10 of the rank divided by the number of values in the distribution, taking the upper tail for the F ST and the EHH tests, and the lower tail for the Tajima's D test. The p-values were assigned on the basis of the rank score tracks. In this study, Tajima's D, EHH average and F ST scores above the 95th percentile of the top extreme genome-wide distribution were considered as significant. All p-values were corrected for multiple comparisons applying the false discovery rate (FDR) method [33] by means of the stats R package [17].

Genotyping and quality control
After pruning, the average coverage was 1 SNP every 1.8 Kb. Genotyping rate for the 384 SNPs initially tested was 95.8%. Sixteen SNPs were not successfully genotyped and were removed from the study. One SNP showed a significant departure from Hardy-Weinberg equilibrium after Bonferroni correction. Thus, a total of 367 markers were included in the analyses after quality control: 59 SNPs in 1p13 (S1 Fig Moreover, twenty individuals were removed for low genotyping rate (>5% of missing genotypes); consequently, 848 population samples were analysed. Individual genotypes are included in S1 Dataset. For PC and for Reynolds genetic distance analyses, a total of 176 SNPs were removed due to high LD (r 2 >0.8) hence, only 190 SNPs were included in population relationship analyses.

Levels of diversity and population relationships
The SNP minor allele frequencies (MAF) in population samples and global heterozygosities per marker and population are presented in S2 Table. Mean heterozygosities per SNP were moderate-high, ranging from 0.038 to 0.495. Per population, heterozygosities ranged from 0.259 to 0.345.
Hierarchical AMOVA estimates when populations were clustered in North Europe, South Europe, North Africa, Sub-Saharan Africa and Asia were F ST = 0.085 and F CT = 0.080 (both p<0.0001). When Sub-Sahara African and Asian data were removed and the remaining populations were grouped in North Europe, South Europe and North Africa, AMOVA estimates were F ST = 0.017 and F CT = 0.012 (both p<0.0001). These values indicate a statistically significant geographic structure of genetic variation in the three continents, and also across Europe and North Africa.
Population genetic distances are indicated in S3 Table. The highest average genetic distance was among North African and European samples (0.170±0.031). The genetic distance between Middle East and North Africa was higher (0.156±0.032) with respect to Europe (0.141±0.023).
Concerning population relationships, in the PC analysis (Fig 2) of the European, North African and Middle East populations, the two first axes accounted for 38.31% of the total genetic variance. In the first axis (24.99% of the total variance) North African samples appeared clearly separated from the European ones. The second component underlined the separation of the Basque Country on one side, and the Middle Eastern samples on the other.

LD analyses
Different LD blocks were identified among populations. At first glance, Asian and Sub-Saharan African samples showed different and characteristic LD patterns when compared with other continental populations (S5-S8 Figs). Significant differences in LD structure for the 4 genomic regions were observed in pairwise comparisons between Europe, Africa, and Asia (Table 1). A more detailed analysis within continents indicated that only the 9p21 region showed significantly different LD patterns in all subcontinental groups. Between North and South Europe the LD blocks were not significantly different in the regions 1p13, 1q41 and 10q11.
Focusing only on SNPs previously associated with CAD, LD statistics (D' and r 2 ) for each pair of risk SNPs and for each analysed population are provided in S4 Table. Risk SNPs in the same haplotype block (red colour in S4 Table) showed a high degree of variation in LD (0.35 D'1) in the populations analysed. The haplotype block information for these markers is provided in S5-S7 Tables for the 1p13, 9p21 and 10q11 regions, respectively.
The most remarkable differences in LD block structure were present in the 9p21 (Fig 3) and 10q11 regions (Fig 4). It is interesting to note the different positions shown by CAD associated markers in Europe and North Africa. In the 9p21 region, associated markers in Europe (blue) and in Caucasians together with African Americans or Chinese (black) laid at the beginning of the region, whereas markers associated in North Africa (red) were located at the end (Fig 3). In this region, the risk SNPs rs9632884, rs6475606, rs4977574, rs1537371, rs1333042, and   Table). On the other hand, in the Sub-Saharan African, and Asian samples LD values ranged from 0.091 D'1 (grey in S4 Table). These SNPs were located in different haplotype blocks except in Tuscany and Andalusia (S6 Table).
In the 10q11 genomic region, the three risk SNPs analysed (Fig 4) presented LD values in the range of 0.772 D'1 in all the populations studied except in Sub-Saharan African and Asian samples (0.035 D'1) (S4 Table). These three risk markers were located in the same haplotype block in all the populations analysed except in Yoruba samples (S7 Table). Regarding the 1p13 region (S9 Fig), the two risk markers, rs599839 and rs646776, presented high LD values in all the populations studied (D'>0.888) (S4 Table). These two risk factors were located in the same haplotype block except in African populations (S5 Table).
The 1q41 genomic region showed previous associations with CAD only in samples of European origin and not in North Africans [10]. The most representative associated SNP in this region was the rs17465637 [3] (S10 Fig).

Detection of potential signals of selection
The results of the locus by locus F ST statistics and their significance as potential indicators of selection are presented in S8 Table. In the European/Mediterranean area, 16 loci were  The potential positively selected markers found in the above analyses (F ST vs heterozygosity) were a total of 14 and were located in regions 1p13, 1q41, and 9p21. Out of these 14 SNPs, seven were found only in the context of the three continents, six only in Europe and North Africa, and one in both groups (rs2811717 in the 9p21 region) (Fig 5). Fifteen markers located in chromosomes 9 and 10 were potentially under balancing selection: six in the context of the three continents, eight in Europe and North Africa and one in common (rs17155733 in the 10q11 region). The average allele frequency of these markers in the analysed populations is of 0.50±0.35 (S2 Table). Consequently the possible influence of negative selection can be discarded.
In the SPA analysis, three SNPs (rs9632884, rs1537371, and rs1333042) showed SPA scores above the 99th percentile in each one of the sample group: North Africa (SPA value in the 99th percentile: 0.87) and Europe (1.95) (S8 Table). None of these three North African markers showed association with CAD in previous studies or was identified as being under potential selective pressure in the previous tests. On the contrary, all three markers with the highest SPA score in Europe were among the SNPs that showed signs of positive selection in Europe and North Africa in the F ST /H O tests. Also, these SNPs have been associated with CAD in previous studies [34,35,36] and are located in the 9p21 genomic region: the CAD risk locus most studied [37]. These three risk markers are in strong LD in most populations analysed (S4 Table). A geographical distribution of the MAF of one of these three SNPs, the rs1333042, in the populations studied is represented in a smoothed spherical contour map in   Table). After applying correction for multiple testing, seven markers with extremely high F ST values in the 9p21 region maintained their significance (red in S9 Table). These markers were located in an 11-kb region (chr9: 21932366-21933125) and are not included in the 367 SNPs analysed in the 26 populations. Comparing the F ST estimated in the 26 populations analysed with the 1000 Genomes Browser Data, it is noteworthy the common signals detected for the rs2811717 in the 9p21 region (S8 Table). For this SNP the differences in cross-continental F ST values detected in the 26 population studied compared to the 1000 Genomes Browser data (0.29 vs. 0.65, respectively) are likely due to the different sets of populations included in each group (S8 Table). Regarding the region 1p13, F ST values of the three common markers detected in both analyses (rs602633, rs583104, and rs599839) are very different comparing the 1000 Genomes data for CEU, CHB and YRI (0.57-0.59) with the 26 populations analysed (0.27-0.29). Also in this case, this is likely due to the different populations included in the analysis, mainly in the African continent (YRI alone versus 5 North African and 2 Sub-Saharan African populations together). The EHH average test, used to validate the results found in the 9p21 region, showed potential signals of positive selection (p0.05) in the three populations analysed (grey in S10 Table). It is noteworthy the fact that the CAD risk marker rs1333042, which showed potential signals of positive selection in the previous tests performed (F ST vs H O. and SPA score), is located in one of the 3-kb regions that show significant EHH values (region highlighted in grey in S10 Table). However, no EHH average showed significant signals of positive selection after correcting for multiple comparisons. Finally, concerning potential signatures of balancing selection, the Tajima's D analysis pointed out ten, five, and fourteen windows with Tajima's D values above the 95th percentile (p0.05) of genome-wide distributions in CEU (9p21), CHB (10q11) and YRI (10q11), respectively (S8 Table). Two regions were found to overlap, a first between Asia and Sub-Saharan Africa in 10q11, for a 15 kb extension and a second one across European samples in 9p21 for a 24 kb (S8 Table). Even though, after applying corrections for multiple comparisons no windows showed significant signals of balancing selection.
The most relevant findings of positive selection regarding the 1p13, 1q41, and 9p21 genomic regions, and of balancing selection of the regions 9p21 and 10q11 are summarized in Tables 2  and 3.

Discussion
This study explores for the first time the genetic diversity in four genomic regions associated with CAD (1p13, 1q41, 9p21, and 10q11) through the genotyping of a set of European and Mediterranean populations and using European, Asian and Sub-Saharan African data from the 1000 Genomes Project [11]. The main aim was to evaluate the relative importance of demography and natural selection as a cause of population variability in some CAD risk regions identified in previous GWAS.
Diversity population indices indicate significant genetic structure in the context of the three continents (F ST = 0.085; p<0.0001) and also in Europe and in the Mediterranean area (F ST = 0.017; p<0.0001). The degree of genetic diversity observed in these four regions between continents is comparable to that obtained from genome-wide genotyping and sequencing studies [11,38]. Moreover, in the European and Mediterranean context, these  Natural Selection and Coronary Artery Disease regions of epidemiological importance exhibit the same genetic variation pattern described using sets of neutral markers (e.g. Alu insertion polymorphisms) and genome-wide arrays in previous population studies [39,40,41,42]. Regarding the LD haplotype blocks, this survey pointed to high levels of variation in LD and sometimes to a different chromosomal location of CAD risk markers in populations of different ancestry. The significant differences in LD structure found across European, North African, Asian, and South African samples likely explain population differences in markers associated to CAD across populations. These LD differences can be explained by the previously observed trend of lower haplotype diversity and higher LD values when the geographic distance from Africa increases, [43], in agreement with the Recent-African-Origin hypothesis [44].
Even though the general patterns of genetic variation are in agreement with the genetic structure generated by demographic processes, we also observed potential signals of positive and balancing selection in some specific markers and haplotypes. Seven SNPs under potential positive selection showed association to diseases in previous studies [10,24,34,35,36,45,46,47], whereas none of the markers under balancing selection was identified as a risk factor in previous GWAS. Regarding the seven risk markers under positive selection, three of them are located in the 1p13 region and four in 9p21. In the 1p13 region, two of the three markers (rs602633 and rs583104) were associated with LDL-C [45,46]. The other positively selected SNP, rs599839, was associated with CAD in Caucasian populations [24]. These three markers were detected in two of the selection tests performed (F ST vs H O with 367 markers and F ST with the 1000 Genomes Selection Browser) in the context of three continents. The F ST value detected in the 1000 Genomes Data is in the same order of magnitude than recent positive selection signals (F ST >0.5) detected in previous studies performed on the lactase gene [48] and on genome-wide scans [49].
The 9p21 genomic region, the most complex and studied CAD risk region [3], showed consistent signals of positive selection. Indeed positive signatures of selection were identified in each one of the four tests performed (F ST vs H O , SPA score, F ST and LRH test). The consistency for selection signals in the 9p21 genomic region lies in the fact that these different tests are all based on the same aspect of the data, i.e. population differentiation. Briefly, four variants showed evidence of positive selection: rs9632884, rs6475606, rs1537371 and rs1333042. These SNPs were previously associated with CAD [34,35,36,47] and showed slightly different patterns of LD blocks across populations.
Thanks to the high density of populations available for Europe and North Africa, we were able to do a spatial analysis of selection (SPA test) in these geographic areas. Thus, the positive signals of selection observed in the F ST vs H O tests could be mainly attributed to Europeans. In fact, the European SPA scores are higher than those detected in North Africans. Additionally, the confirmation of three markers in the 9p21 genomic region (rs9632884, rs1537371, and rs1333042) in two tests (F ST vs H O and SPA score in Europe) is a good indicator of consistency across our results. As an independent replication using the 1000 Genomes dataset, the region containing one of these 3 markers (rs1333042) has shown a significant EHH value (before correction for multiple testing) in the LRH test. Moreover, this SNP, used as representative of the other two risk markers, shows a South to North gradient of increase in MAF. The gradient observed is in agreement with CAD incidence [50] and with previous genetic studies based on the apolipoprotein E4 [51], or the genetic risk score of nitric oxide synthases [52], both associated with susceptibility to CAD. The South to North frequency gradient, in accordance with CAD incidence rates, may be correlated with a potential selective role of CAD in the configuration of genetic diversity in current human populations. On the contrary, another study showed a geographical pattern opposite and uncorrelated with the disease incidence for genetics variants correlated with CAD [53]. The observed North to South cline in frequency detected in that work was likely due to the spatial distribution of the whole genome variation present in the European continent, mainly shaped by demography. The potential positive selection signals of the 9p21 region were also detected using the 1000 Genomes data. In fact, seven markers of this region passed the multiple testing corrections in the F ST tests showing F ST values >0.78. In addition is noteworthy the potential positive signal detected for the rs2811717 in the 26 population studied and also in the 1000 Genomes Browser data. The F ST value of this SNP in Europe (0.094) is substantially higher than 0.028, value used to identify the most prominent genomewide candidate regions in a recent selection scan in Europe [54].
Regarding signatures of balancing selection, only the 9p21 and 10q11 regions showed evidence of balancing selections in the Tajima's D and in the F ST vs H O tests. Although these regions did not pass the corrections for multiple testing, it is noteworthy that regions of 15 kb (encompassing five contiguous windows in 10q11) in Asian and African samples and 24 kb (corresponding to eight contiguous windows in 9p21) in European populations showed evidence of selection. The overlap in the signals observed for Asians and Africans in the 10q11 region suggests that this locus is likely subjected to selective processes in both populations, in agreement with Wang et al. [55], who estimated that 78% of selective events are shared by two or more populations. Regarding the 9p21 CAD risk region, the regions showing positive and balancing selection are physically separated by a minimum of 11kb (the last positive selected marker in the region: rs1333042 versus the first markers under balancing selection: rs2383206). The two different natural selection signals seem to have population specificity arguing for different evolutionary processes behind the positive and balancing selection in the same genomic region. While the plausible balancing selection signals found are acting mainly on European populations (Tajima's D test), the signals of positive selections in the region 9p21 involve the three continents (F ST and EHH tests).
The heterogeneity observed in the potential signals of natural selection in the different populations analyzed might be correlated with environmental variables. A recent whole genome scan reported evidence for selection on two markers implicated in cardiovascular disease by identifying the SNPs with the strongest correlation between allele frequencies and climate, specifically with winter solar radiation and summer precipitation rate in African and Western Eurasian populations [56]. Another study enumerated several genes involved in the causal pathways of atherosclerosis, that may be subject to various degrees of selective pressures resulting from climatic and dietary changes and host response to pathogens [57]. These pathways may influence the genetic susceptibility to CAD and the heterogeneity observed in the signals of selection among the populations analyzed.
Regarding CAD risk loci, the selection of harmful mutations may be due to the fact that they are in LD with a relatively strong still unknown beneficial polymorphism [6] that could be related to CAD. On the other hand, the positively selected loci could be related to other risk traits since many GWAS SNPs are associated with more than one trait [58]. In the case of the 9p21 locus, several genetic variants had been associated with multiple cancers [59], glaucoma [60], intracranial and abdominal aortic aneurysms [61], vascular dementia and late onset Alzheimer's disease [62]. The response of a variant to selection is dependent also on the genetic background. Due to pleiotropy, the existence of selection at a specific locus may be related to different risk traits located in the same genomic region [58].
Although many selection scans have highlighted potentially interesting signals using genome-wide SNP data, it is currently difficult to assess how much confidence should be placed into them in the absence of clear signals of widespread, strong selection or of biological/functional information [63]. Recent data suggests that it is unusual for selection factors to drive new mutations rapidly to fixation in a specific population (the "hard sweep" model). A number of possible theories have been proposed to explain these potential genome-wide selective signals. Among others, most selection on individual alleles may be relatively weak so that alleles have not had time to sweep to fixation within continental populations. Moreover, the strength of selection may vary temporally, and it may be rare for selection to be consistently strong for the 10,000 years or more required to drive an allele close to fixation. Finally, much of human adaptation may proceed by either polygenic adaptation or soft sweeps that can be difficult to detect using standard methods [64]. This latter explanation (polygenic adaptation and/or soft sweeps) may be especially relevant for complex traits, such as cardiovascular diseases, in which several loci contribute to generate the disease. Current GWAS identified several common variants associated to CAD but at the moment there is an open debate about causal variants and the role of rare variants [65,66]. Consequently, the plausible signals of selection here found are expected to be weak and difficult to be clearly identified. In this way, different novel method to detect selection are needed to address potential confounding effects caused by population history and structure and to establish definitive evidence of selection, mechanism of selection, and functional effects of the allelic variants under selection in complex traits.
One possible limitation of this survey is related to the ascertainment bias in the selection of marker. The fact that these results were obtained by genotyping SNPs selected for specific criteria (MAF higher than 0.05 and giving priority to markers not in LD in European populations), and not through direct sequencing, could affect the found patterns of allele frequency, LD and population differentiation [5]. It has been reported that the ascertainment bias introduced by many methods of SNP discovery may have a large effect on the estimation of LD and recombination, influencing the decay of LD with distance [67]. Several studies reported significant LD over distances longer than those predicted by standard models, whereas some data from short, intergenic regions showed less LD than would be expected [68]. Indeed, loci containing high frequency alleles tend to have deeper than average genealogies, providing more opportunity for recombination, and thereby less LD. Regions in which many sequences are used for ascertainment show more SNPs at low frequencies and a higher number of haplotypes are represented in the data. In order to improve this bias, all the variation present in the 1000 Genomes Project [11] for these genomic regions was used to perform the F ST , the LRH and the Tajima's D test. The correspondence in the results across these two datasets with different marker coverage (367 SNPs from our genotyping dataset and all the genetic variation from the 1000 Genomes Project) allow us to consider the 9p21 genomic region as a potential candidate for positive and balancing selection, and the 10q11 for balancing selection.
In the absence of additional functional information, our results are compatible with a potential selective role of CAD in shaping the genetic diversity observed in current human populations, but demographic processes cannot be discarded.  Table. Geographic origin, population codification, sample size and geographic coordinates in decimal degrees for the population samples. (XLSX) S2 Table. Allele frequencies and heterozygozities (mean and standard deviation (SD)) per marker and per population. (XLSX) S3 Table. Reynolds's genetic distances estimated among North European and Mediterranean populations based on the SNPs located in the 1p13, 1q41, 9p21, and 10q11 CAD risk regions. (XLSX) S4 Table. LD statistics (D' and r 2 ) for each pair of risk SNPs and for each population analysed. Risk SNPs in the same haplotype are highlighted in red. Risk SNPs in the same haplotype block in LD (D' = 1) are highlighted in grey.  Table. Detection of selection results for the four genomic regions analysed (1p13, 1q41, 9p21 and 10q11). Observed (Obs) F ST with selection significance (highlighted P value<0.01) for the three continents (Africa, Asia and Europe) and for Europe and North Africa. Global cross-continental F ST scores in the CEU, CHB, and YRI samples from the 1000 Genomes Selection Browser. Significant high and low FST values (extreme top 5% values of the distribution of F ST values across the genome) are highlighted in grey. P values were calculated on the basis of higher rank scores in the genomic distribution. SPA scores for continental European, and North African allele gradients (highlighted in grey SPA scores above 99th percentile). Tajima's D test results in the CEU, CHB, and YRI samples of the 1000 Genomes Project. Signals of potential balancing selection (P value0.05) are highlighted in grey. P-values were calculated on the basis of higher rank scores in the genomic distribution of Tajima's D values. (XLSX) S9 Table. Global cross-continental F ST scores in the CEU, CHB, and YRI samples of the 1000 Genomes Project. Significant high and low F ST values (above the top 5% extreme values of the distribution of F ST values across de genome) are highlighted in grey. P values were calculated on the basis of higher rank scores in the genomic distribution. Significant F ST values after applying correction for multiple testing are highlighted in red. (XLSX) S10 Table. EHH averages in the CEU, CHB, and YRI samples of the 1000 Genomes Project. Significant high EHH values (above the top 5% extreme values of the distribution of EHH values across de genome) are highlighted in grey. The region containing the CAD risk marker rs1333042 is highlighted in grey. P values were calculated on the basis of higher rank scores in the genomic distribution. (XLSX)