No Evidence for Strong Recent Positive Selection Favoring the 7 Repeat Allele of VNTR in the DRD4 Gene

The human dopamine receptor D4 (DRD4) gene contains a 48-bp variable number of tandem repeat (VNTR) in exon 3, encoding the third intracellular loop of this dopamine receptor. The DRD4 7R allele, which seems to have a single origin, is commonly observed in various human populations and the nucleotide diversity of the DRD4 7R haplotype at the DRD4 locus is reduced compared to the most common DRD4 4R haplotype. Based on these observations, previous studies have hypothesized that positive selection has acted on the DRD4 7R allele. However, the degrees of linkage disequilibrium (LD) of the DRD4 7R allele with single nucleotide polymorphisms (SNPs) outside the DRD4 locus have not been evaluated. In this study, to re-examine the possibility of recent positive selection favoring the DRD4 7R allele, we genotyped HapMap subjects for DRD4 VNTR, and conducted several neutrality tests including long range haplotype test and iHS test based on the extended haplotype homozygosity. Our results indicated that LD of the DRD4 7R allele was not extended compared to SNP alleles with the similar frequency. Thus, we conclude that the DRD4 7R allele has not been subjected to strong recent positive selection.


Introduction
The human dopamine receptor D4 (DRD4; MIM 126452) gene, located on chromosome 11p15.5, contains a 48-bp variable number of tandem repeat (VNTR) in exon 3, encoding the third intracellular loop of this dopamine receptor. Ten VNTR alleles with two (2R) to eleven (11R) repeats have been identified so far. Of these, the DRD4 4R and 7R alleles are commonly observed in human populations [1], while the DRD4 7R allele is very rare in East Asians [2]. Of particular interest, the receptor encoded by the DRD4 7R allele has functional properties different from those by the DRD4 2R and 4R alleles. The DRD4 7R protein, compared with DRD4 2R and DRD4 4R, has different binding affinity to clozapine and spiperone [1], and shows a blunted intracellular response to dopamine [3]. In addition, the repeat sequence of the DRD4 7R allele suppresses the expression level of DRD4 compared to the DRD4 2R and 4R alleles [4]. Furthermore, the DRD4 7R allele has been reported to be associated with behavioral and psychiatric phenotypes such as novelty seeking [5,6] and attentiondeficit hyperactivity disorder [7,8], even though there are conflicting results [9,10,11,12,13].
The analysis of the DNA sequences of the DRD4 VNTR alleles and their nucleotide variations suggested that the DRD4 7R allele originated from multiple mutational events [14,15]. Thus, the DRD4 7R allele observed in human populations is considered to have a single origin. Besides the increased frequency of the DRD4 7R allele, it has been reported that the DRD4 7R allele, unlike 4R, is in strong linkage disequilibrium (LD) with the other DRD4 polymorphisms spanning 6.3 kb [16]. Based on these unique features of the DRD4 7R allele compared to the most common DRD4 4R allele, it has been hypothesized that the DRD4 7R allele has been subjected to positive selection [14,16].
The extended LD has been regarded as a signature of recent positive selection [17,18,19]. However, the genomic region of 6.3 kb previously studied [16] seems to be too narrow to evaluate the possibility of recent positive selection favouring the DRD4 7R. After the original work [16], the international HapMap project [20,21] provided us a map of several million well-defined single nucleotide polymorphisms (SNPs) in the human genome, and the information about the SNP genotypes of subjects from African, European, and Asian populations are freely available in the public domain. The use of the HapMap data allows us to easily evaluate the degrees of LD of the DRD4 VNTR alleles, since no further genotyping for SNPs flanking the DRD4 locus is necessary. In this study, to re-examine whether the DRD4 7R allele has been the target of recent positive selection, we genotyped DNA samples from the HapMap subjects for the DRD4 VNTR, and conducted neutrality tests including the Long-Range Haplotype (LRH) test [18] and iHS test [22], which are based on the extended haplotype homozygosity (EHH).

Allele frequencies of the DRD4 VNTR
We detected seven VNTR alleles with different lengths (2R, 3R, 4R, 5R, 6R, 7R, 8R) in four HapMap populations [20,21]: YRI (Yoruba in Ibadan, Nigeria), CEU (CEPH Utah residents with ancestry from northern and western Europe), JPT (Japanese in Tokyo, Japan), and CHB (Han Chinese in Beijing, China) ( Table 1). The population frequencies of the DRD4 7R allele were relatively high in African (YRI) and European (CEU) populations, while very low in East Asian (JPT and CHB) populations as reported in the previous study [2]. In this study, the DRD4 VNTR alleles with the same repeat length but with the different VNTR sequences were not distinguished.
To investigate whether the observed distribution of allele frequencies at the DRD4 VNTR is deviated from the expectation under neutrality, we performed the Ewens-Watterson homozygosity test [23], where the sum of squared allele frequencies (F) was calculated for the observed data, and the observed F value was compared with the expected ones generated by simulating samples under neutrality. The P-value was calculated as a proportion of the simulated F value identical or smaller than the observed F in the 10,000 samples simulated, which allowed us to examine both the excess and the deficiency of F. The observed F values in the HapMap populations are slightly larger than the expected ones (Table 1). However, no population showed the significant difference from the neutral expectation in two-sided test (i.e., 0.025,P-value ,0.975).

Heterozygosity around the DRD4 locus
A recent positive selection alters patterns of genetic variation in the genomic region adjacent to the targeted gene. During fixation of an advantageous mutation, the variation in the surrounding region is expected to be eliminated due to the hitchhiking effect. In actual, a remarkable reduction in heterozygosity around the ABCC11 gene, which has been subjected to recent positive selection, is observed in East Asian populations [24]. To detect signature of recent positive selection, the degree of heterozygosity around the DRD4 locus was evaluated. Figure 1 shows the averaged heterozygosity across SNPs within 25 kb on either side of the focal SNP in a 1.8-Mb genomic region containing the DRD4 locus in three HapMap populations (i.e., YRI, CEU, and JPT+CHB). The degree of heterozygosity at SNPs close to the DRD4 VNTR was not lower than that of the surrounding region in any HapMap population, implying that any derived allele at DRD4 has not rapidly reached a high frequency (e.g., .0.75). In other words, the DRD4 locus has not experienced a strong and recent selective sweep.

Genetic differentiation between HapMap populations
Since a local positive selection results in high genetic differentiation between two populations when one population is under selection and the other is not, the high differentiation can be regarded as an evidence of local selection operating at the locus or genomic region. Considering higher population frequencies of the DRD4 7R allele in YRI and CEU than JPT and CHB (Table 1), local positive selection against the 7R allele may have acted in YRI and CEU populations. We examined the degree of differentiation, measured with F st , for YRI and CEU as candidate populations. The F st values of the DRD4 7R allele between YRI and JPT+CHB and between CEU and JPT+CHB were 0.15 and 0.07, respectively. We compared these values with those of SNPs on chromosome 11. The comparison revealed that 26220 and 30918 SNPs had F st values larger than the DRD4 7R allele between YRI and JPT+CHB and between CEU and JPT+CHB, respectively (the empirical P-values were 0.17 and 0.20 for YRI and CEU, respectively). The analysis of F st provided no evidence of local positive selection acting against the DRD4 7R allele.

Structure of LD around the DRD4 locus
LD plots based on |D'| were visualized for YRI and CEU ( Figure 2). Although the allele frequency of 7R was lower than 4R, we found that |D'| values from the DRD4 7R allele to SNPs located outside the DRD4 locus were similar to those from the DRD4 4R allele.

LRH and iHS tests for DRD4 VNTR 7R allele
We next calculated the EHH values [18] of the DRD4 7R and 4R alleles in YRI and CEU populations ( Figure 3). The 7Rbearing chromosomes have longer LD than 4R-bearing chromosomes in YRI, whereas such a tendency was not found in CEU, regardless of much lower frequency of the DRD4 7R allele compared to the DRD4 4R allele. To examine whether the LD from the DRD4 7R allele has been extended by recent positive selection, we used the LRH test [18] based on the relative EHH (REHH) value, in which the other allele (i.e., reference allele) at the same locus serves as an internal control to normalize recombination rate variation [25]. In this study the REHH value for the focal DRD4 VNTR allele was calculated at a distance of 0.25 centimorgans (cM) from the DRD4 VNTR. The EHH value of the 7R-bearing chromosomes in YRI was 4.58 times greater than the EHH of the reference chromosomes on the centromere-  Figure 4). In both populations, the observed REHH values of the 7R-bearing chromosomes were not significantly large compared to the other SNPs on the chromosome 11 (i.e., empirical P-values were 0.17 for YRI and 0.90 for CEU, respectively). We further performed the iHS test [22]. In the iHS test, unstandardized iHS value was computed for SNPs with minor allele frequency similar to the frequency of the DRD4 7R allele. Large negative unstandardized iHS values indicate that minorallele-bearing chromosomes have much longer LD than majorallele-bearing chromosomes. The unstandardized iHS values of the 7R-bearing chromosomes in YRI and in CEU were -1.23 and -0.65, respectively. These values were not large negative when compared with the unstandardized iHS values for SNPs on the chromosome 11 ( Figure 5). The empirical P-values for the DRD4 7R allele were 0.46 for YRI and 0.60 for CEU, respectively.

Power simulation
Neither LRH test nor iHS test provided any evidence of longer LD of the 7R-bearing chromosomes. However, selection intensity for the DRD4 7R allele might be too low to be detected. We assessed the power of the LRH test and iHS test by using a computer simulation. Using SelSim program [26], 120 chromosomes bearing 101 SNPs evenly distributed in the region with the size of 1 centimorgan (cM) were simulated, where a derived allele at the 51th SNP was assumed to be subjected to positive selection. The final population frequency of the selected allele was set to be either 0.15 or 0.3, since the present frequencies of the DRD4 7R allele in CEU and YRI are approximately 0.15 and 0.3, respectively.
The results demonstrated that the power of the LRH test [18] was always smaller than that of the iHS test [22] in the present parameter settings ( Figure 6). Here, Ns of 50 corresponds to s of 0.005 in a population with the population size of 10,000. Although our simulation indicated that the LRH test reveals very low statistical power for a selected allele with small selection coefficient, the LRH test successfully detected a significant evidence of positive selection acting at the G6PD gene [18]. Since the selection coefficient of the G6PD allele conferring resistance to malaria has been estimated to be more than 0.1 [27], we may say that the LRH test has enough power to detect such a strong recent selection.
The iHS test achieved high statistical power (.0.8) for Ns (N is diploid population size and s is selection coefficient) of 250 when the final population frequencies of the selected allele were 0.15 and 0.3, but showed low power (, 0.3) for Ns of 50. In addition, we should note that both LRH and iHS tests are expected to attain lower power for actual populations such as YRI and CEU than for the model assumed in our simulation because the variance of test statistic for neutral SNPs is larger in the former due to their complicated population history. Thus, statistical power of the present neutrality tests, LRH and iHS, for the HapMap subjects is too low if selection intensity has not been very high.

Discussion
No significant extended LD from the DRD4 7R allele to SNPs outside the DRD4 locus was detected in the present study. Our results seem to be inconsistent with the previous observation that the nucleotide diversity of the DRD4 haplotype bearing polymorphisms spanning 6.3 kb is more reduced in the DRD4 7R allele than the most common DRD4 4R allele [16]. However, it is not surprising that the diversity of the DRD4 7R allele is lower than that of the DRD4 4R allele if the DRD4 7R allele is derived from the DRD4 4R allele [14].
A previous study also showed that non-synonymous substitutions have occurred more frequently than synonymous ones in the human DRD4 VNTR [14]. This implies that diversifying selection has operated against DRD4 VNTR especially in human lineage, though the length polymorphism at DRD4 VNTR has been commonly found even in nonhuman primate species [28]. However, it is unlikely that the observed bias towards nonsynonymous amino-acid changes has been achieved only by positive selection favoring the DRD4 7R allele because the bias can be found not only in VNTR motifs of the DRD4 7R allele but also in those of the other alleles [14].
For selectively neutral alleles with the similar population frequencies, their age is thought to be inversely proportional to the REHH value. It is not feasible to estimate the age of the DRD4 7R allele based only on the present DRD4 VNTR allele frequency data. However, the observed REHH and unstandardized iHS values for YRI and CEU (Figures 4 and 5) indicate that the age of the DRD4 7R allele is not extraordinarily young compared to SNP alleles with the similar frequencies. Hattori et al. [15] pointed out the possibility that five distinct DRD4 VNTR alleles had been derived from the DRD4 7R allele based on the sequencing analysis of the VNTR region. If this prediction is true, the original DRD4 7R allele is considered to have appeared long time ago. These observations lead us to conclude that the DRD4 7R allele has not been subjected to strong recent positive selection and the increased population frequencies of the DRD4 7R allele in African and European populations would have been caused by random genetic drift, though the possibility that diversifying selection has acted against the DRD4 VNTR in primate species including humans for a long evolutionary time cannot be excluded.

Samples
The genomic DNA samples used by the International HapMap Project were obtained from the Coriell Cell Repository: 56 subjects from YRI (Yoruba in Ibadan, Nigeria), 55 CEU (CEPH Utah residents with ancestry from northern and western Europe), 39 JPT (Japanese in Tokyo, Japan), and 44 CHB (Han Chinese in Beijing, China).

Genotyping
For genotyping the VNTR in exon 3 of the DRD4 gene, PCR was performed using the following pair of primers previously reported [29]: upstream 59-ACTACGTGGTCTACTCGTCC-GTGT-39 and downstream 59-TCAGGACAGGAACCCACCG-A-39. PCR was performed with an initial denaturation at 95uC for 5 min, followed by 40 cycles of denaturation at 95uC for 30 s, annealing at 60uC for 45 s, and extension at 72uC for 30 s, and a final extension at 72uC for 7 min using a thermal cycler (GeneAmp PCR system 9700; Perkin-Elmer Applied Biosystems). The PCR-amplified DNA fragments were analyzed using a microfluidics-based platform (Agilent 2100 bioanalyzer; Agilent Technologies) for accurate sizing with the Agilent 2100 Bioanalyzer DNA 1000 kit (Agilent Technologies).

Statistical analyses
Using Arlequin version 3.5 [30], the Ewens-Watterson test [23] based on Ewens sampling theory of neutral alleles [31] was performed to test whether the observed distribution of allele frequencies at the DRD4 VNTR is deviated from the expectation under neutrality.
The allele frequency data of SNPs on the chromosome 11, analyzed in all the HapMap populations, were obtained from the HapMap database (http://hapmap.ncbi.nlm.nih.gov/index.html. To evaluate the structure of LD around the DRD4 VNTR, the SNP genotype data of HapMap subjects were obtained from the HapMap database, and the absolute D' values, |D'|, for all pairwise combinations of SNPs with minor allele frequency of more than 0.1 and DRD4 VNTR allele were estimated and visualized using Haploview software [33]. Here, SNPs spanning 200 kb were analyzed. As Haploview program is unable to analyze multi-allelic locus such as VNTR, only one of VNTR alleles was focused and the other VNTR alleles were regarded as the same ones. For example, the DRD4 7R allele and the other VNTR alleles were designated as ''A'' and ''G'', respectively for Haploview analysis. The phased haplotypes or diplotype in an individual for the calculation of EHH [18] were estimated by using fastPHASE program version 1.2.0 [34]. Like Haploview analysis, in fastPHASE analysis, only one of VNTR alleles was focused and the other VNTR alleles were regarded as the same ones. To conduct the LRH test [18], the empirical distributions of the REHH of SNPs on the chromosome 11 were obtained for YRI and CEU. Only SNPs with similar allele frequency of the DRD4 7R allele were considered. The phased haplotype data were retrieved from the HapMap database. The REHH value was defined as the ratio of the EHH on the tested allele compared with the EHH of the reference allele (i.e., the other allele) at a distance of 0.25 cM from the core SNP. When the EHH of the reference allele was 0, the REHH value was excluded from the empirical distribution.
To perform the iHS test [22], in plots of EHH versus distance, the area under the EHH curve was calculated until EHH reaches 0.05. This integrated EHH (iHH) (summed over both directions away from the core SNP) was computed for each allele of SNPs with minor allele frequency similar to the frequency of the DRD4 7R allele, and was denoted iHH major for major allele or iHH minor for minor allele. In this study, test static iHS was defined as follows: unstandardized iHS~ln iHH major iHH minor : Large negative values therefore indicate long haplotypes carrying the minor allele. Unlike the original study [22], we did not consider the ancestral status of each SNP allele because the ancestral state of each DRD4 VNTR allele is unknown.

Computer simulation for power calculation
Using SelSim program [26] based on the coalescent theory, 120 chromosomes bearing 101 SNPs evenly distributed in the region with the size of 1 cM were simulated assuming a population with constant size, N, of 5000 individuals. In the simulation, a derived allele at the 51th SNP was positively selected with selection coefficient of s, where the dominance parameter was set to 0.5 (i.e.,  genic selection). It should be noted that the product of N and s (i.e., Ns) is important rather than each parameter in this coalescent simulation. For 120 chromosomes, the EHH values of the derived and ancestral alleles at the 51th SNP for both directions were computed. REHH value at a distance of 0.25 cM from the selected SNP and iHH value were calculated for LRT test [18] and iHS test [22], respectively. We performed 200 simulation runs for deterministic models with s of 0.01 (i.e., Ns of 50) and 0.05 (i.e., Ns of 250). In addition, 1000 runs for neutral model (s = 0) were performed to have the null distributions of the average REHH and unstandardized iHS values. In the calculation of the statistical power, ''positive selection'' was regarded to be successfully detected in a simulation run when the test statistic (average REHH in LRH test or unstandardized iHS in iHS test) in selection model exceeded the 95th percentile of those observed in neutral model. The proportion of the detection in 200 runs was defined as the statistical power.