Genome-Wide Association Study of Young-Onset Hypertension in the Han Chinese Population of Taiwan

Young-onset hypertension has a stronger genetic component than late-onset counterpart; thus, the identification of genes related to its susceptibility is a critical issue for the prevention and management of this disease. We carried out a two-stage association scan to map young-onset hypertension susceptibility genes. The first-stage analysis, a genome-wide association study, analyzed 175 matched case-control pairs; the second-stage analysis, a confirmatory association study, verified the results at the first stage based on a total of 1,008 patients and 1,008 controls. Single-locus association tests, multilocus association tests and pair-wise gene-gene interaction tests were performed to identify young-onset hypertension susceptibility genes. After considering stringent adjustments of multiple testing, gene annotation and single-nucleotide polymorphism (SNP) quality, four SNPs from two SNP triplets with strong association signals (−log10(p)>7) and 13 SNPs from 8 interactive SNP pairs with strong interactive signals (−log10(p)>8) were carefully re-examined. The confirmatory study verified the association for a SNP quartet 219 kb and 495 kb downstream of LOC344371 (a hypothetical gene) and RASGRP3 on chromosome 2p22.3, respectively. The latter has been implicated in the abnormal vascular responsiveness to endothelin-1 and angiotensin II in diabetic-hypertensive rats. Intrinsic synergy involving IMPG1 on chromosome 6q14.2-q15 was also verified. IMPG1 encodes interphotoreceptor matrix proteoglycan 1 which has cation binding capacity. The genes are novel hypertension targets identified in this first genome-wide hypertension association study of the Han Chinese population.


Introduction
Hypertension is a common disorder that is prevalent in most populations, especially in highly industrialized regions [1]. The prevention and management of hypertension has become an important public health issue in the world. The identification of hypertension susceptibility genes and an understanding of the hypertension genetic mechanism may contribute to the development of genetic prevention, counseling and treatment for hypertension in the future. Efforts to identify hypertension genes have been ongoing for several decades [2]. Some susceptibility genes have been located using different mapping strategies. One of the mapping strategies is ''candidate-gene linkage analysis''. This method is effective for mapping genes with large phenotypic effects that follow Mendelian laws of inheritance using a large-pedigree linkage approach, and its success in identifying novel hypertension genes is best described by Lifton's works [3,4]. Using this approach, approximately 10 genes were linked to the causality of hypertension, which account for only a small fraction of the essential hypertension etiology. Another mapping strategy is ''genome-wide linkage approach''. This method, which uses hundreds to thousands of short tandem-repeat polymorphisms and a large number of families, has been used in various studies that suggested multiple potential locations of hypertension genes for further research; however, the indicated regions of interest are often too broad and are not consistent across multiple studies [5]. Very few studies have fine-mapped the genes, not to mention carrying out cross-verification of these genes [6].
Much hope has thus been placed on the state-of-the-art genome-wide association study approach using a large number of dense single-nucleotide polymorphism (SNP) markers. From 2006 to 2008, several dozen large-scale genome-wide association studies were published tackling various complex diseases [7]. To date, there have been only two large-scale genome-wide association studies on hypertension, both of which were carried out by the Wellcome Trust Case Control Consortium (WTCCC) [8,9]. Neither yielded apparent variants at the initial stage of data analysis. The Family Blood Pressure Program tried to replicate the top six SNPs identified by the WTCCC but failed to do so [10]. These frustrating findings of the above attempts underscore the need for stringent phenotype definition and powerful statistical gene mapping methods in genetic analyses of hypertension.
To increase the genetic contribution and homogeneity of the study trait, here we focus on young-onset hypertension (YOH), which has a stronger genetic component than its older counterpart [11]. Although clinical profile and candidate gene studies have sketched a blueprint for genetic susceptibility in YOH in the Han Chinese population [12][13][14][15], meticulous dissection of YOH by a systematic genome-wide association study has not been performed. This study aims to identify YOH susceptibility genes for the Han Chinese population based on a two-stage study design consisting of a genome-wide association study (GWAS) and a confirmatory association study (CMAS).

Study design and samples
We performed a two-stage case-control association scan, consisting of a GWAS for the first stage and a CMAS for the second stage, to identify YOH susceptibility genes. We obtained complete genotypic and phenotypic data from 1,008 YOH individuals and established immortalized cell lines from their lymphocytes for the Academia Sinica Multi-Center YOH Genetic Study. In addition, we also obtained genotypic and phenotypic data from 1,008 normal controls from three projects: the Taiwan Han Chinese Cell and Genome Bank [16], the Cardiovascular Disease Risk Factor Two-Township Study [17] and the Nutrition and Health Survey in Taiwan [18]. This study was approved by the Internal Review Board of Academia Sinica. A written informed consent was signed by every participant at his/her initial clinic visit. All individuals in this study were Han Chinese.
In the first-stage association mapping, GWAS, 175 YOH patients with normal body mass indices (,23 kg/m 2 ), triglyceride levels (,150 mg/dl) and high density2lipoprotein cholesterol levels (.40 mg/dl) were analyzed. A one-to-one match strategy for age (65 years) and sex was applied to select controls (n = 175) from the Taiwan Han Chinese Cell and Genome Bank [16]. In the second-stage association mapping, CMAS, a group-match strategy balancing three age groups (20-32, 32-44, 44+) and two gender groups was applied to select controls (n = 833) for the remaining 833 patients on whom genotyping was carried out for the SNPs identified at the first stage. The 1,008 normal controls consisted of 314 individuals from the Taiwan Han Chinese Cell and Genome Bank [16], 551 individuals from the Cardiovascular Disease Risk Factor Two-Township Study [17] and 143 individuals from the Nutrition and Health Survey in Taiwan, 2005-2008 [18]. The male-female ratio was 2.08 for both the case and control groups. Mean age (standard deviation) was 42.4 (6.2) for female cases, 40.2 (7.7) for male cases, 42.8 (6.7) for female controls and 40.9 (8.3) for male controls.

Power calculation
Under certain given scenarios as described below, we calculated power of our two-stage case-control association study by GaTS software [19]. Given an additive-effect disease model with a prevalence of 13.4% for YOH [18], a genetic relative risk of 2, and a disease allele frequency of 0.2-0.4, the power of our two-stage analysis was 0.87-0.90 for a stringent test size of 5.45610 27 . The power was reduced to 0.29-0.38, if the genetic relative risk was reduced to 1.5. If the disease followed a multiplicative-effect model with a disease prevalence of 13.4%, a genetic relative risk of 2, and a disease allele frequency of 0.2-0.4, the power of our two-stage analysis increased to 0.96-0.99 for a stringent test size of 5.45610 27 . The power was reduced to 0.38-0.56, if the genetic relative risk was reduced to 1.5.

Inclusion criteria and auxiliary measurements
Inclusion criteria for YOH patients were defined as follows: (1) a systolic blood pressure (SBP)$140 mmHg and/or diastolic blood pressure (DBP)$90 mmHg over a 2-month period or, for those who were on anti-hypertensive medication, SBP/DBP$120/ 80 mmHg at two consecutive visits over a 2-month period; (2) an initial diagnosis of hypertension between 20 and 51 years of age; (3) no secondary causes of hypertension (such as chronic renal disease, renal arterial stenosis, primary aldosteronism, coarctation of the aorta, thyroid disorders, Cushing's syndrome and pheochromocytoma), which were ruled out through extensive clinical investigations (including blood chemistry, renal function tests, endocrine procedures and abdominal sonogram); (4) a fasting glucose level ,126 mg/dl and no previous diagnosis of diabetes mellitus; (5) a body mass index ,35 kg/m 2 ; (6) having both sides of parents and grandparents identifying themselves as Han Chinese; (7) being a legal resident of Taiwan.
Standard protocols for blood pressure measurements established by the Nutrition and Health Survey in Taiwan [18] were followed by all above studies. Blood pressure was measured three times with two consecutive pulse measurements in between using the Omega 1400 NBP (Invivo Research Laboratories Inc., Orlando, FL, USA). The average of the last two blood pressure measurements was used to confirm the hypertension status. In addition, personal interviews administered by trained nurses ascertained information on socio-demographics, lifestyle and personal habits (smoking, drinking and physical activity) and medical history and medications. For each eligible subject, 17.5 ml of venous blood from an antecubital vein was drawn into a Vacutainer(R) tube (BD, Franklin Lakes, NJ, USA) for clinical chemistry, and 5 ml was drawn into a sodium citrate2containing Monovette tube (Sarstedt AG & Co., Postfach, Nümbrecht, Germany) for DNA extraction.

SNP genotyping
In the first-stage association mapping, GWAS, YOH cases (n = 175) and normotensive controls (n = 175) were genotyped with the Affymetrix Human Mapping 100K Set (Affymetrix, San Diego, CA, USA), which contains 116,204 SNPs with a median inter-marker distance of 8.5 kb and 92% genome coverage within 100 kb of a SNP. Genomic DNA was isolated from leukocytes using a Puregene kit (Gentra Systems, Minneapolis, MN, USA) for genomic DNA isolation. The DNA concentration was quantified and adjusted to 50 ng/ml using a NanoDrop ND-1000 Spectrophotometer (NanoDrop Technologies, DE, USA). Genotyping of each individual was performed with 500 ng genomic DNA according to the GeneChip Mapping Assay Protocol and the BRLMM (Bayesian Robust Linear Model with Mahalanobis distance classifier algorithm) was used to call genotype data.
In the second-stage association mapping, CMAS, the SNPs identified in GWAS were genotyped with Sequenom MassArray (Sequenom, San Diego, CA, USA) for 833 YOH patients and 833 normotensive controls. The DNA concentration of each individual was measured fluorometrically and then diluted to 25 ng/ml using the PicoGreen dsDNA quantification reagent (Molecular Probes, Eugene, OR, USA). PCR primers and primer extension probes were designed using SpectroDESIGNER software (Sequenom), and all PCR amplifications and primer extension reactions were generated by PCR-ABI 9700 thermocyclers (Applied Biosystems, Foster City, CA, USA). PCR products were transferred from the microplate to a 384-well MassARRAY using SpectroCHIP (Sequenom). The mass spectrum from time-resolved spectra was analyzed and recorded using a MassARRAY mass spectrometer (Sequenom), and each spectrum was then quantified and called using SpectroTYPER and SpectroREADER software (Sequenom), respectively.

Statistical methods
This study conducted a two-stage association study in humans consisting of GWAS at the first stage and CMAS at the second stage. The detailed procedures are described as follows.
First, we evaluated SNP/genotyping quality by examining the genotyping call rate (GCR), the status of Hardy-Weinberg Equilibrium (HWE) and the minor allele frequency (MAF). The minimum GCR for 350 samples was 0.972. Using the ALLELE procedure of SAS software (SAS Institute, Inc. Cary, NC, USA), we examined HWE using the Markov-Chain Monte-Carlo exact HWE test [20] with one million permutations. Among 112,990 autosomal SNPs, 410 deviated from HWE with a 2log 10 (pFDR).3 (pFDR is defined in Procedure 6); these were excluded from further analysis. Then, 838 SNPs with a GCR,0.9 were excluded. Finally, 20,029 SNPs with a MAF,0.01 were also removed. The remaining 91,713 SNPs were used for further GWAS analysis.
Second, we evaluated population admixture of the Taiwanese population by using STRUCTURE software [21] and genomic control analyses [22]. For the former analysis, we considered the number of populations was K = 3 (Minna, Hakka and Mainlander) under an admixture model. Admixture proportions of all samples in normotensive group and hypertensive group were calculated respectively using STRUCTURE software [21]. For the latter analysis, variance inflation fraction, max{1, square(median of trend test statistics)/square(0.675)}, was calculated by the CASE-CONTROL procedure of SAS software (SAS Institute, Inc.).
Third, we performed genome-wide single-locus association tests using exact conditional logistic regressions [23,24], where a dichotomous disease status of YOH was regressed on SNP genotypes in either a nominal genotype coding system (i.e., AA, AB and BB) or an ordinal genotype coding system (i.e., 0, 1 and 2 of allele A). Genetic effects of SNPs were examined using one million Monte Carlo samples generated from a hybrid network and Monte Carlo algorithm [25,26] by the LOGISTIC procedure of SAS software (SAS Institute, Inc.). Throughout this paper, we use the term CLR-NOMINAL analysis to describe the procedure of fitting a conditional logistic regression model to associate hypertension with a nominal-genotype-coding variable; the term CLR-ORDINAL analysis describes fitting the same model with an ordinal-genotype-coding variable.
Fourth, we performed genome-wide multilocus association tests using either the haplotype association test or p-value combination test. The genome-wide haplotype association tests combined haplotype trend regression [27] and a sliding-window procedure to scan the human genome chromosome by chromosome. Multiple moving window sizes of 3, 5, 7 and 9 SNPs were used. Haplotype frequencies were estimated using the composite haplotype method [28], which requires less computational time than the expectation-maximization algorithm. Haplotypes with low frequencies were excluded using three thresholds of minimum haplotype frequencies, 0.01, 0.05 or 0.10. The analysis was carried out using HelixTree software (Golden Helix, Inc. Bozeman, MT, USA).
The genome-wide p-value combination test combined a truncated product p-value procedure [29] and a sliding-window procedure to scan the human genome chromosome by chromosome, where the p values were those obtained from the previous genome-wide single-locus association tests (CLR-NOMINAL or CLR-ORDINAL). Multiple moving window sizes of 3, 5, 7 and 9 were applied. The analysis was carried out using the PSMOOTH procedure of SAS software (SAS Institute, Inc.).
Fifth, we performed genome-wide pair-wise SNP-SNP interaction tests for all possible combinations by testing whether the odds ratios for the combined genotypes significantly differed between case and control groups. PLINK software [30] was used. SNP pairs identified were further verified by exact conditional logistic regression models with interactive covariate(s) based on one million Monte Carlo samples, where both nominal and ordinal genotyping coding systems were considered. Significance of an interactive effect with four degrees of freedom for a nominal genotyping coding system and an interactive effect with one degree of freedom for an ordinal genotyping coding system were examined by a type III analysis, respectively. The conditional logistic regression analysis was run using the LOGISTIC procedure of SAS software (SAS Institute, Inc.).
Sixth, we performed multiple testing corrections. Multiplicity of testing was adjusted using either the false discovery rate (FDR) [31], pFDR, or a stringent p-value threshold in various stages of analyses. SNPs with 2log 10 (pFDR).3 in HWE tests were excluded from the subsequent analysis. SNPs, haplotype sets and SNP triplets with 2log 10 (pFDR).3 were considered significant for marker-trait associations. SNP pairs with a 2log 10 (p).8 for interaction were identified as significant interactive pairs. All of the SNP markers identified by any of the GWAS procedures were annotated using GENOWATCH software [32]. For those SNPs with at least one gene located within 100 kb of the flanking regions, the SNP-hypertension associations were further examined with more samples in the CMAS.
CMAS was carried out with 1,008 YOH patients and 1,008 normotensive controls. We used two analysis strategies, independent data analysis and joint data analysis. The fomer strategy was to analyze only the independent samples in CMAS (i.e., 833 YOH patients and 833 controls) and the later strategy was to analyze the combined samples in GWAS and CMAS (i.e., 1,008 YOH patients and 1,008 normotensive controls). Age and gender were adjusted in the analyses. Genotyping quality control procedures were identical to those used during the first stage. An unconditional logistic regression model with either a nominalgenotype-coding covariate (ULR-NOMINAL analysis) or an ordinal-genotype-coding covariate (ULR-ORDINAL analysis) was carried out. Association/interaction tests were performed to confirm the previous findings in the GWAS. Odds ratios and the corresponding 95% confidence intervals were calculated to estimate the effect sizes of the identified SNPs. In addition, the linkage disequilibria (LD) structure of the identified contiguous SNPs was examined using the HAPLOVIEW software [33]. Haplotype-trait association was examined based on a likelihood ratio test [34]. Ten thousand permutations were performed to calculate empirical p values of overall tests and individual haplotype tests.

GWAS at the first stage
Using 91,713 SNPs with good quality (see the discussion of statistical methods), we investigated marginal effects (genome-wide single-locus association test), joint effects (genome-wide multilocus association test) and interactive effects (genome-wide pair-wise interaction test) of SNPs on YOH.
First, STRUCTURE software and genomic control analyses were performed to evaluate population admixture/stratification. The results from STRUCTURE shows that the overall admixture structures in our case samples and control samples are very similar, suggesting the admixture in our population should not cause spurious association in our association study. In addition, the genomic control analysis shows that the variance inflation fraction was 1.097, close to 1, also suggesting the impact of population admixture/stratification on our association study is not significant. The conclusion is similar to the findings in the previous studies [35][36][37].
Second, genome-wide single-locus association tests were carried out to detect marginal genetic effects of YOH. Exact p values were calculated for the CLR-NOMINAL and CLR-ORDINAL analyses to associate YOH with SNPs. After applying an FDR correction to the p values, no SNPs satisfied 2log 10 (pFDR).3 (see Figures S1(A) and S1(B)). That is, no individual SNP was significantly associated with the status of YOH in this study.
Third, we further examined the effects of multiple SNPs on YOH by two types of genome-wide multilocus association analyses: p-value combination analysis and haplotype analysis with sliding windows of 3, 5, 7 and 9 SNPs. Because the analyses of different window sizes identified similar association regions, here we show only the results of window size 3.
The p-value combination analysis integrated p values from either the CLR-NOMINAL or CLR-ORDINAL analyses. The analyses identified 20 significant triplets of SNPs with 2log 10 (pFDR).3 (see  Table 1. Among them, 13 triplets were identified by one analysis and seven by two analyses. Among the seven triplets identified by the two analyses, we focused on the three triplets located in known or hypothetical gene regions (bold in Table 1). Note that the unadjusted p values (in 2log 10 scale) of the three triplets were greater than 7. The first triplet was rs9308945-rs6711736-rs6729869 on chromosome 2; the second triplet was rs6711736-rs6729869-rs10495809 on chromosome 2; the third triplet was rs10517739-rs1444280-rs10517740 on chromosome 4. The two triplets on chromosome 2 contained two overlapping SNPs, forming a SNP quartet. Seven distinct SNPs in the three SNP triplets were genotyped for more samples with the Sequenom's MassARRAY and further analyzed statistically in a CMAS, which is described below. Genome-wide haplotype trend regression was also carried out but did not identify any windows (i.e., triplets of SNPs) with 2log 10 (pFDR).3 (see Figures S2(A), S2(B) and S2(C)).
Fourth, the interactive effects of all possible pairs of SNPs (more than 4.2 billion SNP pairs) were exhaustively examined. Numerical results and gene information of the identified top 10 SNP pairs are summarized in Table 2. The 10 SNP pairs satisfied the following two conditions: (1) the difference test of odds ratios between case and control groups for the combined genotypes showed 2log 10 (p).8 (see column ''ORT'' in Table 2) and (2) at least one SNP from the pair was located in a gene region. Except for SNP pair rs1526555-rs765899, the significance of the remaining 9 SNP pairs was also confirmed using interaction tests of a CLR-NOMINAL analysis (see column ''Nominal'' in Table 2) and a CLR-ORDINAL analysis (see column ''Ordinal'' in Table 2) based on the same data, where significance of the CLR-NOMINAL and CLR-ORDINAL analyses were evaluated by considering an FDR correction of 10 replication tests. The significance of the 10 SNP pairs was also re-examined in a CMAS, the results of which are described below.
In summary, the GWAS identified three SNP triplets with pvalue combination tests and 10 pairs of SNPs with significant interactive effects that are located in gene regions. The three triplets contain seven distinct SNPs and the 10 interactive pairs contain 16 distinct SNPs. All of the resulting 23 SNPs were genotyped in the CMAS.

CMAS at the second stage
For confirmatory purposes, the 23 SNPs were genotyped for 833 YOH patients and 833 normal controls. Summary statistics and p values of single-locus association tests based on the independent samples (833 YOH patients and 833 normal controls) and on the combined samples (1,008 YOH patients and 1,008 normal controls) are presented (see Table 3). Odds ratios and 95% confidence intervals were calculated based on either the independent samples or the combined samples (see Table 4). After considering a multiple testing correction (2log 10 (pFDR).3), no significant results were found for single-locus association tests. This result suggests that a single SNP may not be capable of producing a detectable YOH-SNP association. This conclusion is consistent with our GWAS findings.
Next, we verified the significant findings obtained from genomewide multilocus association tests and the interaction tests. We included only SNPs that passed the quality criteria in the CMAS. Among the 23 SNPs, two SNPs (rs10517740 and rs10500328) had a GCR,0.9, and three SNPs (rs10517740, rs2206416 and rs10500328) significantly deviated from HWE (see Table 3); these were excluded from the subsequent analysis. In addition, one SNP triplet (rs104517739, rs1444280 and rs10517740) on chromosome 4 contained the poor quality SNP rs10517740, resulting in the exclusion of two SNPs (rs104517739 and rs1444280). On the other hand, an interactive SNP pair (rs10488767-rs10505328) contained one of the three poor-quality SNPs, resulting in the exclusion of a SNP (rs10488767). Therefore, we examined only four distinct SNPs (rs9308945, rs6711736, rs6729869 and rs10495809) for p-value combination and examined 13 SNPs (rs618171, rs7805441, rs1115620, rs10506451, rs2502397, rs1886985, rs6129969, rs2214310, rs7950640, rs994531, rs1526555, rs2331706 and rs765899) for genetic interaction. We carried out confirmation analyses on the following three SNP groups: (1) four distinct SNPs resulting from the two SNP triplets on chromosome 2, and (2) 13 distinct SNPs resulting from 8 interactive SNP pairs. All results are summarized in Table 5 and Table 6.
First, we confirmed the significance of the four contiguous SNPs (rs9308945, rs6711736, rs6729869 and rs10495809) located in a hypothetical gene on chromosome 2. They were examined by a pvalue combination analysis with a window size of 2, 3 and 4 SNPs. The significance for each respective SNP was strengthened after considering the join effect of multiple SNPs (see Table 5). The same findings were observed for the independent samples and for the combined samples using either the ULR-NOMINAL or the ULR-ORDINAL analysis. For example, in the analysis of the combined samples, the marginal p values of the ULR-NOMINAL analysis of the four SNPs were 0.0007, 0.0004, 0.0017 and 0.0260, respectively (see Table 3). P values were greatly reduced after considering the truncated product p-value method for the SNP pair, triplet and quartet. Results showed that 2log 10 (p) values of SNP pairs rs9308945-rs6711736, rs6711736-rs6729869 and rs6729869-rs10495809 were 5.4738, 5.1218 and 3.5135, respectively; 2log 10 (p) values of SNP triplets rs9308945-rs6711736-rs6729869 and rs6711736-rs6729869-rs10495809 were 7.1739 and 5.8055, respectively, and 2log 10 (p) value of SNP quartet rs9308945-rs6711736-rs6729869-rs10495809 was 7.8469 (see Table 5). After applying an FDR correction to the p values, the SNP pairs, triplets and quartet satisfied 2log 10 (pFDR).3.
We further examined this region by considering LD and haplotype analyses. LD structures of the four SNPs in the caseonly group, the control-only group and the combined group were highly consistent. The four SNPs formed a strong LD block where    coefficients of LD, D9, for any SNP pairs were greater than 0.95 (see Figures S3(A), S3(B) and S3(C)). Haplotype-based association tests for the two SNP triplets rs9308945-rs6711736-rs6729869 and rs6711736-rs6729869-rs10495809 and for the SNP quartet rs9308945-rs6711736-rs6729869-rs10495809 showed that global p values of haplotype-trait association tests were 0.0010, 0.0057 and 0.0055, respectively. For the SNP triplet rs9308945-rs6711736-rs6729869, two haplotypes presented significantly different distributions between case and control groups. Frequencies of haplotype A-G-T in hypertensive and normotensive groups were 0.3368 and 0.3862, respectively, with a p value of 0.0008 for the difference test. Frequencies of haplotype G-A-A in hypertensive and normotensive groups were 0.6349 and 0.5759, respectively, with a p value of 0.0001. In the SNP triplet rs6711736-rs6729869-rs10495809, two significant haplotypes were identified. Frequencies of haplotype A-A-A in hypertensive and normotensive groups were 0.4694 and 0.4252, respectively, with a p value of 0.0059. Frequencies of haplotype G-T-G in hypertensive and normotensive groups were 0.3328 and 0.3803, respectively, with a p value of 0.0019.
Second, we investigated the 13 SNPs that consisted of 8 significant interactive pairs identified by our GWAS. None of the 13 SNPs showed a significant marginal effect (see Table 3). Only the interactive effect of a SNP pair rs1886985-rs6129969 was confirmed in the combined samples (see Table 6). SNP pair rs1886985-rs6129969 had 2log 10 (p) = 4.0000 for a ULR-NOM-INAL analysis and 2log 10 (p) = 2.3318 for a ULR-ORDINAL analysis. P-values of the ULR-ORDINAL analyses satisfied 2log 10 (pFDR).3.

Discussion
Hypertension is a common complex disorder characterized by multifactorial inheritance, polygenic effects and genetic heterogeneity. The complex etiology of hypertension has made it difficult to map disease-related genes. To date, no high-impact genes have been directly linked to the onset of hypertension. In this study, we not only carefully selected the phenotype (i.e., by focusing on YOH) but also employed statistical methods designed to increase the power of our analysis and to overcome genetic complexity. The type of statistical gene mapping method used in gene mapping studies is critical for successfully identifying genes responsible for complex disorders. The single-locus association method, which is useful for the detection of marginal effects, may not be sufficient for the investigation of joint effects and interactive (synergic) effects of complex disorders. To increase the test power,  in addition to the single-locus association test, we used various multilocus association methods, including the p-value combination approach [29], haplotype analysis [27,34] and interaction analysis [23][24][25][26]30], to compensate for the limitations of the single-locus association test and to examine fully the genetic complexity of hypertension. For each SNP, the method used (M), chromosome (CHR), RS number (RS), nucleotide types (NT) and gene information (GI) are shown. Two genotypic odds, OR 1 and OR 2 , from the ULR-NOMINAL analysis [model: logit{Prob(Y = YOH|I 11 ,I 12 ,I Gender ,Z Age )} = a 0 +a 11 6I 11 +a 12 6I 12 +a 2 6I Gender +a 3 6Z Age ] and one allelic odds ratio, OR 3 , from the ULR-ORDINAL analysis [model: logit{Prob(Y = YOH|X,I Gender ,Z Age )} = a 0 +a 1 6X+a 2 6I Gender +a 3 6Z Age ] were calculated, where Y is YOH status, I 1j is an indicator of the three genotypes (j = 1, 2) of a SNP, I Gender is an indicator of gender, Z Age is a covariate for age, and X is a variable for the number of reference alleles. The results for combined samples and independent samples are shown. doi:10.1371/journal.pone.0005459.t004 It is a challenge to study genome-wide interactions. In our GWAS, we conducted a two-step genome-wide interaction analysis to examine all possible pair-wise SNP-SNP interactive effects. The first step applied a computationally efficient algorithm, a difference test of odds ratios in hypertensive group and normotensive group [30], to scan all possible pair-wise SNP-SNP interactive effects. A large significance threshold of 2log 10 (p).8 was considered to control false positive. The second step further verified the identified interactive effects using exact conditional logistic regressions [23][24][25], which was computationally intensive but accurate even for sparse data. In general, the two-step procedure helps to reduce but may not exclude all false positive due to 4.2 billion of tests were conducted in the first step. A verification of the identified interactive effects in a CMAS becomes critically important.
An important issue is to consider population admixture/ stratification, which may cause spurious association, in population-based case-control studies. This study analyzed Han Chinese samples in the Taiwanese population. In addition to 2 to 3% aborigine people and foreign residents, the Taiwanese population consists of the three major Han Chinese subgroups: Minnan (70%), Hakka (13%) and Mainlanders (14%). Previous studies showed that the high homogeneity of genetic distribution and linkage disequilibrium structure among the three Han Chinese subgroups relative to the Caucasian population. An impact of population admixture on the results of case-control association studies for the Taiwan Han Chinese population is small [35][36][37]. Our population admixture analyses using genome-wide SNP markers also suggested the same conclusion.
This study is the first two-stage GWAS for YOH in the Han Chinese population. We successfully identified novel genetic variants associated with YOH as well as those with interactive effects by applying a p-value combination analysis and a pair-wise interaction analysis. At the first stage, GWAS identified two significant SNP sets that were located in gene regions by using conditional logistic regressions in conjunction with a p-value combination test. SNP quartet rs9308945-rs6711736-rs6729869-rs10495809 located on chromosome 2p22.3 was re-confirmed in the second-stage analysis. Several studies found suggestive linkage signals on chromosome 2p. In particular, HERITAGE Family Study [38] and NHLBI Family Blood Pressure Program [39] showed suggestive evidence at 2p22.3 for African Americans. The quartet was 219 kb, 322 kb, 457 kb, and 495 kb downstream of LOC344371 (hypothetical gene), MYADML (pseudo gene), FAM98A (hypothetical protein), and RASGRP3, respectively. RAS Guanyl Nucleotide-releasing protein 3 is a member of the RAS subfamily of GTPases which functions in signal transduction as GTP/GDP-regulated switches and serves as RAS activators. Inhibition of RAS-GTPase signaling by chronic FPTIII treatment in streptozotocin-induced diabetic spontaneously hypertensive rats could ameliorate abnormal vascular responsiveness to endothelin-1, angiotensin II in isolated carotid artery. Moderate reduction on mean arterial blood pressure was also observed. Whether it is RASGRP3 that involves in the YOH development and how the discovered locus on 2p22.3 is connected await for further functional studies.
Furthermore, GWAS also identified eight interactive SNP pairs that passed SNP quality examination and were located in gene regions. SNP pair rs1886985-rs6129969, which showed a significant pair-wise interaction in associating with YOH, was re-confirmed in the second-stage analysis. SNP rs1886985 is located in IMPG1 on chromosome 6, and rs6129969 and rs2206416 are located in an intergenic region on chromosome 20. IMPG1, which is located on 6q14.2-q15, encodes interphotoreceptor matrix proteoglycan 1, which may participate in retinal adhesion and in maintaining photoreceptor viability [40]. IMPG1 contains 17 exons, including an alternatively spliced exon 2 [41]. A Leu579Pro mutation in IMPG1 may have a causal role in benign concentric annular macular dystrophy based on a linkage study of a large Dutch family [42]. No association has previously been found between IMPG1 and hypertension or related traits. Gene IMPG1 has rat homologue. The gene ID is 66014 for IMPG1 with respective to Rattus norvegicus.
We carried out a preliminary gene expression study comparing pooled samples from three SHR and from three WKY rats at 4, 12, 26 and 38 weeks of age [43]. The use of SHR and WKY rats was approved by the Academia Sinica Institutional Animal Care and Utilization Committee. cDNA was hybridized with Nimble- Chip Array (Roche NimbleGen, Madison, WI, USA) and analyzed with Gene Spring 7.3.1 (Agilent Technologies, Palo Alto, CA, USA). The IMPG1 mRNA in SHR rats was 3.12-fold higher than that of WKY rats at 4 weeks prior to the blood pressure elevation in SHR rats, but not at other time points suggesting its potential involvement in the early phase of hypertension development. Proteoglycans are a major component of the animal extracellular matrix and may be present in many adult tissues including blood vessels [44] and nervous tissue [45]. It is capable of binding cations and its synthesis is affected by cation status [46]. IMPG1 expression may modify blood vessel structure and affect the activity and stability of proteins and signaling molecules within the matrix. In-depth functional studies are, however, required to examine how this gene interactively exerts its effects in humans.
In this study, we applied significance criteria 2log 10 (pFDR).3 and 2log 10 (p).8 for the genome-wide association/interaction tests to reduce false-positive and false-discovery rates. In addition, only significant SNPs, haplotypes and interactions that were located in the region of known genes with potential biological implications were further verified in the CMAS. The use of such criteria may have resulted in a failure to identify biologically relevant SNPs with a relatively small effect. Therefore, it may be worthwhile to examine more SNPs in CMAS in the future by altering the criteria of significance in the GWAS. For example, we found two SNPs with 2log 10 (p).5, neither of which was significant if an FDR correction was considered in single-locus association. The first SNP, rs1010330 on chromosome 2, had a 2log 10 (p) of 5.5229 and 6.0000 for CLR-NOMINAL and CLR-ORDINAL analyses, respectively. There have been no genes identified near this SNP. The second SNP, rs864603 on chromosome 21, had a 2log 10 (p) of 5.2218 and 5.5229 for CLR-NOMINAL and CLR-ORDINAL analyses, respectively. SNP rs864603 is located in gene SYNJ1, and the 100-kb flanking region also contains C21orf59 and OR7E23P. These two SNPs may be investigated further using additional samples and denser SNP chips. The second example includes the 20 significant SNP triplets identified in at least one of the GWAS analyses (CLR-NOMINAL and CLR-ORDINAL). Only the three SNP triplets that were significant in both types of analyses were verified. It would also be worthwhile to verify the other SNP triplets that were significant in only one kind of analysis, since each type of regression model has its unique genetic meaning.
This study can be improved by recruiting more samples and using denser SNP chips. Using CaTs [19], we provided an approximate estimate of power for our two-stage association study. In general, the two-stage association study had sufficiently high power to detect SNP loci with a large main effect (e.g., genotype relative risk .2). However, our association study had reduced power because of the relatively small sample size of 175 casecontrol pairs in the first stage GWAS. Some small-effect YOH loci may have been missed despite of the higher sample size/power in the second stage CMAS. More samples should be recruited for our next genome-wide scan. However, due to the reasonable sample size for the CMAS, the findings on the YOH-associated genes should be real. On the other hand, this genome-wide study was conducted based on data from the Affymetrix Human Mapping 100K Set. The results can be improved upon by using denser SNP chips, such as the Affymetrix 500K/Array6.0 gene chips and Illumina 550K/1M bead chips. We anticipate that more potential loci may emerge when a denser chip is used with a larger number of samples.
YOH is a common disorder with a complex disease etiology that involves biologically important variants with minor to moderate effects. A single-locus association test is limited in its power to discover this type of common disease variant. This phenomenon was also observed in our study, where no single SNPs were identified as significantly important variants associated with YOH. To overcome the difficulty of identifying common variants that are associated with YOH, we performed several multilocus association tests and interaction tests and successfully identified some novel YOH disease genes. The success of this study highlights the importance of using sophisticated statistical association methods in addition to traditional single-locus association tests. By using different methodological constructions, the multilocus association tests and interaction analyses were able to detect genes involved in joint and interactive models, respectively. The employed p-value combination method (i.e., truncated product p values) utilizes the accumulated significant association signals from proximal SNP markers, which is useful for genetically mapping gene regions containing multiple SNPs that are actually associated with YOH. Haplotype analysis relies on linkage disequilibrium and is more powerful for discovering gene regions containing specific YOH-related haplotypes or haplotype combinations. Interaction analysis is especially designed for detecting combinations of SNPs that act together through pathways or in regulated mechanisms, even though they are located in remote regions or on different chromosomes. These methods provide complementary information for gene mapping. More methods that handle genetic heterogeneity and complexity should be developed to utilize genomic information fully for gene mapping.
Replication of the novel findings is an important issue in association studies. Our CMAS has successfully replicated several SNP loci identified by our GWAS based on the same Taiwanese population. In future, further confirming the results from independent populations helps strengthen the credibility of our findings scientifically [47]. We are working on the replicating studies from independent populations by the following two ways. First, we have collaborated with a Hong Kong young hypertension study group to replicate our results. The study will help replicate our findings from a same Han Chinese population with various life styles and environment. Second, we are applying the data of hypertension GWAS of the Wellcome Trust Case Control Consortium. The study will help validate our results in a non-Han Chinese population. In addition to replication studies, we are also conducting a microarray gene expression study to examine the mRNA-level transcriptional difference of the identified genes.