Genetics of Plasminogen Activator Inhibitor-1 (PAI-1) in a Ghanaian Population

Plasminogen activator inhibitor 1 (PAI-1), a major modulator of the fibrinolytic system, is an important factor in cardiovascular disease (CVD) susceptibility and severity. PAI-1 is highly heritable, but the few genes associated with it explain only a small portion of its variation. Studies of PAI-1 typically employ linear regression to estimate the effects of genetic variants on PAI-1 levels, but PAI-1 is not normally distributed, even after transformation. Therefore, alternative statistical methods may provide greater power to identify important genetic variants. Additionally, most genetic studies of PAI-1 have been performed on populations of European descent, limiting the generalizability of their results. We analyzed >30,000 variants for association with PAI-1 in a Ghanaian population, using median regression, a non-parametric alternative to linear regression. Three variants associated with median PAI-1, the most significant of which was in the gene arylsulfatase B (ARSB) (p = 1.09 x 10−7). We also analyzed the upper quartile of PAI-1, the most clinically relevant part of the distribution, and found 19 SNPs significantly associated in this quartile. Of note an association was found in period circadian clock 3 (PER3). Our results reveal novel associations with median and elevated PAI-1 in an understudied population. The lack of overlap between the two analyses indicates that the genetic effects on PAI-1 are not uniform across its distribution. They also provide evidence of the generalizability of the circadian pathway’s effect on PAI-1, as a recent meta-analysis performed in Caucasian populations identified another circadian clock gene (ARNTL).


Introduction
Cardiovascular disease (CVD) consists of multiple conditions with overlapping environmental and genetic risk factors, symptoms, and disease etiologies. It causes~48% of all

Study Cohort Description
All participants provided written consent for this study. The study comprised a subset of 1105 unrelated individuals recruited from Sunyani, Ghana recruited between 2002 and 2005 [23]. We examined a subset of the cohort previously assessed [12,13] (n = 992) and an additional 113 subjects from the 90th percentile of the plasma PAI-1 distribution. Ascertainment, DNA collection and biomarker measurement protocols are described elsewhere [23]. DNA and demographic data including age, body mass index (BMI), triglycerides and PAI-1 levels, were collected for all study participants. This study was approved by the Committee for the Protection of Human Subjects at Dartmouth College.

Genotyping Scheme
DNA was genotyped using the Illumina Infinium HumanExome BeadChip (Exome Chip) platform (Illumina, Inc., San Diego, CA). The Exome Chip provides coverage of the exonic regions of the genome, using approximately 240,000 markers. We supplemented these with an additional 8,439 common variants selected to target genes with prior evidence of association with variation in CVD.

Quality Control Procedures
Participants with genotyping efficiency less than 95% were excluded from further analysis. Subjects who were missing demographic and/or biomarker data were also excluded from the study. After quality control (QC) procedures, 1053 individuals (441 males, 612 females) remained. A total of 39,124 common variants (minor allele frequency 0.05) were analyzed from the Exome Chip genotype data. QC criteria for the selection of common variants included a genotyping efficiency > 95% and a Hardy-Weinberg equilibrium p-value > 0.001, after which 38,871 variants remained. All QC was carried out with PLINK [24].

Preliminary Analyses
The distributions of demographic and biological variables were assessed in males and females, separately, to determine if any significant differences existed between sexes. Normality of continuous traits was evaluated using the Shapiro-Wilk test (p < 0.05). For normally distributed continuous variables, the Student's t test was used to assess mean differences between sexes. In cases of non-normality, the Wilcoxon rank sum test was used. For discrete variables, the Chisquare test was used. Analyses were performed using STATA 11 [25].
To be included in the study, both parents and both sets of grandparents had to be native to Ghana, reducing likelihood of population stratification. However, as an added precaution, we explicitly tested for substructure using STRUCTURE [26]. This analysis was performed using 8521 variants pruned for LD (r 2 = 0.5) and present in the JPT+CHB, YRI, and CEU HapMap data. STRUCTURE runs used an admixture model with correlated allele frequencies (burnins = 10,000; iterations = 10,000) in a supervised analysis with K = 3. STRUCTURE analysis revealed no significant evidence of population stratification within our dataset (S1 Fig).

Median Regression Analysis
The distribution of PAI-1 levels was tested for normality and found to deviate even after log transformation (Shapiro-Wilk test p < 0.001). Therefore, median regression was performed with the quantreg package in R [27]. Regression models were adjusted for age, sex, BMI, triglycerides, and genotype at the PAI-1 4G/5G variant. Triglyceride levels were log-transformed.
Single variant results were visualized using the qqplots package in R [28]. Because a large number of the variants genotyped were in moderate to high LD, Bonferroni correction was deemed overly conservative. Therefore, False Discovery Rate (FDR) was also used to correct for multiple testing. FDR is more robust to the violation of the independent test assumption, providing moderate control of type I error. Results were considered statistically significant if p < 2.57x10 -6 (FDR q = 0.1).
To test if median regression exhibited greater sensitivity to detect linear and non-linear effects than linear regression, we constructed linear regression models for SNPs found to be significant with median regression. Linear regression models were adjusted for the covariates above. Analyses were performed with STATA 11 [25].
Additive models were used, with the major allele as referent. In cases where there were fewer than five individuals in a genotype group, SNPs were coded dominantly for the effect of the minor allele, i.e., the homozygous minor and heterozygote genotype groups were combined into one class and compared to the homozygous major genotype.

Exploratory Upper PAI-1 Quartile Regression
Because the upper extremes of the PAI-1 distribution associate with clinical outcomes, we performed upper quartile regression to assess the impact of single variants within this target region of the PAI-1 distribution [29,30]. The quantreg package in R was used [28], with the option for robust standard errors. SNPs were coded as described above. For gene regions contained more than one associating variant, pairwise LD was assessed using Haploview [31].

Cohort Demographics
There were no significant differences in mean age, triglyceride levels, plasma PAI-1 levels, or distribution of PAI-1 4G/5G variant genotypes between sexes (Table 1). Females had higher BMIs than males (p < 0.001).
We tested the five significant or marginally significant SNPs (p < 10 −5 ) using standard linear regression models adjusted for the same covariates as above (S1 Table). For each SNP, the effect trended in the same direction; however, in every model, the standard error reported by linear regression was greater than that reported by median regression. This resulted in larger 95% confidence intervals and larger p-values, confirming that median regression in a skewed data set can increase sensitivity (S1 Table).

Quartile Regression Analyses
Quantile regression analyses were performed on the upper quartile of the PAI-1 distribution. Nineteen variants were significant after correction for multiple testing with FDR (Table 3, Fig  2). The most significant effect in the upper quartile of the PAI-1 distribution was observed for rs4755779, located on chromosome 11 (p = 1.44 x 10 −10 ), while the largest negative and positive effects were observed for rs10462021 (β = -0.434), located on chromosome 1, and rs116307792 (β = 0.249), located on chromosome 3 (Table 3). Of note, a 72.6kb region on chromosome 11 containing both the pleckstrin homology-like domain, family B, member 1 (PHLDB1) and trehalase (TREH) genes (PHLDB1/TREH gene region) harbored three SNPS that were significantly associated in the upper quartile, two of which (rs7389 and rs519982) remained significant after Bonferroni correction.

Discussion
Susceptibility to major thrombotic events is increased by unbalanced or impaired fibrinolysis, which is heavily impacted by variation in PAI-1 levels. Our results identified three novel variants that significantly associated with median PAI-1. We further postulated that the effects of genetic variants on PAI-1 were non-uniform across its distribution, and tested this hypothesis by investigating the impact of common variants on the clinically relevant upper quartile. We found 19 SNPs that were significantly associated with PAI-1 levels in the upper quartile, including one region that harbored multiple associating variants. Our study not only revealed novel associations with PAI-1 levels but also found the first evidence for association in an African population of quartile-specific effects on PAI-1 levels.

Median regression
Of the three SNPs that associated with median PAI-1, rs1071598 was the most significant. Located within the fourth exon of ARSB, rs1071518 is responsible for a valine to methionine amino acid change at position 376 (V376M) that is classified as "probably benign" with respect to its effect on ARSB protein function [32]. Although there is no strong evidence that this SNP affects ARSB protein function, the V376M substitution may affect structural stability. Methionine to valine substitutions are predicted to cause over-packing of protein cores, as methionine is a larger amino acid than valine, possibly influencing protein stability [37]. According to SNPinfo, rs1071598 is located within two base pairs of a putative exon splice enhancer motif, potentially affecting the relative frequency of splice variants. ARSB has been implicated in reactive oxidative species (ROS) production and the activation of ROS-mediated inflammatory cascades [38]. ARSB also has the ability both to replicate and mediate the effects of hypoxia in human tissue [39]. PAI-1 was recently identified as a hypoxia inducible gene, and has long been established as a biomarker of inflammation [40]. The shared connection with inflammatory responses of ARSB and PAI-1 presents a potential link between PAI-1 levels and genetic variants in ARSB. Another SNP, rs61997065, located in the only exon of LENG9, has an effect similar to the ARSB SNP. It causes a histidine to arginine substitution (H153R) predicted to be benign with respect to protein function. LENG9 is a member of the leukocyte receptor complex (LRC), an extended gene region on chromosome 19 that encodes immunoglobulin superfamily receptors [41]. Although LENG9 has been mapped to the LRC, its function is unknown.
The only SNP that associated with increased PAI-1 levels was rs61997065, located in CPA2, which causes a valine to isoleucine substitution (V67I). This SNP is proximal to a predicted exon splice enhancer motif, indicating a possible biological role [32]. CPA2 is a digestive exopeptidase found primarily in the pancreas that is also expressed in the brain, in both humans and rats [42,43]. Previous studies revealed a possible regulatory role of extrapancreatic CPA2 in the renin-angiotensin system (RAS) via differential processing of Angiotensin I [44,45]. There are multiple sources linking the RAS and the fibrinolytic system [46,47]. Additionally, genetic variants of the RAS have been previously associated with mean PAI-1 levels in both Caucasian and African populations [12,48].

Upper quartile regression
Upper quartile regression analyses identified 19 associating variants; of particular note among these variants were 1) two non-synonymous SNPs located in genes with a plausible connection to PAI-1, rs4755779 in EXT2 and rs10462021 in PER3, and 2) three SNPs located in the PHLBD1/TREH gene region on chromosome 11.
The EXT2 SNP, rs4755779, is a missense variant that causes a methionine to valine substitution (M42V), predicted to be benign with respect to protein function. EXT2 encodes a protein involved in heparin sulfate biosynthesis, and associates with hereditary multiple exostoses and type 2 diabetes [49,50]. A plausible biological connection exists between EXT2 and PAI-1 via heparin-binding growth factors (HBGF). HBGFs have been implicated in the modulation of PAI-1 expression. In particular, HBGF-1 inhibits PAI-1 expression in human umbilical vein endothelial cells [51].
An associating missense variant in PER3, rs10462021, is responsible for a histidine to arginine substitution (H1139R), and is predicted to have an effect on protein function, although the nature of this effect is unclear. PER3 is a member of the circadian rhythm pathway that affects inflammatory responses by increasing the secretion of pro-inflammatory cytokines [52]. Previous studies in model organisms have also reported an association between PER3 and susceptibility to CVD, and transgenic PER3 knockout mice showed increased susceptibility to arteriosclerotic disease [53]. The identification of rs10462021 in PER3 is particularly noteworthy because variants in another prominent member of the circadian rhythm pathway, aryl hydrocarbon receptor nuclear translocator-like gene (ARNTL), were found to be associated with PAI-1 levels in a recent meta-analysis performed on Caucasians [11]. PER3 and ARNTL are major regulators of the circadian clock mechanism, a transcriptional timing apparatus governed by multiple positive and negative feedback loops [54]. ARNTL forms a heterodimer with CLOCK, which drives transcription of the PER and CRY gene families. PER and CRY then heterodimerize to form a complex that acts as an inhibitor of the ARNTL/CLOCK complex, creating a negative feedback loop [54] (Fig 3). The interaction between the PER3/CRY and ARTNL/ CLOCK heterodimers is of note because there is substantial evidence that ARNTL/CLOCK activates the PAI-1 promoter and increases PAI-1 expression [55,56].
The effects of PER3 and ARNTL on PAI-1 variation may be population specific, but the involvement of the circadian rhythm pathway appears to be generalizable. A difference in allele and genotype frequencies at the PER3 variant, rs10462021, may be responsible, in part, for a population-specific effect as found in a study comparing world-wide populations [58]. Allele frequency differences between African and European descent populations may affect the ability to detect or replicate the effects of this variant. However, the fact that multiple circadian clock genes have been associated with PAI-1 indicates the importance of the pathway despite possibly variable effects of specific genes.
We also discovered a 72.6kb region on chromosome 11, containing two genes, PHLBDI and TREH, with multiple associating variants. Of the three variants identified in this region, two (rs519982 and rs7389) passed correction for multiple testing. All three SNPs were in high LD with each other (0.94 < r 2 < 0.97), indicating that they represent a single association signal, making functional predictions difficult. However, we can speculate based on the putative individual SNP functions. Rs519982 is located in a region predicted to contain a transcription factor binding motif 14.9kb upstream of the TREH start codon. Its predicted location in a transcription factor binding site proximal to the TREH gene boundary may have functional implications; rs7389 is located in the 3' UTR of PHLDB1 and is predicted to affect microRNA (miRNA) binding site activity that can inhibit protein translation [59].
Our second most significant association, rs6713972, located in pleckstrin homology domain containing family B member 2 (PLEKHB2), is in the same family as PHLDB1. Deficiency in another member of the pleckstrin homology containing gene family, pleckstrin homology-like domain, family A, member 1 (PHLDA1) has been shown to be protective against atherosclerosis through regulation of cholesterol efflux, apoptosis, and peroxiredoxin-1 expression in mice [60]. Additionally, similar to PAI-1, TREH is a stress response gene known to associate with susceptibility to Type 2 diabetes [61,62].
Median regression analyses revealed novel variants associated with PAI-1 levels that would not have been detected with linear regression. While linear regression may be appropriate for  studies with extremely large sample sizes, for studies with modest sample sizes, such as ours, the impact of performing "standard" analyses can be significant.
Extending our analyses to include upper quartile regression allowed us to gain additional knowledge about the differential impact of genetic variants in this clinically significant portion of the PAI-1 distribution. Elevated PAI-1 levels are associated with increased susceptibility to CVD and in some cases severity of disease [5,[63][64][65]. Knowledge of genetic variation on PAI-1 levels at the higher end of the distribution may aid in the development of targeted therapies that may not be relevant to the general population, but could have a significant impact on a subset of the population already at increased risk of CVD.
Supporting Information S1 Table. Corresponding Standard Linear Regression Results for SNPs found to be significantly associated with Plasminogen Activator Inhibitor-1 (PAI-1) levels by Median Regression. (DOCX) S2 Table. Hardy-Weinberg Equilibrium Estimates and allele frequencies of SNPs significantly associated with Median Plasminogen Activator Inhibitor-1 (PAI-1) levels.