Association of Single Nucleotide Polymorphisms in the ST3GAL4 Gene with VWF Antigen and Factor VIII Activity

VWF is extensively glycosylated with biantennary core fucosylated glycans. Most N-linked and O-linked glycans on VWF are sialylated. FVIII is also glycosylated, with a glycan structure similar to that of VWF. ST3GAL sialyltransferases catalyze the transfer of sialic acids in the α2,3 linkage to termini of N- and O-glycans. This sialic acid modification is critical for VWF synthesis and activity. We analyzed genetic and phenotypic data from the Atherosclerosis Risk in Communities (ARIC) study for the association of single nucleotide polymorphisms (SNPs) in the ST3GAL4 gene with plasma VWF levels and FVIII activity in 12,117 subjects. We also analyzed ST3GAL4 SNPs found in 2,535 subjects of 26 ethnicities from the 1000 Genomes (1000G) project for ethnic diversity, SNP imputation, and ST3GAL4 haplotypes. We identified 14 and 1,714 ST3GAL4 variants in the ARIC GWAS and 1000G databases respectively, with 46% being ethnically diverse in their allele frequencies. Among the 14 ST3GAL4 SNPs found in ARIC GWAS, the intronic rs2186717, rs7928391, and rs11220465 were associated with VWF levels and with FVIII activity after adjustment for age, BMI, hypertension, diabetes, ever-smoking status, and ABO. This study illustrates the power of next-generation sequencing in the discovery of new genetic variants and a significant ethnic diversity in the ST3GAL4 gene. We discuss potential mechanisms through which these intronic SNPs regulate ST3GAL4 biosynthesis and the activity that affects VWF and FVIII.


Introduction
Von Willebrand factor (VWF) in the subendothelium tethers platelets to the site of vascular injury to initiate hemostasis, and protects coagulation factor VIII (FVIII) from enzymatic degradation [1;2]. VWF also contributes to thrombosis at the site of a ruptured atherosclerotic plaque and to platelet aggregation induced by high fluid shear stress in the area of severe vascular stenosis [3]. VWF and FVIII are synthesized primarily in endothelial cells [4][5][6][7][8]. Baseline levels of VWF and FVIII vary considerably among individuals and are regulated by genetic and environmental factors, including carbohydrate structures on the two molecules.
VWF and FVIII are extensively glycosylated. Each VWF monomer contains 13 potential Nlinked and 10 O-linked glycosylation sites, with 4 additional glycosylation sites in the propeptide [9]. Together, the carbohydrates account for~20% of the molecular mass of a VWF monomer [10] The complex types of biantennary core fucosylated glycans represent~60% of the glycans on VWF, as compared to 13% represented by ABO-related glycans [11][12][13] Most of the N-linked and O-linked glycans on VWF are sialylated [12;14;15]. Enzymatically desialylated VWF is more adhesive [16;17] has an altered rate of cleavage by the metalloprotease ADAMTS-13 [18], and is rapidly cleared from the circulation through an asialoglycoprotein receptor [19]. Hypo-sialylated VWF is detected in the plasma of patients with pre-capillary pulmonary hypertension and those exposed to sialidase following microbial infection [19;20] FVIII is also glycosylated, with complex-type biantennary core fucosylated oligosaccharides, of which 80 to >90% carry at least one sialic acid [21][22][23] However, the functional importance of FVIII sialylation remains poorly understood.
Golgi-resident sialyltransferases of the ST3GAL family are type II membrane enzymes that catalyze the transfer of sialic acids in the α2,3 linkage to termini of N-and O-glycan chains. Six genes encoding these sialyltransferases have been identified in the mammalian genome (ST3GAL1-4) [24] Inactivating the murine St3gal4 sialyltransferase gene results in bleeding associated with an autosomal-dominant reduction in plasma VWF levels, with a minimal impact on VWF multimeric structures [25] The reactivity of the β-Gal-binding lectin RCA-I to VWF is increased in plasma from St3gal4-null mice because of an increase in the exposure of sub-terminal β-linked Gal on glycan branches [25] Similar results have been reported in rabbits and humans [26] indicating an important role of ST3GAL4 sialyltransferase in the biogenesis and survival of VWF and potentially of FVIII. The human ST3GAL4 gene is located in the q23.3-q24 of Chromosome 11 [27], a locus that has been associated with the development of coronary artery disease [28] The gene spans more than 65 kilobases (kb), with 14 exons ranging from 61 to 679 nucleotides [27] There are 9 alternately spliced transcripts in the coding region of the human ST3GAL4 gene, with tissue-specific patterns of expression [29]. The ST3GAL4 RNA is widely expressed in cells and tissues, including megakaryocytes and platelets, with the highest levels being found in the small intestine and colon [25]. Variations in the human ST3GAL4 gene remain undefined, and their influence on VWF, FVIII, and other sialylated proteins is not known.
We analyzed genetic and phenotypic data from the Atherosclerosis Risk in Communities (ARIC) study for the association of single-nucleotide polymorphisms (SNPs) in the human ST3GAL4 gene with the plasma level of VWF and with activity of FVIII in 12,117 subjects. We also analyzed ST3GAL4 SNPs from 2,535 subjects of 26 ethnicities from the 1000 Genomes (1000G) database, which has been interrogated with next-generation sequencing (NGS) technology for ethnic diversity, SNP imputation and construction of ST3GAL4 haplotypes.

Study population
ARIC (https://www2.cscc.unc.edu/aric/desc) is an ongoing prospective cohort study designed to assess subclinical atherosclerosis and clinical atherosclerotic events [30] Baseline samples were collected from 1987 to 1989, from 15,792 adults aged 45 to 64 who were selected using probability sampling from Forsyth County, North Carolina; Jackson, Mississippi; the northwestern suburbs of Minneapolis, Minnesota; and Washington County, Maryland.
The 1000G (http://www.1000genomes.org/) project is designed to identify genetic variants that have frequencies of at least 1% in multi-ethnic populations using NGS. We analyzed ST3GAL4 variants from the April 2012 Integrated Variant Set release of the 1000G project (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.wgs.phase1_release_v3. 20101123.snps_indels_sv.sites.vcf.gz). The data were collected from 2,535 non-diseased subjects of 26 ethnicities and from four continents. Exonic regions of the genomes were sequenced at a high coverage rate (average >20X), and the whole genome was shotgun-sequenced at a low coverage rate (2-6X). The false discovery rate was estimated at 1.6% for exonic SNPs, 1.8% for non-coding SNPs, and <5% for insert-deletions (indels) [31] VWF antigen and FVIII activity VWF antigen was determined with a commercial ELISA kit from American Bioproducts (Parsippany, NJ) and reported as a percentage of the Universal Coagulation Reference Plasma (Thromboscreen, Pacific Hemostasis, Curtin Matheson Scientific, Inc, Wooddale, IL) [32] The VWF measurement was taken during the first visits of ARIC subjects when they were recruited, from 1986 to 1989, and was, therefore, not defined as the international units widely used today. FVIII activity was measured using a commercial kit (George King Biomedical Inc. Overland Park, KA), defined as the ability of a testing plasma sample to correct the clotting time of human FVIII-deficient plasma, and reported as a percentage of normal plasma [33] The reliability coefficient (one minus intra-individual variance, divided by total variance), obtained from repeated tests on the same individuals over several weeks, was 0.68 for VWF and 0.86 for FVIII [32;33] The data were adjusted for covariates that are known to influence VWF and FVIII [33][34][35] including age, race, gender, body mass index (BMI), hypertension, diabetes, ever-smoking status, and ABO genotype.

Data analysis
All data were analyzed using SAS Proc LIFETEST or SAS Proc PHREG. The association of ST3GAL4 SNPs with VWF antigen, FVIII activity, and FVIII-VWF ratio were evaluated using one-way ANOVA to test for mean differences among three genotypes of each SNP and for each of the four groups defined by race and gender (EA male, EA female, AA male, and AA female). A multiple linear regression model was used to define a relationship between genotype and outcome before and after adjustments for the covariates. We also tested ABO blood type by SNP interaction in the linear model in order to determine whether the association between SNP and outcome differs significantly by blood type, as these conform to different carbohydrate structures on the H antigen. With a Bonferroni adjustment for the number of SNPs tested, a p < 0.0038 was considered to be statistically significant for the association of a given genotype with VWF antigen and FVIII activity. A p-value between 0.05 and 0.0038 was considered to be nominally significant.
For each SNP, we also tested, through an additive model, to determine whether the number of minor alleles was additive. We also determined the allelic dosing effect by calculating whether having two copies of the allele had twice the effect of one copy (additive), larger than would be predicted by twice the effect of one copy ("more than additive") or smaller effect than predicted by twice the effect of one copy ("below additive"). Finally, we also used HaploReg (http://www.broadinstitute.org/mammals/haploreg/haploreg.php) to predict the effect of 13 non-coding variants in intron 1 of the ST3GAL4 gene on the regulatory regions of the gene.

Subjects included in the study
Of the 15,792 ARIC participants, 3,675 were excluded from analysis because they (1) were neither European American (EA) nor African American (AA) (n = 48), (2) did not give consent for genetic studies (n = 45), or (3) randomly lacked the following data: FVIII activity, VWF antigen, or both (n = 280); ST3GAL4 SNPs (n = 2607), or ABO genotypes (n = 640). The final analysis included 12,117 subjects. Table 1 reports the observed means for VWF, FVIII and the FVIII-VWF ratio by race and sex group for these subjects. Consistent with previous reports [35;36] VWF levels and FVIII activities varied significantly among the four race-by-gender groups. AA subjects had significantly higher VWF levels and FVIII activity than EA subjects (p < 0.001 and p < 0.003, respectively). Females had slightly higher VWF levels and FVIII activity than males. The ABO blood groups for the race-by-sex groups are listed in S1 Table.   Table). Among these, the SNPs rs2186717 and rs7928391 were completely linked (S3 Table). All 14 SNPs were genotyped in EA subjects, but only 4 SNPs were genotyped in AA subjects, and the rest were imputed. Thirteen of these SNPs (92.9%) were intronic, located in the first intron, and rs2298475 was exonic in exon 5.
To increase the number of ST3GAL4 SNPs for association studies and to investigate their ethnic diversity, we also examined the 1000G database, which includes genotype data from 2,535 subjects of 26 ethnicities sequenced with NGS [37] Studying the ethnic diversity may be critical for causal effects of a given genetic variant as several studies [38;39], including our own [40], have shown that some of known VWF mutations associated with the bleeding disorder von Willebrand disease in EA subjects have minor allele frequencies of 10-20% in AA subjects, making them unlikely to cause the disease in African Americans. The findings suggest that the functional influence of a given variant can be enhanced or reduced by its association with specific haplotypes that are defined by ethnically diverse variations. We identified 1,714 variants in the ST3GAL4 gene, including 986 novel variants whose identifiers were yet to be assigned by dbSNP. The locations of these SNPs in the ST3GAL4 gene are shown in Fig 1. Eleven of the 14 ST3Gal4 SNPs from the ARIC GWAS database were found in the 1000G database. Table 3 lists allele frequencies of these 11 SNPs among 10 representative ethnicities from America, Asia, Europe, and Africa, with 5 of them (46%) having significant ethnic diversity (>10-fold difference in allele frequencies among ethnicities, marked in grey).
Haplotypes of the ST3GAL4 SNPs were constructed separately for EA and AA subjects ( Fig  2). The SNP rs629882 was removed from subsequent analyses because it failed Hardy-Weinberg equilibrium testing (p-value = 5.19 × 10 −6 ). We also used 1000G phased genotype data for European (EUR, n = 379) and African subjects (AFR, n = 246) for genotype imputations. The concordance rates defined by variations of alternative allele frequency between known  Table) and EA (S5 Table) subjects, respectively, indicating a high imputation accuracy.

Association of ST3GAL4 SNPs with VWF antigen
Eleven ST3GAL4 SNPs were significantly associated with the plasma level of VWF antigen before adjustments for age, BMI, hypertension, diabetes, ever-smoking status, and ABO ( Table 4). The association remained for two completely linked SNPs (rs2186717 and rs7928391) after adjustment for these covariates. The SNPs rs11220465 and rs4601794 were associated with VWF levels with nominal significance. The stratified analyses further showed that only EA females had a statistically significant difference in mean VWF antigen levels across genotypes for the two linked SNPs. Multiple linear regression modeling for the ABO and SNP interaction suggested that the association between ST3GAL4 SNPs and VWF was not modified by ABO blood groups. Association of ST3GAL4 SNPs with VWF and FVIII

Association of ST3GAL4 SNPs with FVIII activity
We also identified 11 SNPs that were significantly associated with FVIII activity across genotypes before adjustment for the covariates ( Table 5). The association with FVIII remained after adjustment for the covariates for rs2186717, rs7928391, and rs11220465, which were also associated with VWF after the adjustments. In addition, rs4601794 again showed nominally significant association with FVIII. ABO was not adjusted because it has a very weak influence on FVIII [41;42] None of the ST3GAL4 SNPs had significant association with the FVIII-VWF ratio across genotypes (data not shown), but the p-value for rs2186717 and rs7928391 was 0.007 before adjustment for environmental covariates.  We also measured allelic additive effects of these SNP for VWF antigen and FVIII activity by counting the number of minor alleles for each SNP. We identified several SNPs that were additive (Table 6). In addition, we also identified SNPs that were "more than additive" or "below additive", which indicate that the additive effect was greater or less than what would be predicted by twice the effect of a single copy, respectively.

Discussion
We analyzed the ARIC database for association of ST3GAL4 SNPs with VWF antigen and FVIII activity. We also examined the 1000G database for genotype imputation and for ethnic diversity among ST3GAL4 SNPs. The rationale for the study was: (1) the ST3GAL4 gene encodes a sialyltransferase that adds sialic acids to termini of N-and O-glycan chains on a glycoprotein backbone, (2) VWF and FVIII contain biantennary core fucosylated glycans [11][12][13] and are modified by sialyltransferases [12;14;15] (3) removing sialic acids from VWF alters its biosynthesis and adhesive activity [43;44], and (4) inactivating the ST3GAL4 gene results in a reduction in circulating VWF antigen in mice due to accelerated clearance [25] In this association study of two large adult samples, we made several novel observations. First, 14 and 1,714 ST3GAL4 SNPs were identified in the ARIC GWAS and 1000G databases respectively. The finding highlights the power of NGS for discovering new genetic variants. The density of variants in the~65 kb ST3GAL4 gene is consistent with that of the whole genome. However, the first intron, which contains approximately 75% of the nucleotides in the human ST3GAL4 gene [27], contains all but one of the 14 SNPs found in the ARIC GWAS (Table 2) and an overwhelming majority of the SNPs found in the 1000G database (Fig 1). The large number of ST3GAL4 variants found in 1000G permitted the construction of specific ST3GAL4 haplotypes and allowed imputation to increase the detectable variants in the ARIC GWAS database. The study lays the foundations for studying more variations in the ST3GAL4 gene and their associations with VWF biology and disease states when more data becomes available from ARIC exome and whole-genome sequencing. Second, rs2186717, rs7928391 SNPs, and rs11220465 (the first two are perfectly linked) were associated with VWF levels and FVIII activity after adjustment for environmental covariates (Tables 4 & 5). The intronic SNPs rs2186717, rs7928391 SNPs, and rs11220465 are clustered in a region of less than 4000 bp in the first intron (<2% of the human ST3GAL4 gene). The SNP rs4601794 shows a weak association, but it is close to the three clustered SNPs so that their effects, while minor individually, can be additive with or enhanced by other SNPs in specific haplotypes. In fact, this additive effect was found in 5 SNP for VWF antigen and 6 SNPs for FVIII activity (Table 7). None of the SNPs is in a known intron-exon junction, so they unlikely affect RNA splicing. However, they may regulate ST3GAL4 biosynthesis and activity through other mechanisms. For example, rs2186717 is in a GT-rich sequence that is prone to homologous recombination [45] and is a preferred sequence for binding human DNA strand exchange protein, which is involved in the recombination process [46] Because of its location in the 5'-untranslated sequence [27] the first intron may also contain binding sites for transcription factors. A GT rich sequence has indeed been reported to be the substrate for the transcription factor Sp1 [47] which is reported to be important for transcribing the ST3GAL1 gene [48] Similarly, the transcriptional regulation of the human ST3GAL4 gene results in two mRNA species defined by transcription-factor binding to sites in the non-coding first exon and first intron [49]. Finally, using the in silico program HaploReg (http://www.broadinstitute.org/mammals/haploreg/ haploreg.php), we were able to predict the effect of 13 non-coding variants in intron 1 of the ST3GAL4 gene as regulatory SNPs (SNP effect on regulatory motifs, Table 6). These 13 variants can potentially modify at least one type of regulatory motif, suggesting that they are likely to have regulatory functions in haplotype blocks.
The genotype and phenotype associations identified in this study suggest that these SNPs influence the biogenesis and enzymatic activity of ST3GAL4 sialyltransferase, possibly causing differential sialylation on VWF and FVIII. Both VWF and FVIII are major sialylated proteins in the circulation, and their clearances are partly regulated by the asialoglycoprotein receptormediated endocytosis [19;25;50] In addition, ST3GAL4 SNPs may affect the synthesis and survival of FVIII indirectly by regulating VWF, which forms a protective complex with FVIII in the circulation. As shown in Tables 4 & 5, the impacts of the individual correlative SNPs on VWF antigen and FVIII activity are small, but are probably additive because they are closely clustered. The small effect of the ST3GAL4 SNPs may also be attributed to the compensatory activity of other homologous sialyltransferases and their alternative spliced forms.
In summary, we have identified novel SNPs in the ST3GAL4 gene that are associated with VWF levels and FVIII activity. Although predominantly intronic, these SNPs may influence the synthesis and activity of ST3GAL4 sialyltransferase through different pathways. This association study lays the foundation for biological experiments to determine how these SNPs affect Table 6. Allelic additive effects of the ST3GAL4 SNPs for VWF and FVIII*.

SNP
Additive for VWF Additive for FVIII the expression and activity of the ST3GAL4 sialyltransferase. It will also be helpful to the study of variations in other genes in this family of sialyltransferases (ST3GAL1-4) [24] and their influence on the biogenesis and survival of VWF, FVIII, and other sialylated proteins.
Supporting Information S1  Association of ST3GAL4 SNPs with VWF and FVIII
Formal analysis: CX JSP GL DC FY.