Genetics of the thrombomodulin-endothelial cell protein C receptor system and the risk of early-onset ischemic stroke

Background and purpose Polymorphisms in coagulation genes have been associated with early-onset ischemic stroke. Here we pursue an a priori hypothesis that genetic variation in the endothelial-based receptors of the thrombomodulin−protein C system (THBD and PROCR) may similarly be associated with early-onset ischemic stroke. We explored this hypothesis utilizing a multi-stage design of discovery and replication. Methods Discovery was performed in the Genetics-of-Early-Onset Stroke (GEOS) Study, a biracial population-based case-control study of ischemic stroke among men and women aged 15–49 including 829 cases of first ischemic stroke (42.2% African-American) and 850 age-comparable stroke-free controls (38.1% African-American). Twenty-four single-nucleotide-polymorphisms (SNPs) in THBD and 22 SNPs in PROCR were evaluated. Following LD pruning (r2≥0.8), we advanced uncorrelated SNPs forward for association analyses. Associated SNPs were evaluated for replication in an early-onset ischemic stroke population (onset-age<60 years) consisting of 3676 cases and 21118 non-stroke controls from 6 case–control studies. Lastly, we determined if the replicated SNPs also associated with older-onset ischemic stroke in the METASTROKE data-base. Results Among GEOS Caucasians, PROCR rs9574, which was in strong LD with 8 other SNPs, and one additional independent SNP rs2069951, were significantly associated with ischemic stroke (rs9574, OR = 1.33, p = 0.003; rs2069951, OR = 1.80, p = 0.006) using an additive-model adjusting for age, gender and population-structure. Adjusting for risk factors did not change the associations; however, associations were strengthened among those without risk factors. PROCR rs9574 also associated with early-onset ischemic stroke in the replication sample (OR = 1.08, p = 0.015), but not older-onset stroke. There were no PROCR associations in African-Americans, nor were there any THBD associations in either ethnicity. Conclusion PROCR polymorphisms are associated with early-onset ischemic stroke in Caucasians.

Funding: This study was supported in part by NIH grants U01 NS069208, R01 NS100178, and R01 NS105150; an Epidemiology of Aging Training Program Grant, NIH/NIA T32 AG000262; the U.S. Department of Veterans Affairs, and the American Heart Association Cardiovascular Genome-Phenome Study (grant# 15GPSPG23770000), and an American Heart Association Discovery Grant supported by Bayer Group (grant# 17IBDG33700328). Further details regarding the data collection, organization, funding and relationships between METASTROKE and the other studies involved can be found below. Genetics of Early Onset Stroke (GEOS) Study (Baltimore, USA): GWAS data for the GEOS Study was supported by the National Institutes of Health Genes, Environment and Health Initiative (GEI) grant U01 HG004436, as part of the GENEVA consortium under GEI, with additional support provided by the Mid-Atlantic Nutrition and Obesity Research Center (P30 DK072488); and the Office of Research and Development, Medical Research Service, and the Baltimore Geriatrics Research, Education, and Clinical Center of the Department of Veterans Affairs. Genotyping services were provided by the Johns Hopkins University Center for Inherited Disease Research (CIDR), which is fully funded through a federal contract from the National Institutes of Health to the Johns Hopkins University (contract number HHSN268200782096C). Assistance with data cleaning was provided by the GENEVA Coordinating Center (U01 HG 004446; PI Bruce S Weir). Study recruitment and collection of datasets were supported by a cooperative agreement with the Division of Adult and Community Health, Centers for Disease Control and by grants from the National Institute of Neurological Disorders and Stroke (NINDS) and the NIH Office of Research on Women's Health (R01 NS45012, U01 NS069208-01). METASTROKE: METASTROKE is a collaboration of numerous international studies with the aim of validating associations from previous GWAS and identifying novel genetic associations through meta-analysis of GWAS datasets for ischemic stroke and its subtypes. Included studies are as follows: ASGC: Australian population control data were derived from the Hunter Community Study.

Introduction
Hemostasis is a dynamic balance between factors that promote clot formation and factors that promote antithrombotic activity and/or fibrinolysis. Central to this balance is the thrombomodulin-protein C antithrombotic system that is located on the endothelial surface, which plays a key role in regulating both coagulation and inflammation. Thrombomodulin forms a 1:1 complex with thrombin on the vascular endothelium, thereby inhibiting the procoagulant actions of thrombin and converting protein C to activated protein C [1]. Activated protein C promotes fibrinolysis, inhibits thrombosis by inactivating coagulation factors Va and VIIIa, and reduces inflammation by decreasing white blood cell and nuclear factor kappa-B activation [2][3][4][5]. The activation of protein C by the thrombin-thrombomodulin complex is enhanced when the substrate protein C is presented by the endothelial cell protein C receptor. These relationships are demonstrated in Fig 1. Given the central role that the thrombomodulin-protein C pathway plays in thrombosis and inflammation, the genes encoding these receptor proteins are promising stroke susceptibility candidate genes. Prior genetic studies across the cardiovascular disease (CVD) spectrum have demonstrated increased risk in younger (vs. older) patients [6], including thrombosis [7]. Variants in other prothrombotic genes have also previously been associated with ischemic stroke, again, more consistently with early-onset versus later-onset disease [8,9,10]. As such, an a priori hypothesis to evaluate these 2 genes in the setting of ischemic stroke was developed and successfully funded. To this end we tested the hypothesis that THBD (OMIM 188040) and PROCR (OMIM 600646) variants are associated with early-onset ischemic stroke using a 2-stage discovery and replication design, and then addressed whether the identified variants also associated with older-onset disease.

Discovery population
The Genetics of Early Onset Stroke (GEOS) Study is a population-based case-control study designed to identify genes associated with early-onset ischemic stroke and to characterize interactions of identified stroke genes and/or SNPs with environmental risk factors. Participants (921 stroke cases and 941 controls) were recruited from the greater Baltimore-Washington area over 4-time periods between 1992-2008 [11]. The population is primarily composed of two self-reported ethnic groups, European-Americans (Caucasians) (EA; 54.5%) and African-Americans (AA; 40.4%), with the remaining 5.1% of individuals comprising other ethnicities including Chinese, Japanese, other Asians, and other unspecified. Stroke cases were hospitalized with a first cerebral infarction identified by discharge surveillance from one of the 59 hospitals in the greater Baltimore-Washington area and direct referral from regional neurologists. Cases were enrolled in either the sub-acute or chronic post-stroke phases as based on previously described case identification and enrollment procedures [8,11]. Ischemic strokes with the following characteristics were excluded from participation: stroke occurring as an immediate consequence of trauma; stroke within 48 hours after a hospital procedure, stroke within 60 days after the onset of a non-traumatic subarachnoid hemorrhage, and cerebral venous thrombosis. The abstracted hospital records of cases were reviewed and adjudicated for ischemic stroke subtype by a pair of neurologists per previously published procedures [12,13], with disagreements resolved by a third neurologist. The ischemic stroke subtype classification system retains information on all probable and possible causes, and is reducible to the more widely used TOAST system [14] that assigns each case to a single category. All cases had age of first stroke between 15-49 years and were recruited within three years of stroke. For these genetic analyses, we included only Caucasians and African-Americans, and excluded cases with known single-gene or mitochondrial disorders recognized by a distinctive phenotype (e.g. cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL), mitochondrial encephalopathy with lactic acidosis and strokelike episodes (MELAS), homocystinuria, Fabry disease, or sickle cell anemia). Additional exclusions included: mechanical aortic or mitral valve at the time of index stroke; untreated or actively treated bacterial endocarditis at the time of the index stroke; neurosyphilis or other CNS infections; neurosarcoidosis; severe sepsis with hypotension at the time of the index stroke; cerebral vasculitis by angiogram and clinical criteria; post-radiation arteriopathy; left atrial myxoma; major congenital heart disease; and cocaine use in the 48 hours prior to their stroke.
Control participants without a history of stroke were identified by random-digit dialing. Controls were balanced to cases by age and region of residence in each study period and were additionally balanced for race/ethnicity in the latter two participant collection periods.
Traditional stroke risk factors and other study variables, including age, race/ethnicity, history of hypertension, diabetes, myocardial infarction (MI) and current smoking status (defined as use within one month prior to event for cases and at a comparable reference time for controls), were also collected during a standardized interview. Age, race/ethnicity, and cigarette smoking status were determined by subject reports (or proxy report, if a participant was unable to answer). Hypertension, diabetes mellitus, and MI were determined by asking study participants (or a proxy) whether a physician had ever told them that they had the condition. This study was conducted with the consent of all study subjects and was approved by the University of Maryland at Baltimore Institutional Review Board.

Genotyping
Genomic DNA was isolated from a variety of sample types, including cell line (55.2%), whole blood (43.1%), mouthwash (0.4%) and buccal swab (0.05%). Whole genome amplification (Qiagen REPLI-g kit, Valencia, CA, USA) was used to obtain sufficient DNA for genotyping in 1.3% of samples. The genotype data implemented in this study was obtained from two fixedcontent SNP panels developed by Illumina (Illumina, San Diego, CA, USA), a genome-wide association (GWA) genotyping array, the HumanOmni1-Quad_v1-0_B BeadChip, and a cardiovascular disease (CVD) SNP panel, the ITMAT-Broad-CARe array, that included THBD and PROCR. Genotyping quality from both arrays was excellent with individual SNP call rates > 98% and a between-panel concordance rate of 99.996% based on study duplicates (for further details please see S1 File) [11].

Analyses
Genetic association analyses were performed using the PLINK statistical software program [15]. Prior to the association analysis we pruned the genotyped SNPs on the basis of linkagedisequilibrium (LD), such that for any SNPs in high LD (r 2 �0.8) we retained only a single representative SNP. This LD pruning was performed within each ethnic group separately using PLINK. Within each ethnic group separately, we then used an additive logistic regression model to test for association of genotype with stroke, adjusting for age and gender, and population structure (principal components from GWAS array or CVD panel). Secondary analyses were performed to determine if any observed associations were more prominent in those with cardiovascular risk based on the presence of the traditional risk factors as described above and previously [11]. For all association analyses, we defined a significant Boferroni-corrected pvalue as p<0.05 divided by the number of gene-and ethnicity-specific independent (LDpruned) SNPs (i.e., p = 0.05 / # independent LD-Pruned SNPs).

Replication and extension to older onset stroke
We sought to replicate any associated SNPs identified in the GEOS Study in an independent set of early-onset stroke studies (the Genetics of Early Onset Stroke Consortium) previously reported by Cheng et al. [16] after excluding the GEOS samples from the replication set, as meta-analyzed implementing the GWAMA program. The studies included in the replication were: CADISP, Cervical Artery Dissection and Ischemic Stroke Patients [17]; MILANO, Besta Stroke Study; RACE, Risk Assessment of Cerebrovascular Events Study; SIFAP, Stroke in Young Fabry Patients; and WTCCC2, Wellcome Trust Case-Control Consortium 2 [16]. The details of each of replication cohort are available in the supplementary data of Cheng et al. [16]. In short, only confirmed ischemic strokes, first ever or recurrent, were included in these studies, TIAs and hemorrhagic strokes were excluded. SNPs whose associations replicated in the Genetics of Early Onset Stroke Consortium were then tested for association with later-or older-onset stroke via in silico lookup in the METASTROKE Consortium [17]; the mean age of stroke onset ranged from 57.3-81.6 years among the 14 contributing cohorts of METAS-TROKE (not including GEOS). Further details regarding the data collection, organization, and relationships between METASTROKE and the other studies involved can be found in the S1 File and S1 Dataset.
The aggregated data that support the findings of this study are available from the corresponding author and participating studies upon reasonable request as listed in the S1 Dataset. Further, each study can be contacted to attain their data individually, and for the NIH funded studies, study data is available via request from the database of Genotypes and Phenotypes (dbGaP) @ https://www.ncbi.nlm.nih.gov/gap/.

Results
Characteristics of the young-onset stroke discovery and replication studies are provided in Table 1. After exclusions, the GEOS Discovery Stage included 448 ischemic stroke cases (mean age stoke-onset = 41.0 yrs) and 498 controls of EA ancestry, and 381 ischemic stroke cases (mean age stroke-onset = 41.9 yrs) and 352 controls of AA ancestry. Further demographic and risk factor characteristics by case-control status for the GEOS Discovery Stage are described in Table A in S1 File.
LD pruning resulted in 13 THBD SNPs in EAs and 13 THBD SNPs in AAs, and an additional 4 THBD SNPs in EAs on the CVD chip; and 5 PROCR SNPs in EAs and 11 PROCR SNPs in AAs (see Table 2 and Table B  Association analyses of the EA revealed a significant association of PROCR rs9574 with ischemic stroke, with the rs9574C allele (MAF case/control = 0.49/0.41) associated with a 1.33-fold increased odds of stroke compared to the G allele (p = 0.003; Table 2). Another independent PROCR SNP rs2069951 was also associated with ischemic stroke significantly in EA (OR = 1.80, P = 0.006). None of the PROCR SNPs were associated with stroke in AA, nor were any associations observed between THBD SNPs and stroke in either ethnic group. An exploratory analyses of stroke subtypes (e.g., TOAST-defined large artery (LA), small artery, cardioembolic, and cryptogenic) did not reveal significant associations with any variants, although the sample sizes were small (range: 33 LA to 230 cryptogenic) in EAs.
To further characterize the EA PROCR associations, we performed a secondary analysis to evaluate the impact of concomitant vascular risk factors. We repeated the association analysis with additional adjustment for vascular risk factors (i.e., hypertension, diabetes mellitus, angina/MI, and current cigarette smoking) and found the association results to be essentially unchanged (data not shown). However, when stratifying for the presence or absence of each vascular risk factor, we observed a stronger, but non-statistically significant, association of rs9574 and rs2069951 with stroke in the absence of each risk factor when considered separately, with the direction of association similar across each risk factor (data not shown). To obtain a more comprehensive picture, we therefore compared subjects with zero vascular risk factors to those with at least one vascular risk factor. In the subset of EA participants without vascular risk factors (n = 167 cases and 315 controls), both SNPs were more strongly associated with stroke (rs9574, OR = 1.50, p = 0.0046; rs2069951, OR = 4.82, p = 0.0002; see Table 2).

Replication of the PROCR rs9574 association with early onset stroke
We sought to replicate the PROCR rs9574 and rs2069951 association in the Early-Onset Stroke Consortium [16], with exclusion of the GEOS study. This replication sample included 3,676 cases and 21,118 controls. Only rs9574 replicated; the effect allele frequency of rs9574C was 0.39, with association analyses demonstrating an OR of 1.08 (p = 0.015) ( Table 2). The ischemic stroke replication results of the LD-pruned PROCR SNPs among Caucasians in the young-onset stroke consortium, inclusive and exclusive of GEOS, are shown in Table C in S1 File. There was no significant correlation between stroke subtypes and PROCR rs9574 in the replication samples.

Older-onset stroke
To determine if PROCR rs9574 and/or the other previously identified LD-pruned SNPs were associated with older-onset stroke, these SNPs were evaluated in the METASTROKE cohort [18]. Lookups found no replication of these SNPs with ischemic stroke or in any subtype (data not shown).

Discussion
We observed a significant association between PROCR rs9574 and early-onset ischemic stroke that replicated in a large independent sample of early-onset ischemic stroke. Prior studies have demonstrated that mutations in PROCR have been associated with venous thromboembolism Thrombomodulin-endothelial cell protein C receptor genetics and ischemic stroke risk (VTE) [19] and myocardial infarction [20,21], as well as with late fetal loss during pregnancy [22]. Specific to ischemic stroke, PROCR associations have been inconsistently reported [23,24], which may in part be related to variations in the age and ethnicity of the populations evaluated. Our study is the first to specifically identify and replicate PROCR associations in a youngonset ischemic stroke population of European descent. Our failure to detect an association in GEOS AA may reflect low power and/or that a true causal variant is not well tagged in African-Americans. Our findings add to the growing evidence that prothrombotic mechanisms may be more important for younger compared to older onset stroke as demonstrated with other established prothrombotic variants including Prothrombin G20210A [8], Factor XI [10] and Factor V Leiden [25]. This is also in line with the lack of association we observed between PROCR rs9574 and older onset stroke in the METASTROKE lookup. Our findings are also consistent with the hypothesis that PROCR (and perhaps by analogy other thrombosis-related genes) may be more relevant, or easier to detect, in the setting of a paucity of standard vascular risk factors, as these factors may induce risk via non-thrombotic mechanisms. This may again also partially explain why we did not see replication in the older-onset METASTROKE population, which also has a greater vascular risk factor burden. Differing genetic mechanisms may also partially explain why African-Americans did not demonstrate the associations seen in their Caucasian counterparts. Strengths of our study include the well-phenotyped and relatively large young-onset discovery sample size, as well as the large replication sample. Notably, GEOS cases are part of the METASROKE as were other young-onset strokes, yet despite the inclusion of the GEOS samples, there was no association seen in this primarily older-onset cohort. A potential study limitation relates to the replication sample, which is predominantly of European rather than North-American origin, although the MAFs were roughly similar on both sides of the Atlantic. Another limitation is that our discovery population-based design, with recruitment at over 50 regional hospitals, precluded consistent assessment of the presence of patent foramen ovale (PFO) and potential paradoxical embolism among cases, given PROCR genetics are known to increase the risk of venous thromboembolism. This is important because an established mechanism by which PROCR variation could cause ischemic stroke is via venous thrombosis and paradoxical embolization. Our study was also limited to non-fatal ischemic strokes, so the possibility that our findings are due to a survival bias cannot be ruled out; though this is unlikely given the low case-fatality rate in this population [26]. Another limitation is that our study provides no information about the role of PROCR in ischemic stroke among young adults with a personal or family history of prior early-onset thrombotic events. Lastly, while some intronic mutations can affect gene expression levels by introducing novel splice sites, activating novel promoters (which may direct sense or antisense transcription causing alterations in mRNA, miRNA or lncRNA expression), or by introducing/eliminating enhancer activity, our study does not provide any such detailed mechanistic analyses. Despite these shortcomings, we have identified several PROCR variants in strong LD that associate and replicate with ischemic stroke among young Caucasians. While these findings are interesting, it is too early to assess their clinical implications regarding anticoagulation and/or genetic testing, as examples. Further replication and research are required to better understand these findings.

Conclusion
PROCR, but not THBD, polymorphisms are associated with early-onset ischemic stroke in young Caucasians.
Supporting information S1 File. Supplementary information. (Table A) GEOS Characteristics by case-control status. (Table B) Results of linkage-disequilibrium pruning by ethnic group using the PLINK. SNPs in high LD (r2�0.8) retained only a single representative SNP. None of the listed SNPs here were associated with all-ischemic stroke (results not shown) in the GEOS Discovery population. (Table C)