Multi-omic analysis of stroke recurrence in African Americans from the Vitamin Intervention for Stroke Prevention (VISP) clinical trial

African Americans endure a nearly two-fold greater risk of suffering a stroke and are 2–3 times more likely to die from stroke compared to those of European ancestry. African Americans also have a greater risk of recurrent stroke and vascular events, which are deadlier and more disabling than incident stroke. Stroke is a multifactorial disease with both heritable and environmental risk factors. We conducted an integrative, multi-omic study on 922 plasma metabolites, 473,864 DNA methylation loci, and 556 variants from 50 African American participants of the Vitamin Intervention for Stroke Prevention clinical trial to help elucidate biomarkers contributing to recurrent stroke rates in this high risk population. Sixteen metabolites, including cotinine, N-delta-acetylornithine, and sphingomyelin (d17:1/24:1) were identified in t-tests of recurrent stroke outcome or baseline smoking status. Serum tricosanoyl sphingomyelin (d18:1/23:0) levels were significantly associated with recurrent stroke after adjusting for covariates in Cox Proportional Hazards models. Weighted Gene Co-expression Network Analysis identified moderate correlations between sphingolipid markers and clinical traits including days to recurrent stroke. Integrative analyses between genetic variants in sphingolipid pathway genes identified 29 nominal associations with metabolite levels in a one-way analysis of variance, while epigenomic analyses identified xenobiotics, predominately smoking-associated metabolites and pharmaceutical drugs, associated with methylation profiles. Taken together, our results suggest that metabolites, specifically those associated with sphingolipid metabolism, are potential plasma biomarkers for stroke recurrence in African Americans. Furthermore, genetic variation and DNA methylation may play a role in the regulation of these metabolites.

Introduction informed consent and a subset of 2,100 participants agreed to be included in subsequent genetic studies. IRB approval from the University of Virginia and East Carolina University was obtained for the genetic, epigenetic, and metabolomic components.

The Vitamin Intervention for Stroke Prevention (VISP) trial
The Vitamin Intervention for Stroke Prevention (VISP) clinical trial was a multi-centered, double-blinded, randomized and controlled clinical trial designed to determine whether a combination pyridoxine (vitamin B6), cyanocobalamin (vitamin B12), and folic acid (vitamin B9) supplementation reduced recurrent cerebral infarction, myocardial infarction (MI), or fatal coronary heart disease [8]. Participants were enrolled within 120 days of suffering a nondisabling cerebral infarction, assigned a daily high-dose or low-dose B-vitamin formulation, and followed for two years. VISP inclusion/exclusion criteria are listed below.
Inclusion criteria included: 1. Participants aged 35 years or older with elevated baseline homocysteine levels at or above the 25th percentile; 2. Participant enrollment within 120 days of suffering a non-disabling cerebral infarction characterized by the sudden onset of a neurological deficit persisting at least 24 hours and observed on CT or MRI; 3. Geographical accessibility for follow-up; 4. Adequate means of transportation; 5. Compliance of 75% or greater with vitamin regiment in the run-in period and 6. Patient agreement to take study medication and avoid other vitamin supplements containing folic acid and B6 VISP exclusion criteria included: 1. Stroke due to any form of intracranial hemorrhage, dissection of a cervico-cephalic artery, veno-occlusive disease, drug abuse, or vasculitis; 2. CT or MRI of brain showing lesion other than ischemic infarction as cause; 3. Modified Rankin Stroke Scale (RSS) score of 4-5 at time of eligibility determination; 4. Presence of specific potential sources of cardiogenic emboli, such as atrial fibrillation within 30 days of stroke or history of prosthetic cardiac valve, intracardiac thrombus or neoplasm, or valvular vegetation; 5. Presence of major neurologic illness apart from stroke that would prevent proper evaluation of recurrent stroke; 6. Presence of cancer, pulmonary disease, or other illness which, in the opinion of the study physician, would limit the life expectancy of the patient to less than two years; 7. Severe congestive heart failure; 8. Renal insufficiency requiring dialysis; 9. Untreated pernicious anemia or untreated B12 deficiency; 10. Uncontrolled hypertension defined as systolic blood pressure >185 mm or diastolic >105 mm on two readings separated by five minutes at time of eligibility determination; 11. Conditions that prevent reliable participation in the study, such as refractory depression, severe cognitive impairment, alcoholism, or other substance abuse; 12. Use of medications, within the last 30 days, that affect homocysteine, such as methotrexate, tamoxifen, L-dopa, or phenytoin or bile acid sequestrants that can decrease folate levels 13. Woman of childbearing potential, defined as not having reached natural or surgical menopause or having had tubal ligation; 14. Participation in another trial in which active intervention is being received; 15. Participants on multivitamin supplements or single vitamins of B6 or folic acid were excluded unless they were willing to take the study supplements in place of the one(s) they usually took and/or 16. Any surgical procedure requiring a general anesthesia or hospital stay of three days or more, any type of invasive cardiac instrumentation, or an endarterectomy, stent placement, thrombectomy, or any other endovascular treatment of an abnormal carotid artery performed within 30 days prior to randomization or scheduled to be performed within 30 days after randomization [8].
In this study, global metabolite data were generated from serum samples of 50 AAs, enriched for recurrent stroke cases (N = 28). When possible, nonrecurrent controls were matched with recurrent stroke cases for age (within eight years), sex, number of cigarettes smoked per day (within 5 cigarettes), and disability measured by the modified Rankin Stroke Scale (RSS; within two points) (S1 and S2 Tables).

Stroke recurrence, vascular outcomes, and clinical trait definition
In this study, recurrent stroke was defined as an acute neurological ischemic event of at least 24 hours duration with focal signs and symptoms and without evidence of primary intracranial hemorrhage or alternate explanation. Additionally, one of the following was present: 1) a onepoint increase in the NIH stroke scale (NIHSS) in a previously normal section or 2) a new or 3) extended abnormality seen on CT or MRI. Diagnoses were reviewed by a local neurologist, two endpoint reviewers, and on a case-by-case basis, a Stroke Endpoint Review Committee. Underlying cause of death was decided by a Death Review Committee composed of physicians independent of VISP. Decisions were based on information available from hospital records, death certificates, coroners' reports, or physicians' questionnaires. Ischemic stroke subtype information is not available [8,9]. Composite vascular event was defined as fatal coronary heart disease, a nonfatal hospitalized myocardial infarction (MI), and resuscitation for cardiac collapse, coronary bypass surgery, coronary angioplasty, or VISP recurrent stroke. Fatal/disabling stroke, MI, or death (FDMD) was defined as an outcome variable which identified individuals who suffered a disabling recurrent stroke or MI during the clinical trial or those who died from any cause. All of these outcomes were expressed as dichotomous variables and days from trial randomization to vascular event were recorded as time to event.
The "recurrent stroke ever" variable indicated whether an individual had suffered a stroke in addition to the VISP enrollment stroke. The MI variable was indicative of an individual suffering a MI during the VISP clinical trial. Known stroke risk factors such as the number of cigarettes smoked per day, smoking status (at enrollment and ever), blood pressure (BP) measurements (systolic and diastolic), self-reported hypertension (HTN) status, diabetes status, the number of strokes prior to VISP, the treatment arm, age, sex, and body mass index (BMI) were included in analyses. Additionally, four stroke severity/functional scales were included: Mini-Mental State (MMS) status scale, NIHSS, RSS, and Barthel Stroke Questionnaire Form (BAR).

Metabolomics
Global, untargeted metabolic profiling of 922 metabolites for the 50 AA VISP participants was performed by Metabolon, Inc (Durham, NC) using gas chromatography-mass spectrometry and liquid chromatographic-mass spectrometry protocols, as previously described [10]. Metabolite levels were measured using the fasting serum plasma samples from the baseline VISP visit.
Supernatants from a methanol extraction were used for analyses by two reverse phase (RP) ultra-performance liquid chromatography tandem mass spectrometry (UPLC-MS/MS) methods with positive ion mode electrospray ionization (ESI); (RP) UPLC-MS/MS with negative ion mode ESI; and hydrophilic interaction liquid chromatography/UPLC-MS/MS with negative ion mode ESI. All methods utilized a Waters ACQUITY UPLC system (Waters Corporation, Milford, MA) and a Thermo Scientific Q-Exactive high resolution/accurate MS interfaced with a heated ESI source and Orbitrap mass analyzer (ThermoFisher Scientific, Waltham, MA). Raw data was extracted, peak-identified and processed using Metabolon's proprietary hardware and software. Compounds were identified by comparison to library entries of purified standards or recurrent unknown entities that contain the retention time/index, mass to charge ratio, and chromatographic data, including tandem MS spectral data, on all molecules present in the library (S1 Appendix). Metabolite peaks were quantified using areaunder-the-curve, and a data normalization step was performed to correct variation resulting from instrument inter-day tuning differences, setting the medians equal to one. A log transformation was performed on the normalized data for subsequent analysis.
Instrument variability was determined by calculating the median relative standard deviation (RSD) for the internal standards that were added to each sample prior to injection into the mass spectrometers (median RSD = 3%). Overall process variability was determined by calculating the median RSD for all endogenous metabolites (median RSD = 9%). These values for instrument and process variability met Metabolon's acceptance criteria.

Genetic association of selected variants
Previously, 2100 VISP participants were genotyped on the Illumina HumanOmni1-Quad_v1-0_B BeadChip (Illumina, Inc) [9,15]. We selected 556 variants spanning ±10 kb of 24 genes associated with sphingolipid metabolism and the sphingomyelinase pathway (S3 Table). Individual level VISP genetic, metabolomics, and epigenetics data is considered sensitive controlled data and cannot be shared publicly, as specified by the IRBs of Wake Forest University School of Medicine, the University of North Carolina at Chapel Hill School of Medicine, and the University of Virginia School of Medicine, along with the NIH Data Access Committee. Controlled-access data can only be obtained if a user has been authorized by the appropriate Data Access Committee (DAC). The individual level Genomics and Randomized Trials Network (GARNET) VISP data are available in the database of Genotypes and Phenotypes (dbGaP) (Accession: phs000343.v3.p1) and can be requested through the dbGaP Authorized Access System (https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login), the Cerebrovascular Disease Knowledge Portal (https://cd.hugeamp.org/), and the Coriell Institute for Medical Research (https://catalog.coriell.org/). The authors will also share the data on request.

Statistical analysis
Baseline characteristics of the study participants with and without recurrent stroke were compared using t-tests and chi-squared (χ 2 ) tests for continuous and categorical traits, respectively. Univariate Welch's two sample t-tests were performed using all VISP participants (N = 50) and compared recurrent stroke, recurrent stroke ever, composite vascular event, and smoking statuses. VISP recurrent stroke analyses were stratified by sex, treatment arm, diabetes status (DM), and/or smoking status (S4 Table). A Bonferroni correction accounting for the number of metabolites tested was applied to determine the significance threshold of p� 5.42e-05 (error rate of 0.05 divided by the total number of metabolites, or = 0.05/922). A suggestive threshold of p�4.42e-04 (= 0.05/113 total number of metabolite sub-pathways) was also implemented. Statistical power was calculated as 0.786, assuming a large effect size (Cohen's d = 0.8) using the "pwr" package in R [16,17].
Matched pair analyses on a subset of 44 VISP participants (22 recurrent, 22 non-recurrent matched on age, sex, cigarettes smoked per day, and enrollment stroke severity) was conducted utilizing a matched-pair t-test. Statistical and suggestive significance was calculated as p�5.42e-05 (Bonferroni adjustment described above) and p� 2.27e-03 (= 0.05/22 matched pairs), respectively. Statistical power of 0.8 was reached using a two-sided test, an effect size of 0.626, alpha = 0.05, and n = 22 matched pairs [16,17].
Cox proportional hazards (PHs) regression models were used to identify metabolites associated with time to event for VISP recurrent stroke or composite vascular event and adjusted for age, sex, and current smoking. Sensitivity models were formed adjusting for treatment and diabetes status. Conditional PH models adjusting for matched pairs was performed on the 22 pairs. Statistical and suggestive significance was determined as p�5.42e-05 and p�4.42e-04, respectively (see above for threshold determination) [18].
In the metabolite cluster analyses, modules comprising the 922 metabolites were constructed using a soft-threshold power of 5. Pearson correlations between the first principal component (module eigenvalue) and trait (clinical or methylation beta values) were performed. WGCNA on the methylation profiles were performed and correlations between these modules and the metabolite profiles or clinical traits used the same quality control steps above. These modules were calculated using the blockwise module function, a soft-threshold power of 14, and a maximum block size of 10,000. Parameters for the module construction for all analyses consisted of the signed topographical overlap matrix type, a signed-hybrid network model, and the minimum number of loci set to 30.
A multiple linear regression analysis adjusting for age, sex, batch effect, current smoking and estimated cellular proportions [20,21] was performed to identify inferentially methylated CpG loci associated with metabolite measurements. Statistical and suggestive significance was determined at a threshold of p�1.14e-10 (= 0.05/ (473864 loci � 922 metabolites)) and p�1.06e-07 (= 0.05/473856 CpG loci), respectively. Using the large effect size for regression (f2 = 0.35) with alpha of 0.05, degrees of freedom in the numerator and denominator of 13 and 35, respectively, resulted in a power of 0.6232 [16,17]. All analyses were performed in R, version 3.5 [22].

Baseline demographics for VISP metabolomics subset
Baseline characteristics of VISP study participants with and without recurrent stroke were compared using χ2 and t-tests for categorical and continuous variables, respectively, and are presented in Table 1. Of the 50 participants, 56% (N = 28) suffered a recurrent stroke during the 2-year follow-up. The average baseline age of participants suffering a VISP recurrent stroke was approximately 1.5 years older than those who did not have stroke recurrence at ages 65.07 (standard deviation, SD 11.59) versus 63.68 (9.76), respectively. Those individuals who experienced a recurrent event had worse enrollment stroke severity based on the Modified Rankin Stroke Scale. Only one non-recurrent participant had moderate disability described as a Rankin score of 3, while 11 (39%) of the individuals who experienced stroke recurrence during the trial had moderate disability after the enrollment stroke (p<0.001).

Identification of metabolites associated with VISP stroke recurrence
Global, untargeted metabolic profiling of 922 metabolites was performed on each of the VISP participant serum samples. A series of statistical tests utilizing the metabolite profiles of VISP participants were performed, including a standard Welch's t-test to account for unequal sample sizes and variances within the complete dataset, Cox proportional hazards regression analyses (Cox PH) to reveal associations between the survival time of VISP patients and metabolite detection, and one-sample t-tests for matched pair analyses. Collectively, these approaches identified seven significant and 25 suggestive metabolites associated with cardiovascular outcomes or associated clinical traits.
Cox PH survival analyses for the time to recurrent stroke identified tricosanoyl sphingomyelin (d18:1/23:0) (hazard ratio (HR): 0.002 [95% CI:0.000-0.036], p = 1.50e-05), as well as 11 suggestive associations; seven with time to recurrent stroke and four with composite vascular endpoint in the discovery model. Of the suggestively significant metabolites, 10 were associated with sphingomyelin metabolism. Threonate was independently associated with a composite vascular event (HR: 0.001 [0.001-0.135]; p = 4.00e-04) and not related to sphingolipid metabolism. Using global Schoenfeld residual tests, we did not observe any evidence of violation to the proportional hazards assumptions for any of the discovery models (p-values ranging from 0.26

PLOS ONE
to 0.537). Sensitivity models including the addition of treatment arm and diabetes status indicated consistent results for these 12 metabolites. A conditional Cox PH model adjusting for pair was performed and resulted in nominal significance for these metabolites (Table 3). WGCNA analyses of metabolite clusters and clinical traits identified six minimal correlations for 45 VISP participants upon sample filtering (p�0.01; Fig 1; Table 4). The strongest correlation observed was between a module comprising 77 lipid metabolites (green module) and BMI (r = -0.44, p = 0.002). The yellow module incorporated 81 metabolites, mostly involved in sphingolipid metabolism, and was associated with three stroke outcomes including days to composite vascular endpoint (r = 0.40, p = 0.006), days to VISP recurrent stroke (r = 0.40, p = 0.007), and VISP recurrent stroke status (r = -0.36, p = 0.01).

Integrative metabolomics-genomics
A one-way ANOVA was performed using the genotypes of 556 variants located within 10 kb upstream/downstream of 24 sphingolipid-metabolism-associated genes selected from the

Discussion
This study integrated metabolomics, epigenomics, and genomics data in analyses of recurrent stroke in AAs. The metabolites identified in the baseline smoking t-test analysis included those present in tobacco or cigarette smoke. Cotinine, an alkaloid found in tobacco leaves and the main metabolite of nicotine, [24] was the most significant metabolite in a series of groupmeans comparisons. Serum cotinine levels have been used as an indicator of second-hand smoke exposure where high second-hand smoke levels (serum cotinine >0.7 ng/mL) increased coronary heart disease risk up to 1.5-fold [25,26]. In total, ten metabolites were detected in our smoking analyses. While these associations were significant, they were not surprising and serve as a proof-of-concept, validating the utility of metabolomics in the VISP study. Notable differences in the levels of N-delta-acetylornithine, sphingomyelin (d17:1/24:1), and ceramide phosphoethanolamine (d18:1/16:0) were observed in t-tests of stroke recurrence. N-delta-acetylornithine was previously associated with NAT8, a gene correlated with creatinine levels and chronic kidney disease in AA [27]. This is of interest since the risk of stroke is 5-30 times higher in patients with chronic kidney disease [28]. Identified in the matched pairs analysis, gamma-glutamylhistidine is a component of gamma-glutamyl amino acid metabolism. Previous studies suggested elevated gamma-glutamyl transferase is associated with increased risk of stroke, coronary heart disease, arterial HTN, and cardiovascular disease-related mortality [29].
Sphingolipid-related metabolites were implicated in Welch's t-tests, survival analyses, matched-pairs analysis, and WGCNA, where we observed an increase in sphingomyelin levels to confer a level of neuroprotection or delayed event recurrence. Sphingolipids play a vital role in intracellular signal transduction and regulation of cellular proliferation, maturation, apoptosis, and cellular stress response, as well as being components of the cardiomyocyte cell membrane [30]. Inflammatory cytokines, such as TNF-α, may induce the synthesis of ceramides from sphingomyelins via sphingomyelinase [31]. While numerous studies report detrimental effects associated with increased ceramide levels, sphingosine-1-phosphate (S1P), has a neuroprotective function during ischemia [32]. This is presumably due to S1P regulating prosurvival mechanisms through the suppression of pro-apoptotic factors including caspase 3 and the activation of protein kinase B or Akt [32]. Furthermore, increased ceramide levels have been reported in patients with HTN, with ceramide concentrations positively correlated with HTN severity [33]. Further investigation on the homeostatic balance between sphingomyelins and ceramides are needed to validate biological markers of stroke recurrence and strokerelated comorbidities such as HTN.
Examining variants within sphingolipid metabolism enzymes identified 23 unique variants that were associated with metabolites within sphingolipid and fatty acid metabolism. In the present study, rs7025659 was associated with leukotriene B4, a pro-inflammatory lipid mediator derived from arachidonic acid [34]. Leukotriene B4 levels are associated with poorer functional recovery in ischemic stroke patients. A 2020 study reported that higher leukotriene B4 levels on days 0 and 7 post-ischemic stroke are associated with poorer functional recovery based on RSS [34]. Five genetic variants within ACER2 were associated with plasma sphinganine and sphingosine levels, while an additional ACER2 variant (rs10757056) was associated with sphingosine levels. ACER2 encodes alkaline ceramidase 2, which hydrolyzes ceramides to generate sphingosine and sphingosine-1-phosphate [35]. ACER2 expression is in-part regulated by the hypoxia-inducible factor 2α, an atherosclerosis suppressor and known ischemic stroke marker [36,37]. Additionally, four variants within SGMS1, rs10763500, rs11595661, rs12355439, and rs12358176, were implicated in the ANOVA. SGMS1 encodes the sphingomyelin synthase 1 protein, which is a transmembrane protein highly expressed in the brain [38] and functions by metabolizing ceramide into sphingomyelin [39]. While these results are promising, further studies with increased sample sizes are needed to determine the biological implications of these variants in regard to metabolism and stroke recurrence.
Integrative analyses consisting of methylation, metabolite profiles, and covariates have reinforced our univariate results implicating tobacco and smoking related metabolites. This finding could strengthen the paradigm that secondhand smoke influences stroke and stroke recurrence [40]. Network analyses identified clusters of DNA methylation loci that were correlated with concentrations of prescribed analgesics, cardiovascular, neurological, and psychoactive drugs. These associations are to be expected in a population with individuals having suffered prior strokes and cardiovascular events, as these drugs are commonly prescribed to post-stroke patients. It is possible these drug associations could be useful for identifying

PLOS ONE
individuals who are poor drug metabolizers or indicative of the efficacy and/or potency of particular therapeutics. This study was performed in a subset of AA VISP clinical trial participants, a population more likely to experience a recurrent stroke than their white counterparts. Furthermore, risk factors for stroke, specifically HTN, diabetes mellitus, and chronic kidney disease, are more prevalent in this population. The recurrent stroke phenotype overall is understudied and its etiology is not well understood; therefore, the multi-omic analyses on this complex disease is a strength of this study. DNA used in the methylation analyses was extracted from whole blood samples upon enrollment in the trial. Although not optimal due to cellular heterogeneity, whole blood provides a valuable resource that is available for replication studies. Adjusting for this limitation, cellular proportions were calculated in silico, and used as covariates in the association models.
This study has a modest sample size of 50 individuals and thus our statistical power was limited and analyses could only confidently detect large effect associations. This was a primary drawback in addition to the lack of validation. Studies of AA with global metabolite, methylation, and genetic data, as well as adjudicated recurrent stroke outcomes are rare, further limiting our ability to include a replication cohort. Although stroke subtype is not adjudicated in VISP, these stroke cases most likely represent small vessel variety due to the inclusion/exclusion criteria and higher proportion of small vessel (lacunar) ischemic stroke typically overserved in AA [41].
While the design of matched pairs is an optimal approach, in our analyses larger sample sizes could improve the detection of associations of small and/or medium effect size. Multiple blood samples per individual would allow for more comprehensive matched-pairs analyses, while matching based on similar pharmacological profiles and time of blood draw would be ideal as the consumption of pharmaceutical drugs, time of day, and time of year (i.e. season), all influence metabolite levels. We also cannot conclusively state that the metabolite levels were not altered by time post-stroke since the metabolites were collected within 120 days of stroke onset. Potentially acute metabolite profiles could differ from sub-acute or chronic stroke metabolite profiles. Additionally, samples should be matched closer to age, as our paired individuals were within eight years of age.
In conclusion, even with a limited sample size, findings from this study provide insight into associations between metabolites, DNA methylation, genetic variants and recurrent stroke, thus identifying potential plasma biomarkers in AA. Further studies are needed to highlight the pathways underlying these associations in regard to their biological and clinical applications.