Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Short report: Targeted analysis of whole exome sequencing data in Indian cryptogenic stroke patients

  • Priya Dev,

    Roles Data curation, Investigation, Writing – original draft

    Affiliation Department of Neurology, Institute of Medical Sciences, Banaras Hindu University, Varanasi, India

  • Jenefer M. Blackwell,

    Roles Formal analysis, Writing – original draft, Writing – review & editing

    Affiliation The Kids Research Institute Australia, University of Western Australia, Perth, Western Australia, Australia

  • Rajiv Kumar,

    Roles Conceptualization, Data curation, Formal analysis

    Affiliation Centre of Experimental Medicine & Surgery, Institute of Medical Sciences, Banaras Hindu University, Varanasi, India

  • Vijay Mishra,

    Roles Data curation, Formal analysis

    Affiliation Department of Neurology, Institute of Medical Sciences, Banaras Hindu University, Varanasi, India

  • Abhishek Pathak

    Roles Conceptualization, Funding acquisition, Project administration, Supervision

    * abhishekpathakaiims@gmail.com

    Affiliation Department of Neurology, Institute of Medical Sciences, Banaras Hindu University, Varanasi, India

Abstract

Cryptogenic stroke (CS) is an ischemic stroke of unknown cause with increasing incidence in India. Common and rare genetic variants have been associated with the risk of stroke. We carried out targeted analysis of whole exome sequencing on a small cohort of 16 CS patients compared to 16 healthy unaffected relatives to determine whether rare coding variants in genes previously associated with stroke could play a role in India. Variants were filtered for coverage (≥20x) and minor allele frequency (≤0.01). Putative deleterious variants were identified using a range of bioinformatic tools. Targeted analysis was performed by filtering for those variants present in a panel of 220 stroke-related genes. Phenotypes, pathways and cell compartments to which genes carrying putative deleterious (PHRED-scaled CADD scores ≥15) variants belonged were determined using Enrichr. STRING was employed to identify interacting proteins. We identified 17 potentially damaging variants specific to Indian CS patients in 15 genes contributing to phenotypes (e.g., hemorrhage; abnormal blood coagulation; dilated aorta, increased heart weight) and pathways (e.g., platelet degranulation, common pathway of fibrin clot formation; response to elevated platelet cytosolic Ca2+) that were not observed in unaffected relatives. STRING analysis identified 6 genes (ITGA2B, F13A1, F5, ATP7A, GLA, ABCC6) encoding interacting proteins that could be prioritised for follow-up studies. This should include secondary sequence validation, as well as extended pedigree and functional laboratory-based gene-editing studies to validate the clinical relevance of specific variants to CS. Although limited by small sample size, our study provides novel data on CS in a geographical region and ethnic group not well studied to date.

Introduction

Stroke is the second largest cause of death and the third largest cause of years of life lost worldwide. Ischemic stroke, an overt symptomatic expression of brain infarction, accounts for ~80% of all strokes, with most cases caused by a combination of environmental and genetic factors. Cryptogenic (unexplained) stroke (CS) accounts for ~30%–40% of ischemic stroke patients, and is increasing in the Indian population [1]. A better definition and identification of associated risk factors are required to manage CS.

Heritability for ischemic stroke is substantial (37.9%) and varies with subtype (40.3% for large-vessel disease; 32.6% for cardioembolic; and 16.1% for small vessel disease) [2]. Genome-wide studies have identified common variants in a large number of genes associated with stroke, including in India [3]. Rare genetic variants in monogenic disorders can also lead to ischemic stroke (OMIM#601367). Evidence is accumulating for the contribution of rare functional coding variants to genetic risk in complex diseases [4], while next-generation sequencing has made finding these more cost-effective. Here we employ targeted analysis of whole exome sequencing (WES) data in Indian CS patients to determine whether rare putative deleterious variants in previously identified stroke genes could play a role in disease risk.

Materials and methods

All procedures in this study were conducted according to the principles of the Declaration of Helsinki. This study was approved by the Institutional Ethics Committee, Institute of Medical Sciences (IMS), Banarus Hindu University (BHU), Varanasi, India with reference number: Dean/2018/EC/288. Patients were enrolled between 26 August 2020 and 30 March 2022. Written consent for participation was obtained from all the patients/persons responsible. All persons whose DNA was collected for this research consented to storage of the sample and future use of de-identified genetic and clinical data. All participants agreed to publication of de-identified genetic and clinical data.

Study subjects

The study was conducted at the Department of Neurology, IMS, Sir Sunderlal Hospital, BHU, Varanasi, India. We enrolled 16 consecutive CS patients (age range 47–84 years; 10 males, 6 females) with ischemic strokes of undetermined etiology. Samples were also collected from 16 unaffected family members (age range 20–59 years; 14 males, 2 females) for comparative analyses. This was not an extended family study per se, but data from unaffected family members (one per patient) were used to identify variants specific to CS patients.

CS was defined as an ischemic stroke not attributed to a definite source of large-vessel atherosclerosis, cardioembolism, or small vessel disease, according to the Trial of ORG 10172 in Acute Stroke Treatment (TOAST) classification [5]. Patients presenting within 48h of onset of clinical symptoms and with a modified Rankin Scale (mRS) score ≥1 at admission were included. Complete evaluation, including clinical assessment and neurological examination, was carried out in line with the National Institutes of Health Stroke Scale within 60 min of the patient’s arrival. The diagnostic assessment included non-contrast computed tomography or brain magnetic resonance imaging, hemogram, biochemical tests, electrocardiogram, transoesophageal echocardiography, vascular imaging (of intra- and extra-cranial vessels), assessment of prothrombotic state, and 24h Holter monitoring for atrial fibrillation. Patients with conditions requiring intensive care unit management, pregnancy, recurrent ischemic stroke, subarachnoid hemorrhage, traumatic brain injury, vascular aneurysm, arterial malformation, infective endocarditis, central nervous system infections or chronic liver or kidney diseases were excluded.

Library preparation and exome sequencing

DNA was extracted from 2 ml of peripheral blood using Qiagen DNA mini kits, with quantity/quality of genomic DNA (gDNA) measured by NanoDrop-2000 Spectrophotometer. DNA samples were sent to Dr Lal PathLabs Ltd (New Delhi, India) where library construction, sequencing and data analyses were undertaken. Libraries were constructed from 100ng gDNA using the Ion Ampliseq Exome RDY Panel kit (Thermo Fisher Scientific) and quantified with Qubit™ dsDNA HS (High Sensitivity) Assay Kit on Qubit 3.0 Fluorometer. Templates were generated using 25 pm of each library (Ion Chef Instrument; Thermo Fisher Scientific), followed by enrichment of templated ion sphere particles. WES was performed using Hi-Q chemistry on the Ion Proton system (Thermo Fisher Scientific).

Data processing and variant analysis

Sequences were aligned against the reference genome (GRCh37/hg19) using Torrent Suite v.5.12.0 and Variant Caller v.5.2.1 software, including coverage analysis and variant caller plugins (Thermo Fisher Scientific). The average (± standard deviation) coverage for CS patients was 93.49 ± 1.97%, for unaffected relatives 93.06 ± 1.71%. Variant discovery, genotype calling of multi-allelic substitutions and indels were performed using the Torrent Variant Caller version 4.6.0.7. The Torrent Coverage Analysis provided statistics and graphs describing the level of sequence coverage produced for targeted genomic regions (version 4.6.0.3). The Annotate variants 5.0 of Ion Reporter (Thermo Fisher) annotated the variants. The average (± standard deviation) total exonic variants for CS patients was 20749 ± 2562 (9840 ± 1470 nonsynonymous SNVs; 10547 ± 1024 synonymous SNVs; 361 ± 73 insertion/deletions) and for unaffected family members was 19880 ± 2058 (9340 ± 1122 nonsynonymous SNVs; 10209 ± 890 synonymous SNVs; 331 ± 58 insertion/deletions).

Variant prioritization and bioinformatics analysis

An Integrative Genome Viewer (https://www.broadinstitute.org/igv/) was used to visualise sequencing data. Variant frequencies were obtained from public domain databases including the 1000 Genomes Project (https://www.internationalgenome.org/) and the Genome Aggregation Database (gnomAD; https://gnomad.broadinstitute.org/). Variants detected in the exome sequencing were filtered for coverage (≥20x) and minor allele frequency (≤0.01) in the public domain databases. All were compared with mutation databases including the Human Gene Mutation Database; https://www.hgmd.cf.ac.uk/ac/index.php/) and Uniprot (https://www.uniprot.org/). Intronic, up/downstream, and synonymous variants were removed. Predicted deleteriousness of the detected variants was evaluated using DEOGEN2 (https://deogen2.mutaframe.com/), Mutation Taster (https://www.mutationtaster.org/), Sorting Intolerant From Tolerant (SIFT; https://sift.bii.a-star.edu.sg/), Protein Variant Effect Analyzer (https://provean.jcvi.org/index.php), Functional Analysis Through Hidden Markov Model (https://fathmm.biocompute.org.uk/) and Deleterious Annotation of Genetic Variation using Neural Networks (Index of/public_data/DANN). Variants predicted to be deleterious, i.e., damaging, disease-causing, or likely pathogenic using these tools, were taken forward in the targeted analysis.

Targeted exome analysis

For targeted analysis variants predicted to be deleterious were filtered against an in-house gene panel (Dr Lal Pathlabs Ltd) of 220 genes (S1 Table) previously identified as risk factors for stroke and related clinical phenotypes. The average (± standard deviation) quality threshold for the coverage across the targeted region for CS patients was 97.62 ± 1.73%, for unaffected relatives 97.56 ± 1.77%, with read depths > 20x. All filtered variants in these genes were separately examined in ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/), the public archive of interpretations of clinically relevant variants, which classified the variants identified as pathogenic or variant of uncertain signficiance (VUS), according to American College of Medical Genetics and Genomics (ACMG) 2015 guidelines. As a measure of putative deleteriousness that correlates well with the qualitative bioinformatic assessment of deleteriousness provided by the suite of tools employed by the sequencing company we report PHRED-scaled Combined Annotation Dependent Depletion (CADD) scores [6]. Although there is no hard cut-off to determine potential pathogenicity in VUS, a cut-off of 15 for scaled CADD scores has been suggested [6]. Here we report only those variants that have achieved this CADD score cut-off and refer to these as putative (i.e., thought to be but not proven) deleterious variants. Genes with variants specific to CS patients were analysed using the knowledge base of known and predicted protein-protein interactions STRING database (https://string-db.org/). Phenotypes, pathways and cell compartments to which genes carrying putative deleterious (PHRED-scaled CADD scores ≥15) variants belonged were determined using the comprehensive gene set enrichment tool Enrichr (https://maayanlab.cloud/Enrichr/).

Results

The study cohort

S2 Table shows clinical data and family history of stroke-related diseases for the 16 CS patients (designated P1 to P16). All except P1 and P16 presented with hypertension; 4 (P3, P11, P12, P13) had diabetes; all had Fazekas scores of 2 or 3 for white matter hyperintensities. All except P10 had a family history of stroke; all except P1, P9 and P10 reported family history of hypertension. Five patients (P2, P3, P4, P7, P11) reported a history of diabetes in a first degree relative; 2 CS patients (P12, P13) had first degree relatives who had suffered heart attacks with myocardial infarctions. Unaffected family members had no comorbities.

Genetic variants identified

In all, 26 VUS in 19 genes and one asymptomatic heterozygous pathogenic carrier at autosomal recessive PMM2 were identified in CS patients (Table 1). P2, P5 and P16 had no putative deleterious variants in the targeted panel. Of the 26 VUS, 23 were heterozygous at autosomal genes, one homozygous autosomal variant at ITGA2B, and two hemizygous X-linked VUS at ATP7A and at GLA. Although pathogenic variants at these three genes cause serious monogenic disorders, these CS patients did not present with clinical signs relevant to them. All VUS in CS patients had PHRED-scaled CADD scores ≥15 or were disruptive in-frame deletions and hence have the potential to be deleterious. Nine variants observed in CS patients were also observed in unaffected relatives (Table 2). Given that unaffected relatives were, on average, younger than CS patients we cannot discount the possibility that these variants could contribute to future episodes of CS. We therefore take a conservative approach here in taking forward only the 15 genes with putative deleterious variants specific to the CS group (Table 1), and the 18 genes carrying putative deleterious variants specific to unaffected relatives (Table 2), in gene-set enrichment analysis.

thumbnail
Table 1. Pathogenic variants and variants of uncertain significance (VUS) identified in the CS patients.

https://doi.org/10.1371/journal.pone.0326554.t001

thumbnail
Table 2. Pathogenic and variants of uncertain significance (VUS) identified in unaffected relatives.

https://doi.org/10.1371/journal.pone.0326554.t002

Gene set enrichment and STRING analyses

Putative deleterious variants could contribute to genetic risk either as complex heterozygotes in a single gene (e.g., Table 1 P3 at CPS1; P15 at DOCK8) or at interacting proteins in common pathways. Gene-set enrichment analysis in Enrichr using the 15 genes carrying putative deleterious variants specific to CS patients (Table 3) showed significant enrichment for gene-sets in phenotypes (e.g., hemorrhage; abnormal blood coagulation; dilated aorta), pathways (e.g., common pathway of fibrin clot formation; platelet degranulation; response to elevated platelet cytosolic Ca2+), and cellular components (e.g., platelet alpha granule) that were not observed in Enrichr analysis of 18 genes carrying putative deleterious variants specific to unaffected relatives (S3 Table). This is of interest but should be tempered by the knowledge that our targeted gene panel was selected on the basis of prior association with stroke in other studies. The observations are therefore exploratory and provide a basis for hypothesis generation. STRING analysis demonstrated that 6 of the genes common across the CS patient enriched gene-sets, ITGA2B, F13A1, F5, ATP7A, GLA and ABCC6, interact at the protein level (Fig 1). This result will again reflect the limitations of our targeted gene study where the gene-set entered into the STRING analysis was constrained by the targeted analysis of the whole exome data. Nevertheless, the STRING analysis provides provisional functional support that patients who carry putative deleterious variants at more than one of these genes, e.g., P7 with variants at F13A1 and ATP7A, could be investigated for a blended deleterious variant phenotype involving these genes. Indeed, multiple patients carried putative deleterious variants across more than one gene in these enriched gene-sets (Table 1).

thumbnail
Fig 1. Results of STRING analysis for 15 genes with VUS specific to CS patients.

Nodes (= query proteins) are represented by coloured circles; filled nodes indicate that the protein structure is known or predicted. Protein-protein interactions are represented by coloured lines as indicated in the key. These interactions indicated that proteins contribute to a shared function but does not necessarily mean they are physically binding each other (https://string-db.org/).

https://doi.org/10.1371/journal.pone.0326554.g001

Discussion

Here we identified 17 putative deleterious variants specific to Indian CS patients in 15 genes previously associated with stroke. The target gene panel comprised genes predominantly identified through common alleles contributing to complex polygenic inheritance of stroke that could include genes influencing a range of comorbidities. The rare putative deleterious variants identified here have the potential to contribute to less complex oligogenic or monogenic inheritance if confirmed as functionally deleterious. A common theme identified amongst the genes carrying the rare variants specific to CS patients was a role in dysmorphic heart/aorta, abnormal blood coagulation, and/or in platelet function and clot formation. Here we focus our discussion on those genes carrying variants that (a) could contribute to monogenic or oligogenic Mendelian inherited disease, and (b) were highlighted (ITGA2B, F13A1, F5, ATP7A, GLA, ABCC8) as potential interacting proteins in our STRING analysis.

Firstly, three patients were either homozygous (P12 at ITGA2B) or hemizygous (P7 at ATP7A; P9 at GLA) for VUS which could be directly disease causing in monogenic disease. ITGA2B encodes the alpha subunit of the platelet membrane adhesive protein receptor complex GPIIb/IIIa. Its expression in ischemic stroke has been previously related to elevated platelet cytosolic Ca2+ [7]. ATP7A encodes a copper-transporting ATPase that predicts cuproptosis, a novel form of programmed cell death in ischemic stroke [8]. GLA encodes the lysosomal hydrolase alpha-galactosidase. Previous exome-based analysis identified two pathogenic variants in GLA in 172 ischemic stroke patients [9].

Secondly, carriage of putative deleterious alleles across more than one functionally related gene could contribute to oligogenic inheritance of disease risk. For example, previous studies have shown that a blended phenotype of deleterious variants at NOTCH3 and RNF213 is associated with temporal pole infarcts in stroke episodes [10]. In our study, CS patient P7 hemizygous at ATP7A also carried a putative deleterious variant at F13A1 encoding coagulation factor XIII, a key differentially expressed gene associated with ischemic stroke [11]. Similarly, patient P9 hemizygous at GLA also carried CS-specific putative deleterious variants at F5. GLA and F5 also fall within our STRING of interacting proteins. F5 encoding coagulation factor V is a novel biomarker in ischemic stroke [12].

Additional CS patients were heterozygous or compound heterozygous for putative deleterious variants at a number of genes that could be dominant for Mendelian inherited disease. Of interest amongst these, P6 carried a potentially deleterious variant at ABCC6 which belongs to a family of ATP-binding cassette transmembrane transporters. Known pathogenic variants at ABCC6 are associated with disorders (OMIM *603234) that include arterial calcification and myocardial infarction. In a study of mutations in Mendelian stroke genes in 1,033 early onset stroke patients, clinically relevant VUS were identified at ABCC6 (n = 53), RNF213 (n = 59) and NOTCH3 (n = 15) [13]. Whilst we have focussed here on the specific gene set identified as potentially interactive in our STRING analysis, all the genes carrying variants specific to the CS patient group are worthy of further investigation.

Limitations and conclusions

We acknowledge that small sample size was a major limitation of our study. Hence our findings should be viewed as exploratory and hypothesis-generating. A further limitation was the targeted analysis of variants in genes previously associated with stroke. A full and unbiased analysis of rare variants across the whole exome, and in a larger sample of patients, will likely identify more variants in genes potentially contributing to CS in our study population. This untargeted approach would also expand the potential for identification of important pathways and protein interactions that could be important in CS. The 17 potentially clinically relevant variants in 15 genes that we identified in 16 CS patients also require secondary validation using an alternative sequencing platform. If validated, further follow-up studies of extended families as well as functional gene-editing laboratory investigations will be required to determine the clinical significance of these variants. Specifically we identify 6 genes (ITGA2B, F13A1, F5, ATP7A, GLA, ABCC6) encoding interacting proteins in functionally relevant pathways that could be prioritised for follow-up. Overall, our results provide a novel contribution to genetic studies of CS in a geographical region and ethnic group not well studied to date.

Supporting information

S1 Table. Panel of 220 stroke-related genes included during targeted analysis of exome sequencing data.

https://doi.org/10.1371/journal.pone.0326554.s001

(DOCX)

S2 Table. Characteristics, clinical data and family history for stroke and stroke-related risk factors for CS patients.

https://doi.org/10.1371/journal.pone.0326554.s002

(DOCX)

S3 Table. Results of Enrichr analysis (P < 0.001; adjusted P-values <0.01) for top gene sets identified from 18 genes carrying 18 putative deleterious VUS (missense CADD score ≥15 or in-frame deletions) unique to the unaffected relatives of CS patients.

https://doi.org/10.1371/journal.pone.0326554.s003

(DOCX)

Acknowledgments

We would like to thank Dr. V N Mishra, Dr. Deepika Joshi, Dr. R N Chaurasia, Dr. Varun Kumar Singh, and Dr. Anand Kumar for their support in the study.

References

  1. 1. Salomi BSB, Solomon R, Turaka VP, Aaron S, Christudass CS. Cryptogenic stroke in the young: role of candidate gene polymorphisms in indian patients with ischemic etiology. Neurol India. 2021;69(6):1655–62. pmid:34979665
  2. 2. Bevan S, Traylor M, Adib-Samii P, Malik R, Paul NLM, Jackson C, et al. Genetic heritability of ischemic stroke and the contribution of previously reported candidate gene and genomewide associations. Stroke. 2012;43(12):3161–7. pmid:23042660
  3. 3. Kumar A, Chauhan G, Sharma S, Dabla S, Sylaja PN, Chaudhary N, et al. Association of SUMOylation Pathway genes with stroke in a genome-wide association study in India. Neurology. 2021;97(4):e345–56. pmid:34031191
  4. 4. Bomba L, Walter K, Soranzo N. The impact of rare and low-frequency genetic variants in common disease. Genome Biol. 2017;18(1):77. pmid:28449691
  5. 5. Adams HP Jr, Bendixen BH, Kappelle LJ, Biller J, Love BB, Gordon DL, et al. Classification of subtype of acute ischemic stroke. Definitions for use in a multicenter clinical trial. TOAST. Trial of Org 10172 in Acute Stroke Treatment. Stroke. 1993;24(1):35–41. pmid:7678184
  6. 6. Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46(3):310–5. pmid:24487276
  7. 7. Cen G, Liu L, Wang J, Wang X, Chen S, Song Y, et al. Weighted gene co-expression network analysis to identify potential biological processes and key genes in COVID-19-related stroke. Oxid Med Cell Longev. 2022;2022:4526022. pmid:35557984
  8. 8. Fan X, Chen H, Jiang F, Xu C, Wang Y, Wang H, et al. Comprehensive analysis of cuproptosis-related genes in immune infiltration in ischemic stroke. Front Neurol. 2023;13:1077178. pmid:36818726
  9. 9. Härtl J, Hartberger J, Wunderlich S, Cordts I, Bafligil C, Sturm M, et al. Exome-based gene panel analysis in a cohort of acute juvenile ischemic stroke patients:relevance of NOTCH3 and GLA variants. J Neurol. 2023;270(3):1501–11. pmid:36411388
  10. 10. Saito S, Hosoki S, Yamaguchi E, Ishiyama H, Abe S, Yoshimoto T, et al. Blended Phenotype of NOTCH3 and RNF213 variants with accelerated large and small artery crosstalk: a case report and literature review. Neurol Genet. 2024;10(5):e200176. pmid:39257469
  11. 11. Lin L, Guo C, Jin H, Huang H, Luo F, Wang Y, et al. Integrative multi-omics approach using random forest and artificial neural network models for early diagnosis and immune infiltration characterization in ischemic stroke. Front Neurol. 2024;15:1475582. pmid:39697434
  12. 12. Liu J, Si Z, Liu J, Zhang X, Xie C, Zhao W, et al. Machine learning identifies novel coagulation genes as diagnostic and immunological biomarkers in ischemic stroke. Aging (Albany NY). 2024;16(7):6314–33. pmid:38575196
  13. 13. Park H-K, Lee K-J, Park J-M, Kang K, Lee SJ, Kim JG, et al. Prevalence of Mutations in Mendelian Stroke Genes in Early Onset Stroke Patients. Ann Neurol. 2023;93(4):768–82. pmid:36541592