Genetic Spectrum of Idiopathic Restrictive Cardiomyopathy Uncovered by Next-Generation Sequencing

Background Cardiomyopathies represent a rare group of disorders often of genetic origin. While approximately 50% of genetic causes are known for other types of cardiomyopathies, the genetic spectrum of restrictive cardiomyopathy (RCM) is largely unknown. The aim of the present study was to identify the genetic background of idiopathic RCM and to compile the obtained genetic variants to the novel signalling pathways using in silico protein network analysis. Patients and Methods We used Illumina MiSeq setup to screen for 108 cardiomyopathy and arrhythmia-associated genes in 24 patients with idiopathic RCM. Pathogenicity of genetic variants was classified according to American College of Medical Genetics and Genomics classification. Results Pathogenic and likely-pathogenic variants were detected in 13 of 24 patients resulting in an overall genotype-positive rate of 54%. Half of the genotype-positive patients carried a combination of pathogenic, likely-pathogenic variants and variants of unknown significance. The most frequent combination included mutations in sarcomeric and cytoskeletal genes (38%). A bioinformatics approach underlined the mechanotransducing protein networks important for RCM pathogenesis. Conclusions Multiple gene mutations were detected in half of the RCM cases, with a combination of sarcomeric and cytoskeletal gene mutations being the most common. Mutations of genes encoding sarcomeric, cytoskeletal, and Z-line-associated proteins appear to have a predominant role in the development of RCM.


Introduction
Restrictive cardiomyopathy (RCM) is one of the rarest cardiac disorders with a very poor prognosis, and heart transplantation is the only long-term treatment option [1]. After exclusion of secondary causes such as AL-amyloidosis and irradiation, the aetiology of RCM is most often genetic. Due to the rare incidence rate of RCM, there has been limited description of its genetic causes when compared to other cardiomyopathies, especially hypertrophic cardiomyopathy (HCM), but also dilated cardiomyopathy (DCM). The list of RCM-associated genes includes sarcomeric and cytoskeletal genes often similar to those genes observed in HCM and DCM, but in total the genotyping success rate is quite low, corresponding approximately to 30% [2][3][4][5][6][7][8][9]. Importantly, the mechanisms underlying different cardiac phenotypes, resulting from mutations in the same genes, and the basis for intrafamilial variability of cardiomyopathy phenotypes, are poorly understood [10][11][12][13]. Considerable progress in the understanding of genetic causes of cardiomyopathies has recently become possible, because of the development of high throughput, massively parallel genetic sequencing methods, namely next-generation sequencing (NGS). Compared to the conventional Sanger sequencing, NGS allows the coverage of a much wider panel of genes including giant genes such as titin, and enables the expansion of analysis to genes associated with cardiac arrhythmias, neuromuscular disorders, and cardiomyopathy phenocopies [14,15]. Recently, this approach was successfully applied to unravel the spectrum of genetic causes in patients with HCM and DCM [16][17][18]. In spite of several reports on the application of NGS technology for the dissection of genetic causes of RCM in individual patients, to our knowledge, a systematic analysis of the genetic background in RCM has not yet been performed [5]. Thus, in our study, we aimed to analyse 24 cases of idiopathic RCM using an NGS approach, with a panel of 108 cardiomyopathy and arrhythmia-associated genes.

Patient cohort and clinical examination
The study included 24 individuals with RCM, hospitalised or treated in clinics of the Federal Almazov Medical Research Centre, St. Petersburg, or Astrid Lindgren's Children Hospital, Karolinska University Hospital, Stockholm, during the period from 2003 to 2014. The study was performed according to the Declaration of Helsinki, and approval was obtained from Karolinska Institute Ethical Review Board and Almazov Medical Research Centre Ethical Committee. Written informed consent was obtained from all subjects and their representatives prior to investigation. On behalf of the children enrolled in the study written informed consent was obtained from the next of kin. The diagnosis was based on the WHO/International Society and Federation of Cardiology Task Force clinical criteria and classified according to the European Society of Cardiology classification of cardiomyopathies [19]. RCM was defined as the condition of a heart with restrictive ventricular physiology in the presence of normal or reduced diastolic volumes (of one or both ventricles), normal or reduced systolic volumes, and normal ventricular wall thickness. In paediatric patients (0-18 years), the diagnosis was based on echocardiography features of RCM such as atrial dilatation in combination with normal or nearly normal left ventricular size and preserved or nearly preserved systolic function (left ventricular end diastolic dimension z-score 3, left ventricular wall thickness z-score 3, and fractional shortening ! 0.25). In some patients, the increased left ventricular end diastolic pressure was further confirmed by cardiac catheterization. Patients with the neuromuscular phenotype were also included in the study. Acquired causes of RCM such as AL-amyloidosis, carcinoid, prior irradiation, and antitumor treatment were excluded, based on anamnesis, serum protein electrophoresis, bone marrow investigation, cardiac MRI, and heart biopsy, where necessary. Patients with transthyretin amyloidosis or constrictive pericarditis were not included in this study cohort. In cases of transformation of clinical phenotype and myocardial morphology over time, patients were included in the study, if defined criteria of RCM were present at disease onset or during the prolonged period, predominantly contributing to the clinical manifestation. All patients were examined by a dedicated team of clinical physiologists or paediatric cardiologists and underwent echocardiography with Doppler, 12-lead, Holter monitoring, and blood testing for creatine kinase (CK). In cases of elevated CK levels or other signs of muscle system involvement, a detailed neurological examination and electromyography were performed. To compare the spectrum of genetic variants, ten patients with early onset ventricular arrhythmias without diastolic dysfunction, ischemic heart disease, or structural cardiac abnormalities were genotyped using the same panel of genes.

Design of the target gene panel
First, a list of cardiomyopathy-and channelopathy-associated genes was compiled from the literature, including 108 genes that are implicated in cardiomyopathies or inherited arrhythmia syndromes (S1 Table). To ensure comprehensive coverage of the target genes, we extracted all annotated coding regions based on genes and track data from RefSeq, Ensembl, CCDS, Gencode, VEGA, SNP, and CytoBand. The resulting target region covered 426,332 bp and was used as the input data for SureSelect (Agilent Technologies, Santa Clara, California, USA), to design the custom capture oligonucleotides for in-solution target enrichment. Manual optimisation was carried out to re-adjust capture oligonucleotides in regions with lower capture efficiency. In total, 19,956 capture probes mapping to 424,430 bp were synthesised (BED file with target region is available upon request).
Gene enrichment and next-generation sequencing DNA was purified from whole blood using a FlexiGene Kit following manufacturer's recommendations, with an additional step of RNAase treatment (Qiagen, USA). The amount of genomic DNA in the samples was determined using a Qubit 2.0 fluorometer (Life Technologies; Carlsbad, CA) and a QuantiFluor fluorometer. The quality of the genomic DNA samples was determined using a Nanodrop 1000 (ThermoFisher Scientific; Wilmington, DE) and agarose electrophoresis with SybrGold (Life Technologies; Carlsbad, CA). Then, 200 ng of genomic DNA per patient was digested by 8 pairs of restriction enzymes (25 ng of DNA per restriction pair) provided by Haloplex custom target enrichment kit (Agilent; Waldbronn, Germany). Restriction quality was determined by measuring enrichment control DNA by a Bioanalyzer High Sensitivity DNA assay (Agilent; Waldbronn, Germany). The digested DNA was hybridised to the custom-designed Haloplex probes using a Veriti Thermal Cycler (Life Technologies; Carlsbad, CA) for 16 h. Herculase II Fusion DNA Polymerase (Agilent; Waldbronn, Germany) was used for amplification, when preparing the sequencing-library with the Haloplex Target Enrichment System (Agilent; Waldbronn, Germany). The PCR reaction was performed using the same equipment as in the hybridization step. The enrichment process was performed with a haloplex probe during the 16-h period. All clean-up steps were performed with the Agencourt AMPure XP PCR purification bead system (Beckman Coulter; Pasadena, CA). The targeted DNA was captured on magnetic beads, washed twice with 80% ethanol, and was subsequently dried at room temperature for 10 min. Library concentrations were measured using the Bioanalyzer High Sensitivity DNA assay (Agilent; Waldbronn, Germany). Different libraries with compatible barcodes were pooled in equal amounts and clustered at a concentration of 8 pM. Sequencing of 2 × 250 cycles was performed on MiSeq instruments using MiSeq Reagent Kit v2 chemistry (Illumina; San Diego, CA).
Coverage metrics files were produced with SAMtools depth and were proceeded by a custom R script [22]. All samples were annotated using Annovar [23]. All disease-related genetic variants were successfully validated by Sanger sequencing as the gold standard sequencing technology. The positive prediction value (PPV) was calculated as the total number of true positive variants divided by the sum of true positives and true negatives based on Sanger sequencing verification.

Variant classification
To estimate the pathogenic role of the identified genetic variants, we classified them according to ACMG guidelines [24]. All variants were checked according to population databases (1000G, ESP, and ExAC). Additionally, rare SNPs within frequency range 1:1000-1:10000 (MAF% 0.1-0.01) according to 1000G, GO-ESP, or ExAC 0.2 databases were analysed and demonstrated for each patient.

Bioinformatics approach to predict damaging effect of missense mutations
Pathogenicity of missense variants was assessed based on the MetaSVM predictions obtained from the dbNSFP database [25]. MetaSVM is a support vector machine, which classifies amino acid substitutions as tolerated or damaging by incorporating deleteriousness scores produced by 10 individual algorithms-SIFT, PolyPhen-2 HDIV, PolyPhen-2 HVAR, GERP++, Mutation Taster, Mutation Assessor, FATHMM, LRT, SiPhy, and PhyloP.

Protein network analysis
A disease interaction network was generated by manual curation using the CIDeR database [26]. Text-mining tools such as iHop, Chilibot, and EvidenceFinder were used for literature mining of Pubmed abstracts and PMC full text articles [27][28][29]. Proteins with RCM variants were analysed for their physical and regulatory interactions. All interactions from the protein network were manually curated and supported by experimental evidence from the scientific literature. If available, we used data obtained for cell types related to cardiac cells.

Statistics
The difference in the distribution of genetic variants between the groups was statistically analyzed by means of the Fisher's exact test to obtain a P value. A P value of less than 0.05 was considered significant. Odds ratios (OR) and 95% confidence intervals (95% CI) were calculated to express the strength of the association.

Patient characteristics
The main clinical characteristics of the patient cohort are summarized in Table 1. The study cohort represents a mix of pediatric and adult patients with RCM, since the enrollment was without age limit. The mean age of disease presentation in our group was 23 years. Importantly, more than half of the patients developed first symptoms of the disease before age 18, including two infant cases. The median survival time from diagnosis until death or heart transplantation was 3 years, being longer in the pediatric group than in the adult (4 and 2 years, correspondingly). In 30% of patients a familial history of cardiomyopathy was reported. Of note, in some patients we observed a transformation with time of restrictive phenotype either to dilated cardiomyopathy, or to hypertrophic. Similar phenotypic variability was also observed in some familial cases of cardiomyopathy. Out of 8 patients with family history, 4 belonged to families in which cardiac disorders were represented also by other types of cardiomyopathy than that of the proband. Further, in four patients there were signs of neuromuscular disease in addition to RCM (patients 1,18,19,22).

Sequencing quality and coverage data
The median value of the per-sample average read depth in the 426332 bp target region across the samples was 601. Combining all samples and taking the median value across all samples, 99.1% of the target region was covered to a depth of 15 or more, and 95.1% to a depth of 50 or more, 90.1% to a depth of 100 or more. The mean coverage over all 108 genes was as high as 667-fold. The PPV based on Sanger sequencing was 97.58% (95% CI = 94.88-100).

Spectrum of genetic variants in patients with RCM
In total we identified 5850 variants across the study group, on average 243±40 variants per patient. After annotation of all variants using ANNOVAR, we filtered out all common SNPs that resulted in 14 pathogenic or likely-pathogenic variants and 24 variants of unknown significance ( Table 2, see S2 Table). Pathogenic and likely-pathogenic variants were identified in 8 out of 108 genes studied resulting in 54% of genotype-positive cases (13 patients, Fig 1A). Of 14 pathogenic and likely-pathogenic variants 10 were identified in genes encoding sarcomeric proteins (MYH7-4, MYBPC3-3, TNNI3-2 and TNNT2-1) and 4 in genes encoding structural and cytoskeletal proteins (BAG3, JUP, ACTN2, DES) ( Fig 1B). Among 24 variants of unknown significance 19 were identified in structural and cytoskeletal genes (TTN, SYNE, MYOM1, CACNB2, FKTN, LDB3, EMD, MYOZ, DSP, TMPO), 2 in ion channel genes and 2 in genes encoding mitochondrial proteins. As expected, most of VUS were identified in TTN gene. Among 13 of genotype-positive patients in 31% the pathogenic or likely-pathogenic variant was detected within cytoskeletal genes (patients 1, 4, 8 and 19), and in 31% within sarcomeric genes (patients 2, 3, 17 and 20). In the remaining 5 patients (38%) pathogenic or likely-   pathogenic variants in sarcomeric genes were associated by variants of unknown significance within cytoskeletal genes ( Fig 1B). Additionally, for each patient we identified a list of rare SNPs with a population frequency of 1:1000-1:10 000, which resulted in another 39 variants (S3 Table). As expected, the highest number of variants was also found in the TTN gene (n = 16). No rare SNPs were detected in the group of sarcomeric protein genes or mitochondrial genes. In contrast, the number of variants in desmosomal and membrane-associated genes was significantly higher compared to the group of pathogenic, likely-pathogenic variant or variants of unknown significance (p = 0.003).
To investigate how the identified spectrum of RCM-associated genetic variants differs from the spectrum of the genetic variants identified in other cardiac disorders, we compared the spectrum of RCM-associated variants with variants identified in patients with early onset ventricular arrhythmias without diastolic dysfunction or structural cardiac abnormalities. Normal coronary angiography and young age at disease onset excluded an ischemic etiology of the arrhythmic disorders. In ten patients we identified 14 pathogenic or likely-pathogenic variants in ion-channel encoding genes, no pathogenic or likely-pathogenic variants were found in sarcomeric genes or cytoskeletal genes. Hence, the observed genetic spectrum of RCM-associated variants does not represent a random combination and differs from that of arrhythmic cardiac disorders.

Major pathways involved in RCM pathogenesis
To understand the interaction landscape of the genes mutated in RCM patients, we created a tightly connected interaction network with 36 proteins corresponding to the genes described in this study (Fig 2). Including 30 interlinking proteins the network consists of 66 proteins connected by 124 interactions (S4 Table). We sought to use published data derived from cells and tissues that are related to cardiac cells. Out of the 124 physical and regulatory interactions considered here 38 and 33 were supported by experiments performed in cardiac and other types of muscle cells, respectively. Complementary interaction information was also found through protein structure analyses (6) and in vitro experiments (7). The largest group of interactions (79) is constituted by physical interactions between proteins, followed by regulatory interactions (34), protein modifications (10), and transport activities (1). Most proteins of the network belong to one of four functional groups: (i) sarcomeric proteins, (ii) mechanosensing and Zline structures, (iii) nuclear membrane. The most interconnected protein is the plasma membrane protein ILK, which is involved in physical and regulatory interactions with 16 other proteins.

Discussion
In the current study, we focused on the genetic spectrum of RCM using an NGS approach and demonstrated genotype-positive cases in 54% of patients. Our data in general correspond well to recently published study on another cohort of patients with RCM showing genotype-positive rate of 60% [30]. Previous publications on genetic causes of RCM have mainly addressed mutations in the sarcomeric genes, MYH7, ACTC1, TNNI3, and TNNT2 as leading causes of RCM [31]. RCM has been characterised as a "severe form" of HCM, with a higher effect on Ca ++ sensitivity and activity of actomyosin ATPase, because of specific functional properties of the mutations [32,33]. In our study, we have confirmed the role of sarcomeric proteins in the development of RCM, but have also extended the spectrum of pathogenic and likely-pathogenic variants underlining the role of cytoskeletal proteins in RCM pathogenesis.
The validation of pathogenic or likely-pathogenic variants is a very challenging task. Despite the increased number of variants recognized by high throughput sequencing technologies, many still remain of unknown significance, pending additional genetic or functional evidence. Many of such variants switch from being variants of unknown significance to likely-pathogenic or pathogenic variants, with the increasing number of reported patients and phenotypic descriptions. Most of previously reported pathogenic or likely-pathogenic variants were reported to reside within genes encoding sarcomeric proteins, which is in line with the data in our study. However, the majority of these previously reported variants in genes encoding sarcomeric proteins relate to the much more frequent HCM subgroup of cardiomyopathies. Considerably less is known about the genetic causes for RCM than for HCM, and interpretations of genetic results in RCM may be skewed if based on knowledge of genetic background in HCM. Therefore, even excluding genotype-positive cases, we consider all identified variants of unknown significance important for demonstration and analysis regardless of their predicted functional effect and presence in public databases.
Since approximately one half of the genotype-positive cases were associated with multiple pathogenic and likely-pathogenic variants, or variants of unknown significance, we speculate that, in some cases RCM might be the consequence of a combination of multiple variants, rather than resulting from one single disease-causing mutation. We propose that an individual combination of pathogenic and likely-pathogenic variants, variants of unknown significance, and rare SNPs underlies the intrafamilial phenotypic variability of RCM and the characteristic of being prone to phenotypic transformation.
Detected pathogenic and likely-pathogenic variants most frequently fall into the group of sarcomeric proteins, often in combination with variants of unknown significance in cytoskeletal and Z-line-associated proteins. Given the knowledge on participation of cytoskeletal and Zline-associated proteins in force transmission, our data underscore the importance of mechanosensing and mechanotransducing proteins in the development of restrictive cardiac pathophysiology. We propose a model of RCM pathogenesis wherein sarcomeric or other gene mutations trigger a compensatory hypertrophic response such as in HCM, which cannot be fully implemented because of the concomitant defect in mechanotransducing proteins, resulting in a restrictive physiology. This hypothesis is further supported by bioinformatics modelling of major pathways involved in RCM, which has additionally highlighted the pivotal role of mechanosensing and mechanotransducing protein cascades.
Here we describe several new MYBPC3 mutations associated with RCM. While MYBPC3 mutations in HCM and DCM patients were described in numerous reports, only one mutation was previously reported as disease-causative in RCM [11,[34][35][36][37]. This finding further underlines the genetic and pathophysiological link between HCM and RCM [38,39]. Additionally, we reveal several variants in genes, encoding ion channels and desmosomal proteins in patients with RCM. Whether these variants can affect myocardial relaxation is unknown. Recent studies using induced pluripotent stem cell-derived cardiomyocytes have identified alterations in cardiomyocyte contractility and prolonged relaxation time on a single-cell level caused by long QT syndrome-associated ion channel mutations [40]. The precise effect of these variants in the development of the RCM phenotype remains to be identified.
A major limitation of our study is the small number of patients included and the lack of segregation analysis, owing to the low number of family members and unavailability of these individuals for genetic testing. Such data might be of particular value in cases of multiple genetic variants in patients with RCM. Another limitation is the inability to infer a decisive conclusion regarding the functional effect of the multiple identified TTN variants. A high rate of spontaneous events owing to the huge size of TTN gene and difficulties of functional studies on giant muscle proteins makes it difficult to attribute a clear pathogenic effect to most of the identified variants. Additional information on the TTN variants identified from cardiomyopathy patients and healthy subjects will allow elucidation of the true roles of these variants in causing disease or their contribution to disease progression. Additionally, several genes recently reported in association with RCM such as FLNC were not included in the current analysis because the present study was undertaken prior to their identification as RCM associated genes [41]. The study of their role in RCM development will have a significant importance for further RCM research.

Conclusion
In the present study we have identified a broad spectrum of RCM-associated variants using a next-generation sequencing approach and our findings underline the role of cytoskeletal genes in RCM development. Simultaneous screening of a wide panel of genes allowed us to identify multiple gene mutations in almost half of genotype-positive patients. We hypothesize that RCM is often triggered by a combination of multiple mutations, rather than by one diseasecausing mutation. Our data further underscore the importance of mechanosensing and mechanotransducing proteins in the development of restrictive cardiac physiology.