Molecular analysis of TSC1 and TSC2 genes and phenotypic correlations in Brazilian families with tuberous sclerosis

Tuberous sclerosis complex (TSC) is an autosomal dominant multisystem disorder characterized by the development of multiple hamartomas in many organs and tissues. It occurs due to inactivating mutations in either of the two genes, TSC1 and TSC2, following a second hit in a tumor suppressor gene in most hamartomas. Comprehensive screening for mutations in both the TSC1 and TSC2 loci has been performed in several cohorts of patients and a broad spectrum of pathogenic mutations have been described. In Brazil, there is no data regarding incidence and prevalence of tuberous sclerosis and mutations in TSC1 and TSC2. We analyzed both genes in 53 patients with high suspicion of tuberous sclerosis using multiplex-ligation dependent probe amplification and a customized next generation sequencing panel. Confirmation of all variants was done by the Sanger method. We identified 50 distinct variants in 47 (89%) of the patients. Five were large rearrangements and 45 were point mutations. The symptoms presented by our series of patients were not different between male and female individuals, except for the more common occurrence of shagreen patch in women (p = 0.028). In our series, consistent with other studies, TSC2 mutations were associated with a more severe phenotypic spectrum than TSC1 mutations. This is the first study that sought to characterize the molecular spectrum of Brazilian individuals with tuberous sclerosis.

Introduction Tuberous sclerosis complex (TSC) (OMIM 191100) is an autosomal dominant multisystem disorder that occurs in all ethnic groups and both sexes. Population studies have estimated the prevalence of the disease in 1 in 6000 to 9000 individuals and at least 2 million people are affected worldwide [1]. The clinical findings and severity of TSC are highly variable; for this reason, clinical diagnostic criteria were established by a consortium in 1998 [2], and revised and updated by the same group in 2012 [3]. Most TSC patients have hamartomas in the brain, skin, kidneys, and heart. Involvement of the lung, gastrointestinal tract, bones, retina and/or gingiva is also common [4].
TSC occurs due to inactivating mutations in either of two genes, TSC1 in chromosome 9q34 or TSC2 in chromosome 16p13, and follows the two hit tumor suppressor model of pathogenesis in most hamartomas [5]. TSC1 is composed of 23 exons and encodes for hamartin, a ubiquitously expressed 1164 amino acid protein [6] while TSC2 consists of 42 exons and encodes for tuberin, a ubiquitously expressed 1807 amino acid protein [7]. Both proteins form a complex that regulates cell growth and tumorigenesis [8]. About one third of the patients with TSC have a familial form, in which the disorder follows a clearly dominant inheritance, whilst the other two-thirds are sporadic cases resulting from de novo germline mutations in one of the TSC genes [9,10].
Comprehensive TSC1 and TSC2 mutation screening results have been reported in several cohorts of patients with TSC, as described in the Human Gene Mutation Database (HGMD) [11], and a broad spectrum of pathogenic mutations has been described. Among individuals who met the clinical criteria, 75-90% had an identifiable mutation in either TSC1 or TSC2, and the majority of mutation-positive TSC patients have a mutation in TSC2. In sporadic cases, TSC2 mutations are 2-10 times more common than TSC1 mutations. In contrast, in multi-generation families segregating TSC, approximately half show linkage to each of the genes. About 80-95% of the TSC1 and TSC2 mutations are small mutations (missense, nonsense, small deletions, small insertions and splicing site mutations), and 5-20% are large duplications, large deletions or complex rearrangements. The high variability in mutation type and position renders molecular diagnosis of TSC challenging. This variability may explain, at least in part, the wide range of clinical symptoms observed in TSC patients, although timing and location of the second hit event is more likely to contribute to the variability of clinical symptoms. Several studies described possible genotype-phenotype correlations for TSC [10,[12][13][14][15][16][17][18]. Although, TSC1-related disease is usually less severe than TSC2-related disease.
In Brazil, there is no data regarding TSC incidence or TSC1 and TSC2 mutation prevalence amongst affected individuals. The genetic diagnosis is particularly important for patients with suspected TSC who do not fulfill clinical diagnostic criteria, and for genetic counseling. Therefore, the aims of this study were to describe demographics and clinical phenotype of patients with TSC from different Brazilian regions and characterize the germline TSC1 and TSC2 mutations observed in a group of individuals with clinical diagnosis of TSC.

Patients and DNA samples
Twenty-two male and 31 female individuals with clinically diagnosed or highly suspicion of TSC were recruited at eight Oncogenetics services from four different Brazilian regions, between August/2013 and May/2016. All patients were unrelated probands, including 17 familial and 36 sporadic cases. The study was approved by the institutional review board, Comitê de Ética em Pesquisa do Hospital de Clínicas de Porto Alegre (CEP-HCPA), under registration numbers GPPG 13-0260 and GPPG 15-0049. All individuals or legal representatives signed a written informed consent. Germline DNA samples were obtained from peripheral blood using a commercial kit (Flexigene Blood Kit, Qiagen, USA). Standardized clinical information was collected retrospectively by clinicians from each center after reviewing the medical records.

Large deletion/duplication analysis
All 53 unrelated individuals were screened for large TSC1 and TSC2 deletions and duplications by SALSA Multiplex ligation-dependent probe amplification (MLPA) analysis. Commercial SALSA MLPA kits P124-C1 and P046-C1 (MRC-Holland, Amsterdam, The Netherlands) were used for TSC1 and TSC2 analysis, respectively, according to the manufacturer's instructions. The P124-C1 and P046-C1 probe mixes contain probes for each of the TSC1 and TSC2 exons and 9 and 8 reference probes detecting different autosomal chromosomal locations, respectively. In addition, P046-C1 contains one probe for the PKD1 gene, adjacent to TSC2, which is associated with polycystic kidney disease. DNA samples from healthy individuals were used as normal copy number controls. MLPA amplification products were separated on an ABI3500 capillary sequencer (Applied Biosystems, Foster City, CA, USA) and the results were analyzed using Coffalyser.net. Ratios <0.7 were considered deletions and ratios >1.4 were considered duplications. The chromosomal microarray technique CytoScan HD (Affymetrix, USA) was used to confirm MLPA analysis when a deletion/duplication larger than 300kb was identified by MLPA, as recommended by the manufacturer. The high-density, whole-genome CytoScan Array includes 2.69 million markers for copy number analysis. Chromosome Analysis Suite software (ChAS software 3.1) was used to analyze and visualize microarray data as well as for comparison of results with built-in reference of more than 400 samples.

Next generation sequencing (NGS)
TSC1 (NM_000368.4) and TSC2 (NM_000548.3) amplicons were designed using the AmpliSeq Designer software (Thermo Fisher Scientific, CA, USA), targeting the complete coding sequence, 50 bp exon-intron junctions and 5' and 3' untranslated regions of TSC1 and 99,83% of the coding sequence, 50 bp exon-intron junctions and 5' and 3' untranslated regions of TSC2 gene, resulting in a total of 112 amplicons. A region of 17 base pairs of TSC2 exon 29 remained uncovered. Amplicon library was prepared using the Ion AmpliSeq Library Kit 2.0 (Thermo Fisher Scientific, CA, USA) and NGS performed using 20ng of genomic DNA and an Ion 316 sequencing chip on an Ion Personal Genome Machine and the 200 Sequencing kit (Thermo Fisher Scientific, CA, USA), with 500 flows. Data from the Ion Torrent runs were analyzed using the platform-specific pipeline software Torrent Suite v3.2.1 for base calling, trim adapter and primer sequences and filtering out poor quality reads. The sequences were aligned to the hg19 human reference genome and for variant calling, the sequence runs were imported to the Ion Reporter software v5.0. Allele call frequency cutoff of 10% was used to investigate mosaic and non-mosaic germ-line variants. Phred score >500 was considered to filter variants. Variants were also reviewed and annotated using this software. Integrative genomics viewer was used for visualization of the mapped reads [19].

NGS Validation by Sanger sequencing
For every sample with a variant of interest in one of the TSC genes, specific primers for the corresponding exon(s) were designed using Primer Blast (http://www.ncbi.nlm.nih.gov/tools/ primer-blast/) and the reference sequences NM_000368.4 -TSC1 and NM_000548.3 -TSC2. In addition, a specific primer for TSC2 exon 29 (not covered in the NGS panel) was designed and DNA from all individuals was sequenced by the Sanger method for this exon. Primers were also designed for TSC1 and TSC2 promoter regions for variant screening in individuals with no identifiable pathogenic or probably pathogenic variants detected by MLPA or NGS. Primer sequences are available upon request. Forward and reverse primers were used to sequence the purified PCR products, using the BigDye Terminator v3.1 Cycle Sequencing Kit on an ABI 3500 Genetic Analyzer (Thermo Fisher Scientific, CA, USA). Sequences were aligned to their reference using CodonCode Aligner software implemented in MEGA 5.04. Variant calling and interpretation were based on the American College of Medical Genetics most recent guidelines [29]. Points were attributed to each variant according to these criteria, and they were classified as pathogenic, likely pathogenic, variant of uncertain significance (VUS), or likely benign.

Statistical analysis
We compared the frequency of each clinical finding between male and female individuals. Statistical analysis was performed by conventional chi-squared or Fisher's exact test using SPSS software (version 19.0).

Results
We recruited 53 individuals with TSC from four different Brazilian regions (only the North region was not represented). Region of birth of the patients studied is summarized in S1A Fig Table. Median age at recruitment was 14 years (range: 6 months to 50 years) and average age at onset or recognition of the first symptoms was 3.3 years. Average age of TSC diagnosis was 7.1 years in familial cases and 2.6 years in sporadic cases. Fifty-two patients fulfilled the definitive TSC criteria established by the 1999 Tuberous Sclerosis Consensus Conference [2].
A phenotypic comparison between genders for each clinical feature is shown in Table 1. Only shagreen patch was more frequently observed in females (p = 0.028). This difference may occur due to random chance, since multiple comparisons were performed. Lymphangioleiomyomatosis (LAM) was only detected in one female patient. There were no differences in the frequency of any symptom when we compared familial and sporadic cases.
Overall, 50 distinct variants were identified in 47 (89%) out of the 53 patients. MLPA analysis identified five (9%) patients who were heterozygous for large rearrangements in TSC2: four large deletions (7%) and one large duplication (2%). A complete TSC2 deletion observed in one family was confirmed by CytoScan HD as a heterozygous deletion of 2.0Mb (108 genes including TSC2 and PKD1). Two single exon deletions (exon 8 in TSC1 and exon 19 in TSC2) were detected by MLPA. NGS and Sanger sequencing revealed point mutations at the hybridization probe sites of the specific exons, thus excluding the occurrence of these single-exon deletions. A PKD1 deletion was found in one patient, since the MLPA kit P046 contains a probe for this gene. In addition, we identified 13 distinct heterozygous small variants in the coding region of TSC1 and 32 in TSC2. We did not observe evidence of mosaicism (considering allele proportions between 10-50%). The read-depth achieved per amplicon per subject is shown in S2 Table. A summary of the pathogenic and likely pathogenic variants is in Table 2. Families with a likely benign or a VUS are shown in Table 3.
The distribution of small mutations within the TSC1 and TSC2 genes is shown in Fig 1. The tuberin domain that interacts with hamartin was recently solved and is shown accordingly [30]. We calculated the mean number of small mutations (including splice site changes) per nucleotide for each exon of both genes. The overall mutation frequency was higher at the TSC2 locus (0.006 mutations per nucleotide) when compared to TSC1 (0.003 mutations per nucleotide). Exons 11, 19, 34 and 40 of the TSC2 gene had the highest frequency of mutations.
Considering clinical data, the most commonly observed skin/mucosal findings were hypopigmented macules, facial angiofibromas, shagreen patches and ungueal fibromas. Regarding central nervous system symptoms, the most common findings were cortical tubers, subependymal nodules, cognitive deficiency and seizures. Subependymal giant astrocytomas occurred in 23% of the patients. Other common findings were renal angiomyolipomas, multiple renal cysts and cardiac rhabdomyomas. We examined the clinical manifestations of patients with different types of mutations in different domains of the TSC1 and TSC2 to assess whether there was any correlation between mutation type and location in the gene with specific clinical features. Comparing the total number of individuals with a TSC2 mutation and seizures with the  number of individuals with TSC1 mutation and seizures, individuals with TSC2 mutation had a higher frequency of this symptom (p = 0.008). The same occurred when considering astrocytomas (p = 0.0038). We did not observe a difference in symptoms between patients with mutations in the first four exons of TSC1 and in the region that codifies the coiled-coil domain. However, patients with nonsense variants, independently of the position in the gene, had cognitive impairment and seizures, while patients with other types of mutation do not have these symptoms. Regarding TSC2, the symptoms were similar in patients with mutations early in the protein, in the middle or in the GAP-related domain. The different types of mutations also did not result in specific phenotypes in this gene. For instance, the patient with an entire TSC2 deletion had a similar phenotype to that of patients with point mutations. Clinical data for patients with a synonymous variant or no mutation identified are summarized in S3 Table. All of these patients had at least two major diagnostic criteria for TSC and were sporadic cases. Cognitive impairment, subependymal giant cell astrocytomas, and retinal hamartomas did not occur in the group without an identifiable mutation. Other symptoms were also observed less frequently in this group, although the differences were not statistically significant.

Discussion
This study sought to characterize the clinical and molecular profile of Brazilian individuals with tuberous sclerosis. Although many TSC1 and TSC2 disease-causing mutations have been identified in other populations, no studies including the Brazilian population have been undertaken. The overall mutation detection rate (89%) was within the expected in our study. Approximately two thirds of TSC probands worldwide are simplex cases [8]. Family history directly correlates with the presence of a deleterious mutation either in TSC1 or TSC2 [9,31]. In our series, the majority of the probands (68%) had no family history of the disease (similar to other reports); of these cases, 63% had a variant in TSC1 or TSC2 identified. In the familial cases, 82% had an identifiable variant. In addition, distribution of mutations was also similar to other studies showing a predominance of rearrangements and point mutations in TSC2 [32]. However, while TSC2 mutations are 4-5 times more common than TSC1 mutations in the literature, in our study TSC2 mutations were only 2.5 times more common than TSC1 mutations [9,31]. The reason for the higher frequency of TSC2 mutations in our population is currently unknown. The coding region of TSC2 is about 50% larger than TSC1, the number of exons is nearly doubled, and the frequency of nonsense mutations and small indels are roughly proportional to difference in the gene size. In addition, TSC2 has a much higher GC content than TSC1 (60% vs. 43%), which could favor point mutation occurrence. On the other side, TSC1 contains more repeat elements than TSC2 (32% vs. 25% total sequence), which could favor the occurrence of gene rearrangements. However, TSC2 rearrangements were seen in our cohort, while TSC1 rearrangements were not. Mutations were distributed throughout all gene regions with the exception of the 3' region of TSC2. Fifteen variants occur in the hamartin or tuberin functional domains and all frameshift and nonsense alterations outside these domains create a stop codon that produces an incomplete protein with partial or no functional domains. There was a high occurrence of splice site mutations at the donor site of exon 10 of TSC2, and no higher frequency of other types of mutation in other gene regions.
Using a combined approach of NGS and rearrangement analysis by MLPA, we were able to identify 20 novel variants (8 in TSC1 and 12 in TSC2) and 30 previously reported variants. Three deletions has already been described in the literature, including a large deletion including the TSC2 and the PKD1. One possible explanation for the occurrence of a TSC phenotype with no identifiable germline TSC1 or TSC2 mutation in six probands (11%), could be related to intronic mutations distant from the exon-intron boundaries, which could affect the splicing process or gene regulation, causing a reduction of the normal mRNA transcript. In addition, somatic mosaicism could account for some of these cases, as described before [9], but this was not observed in any of the cases studied. However, we must emphasize that we reached a variant call frequency of >10%. Therefore, mosaics at a level of 10% or less variant call frequency would not have been detected, but it is not clear if mosaics at a level of 10% or less have clinical significance. Finally, a third genetic locus related to TSC could exist.
Results from several studies over the past few years have provided insights on how tuberin and hamartin might affect cell proliferation, growth, adhesion, migration, or protein trafficking. It has been demonstrated that tuberin and hamartin interact directly with each other, forming a cytoplasmic protein complex [33]. The C-terminal putative coiled-coil domain of hamartin is necessary for interaction with tuberin HEAT-repeat domain. Additionally, tuberin is phosphorylated at serine and tyrosine residues in response to growth factors, which affects the interaction between hamartin and tuberin [34]. The GAP-related domain (Fig 1) of tuberin is responsible for the inhibition of cell division by indirect modulation of mammalian target of rapamycin (mTOR), a central regulator of translation [35]. Considering the importance of these domains, mutations in the interaction domains or GAP-related domain, as well as in phosphorylated residues in tuberin or loss of function mutations that exclude these domains are likely to be pathogenic. Missense and splice site mutations may also affect directly these domains or interfere in protein folding, charge and hydrophobicity. Although we did not find mutations in tuberin phosphorylation sites, we identified mutations that affect hamartin or tuberin functional domains. Furthermore, all nonsense mutations observed cause a premature stop codon that excludes an important functional domain. Nonsense-mediated decay (NMD) could also explain the loss of function effect of nonsense and frameshift variants in TSC1 and TSC2. Canonical splice site changes were classified as potentially pathogenic by bioinformatics algorithms and have functional tests already described in literature that proved their pathogenicity (they exclude the corresponding exons of the genes): c.975 + 1G>T and c.976-15G>A in TSC2 and c.1439-2A>G in TSC1 [36][37][38]. The mutation c.2355+1_2355+4del in TSC2 that may also affect splice site has been reported as pathogenic after functional validation [39]. The other splice site variants found in this study are predicted by in silico tools to not change the splicing.
It is always difficult to predict the effect of missense variants on protein function. Analysis of familial segregation may help, but the progressively small size of families, lack of family history information, and the predominance of simplex cases make segregation analysis challenging. We chose to use two in silico prediction tools that combine several pathogenicity scores to achieve a consensus classification and try to reduce misclassification of the variants. M-CAP uses existing pathogenicity likelihood scores and direct measures of evolutionary conservation to achieve a misclassification rate of the pathogenic variants of less than 5%. PredictSNP 1 is a consensus classifier that combines six tools and provides significant improvement in prediction performance over the individual tools and over other consensus classifiers, such as CONDEL and Meta-SNP [40,41]. We were able to classify as pathogenic or likely pathogenic three of the six missense variants found in our probands: the TSC2 exon 11 (c.1019T>C) variant has functional studies indicating significant lower TSC2 expression [42]. The mutations p.Leu1562Pro and p.Pro1709Leu are localized in the GAP related domain of tuberin. Prolines are known to have a very rigid structure, sometimes forcing the backbone into a specific conformation. In the first mutation, a change from a leucine to a proline could disturb the GAP domain conformation; in the latter, the mutant residue is bigger and could lead to bumps in this functional domain. Additionally, a functional study showed that this mutation increases the ratio of T389-phosphorylated to total S6K when comparing to wild-type TSC2, which corresponds to an increase in mTORC1 activity [43]. The other three missense variants (p.Gly671Cys, p.Arg1329His and p.Ser1466Leu) are outside functional domains and occur concomitantly with other pathogenic mutations in TSC2. Both p.Arg1329His and p.Ser1466-Leu variants have been described at low frequencies in gnomAD. VUS detected in the present study would be good candidates for functional studies which could help to establish their pathogenicity.
Clinical presentation did not differ between the genders, and signs and symptoms of TSC were most commonly observed in adults. Dermatological, central nervous system, and renal findings are described as the most common clinical features, observed in over 80% of the patients, while cardiac rhabdomyomas are present in 50%, and lymphangioleiomyomatosis in 40% of the female patients [44]. The frequencies of the most common symptoms in our cohort of patient were similar to those previously described, with exception of lymphangioleiomyomatosis, which we observed only in one female patient. We observed that TSC2 variants were associated with a more severe phenotypic spectrum when compared to TSC1 variants, which is consistent with other studies [9,31]. Although previous studies have found similar results, all statistical findings in our study may occur due to random chance, since multiple comparisons were performed and the sample size was limited. Finally, a previous study has described two individuals with a TSC2-PKD1 deletion with severe renal manifestations and skin alterations of TSC [45]. We did not observe more severe renal symptoms in patients with PKD1 deletion. We were unable to establish any additional meaningful genotype-phenotype correlations in this series, what could be due to the extensive molecular heterogeneity observed in this first series of Brazilian patients with TSC. Several limitations must be considered when analyzing the results of our study: (i) patients were classified as sporadic cases when no relatives presented symptoms of TSC; (ii) failure in the recruitment of patients' relatives to make a complete mutation segregation analysis. This is particularly difficult in sporadic cases, when relatives do not have symptoms of TSC and need to be submitted to genetic tests; (iii) clinical data collection was performed carefully, but some characteristics were not evaluated in all patients, as shown in Table 1. This occur when evaluations are requested but not performed by the patients, or when the data is not available in the medical records. However, these limitations probably did not interfere in variant classification and genotype-phenotype correlation assessment. In 23% of the patients no pathogenic or likely pathogenic TSC1 or TSC2 germline variant was identified. The molecular cause of the TSC phenotype of these patients remains elusive.

Conclusion
Genetic testing is currently part of the TSC diagnostic criteria [3]. In individuals with suspected TSC, clinical diagnosis is complicated by a high degree of phenotypic variability and the potential for a late onset of certain features of the disease. Thus, genetic testing can play an important role in diagnostic confirmation, enabling genetic counseling to families, and providing additional understanding towards the etiology of the disorder. We designed a molecular diagnosis strategy for TSC that showed an overall variant detection rate of 89%; 69% of the patients had a pathogenic or likely pathogenic variant. No specific genotype-phenotype correlations were established in this specific cohort, but we confirmed findings described in other populations. Early genetic diagnosis of patients with TSC will become more important as better therapeutic interventions become available.