Two novel and correlated CF-causing insertions in the (TG)mTn tract of the CFTR gene

Two novel and related pathogenic variants of the Cystic Fibrosis Transmembrane conductance Regulator (CFTR) gene were structurally and functionally characterized. These alterations have not been previously described in literature. Two patients with diagnosis of Cystic Fibrosis (CF) based on the presence of one mutated allele, p.Phe508del, pathological sweat test and clinical symptoms were studied. To complete the genotypes of both patients, an extensive genetic and functional analysis of the CFTR gene was performed. Extensive genetic characterization confirmed the presence of p.Phe508del pathogenic variant and revealed, in both patients, the presence of an insertion of part of intron 10 in intron 9 of the CFTR gene, within the (TG)m repeat, with a variable poly-T stretch. The molecular lesions resulted to be very similar in both patients, with only a difference in the number of T in the poly-T stretch. The functional characterization at RNA level revealed a complete anomalous splicing, without exon 10, from the allele with the insertion of both patients. Consequently, the alleles with the insertions are expected not to contribute to the formation of a functional CFTR protein. Molecular and functional features of these alterations are compatible with the definition of novel CF-causing variants of the CFTR gene. This also allowed the completion of the genetic characterization of both patients.


Introduction
Cystic Fibrosis (CF) is an autosomal recessive disease caused by pathogenic variants of Cystic Fibrosis Transmembrane conductance Regulator (CFTR) gene [1][2][3][4]. A high intragenic variability, due to a huge number of CFTR variants combined in trans, is often enhanced by the presence of two or more variants in cis to constitute haplotypes and/or complex alleles [5,6]. Additional genetic variability is also introduced by modifier genes, which can affect the severity of the CF phenotype through, for example, an alternative chloride conduction, the regulation of splicing, the modulation of CFTR gene expression and the modulation of susceptibility a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 mRNA without exon 10. These findings point to two novel CF-causing variants of the CFTR gene.

Biochemical, microbiological and clinical characterization of patients
The biochemical, microbiological and clinical features of patients are summarized in Table 1.
The first case, a female born in 2001 (patient 1), was diagnosed for symptoms at the age of 10 months; at that time newborn screening program for CF was not yet introduced in her region of origin. Family history of CF was negative. She developed gastrointestinal symptoms (underweight and diarrhoea) shortly after birth. The symptoms did not improve neither weaning from breast milk to a soy formula because of a suspected cow's milk protein allergy, nor weaning from soy formula to rice formula because of a suspected soy allergy. Serologic testing for coeliac disease, faecal occult blood test and stool and urine cultures were negative. After an episode of pseudo-Bartter's syndrome (loss of salts) at 10 months CF was strongly considered. Sweat chloride values of 80 and 92 mmol/L confirmed the diagnosis. A first level genetic analysis showed the presence of only the c.1521_1523delCTT, p.Phe508del (legacy name F508del) pathogenic variant in heterozygosis. The segregation analysis demonstrated the paternal origin of the p.Phe508del pathogenic variant (absent in the mother). Faecal chymotrypsin activity was undetectable and she developed PEI. Consequently, a pancreatic enzyme replacement therapy was initiziated improving her nutritional status. To date, the patient body mass index The second case was a male born in 2012 (patient 2). The child was adopted soon after birth. Information about patient's biological family health history and ancestry are lacked, but he shares the same geographical origin of the patient 1 (south of Marche Region, Central Italy). To complete the genotypes of both patients, extensive genetic and functional analyses of CFTR gene were performed (see below).

Sweat test and pancreatic status evaluation
Both patients underwent a sweat test twice, performed by a quantitative pilocarpine iontophoresis method [32], by using the Macroduct device (Delcon, Milan, Italy) for sweat collection and the Jenway PCLM3 analyser (VWR International, Milan, Italy) for chloride measurement. In accordance with recent guidelines [22], the sweat test was considered positive for both patients because all values were > 60 mEq/L.
For patient 1, exocrine pancreatic function was evaluated by the dosage of stool chimotrypsin [33] (test Chymo, Roche, Mannheim). A pathological level of stool chimotrypin (< 4.2 U/ g) was determined in at least two independent dosages. For patient 2, exocrine pancreatic function was evaluated by the dosage of stool elastase 1 [34] (test PANEL FE Elastase Elisa, Bioserv Diagnostics, Germany). A pathological level of fecal elastase 1 (< 200 μg/g) was determined in at least two independent dosages.

CFTR mutational analysis
Genomic DNA was extracted from peripheral blood leukocytes by the QIAamp DNA Blood midi kit (Qiagen, Hilden, Germany) and quantified using a fluorimeter (Qubit, Invitrogen, CA, USA). The first level genetic analysis of CFTR gene (RefSeq NM_000492.3, NG_016465.3) was performed by INNO-LIPA CFTR19, CFTR17+Tn Update and CFTR Italian Regional kit (Fujirebio Europe). A second level genetic analysis was performed by sequencing. The proximal 5'-flanking, all exons and adjacent intronic zones were PCR-amplified and sequenced by a Sanger cycle sequencing protocol (ThermoFisher scientific, Waltham, MA, USA) in a 96-well format already published by us [35], using a genetic analyzer (ABI PRISM 3130xl; Applied Biosystems, Foster City, CA, USA). For data analysis, a specific template for SeqScape software version 2.7 (Applied Biosystems) were used [36]. A third level genetic analysis was performed by multiplex ligation-dependent probe amplification (MLPA) (MRC-Holland). The primers specifically used for the analysis of the insertions are reported in Table 2. In particular, the pair F1-R1 was used for PCR amplification of the zone of the insertion. To analyze both the wild type and the anomalous amplicons evidenced at the PCR step, they were separately extracted and purified from 1% agarose gel by the QIAquick Gel Extraction kit (Qiagen) and individually sequenced by a Sanger cycle sequencing protocol (ThermoFisher). In particular, primers F1 and F3 were used for forward sequencing and primers R1 and R3 for reverse sequencing, of purified amplicons. The hypothesis that the molecular lesion was duplicated (and not transferred from intron 10 to intron 9, respectively intron 9 and intron 8 in previous nomenclature) was verified by sequencing the zone of interest in intron 10. This was achieved by using specific primers for PCR amplification ( Table 2, pairs F2-R2) and then by extraction, purification and Sanger cycle sequencing of amplicons ( Table 2, primer F2 for forward sequencing and primer R2 for reverse sequencing). The primer R2, external to the inserted zone, ensured the PCR amplification of the wild type zone of intron 10.
The primers used for exon 10 analysis are located in the preceding and following introns of exon 10 to span the entire exon 10 and previous and subsequent adjacent intronic zones. This large positioning of primers and the negativity of MLPA analysis excluded the possibility that results could be influenced by some duplication of part of exon 10 at multiple locations of probands' genome.

Fragment analysis
To confirm the right size of the molecular alterations, a DNA fragment analysis protocol (ThermoFisher) was optimized. A specific oligonucleotide (forward: 5'-TTGATAATGG GCAAATATCTTAG-3') was labeled with fluorescent dye (dROX; Applied Biosystems) and used for PCR amplification (paired with the following reverse primer: 5'-CCTTCCAGCACT ACAAACTA-3') of the zone of interest. Labeled amplicons were separated by capillary electrophoresis on ABI PRISM 3130xl (Applied Biosystems) and analyzed using GeneMapper software version 4.1 (Applied Biosystems).

CFTR expression analysis
Expression analysis was performed by a protocol that, using an optimized set of primers, allows the display of any possible anomalous splicing of CFTR mRNA, as previously described [37]. Briefly, starting from nasal brushing of patients, RNA was extracted by the RNeasy mini kit (Qiagen), reverse transcribed and amplified by RT-PCR (Bio-Rad, Hercules, CA, USA) producing 7 amplicons spanning the entire CFTR mRNA. All the 7 cDNA amplicons were extracted from agarose and individually sequenced as described above. The primers specifically used for the analysis of exon 10 splicing are reported in Table 3. In particular, one forward Table 2. Primers used for genomic DNA amplification and sequencing. Pairs F1-R1 and F2-R2 were used for PCR amplification. Primers F1, F2 and F3 were used for forward sequencing; primers R1, R2 and R3 were used for reverse sequencing. The position of the nucleotide at 5'-end of each primer is indicated in both legacy and HGVS name. The annealing temperatures used (Ta) and the length of amplicons are indicated. primers (F4, located in exon 9) and two different reverse primers (R4, located in exon 12 and R5, located in exon 11) were used. All cDNA amplicons, spliced and unspliced, were recovered from agarose gel and forward sequenced by F4 primer and reverse sequenced by R4 or R5 primers. This allowed the evaluation of exon 10 splicing and of the presence of p.Phe508del pathogenic variant, as well as of their segregation on different alleles also at cDNA (RNA) level. Gel electrophoresis runs were scanned by a CCD camera (VisiDoc-It; UVP, Cambridge, UK) and analyzed on the VisionWorks LS software version 6.7.3 (UVP) for densitometric assays. Expression analysis results were confirmed by Real Time PCR.

Statistical analysis
Expression data were evaluated using analysis of variance (ANOVA) by GraphPad Prism 5. A p<0.05 was considered statistically significant.

Ethics statement
The analyses were performed and the data were collected for diagnostic purposes. In particular, to complete the genotype of both patients and to confirm the diagnosis of Cystic Fibrosis. If the study is performed for diagnostic purposes it is not necessary, in our Institution, to seek the approval of the ethics committee. We have, for both families, the informed consent to the genetic test and the authorization to publish the results.

Results
In both patients, the genetic characterization performed by sequencing and MLPA confirmed the presence of p.Phe508del pathogenic variant in heterozygosis. No other known CF pathogenic variant was identified by these methods. The polymorphic (TG)mTn tract, known to modulate exon 10 splicing, appeared to be non-pathological. However, the amplification of the exon 10 and surrounding intronic regions of CFTR gene showed an extra amplicon greater (of about 300 nucleotides) than the WT in both patient 1 and patient 2 (Fig 1, lanes 1 and 3 respectively). This amplicon is absent both in the patient 1 father and in an individual of the general population (Fig 1, lanes 4, 5 respectively) while it is present in the patient 1 mother (Fig 1, lane  2). The dimensions of the molecular alterations of both patients resulted to be similar, with only a small increase in the length of that of patient 2 (Fig 1, lane 3).
The structure of anomalous amplicons was reconstructed by both fragment analysis and sequencing after their recovery from agarose gel. By fragment analysis, the migration profile of each WT sample was compared to the anomalous sample profiles. This allowed to determine the presence or absence of the molecular alterations and to have an estimation of their size independent from sequencing. We selected the most efficient primer which let to distinguish between the anomalous and the WT alleles in several tested samples. The size of all amplicons, Table 3. Primers used for cDNA amplification and sequencing. Pairs F4-R4 and F4-R5 were used for RT-PCR. Primer F4 was used for cDNA forward sequencing; primers R4 and R5 were used for cDNA reverse sequencing. The position of the nucleotide at 5'-end of each primer is indicated in both legacy and HGVS name. The annealing temperatures used (Ta) and the length of amplicons are indicated. Novel CF-causing insertions in (TG)mTn defined by DNA fragment analysis approach, resulted to be 483.9, 799.1 and 840.1 nucleotides respectively for WT, patient 1 and patient 2 amplicons (Fig 2, panels 1, 2 and 3 respectively). Each anomalous amplicon was sequenced by distal ( Table 2; F1 for forward sequencing and R1 for reverse sequencing) and proximal ( Table 2; F3 for forward sequencing and R3 for reverse sequencing) primers. In particular, the proximal primers (especially the F3 forward primer anchored to the end of (TG)m and to the start of Tn) allowed a better count of the number of T. The data from sequencing and fragment analysis were elaborated by SeqScape software (Applied Biosystems) to reveal the exact structure of molecular lesions, as shown in Fig 3. The analysis showed a rearrangement at the level of the polymorphic (TG)mTn tract (Fig 2, panel 2  and 3), compared to the WT (Fig 2, panel 1). The presence of many repeated T, within the (TG)m repeat, followed by a sequence not corresponding to exon 10 of the CFTR gene was found. This repetition resulted longer in the patient 2 (76T, Fig 2, panel 3) than in the patient 1 (35T, Fig 2, panel 2). The different size between case 1 and case 2 anomalous amplicons resulted to depend only from the different number of T repeat. The alignment of DNA sequences with reference templates revealed, in both patients, the presence of an insertion of part of intron 10 within the (TG)m repeat of intron 9 of the CFTR gene, preceded by the poly-T stretch (Fig 2, panels 2 and 3). The portion of intron 10 inserted in intron 9 is the result of an anomalous DNA duplication, as verified using a specific PCR amplification targeted to the WT sequence of intron 10 ( Table 2; primers F2 and R2). This allowed confirming the presence of WT intron 10 in both patients.
To determine if the molecular alterations independently segregate from the p.Phe508del pathogenic variant, parental DNA studies were made. However, they were possible only for patient 1. The patient 1 father carried the p.Phe508del pathogenic variant (but not the novel insertion) and the mother carried the novel insertion (but not the p.Phe508del). This confirmed that the genotype is compound heterozygous for the p.Phe508del pathogenic variant on one allele and the novel insertion on the other. Although it was not possible to verify the allelic segregation in the case 2 parents, it is highly probable that also the case 2 is compound heterozygous for the other novel insertion. In addition, for both patients segregation analysis was also performed at cDNA level (see below). To verify the functional effect of the molecular lesions, the expression analysis of the entire CFTR mRNA was performed. All the amplicons of RNA analysis not including exon 10 resulted WT after recovery from agarose gel and sequencing. The amplicon including exon 10 revealed an alteration of mRNA structure. The functional characterization at RNA level, by RT-PCR and sequencing, revealed, in both cases, the presence of an anomalous splicing of exon 10 (Fig 4A, lanes 2 and 3, lower arrow). This aberrant splicing appeared to completely depend on the allele with the molecular alteration, as the other allele showed a non-pathological (TG)10T9 polymorphic tract in both patients. In fact, a homozygous (TG)10T9 polymorphic tract showed a complete absence of anomalous splicing in WT controls ( Fig 4A, lane 1, upper arrow, showed as a representative example). In addition, for both patients, the sequencing of the lower and the higher cDNA bands shown in Fig 4 (after their separate recovery form agarose gel) evidenced the inclusion of exon 10 and the presence of p.Phe508del pathogenic variant in the higher band and the exclusion of exon 10 and the absence of p.Phe508del in the lower band. This confirmed for both patients the segregation of both novel insertions from the p.Phe508del pathogenic variant. A quantitative densitometric assay of both CFTR amplicons, with or without exon 10 were performed (Fig 4B). The percentage of total exon 10 anomalous splicing (Fig 4B, right panel) resulted to be 65.3% (±4.8) and 59.3% (±7.8) in the patient 1 and patient 2 respectively, compatible with the presence of one allele with the molecular alteration. Quantitative data of splicing were confirmed by real time PCR. Consequently, the mutated allele is expected not to contribute to the formation of functional CFTR protein. By contrast, in the WT control no exon 10 anomalous splicing was found.
These studies allowed to understand and schematize the structural organization of molecular alterations (Fig 5). It is an insertion that disrupt the canonical (TG)mTn splicing site, located at the level of intron 9 -exon 10 junction. In particular, a physiological repetition of (TG)10 is followed by an abnormal poly-T stretch, different in length in the two cases: 10 T in case 1 and 51 T in case 2. After this poly-T stretch a 306 bp portion of intron 10, starting with Novel CF-causing insertions in (TG)mTn 25 T, is inserted. Consequently, a total of 35 T and 76 T for case 1 and case 2 respectively was achieved. After the insertion, in both cases a G was found. The overall length of the molecular alterations were 317 bp for the case 1 and 358 bp for the case 2. At the end of the molecular alterations a (TG)4T7 repetition was found.

Discussion
Currently, an extended CFTR mutational analysis in CF patients achieves up to 98% of detection rate (DR), leaving about 2% of genotypes with one or two unknown alleles [38]. This is an effect of the high genetic heterogeneity of the CFTR gene, which may be damaged by a lot of rare pathogenic variants. Due to their low frequency, these rare variants may escape even to mutational search protocol with high detection rate. In addition, several pathogenic variants have peculiar structural features and, consequently, may be not recognized during common mutational searches [39]. The very low frequency (even the individuality) and the structural peculiarity of the unknown alleles of the two cases described here are probably the reasons of their non-recognition at the initial genetic analysis. To date, although over 2000 variants of the CFTR gene are known, some escape the most common analysis protocols. In addition, only for a small part of these variants an experimental functional characterization has already been done.
The patients described here had a classical (TG)10T9 tract on one allele and a disrupted (TG)mTn tract on the other. As already known, the polymorphic (TG)mTn tract, located in the CFTR intron 9, regulates the splicing of CFTR exon 10. Variations of this tract may cause deleterious effects on the pre-mRNA splicing [40]. The two novel pathogenic variants correspond both to an insertion, within canonical (TG)mTn tract, of 306 nucleotides from the intron 10, preceded by a T repeat of different length (10 and 51 respectively for case 1 and case 2). In both cases, the position of the insertion does not allow defining the specific initial (TG) mTn tract of these alleles. These molecular lesions alter the physiological splicing of the CFTR pre-mRNA. We functionally studied these samples both by RT-PCR, gel electrophoresis and densitometric analysis, as well as by Real-time PCR obtaining, by both approaches, percentages of anomalous transcript over 50%. Considering the very limited contribution of (TG)10T9 allele to exon 10 anomalous splicing (as evidenced both in literature and in our experiments), it appears that these insertions completely suppress the physiological splicing process and cause the loss of functional CFTR protein. Consequently, when paired with a severe CFTR pathogenic variant in trans on the other allele, they originate classic CF phenotypes. These molecular lesions have not been previously described in literature and were found in two patients with similar phenotypes. Their molecular and functional features, allow classifying them as novel CF-causing variants of CFTR. They could be named as c.1210-34TG[10]_1210-34TG [4]ins317 for case 1 and c.1210-34TG[10]_1210-34TG [4]ins358 for case 2. Their selection and functional characterization allowed the completion of the genetic characterization of both patients.
Both pathogenic variants derive from the duplication of the same portion of DNA from intron 10. The poly-T stretch at the beginning of the duplicated portion may had some kind of interaction/recombination with the repeated T of the (TG)mTn tract, which could constitute the first mutational event originating the insertion. Then, a molecular mechanism of divergence from this common ancestral allele can be supposed. Due to the presence of several At CFTR intron 9 level, within the (TG)mTn tract, a poly-T stretch followed by the insertion of 306 bp of CFTR intron 10 was evidenced. The molecular lesions resulted to be very similar in both patients, with only a small difference in the number of T in the poly-T stretch preceding the common part of the lesion (in particular 10 T in case 1 and 51 T in case 2). The figure also reports the position of primers (F1, F2, F3, R1, R2, R3) used and described in Table 2. See text for explanation. https://doi.org/10.1371/journal.pone.0222838.g005 Novel CF-causing insertions in (TG)mTn repeated T at the beginning of the inserted portion, it is possible that during the meiotic divisions the mutated allele underwent two different rearrangements leading to the formation of two novel pathogenic variants. As a support of this hypothesis, it should be taken into account that both patients have their origin in the same limited geographical area.
Mutational search protocols at high sensitivity and specificity are useful to the structural and functional characterization of CFTR alleles unknown at initial genetic characterization. The discovery of novel rare pathogenic variants of CFTR, as well as their experimental functional characterization, are mandatory to ameliorate our diagnostic, prognostic and, in the era of CF personalized medicine, therapeutic ability.