Novel mutations in COL4A3, COL4A4, and COL4A5 in Chinese patients with Alport Syndrome

Alport syndrome (AS) is a clinically and genetically heterogeneous, progressive nephropathy caused by mutations in COL4A3, COL4A4, and COL4A5, which encode type IV collagen. The large sizes of these genes and the absence of mutation hot spots have complicated mutational analysis by routine polymerase chain reaction (PCR)-based approaches. Here, in order to design a rapid and effective method for the genetic diagnosis of AS, we developed a strategy by utilizing targeted capture associated with next-generation sequencing (NGS) to analyze COL4A3, COL4A4, and COL4A5 simultaneously in 20 AS patients. All the coding exons and flanking sequences of COL4A3, COL4A4, and COL4A5 from the probands were captured followed by HiSeq 2500 sequencing. Candidate mutations were validated by classic Sanger sequencing and quantitative (q)PCR. Sixteen patients (16/20, 75%) showed X-linked inheritance, and four patients (4/20, 20%) showed autosomal recessive inheritance. None of the individuals had autosomal-dominant AS. Fifteen novel mutations, 6 known mutations, and 2 novel fragment deletions were detected by targeted capture and NGS. Of these novel mutations, 12, 3, and 2 mutations were detected in COL4A5, COL4A4, and COL4A3, respectively. A comparison of the clinical manifestations caused by different types of mutations in COL4A5 suggested that nonsense mutations and glycine substitution by an acidic amino acid are more severe than the other missense mutations. Pathogenic mutations were detected in 20 patients. These novel mutations can expand the genotypic spectrum of AS. Our results demonstrated that targeted capture and NGS technology are effective in the genetic diagnosis of AS.


Introduction
Alport Syndrome (AS) is a type of inherited nephropathy characterized by hematuria, proteinuria, and progressive renal failure, often associated with extrarenal manifestations such as sensorineural hypoacusis and ocular abnormalities [1]. The pathogenesis of AS is genetically heterogeneous and is caused by mutations in the genes encoding type IV collagen [2,3]. The α3, α4 and α5 chains of type IV collagen are encoded by COL4A3, COL4A4, and COL4A5, respectively [4]. Type IV collagen is a major constituent of the glomerular basement membrane (GBM), and is composed of six kinds of homologous α chains, which assemble into three different heterotrimers (α 1 α 1 α 2 , α 3 α 4 α 5 , and α 5 α 5 α 6 ) [3]. Among them, the α3α4α5 heterotrimer is essential for the structure and function of the basement membrane in the glomeruli of the kidney, cochlea, and eye [3,5].
Clinically, AS is heterogeneous, and patients always present a wide variability of clinical manifestations: the rate of progression to end-stage renal disease (ESRD) and the presence or absence of sensorineural deafness and ocular changes [12] depends on the mutation they carry [13][14][15]. In the case of XLAS, hemizygous males exhibit more severe symptoms than heterozygous female patients and usually reach ESRD before the age of 30 [13]. However, the female patients have more variable symptoms, from isolated hematuria to ESRD [16]. Individuals homozygous for COL4A3 and COL4A4 genes resulting in ARAS have similar clinical symptoms and prognosis as males showing XLAS, reaching ESRD in the first or second decade of life [17]. However, early diagnosis and therapy can improve the prognosis. Life expectancy can be increased via early effective and safe therapy for patients with AS [4]. The clinical phenotypes of heterozygous mutations in COL4A3 and COL4A4 are heterogeneous, and can cause autosomal dominant AS (ADAS), thin basement membrane nephropathy (TBMN), focal segmental glomerulosclerosis (FSGS) and benign familial hematuria (BFH) [18,19]. The early clinical manifestations of these diseases are similar, and therefore, genetic classification is critical. However, the large size of the genes and the absence of mutational hot spots have significantly hindered mutational analysis with Sanger sequencing. Recent advances in targeted gene capture associated with next-generation sequencing (NGS) and validated bioinformatics tools have facilitated research and diagnostics in the field of nephrogenetics [20]. Therefore, in this study, we developed a strategy to analyze COL4A3, COL4A4, and COL4A5 simultaneously using targeted capture and NGS in 20 patients with AS.

Patients
Twenty patients (18 males and two females) from unrelated Chinese families were elected from 400 patients from 2011 to 2014 diagnosed with AS according to standard criteria, and were enrolled in this study [21]. All these patients met Gregory's criteria of AS, including family history of nephritis, persistent hematuria and/or proteinuria, immunohistochemical evidence of complete or partial lack of epitope of type IV collagen in glomerular or epidermal basement membranes or both, widespread GBM ultrastructural abnormalities, and presence or absence of ear and eye abnormalities. A brief clinical summary of the patients is shown in Table 1.
Peripheral blood samples were collected from all the participants. This study was approved by the Ethics Committee of Jinling Hospital and written informed consent was obtained from patients and both parents.
Targeted capture and next-generation sequencing A custom capture array (NimbleGen, Roche, USA) was designed to capture all exons (1,625), splice sites, and the flanking intron sequences of 78 genes known to be associated with genetic kidney disease, including AS. The total size of the target regions of the capture array was 0.5 Mb. The pipeline described in previous studies [22,23] was followed to capture and enrich the targeted sequences, prepare the sequencing library, and for NGS. Genomic DNA was extracted from peripheral blood using the QIAamp DNA BloodMiNi Kit (Qiagen, Hilden, Germany) and then sheared into fragments ranging from 200-300 bp using an ultrasonoscope (Covaris S2, Massachusetts, USA). Oligonucleotide adapters from Illumina (single reads) were ligated to the fragments. After the ligation was complete, successful adapter ligation was confirmed by four-cycle PCR using a high-fidelity polymerase with PCR primers containing a custom-synthesized barcode sequence (8 bp) as a sample index signature. PCR was used to generate a library for further analysis, and the DNA adapter-ligated and indexed fragments from 10 libraries were pooled and hybridized to customized oligonucleotide probes. After hybridization of the sequencing primer, base incorporation was carried out on an Illumina Hiseq2500 platform (Illumina, San Diego, CA), following the manufacturer's standard cluster generation and sequencing protocols, for 90 cycles of sequencing per read, to generate paired-end reads including 90 bps at each end and 8 bps of the index tag by BGI, Tianjin, China.

Validation of candidate mutations by Sanger sequencing and quantitative (q)PCR
To validate the variations identified with NGS, the corresponding gene regions surrounding the variations were amplified by PCR and sequenced by Sanger sequencing. Purified PCR products were sequenced using the same primers and PCR conditions in both directions on an ABI3730xl DNA sequencer and were analyzed with the sequencer software (Sequencing Analysis 5.2). The two fragment deletions were identified by comparing the normalized sequencing depth of each exon in the same batch. Exons with a depth ratio of specific sample to others > 1.4 were considered to have duplication, while those with values <0.6 were considered to have fragment deletion. The cut-off value for the Z-score was 2.58. qPCR was used to validate the fragment deletions using Step One Plus (Applied Biosystems, CA, USA). The primers used for the amplification of COL4A3, COL4A4, and COL4A5 are listed in S1 Table. Function prediction and alignment analysis of missense variations Image analyses, error estimation, and base calling were processed by using the Illumina pipeline (version 1.3.4) after the entire run was completed to generate primary data. Indexed primers were used to identify the different reads from different samples in the primary data, and only reads that were perfectly matched to the theoretical adapter indexed sequences and those that matched the theoretical primer indexed sequences with a maximum of three mismatches were considered acceptable reads. Then, we removed a few unqualified sequences from the primary data using a local dynamic programming algorithm, which included low quality reads, defined as reads that contained >10% Ns in the read length, 50% reads with a quality value of <5 and with an average quality of <10, and adapter sequences including indexed sequences.
The remaining sequences were termed as clean reads for further analysis. Clean reads were aligned to the reference human genomic sequence (NCBI37/gh19) of COL4A3 (NM_000091.4), COL4A4 (NM_000092.4), and COL4A5 (NM_033380.2) by using the BWA software package (Burrows Wheeler Aligner) [24]. SNPs and INDELs were identified via the SOAPsnp software and GATK IndelGenotyper (http://www.broadinstitute.org/gsa/wiki/ index.php/), respectively [25]. The control database used in the pipeline included the 1000 genome database (http://www.1000genomes.org), dbSNP database, and a BGI in-house database, which included 2,087 normal subjects. Phylop (phyloP46wayPlacental) was used to calculate the conservation of each SNP. The functional impact of missense variation was analyzed with the PolyPhen-2 (Polymorphism Phenotyping v2.2.5) [26] and SIFT algorithm (Sorting Intolerant From Tolerant v5.1) [27]. According to the criteria of the American College of Medical Genetics, we divided the mutation sites into the following five categories: pathogenic, likely pathogenic, benign, likely benign and uncertain significance [28].

Identification of candidate mutations in COL4A3, COL4A4, and COL4A5
A total of 272 variants and 2 fragment deletions were identified in COL4A3, COL4A4, and COL4A5 from the 20 patients diagnosed with AS using targeted capture and NGS. Among these variants, 90.8% (247/272) were termed as "polymorphisms" with high frequency (>0.01 in control databases). We identified 16 missense mutations, one nonsense mutation, three frameshift mutations, and one 5 0 -untranslated region (UTR) mutation. Among these, six were known mutations that had been reported previously and 15 were novel ones ( Table 2); 66.67% (14/21) were new amino acid substitutions of glycine and 9.52% (2/21) were new amino acid substitutions of proline. These candidate mutations were validated by Sanger sequencing. The frequencies and prediction scores of these mutations are listed in Table 2.
In AS patient IID16, in addition to a glycine substitution (G572A) in COL4A4 in the heterozygous state, there was another heterozygous mutation c.-23T>G in the 5 0 -UTR of this gene. This heterozygous mutation c.-23T>G was inherited by his two sons (S1 Fig, III1, and III2) who presented persistent isolated hematuria and normal renal function. The clinical features of these two isolated hematuria patients accorded to the diagnosis of familial benign hematuria.

Identification of fragment deletions in COL4A5
IID3 and IID10 carried fragment deletions in COL4A5 which were validated by qPCR (Fig 1). IID10 had no sensorineural hearing loss and the mildest glomerular injury. However, IID10 carried a heterozygous c.3946G>A (p.G1316S) mutation in COL4A3 (S2 Fig & Table 2) and a hemizygous deletion of exon 44 COL4A5. These two mutations were inherited from each of his parents, who were healthy (S2 Fig). A hemizygous deletion of exon 29 was identified in IID3, and the deletion was inherited from his mother, who had hematuria. The sequencing depths of COL4A5 exons of patients IID3 and IID10 are listed in S2 Table; the PCR quantification results are shown in Fig 1.

Discussion
Of the 20 AS patients, 16 (16/20, 75%) showed X-linked inheritance (mutations in COL4A5) and four (4/20, 20%) showed autosomal recessive inheritance (mutations in COL4A3 and COL4A4). This ratio is similar to that reported in the literature [4]. In our study, we did not find any individuals with ADAS caused by mutations in COL4A3 or COL4A4. Twenty-one mutations and two fragment deletions were identified by targeted capture and NGS. Sixteen missense mutations, one nonsense mutation, three frameshift mutations, and one 5 0 -UTR mutation were identified. Among these, 15 were novel mutations and six were known mutations that had been reported previously. According to the criteria of the American College of Medical Genetics [28], the 5 0 -UTR mutation is an uncertain significant mutation, while the others are pathogenic mutations. Our results showed eight glycine substitutions and two proline substitutions in COL4A5, two glycine substitutions in COL4A4, and four glycine substitutions in COL4A3. Analyzing the correlations of phenotype and genotype, we found that the clinical phenotypes caused by glycine substitution by an acidic amino acid seem more severe than by a neutral amino acid in COL4A5. Patients IID1 (with p.G1060E), IID4 (with p.G675D) and IID15 (with p.G878E) presented relatively more severe symptoms, such as grosser hematuria or proteinuria, higher level of serum creatinine, more rapid progression of disease and with extra renal manifestations (Table 1) than other patients IID7 (with p.G1229S), IID11 (with p.G669S), IID12 (with p.G1229S) and IID13 (with p.G1170V). Similar severity was  observed in the clinical phenotypes caused by truncate mutations in COL4A5. All male patients IID6 (with p.R373 Ã ), IID18 (with p.S1371 Ã ) and IID20 (with p.P809Wfs Ã 9) presented extra renal manifestations (Table 1). In addition, IID6 and IID18 had gross hematuria or proteinuria and high-level serum creatinine (Table 1). IID20 exhibited modest clinical manifestations (Table 1), possibly because of his relatively young age. However, as the sample number is not big enough to make a meaningful conclusion, it is necessary to enlarge the sample size for getting further verification in future. The patient (IID10) with double gene mutations (p.G1316S in COL4A3 and a hemizygous deletion in COL4A5) which were inherited from each of his healthy parents (S2 Fig) had only moderate proteinuria, microscopic hematuria, normal renal function, and no extra renal manifestations. However, Mencarelli et al reported that double heterozygotes have more severe phenotypes, such as progression to ESRD by age 44 and sensorineural hearing loss, than individuals with heterozygous mutations in COL4A5 (in women) or COL4A4 [33]. As the modest clinical symptoms of IID10, the phenotype of this patient could be caused mainly by the COL4A5 deletion while COL4A3 mutation, as an incidental finding, probably played an insignificant role in the patient. However, considering his young age, further follow-up is necessitated for his clinical symptoms. The individuals with heterozygous mutations of COL4A3 or COL4A4 exhibit a broad range of phenotypes. Recent studies have shown that about 40% patients with FBH [34], 10% of familial FSGS patients [35], and <5% AS patients carry heterozygous mutations in COL4A3 or COL4A4 [18]. In our study, compound heterozygous mutations p.G997E and p.G1167R in COL4A3 were identified in the ARAS patient IID2. Interestingly, Xie et al reported p.G997E heterozygous mutations in an FSGS patient, and the pathological manifestations showed typical focal segmental glomerulosclerosis [29]. However, the patient IID2, whose renal biopsy showed lamellation changes of the GBM and significantly decreased expression of the α5 chain in GBM, was diagnosed with ARAS. A similar situation was observed in the family of patient IID16. The proband carried two mutations in COL4A4 (p.G572A and c. -23T>G) and was diagnosed with AS. Both his sons (S1 Fig, III1, and III2) with heterozygous mutations (c. -23T>G) inherited from their father, presented isolated hematuria and accorded to the clinical features of FBH. As the mutations in COL4A3 or COL4A4 are heterozygous, half of the α3 or α4 chains in the α3α4α5 heterotrimers of type IV collagen have a normal structure. Therefore, carriers of heterozygous mutations presented mild phenotype clinically. Thus, these carriers are different from the asymptomatic heterozygous carriers of some recessive metabolic diseases such as phenylketonuria. On the basis of these two cases, we proposed that FBH, AS and partial FSGS are type IV collagen-related disorders, and patients who carry a single heterozygous mutation or compound heterozygous mutations may present independent diseases.
Genetic testing is of great and increasing importance for diagnosing AS, and it has some advantages over conventional methods. Compared with renal biopsy, using DNA extracted from peripheral blood for genetic analysis is much less invasive for patients, especially for children. The sequencing results are independent of the stage of disease and the age of the proband. In addition, genetic testing is helpful in clinical diagnosis, especially in sporadic patients or patients whose renal biopsy data not available.
In conclusion, based on targeted capture and NGS, 17 new mutations in genes encoding type IV collagen were identified in 20 AS patients which will be added to the mutation spectrums of COL4A3, COL4A4 and COL4A5 related to AS. Therefore, combining genetic test with clinical and pathological phenotype, comprehensive diagnosis of AS can be performed at the individual, tissue and molecular levels.