Technical Evaluation: Identification of Pathogenic Mutations in PKD1 and PKD2 in Patients with Autosomal Dominant Polycystic Kidney Disease by Next-Generation Sequencing and Use of a Comprehensive New Classification System

Genetic testing of PKD1 and PKD2 is expected to play an increasingly important role in determining allelic influences in autosomal dominant polycystic kidney disease (ADPKD) in the near future. However, to date, genetic testing is not commonly employed because it is expensive, complicated because of genetic heterogeneity, and does not easily identify pathogenic variants. In this study, we developed a genetic testing system based on next-generation sequencing (NGS), long-range polymerase chain reaction, and a new software package. The new software package integrated seven databases and provided access to five cloud-based computing systems. The database integrated 241 polymorphic nonpathogenic variants detected in 140 healthy Japanese volunteers aged >35 years, who were confirmed by ultrasonography as having no cysts in either kidney. Using this system, we identified 60 novel and 30 known pathogenic mutations in 101 Japanese patients with ADPKD, with an overall detection rate of 89.1% (90/101) [95% confidence interval (CI), 83.0%–95.2%]. The sensitivity of the system increased to 93.1% (94/101) (95% CI, 88.1%–98.0%) when combined with multiplex ligation-dependent probe amplification analysis, making it sufficient for use in a clinical setting. In 82 (87.2%) of the patients, pathogenic mutations were detected in PKD1 (95% CI, 79.0%–92.5%), whereas in 12 (12.8%) patients pathogenic mutations were detected in PKD2 (95% CI, 7.5%–21.0%); this is consistent with previously reported findings. In addition, we were able to reconfirm our pathogenic mutation identification results using Sanger sequencing. In conclusion, we developed a high-sensitivity NGS-based system and successfully employed it to identify pathogenic mutations in PKD1 and PKD2 in Japanese patients with ADPKD.

Diagnosis of ADPKD is usually based on family history, ultrasonography, computed tomography (CT), or magnetic resonance imaging (MRI) [8]. Genetic testing, however, can facilitate the diagnosis in patients whose renal phenotypes are unclear and in patients for whom there is lack information regarding family history; it may also help identify donors for renal transplantation [9]. However, modern genetic testing methods are currently not part of the standard of care. This is partly because of the difficulties in testing PKD1 and PKD2 by conventional direct sequencing methods such as Sanger sequencing because there is a high degree of allelic heterogeneity in both PKD1 and PKD2 and their combined coding regions are quite long, amounting to 61 exons to analyze (46 in PKD1 and 15 in PKD2), and there are six PKD1 pseudogenes that share a high degree of homology with most of PKD1 [10,11].
To improve the sequencing shortfalls, long-range polymerase chain reaction (LR-PCR) that can amplify all the exonic regions with several sets of primers was developed. As previously reported, the LR-PCR method required five different PCR conditions to amplify the 46 exons of PKD1. This was simpler than direct sequencing, but the procedure was still complex, and PKD2 was not assessed [12]. Thus, we developed unique primers, and combinations thereof, to amplify all the PKD1 and PKD2 exons simultaneously under similar PCR conditions, thereby simplifying the testing procedure for these genes.
Recently, several next-generation sequencing (NGS) platforms have been approved for in vitro diagnostic devices, indicating that genetic testing using NGS may have an important role to play in clinical testing in the near future. The overall mutation detection rate of PKD gene analysis using older NGS methods was reported to be lower than that using the Sanger method [13]; however, in recent articles, high sensitivity has been reported [14][15][16][17]. In our new method for detecting genetic mutations in PKD1 and PKD2, the latest benchtop NGS machine was used for the analysis of large genes; NGS was considered suitable for PKD genetic testing because of its high-throughput capability.
For the detection and identification of pathogenic mutations in PKD1 and PKD2 in each patient, a novel software package was developed to perform the analysis. This software package integrated seven databases, including the polymorphic variants detected in 140 healthy Japanese volunteers. An efficient and comprehensive genetic testing system based on LR-PCR, an NGS platform, and the software package was evaluated by testing 101 Japanese families with ADPKD to identify the pathogenic genetic mutations in all of the patients.

Participants and Materials
A total of 101 unrelated patients with ADPKD, all age 19 years or older, were recruited at Kyorin University Hospital (N = 82) and Juntendo University Hospital (N = 19) in Japan from financial support in the form of salaries for MK, RH, DK, KK, and KS and did not have any additional role in the study design, data collection and analysis, decision to publish, or manuscript preparation. Similarly, FALCO Biosystems, World Fusion, Omixon, and Samon-cho clinic provided support in the form of salaries for TF, NG, TO, KK, KR, TH, and MT but did not have any additional role in the study design, data collection and analysis, decision to publish, or manuscript preparation. EH, SH, and KN  There are no products in development or marketed products to declare. Otsuka Pharmaceutical has applied for patents of the functional capability of our genetic testing system; however, this does not alter our adherence to PLOS ONE policies on sharing data and materials as detailed online in the guide for authors. We have released all data, sequences of LR-PCR primers, reaction conditions, and the algorithm of analysis software. All relevant data, materials, and algorithm are within our article and its Supporting Information files. Although the algorithm is open source, the analysis software package is not a completely open source because it includes commercial software of OMIXON TARGET, which is nextgeneration-sequencing analysis software. 2014 to 2015. ADPKD was diagnosed by imaging, in accordance with a previous report [18]. In addition, 140 healthy Japanese volunteers were recruited at the Medical Corporation Shinanokai, Samoncho Clinic, Tokyo, Japan. Volunteers were age 35 or older and were confirmed, by ultrasonography, as having no renal cysts. The experimental protocol was reviewed and approved by the following local ethics committees: Independent Ethics Committee of Kyorin University (Approval study ID: Kyorin-PKD-1), Independent Ethics Committee of Juntendo University Hospital (Approval number: 13-151) and Independent Research Ethics Committee of Otsuka Pharmaceutical Co., Ltd. (Approval number: 131007 and 131217), and written informed consent was obtained from all participants; the study was submitted and registered in the ClinicalTrials.gov registry (Identifier: NCT02322385). Genomic DNA from lymphocytes was extracted from 6 mL peripheral blood using the QIAamp DNA Mini Kit (QIAGEN, Venlo, Netherlands) at FALCO Biosystems Ltd in Kyoto, Japan, and was stored at 4˚C until use.

LR-PCR
We designed LR-PCR primers to amplify 18 long DNA fragments, including the exonic regions of PKD1 and PKD2. These fragments were amplified from participants' purified genomic DNA using the primers shown in S1 Table. Multiple LR-PCR products were amplified in each of the following combinations: A (LR-PCR 2 and 12), B (4 and 17), C (6 and 15), D (1, 5, and 9), E (3, 8, and 18), F (7, 13, and 16), and G (10, 11, and 14). The LR-PCR reactions were performed simultaneously on the same PCR plate using the following touchdown PCR regimen [19]: (i) 94˚C for 2 min; (ii) one cycle at 98˚C for 10 s and 74˚C for 5 min; (iii) one cycle at 98˚C for 10 s and 72˚C for 5 min; (iv) one cycle at 98˚C for 10 s and 70˚C for 5 min; (v) 30 cycles at 98˚C for 10 s and 68˚C for 5 min; and (vi) 68˚C for 7 min. The LR-PCR products were purified using the Agencourt AMPure XP kit (Beckman Coulter, Inc, Brea, CA, USA) and quantified as previously described [20].
With the Ion PGM instrument (Thermo Fisher Scientific Inc, Waltham, MA, USA), only weak coverage could be achieved for the PKD1 exon 1 region because it contains a GC-rich sequence. To circumvent this problem, corrective PCR was performed in which PKD1 exon 1 was re-amplified from the combination D LR-PCR product noted above, using the primers shown in S2 Table. The PCR products were purified according to the procedure described above.
Sequencing NGS by Ion PGM. The LR-PCR and corrective PCR products were used to prepare libraries with an Ion Xpress Plus Fragment Library Kit (Thermo Fisher Scientific Inc). In each assay batch, the LR-PCR-derived libraries and corrective PCR-derived libraries from six patients were mixed at a ratio of 7:3 to yield barcoded libraries, each of which had a total nucleotide concentration of 26 pmol/L in the mixture. After treating the samples using an Ion PGM Template OT2 Kit (Thermo Fisher Scientific Inc) according to the manufacturer's protocol [21], emulsion PCR was performed using the Ion OneTouch 2 System (Thermo Fisher Scientific Inc), followed by enrichment of the beads using Ion OneTouch ES (Thermo Fisher Scientific Inc). The enriched emulsion-PCR products were prepared for sequencing using an Ion PGM Sequencing 200 Kit v2 (Thermo Fisher Scientific Inc) and then loaded onto an Ion 318 v2 chip (Thermo Fisher Scientific Inc) and sequenced with an Ion PGM.
NGS by MiSeq. In preparation for sequencing on a MiSeq sequencer (Illumina, Inc, San Diego, CA, USA), the LR-PCR products from seven patients were mixed and subjected to fragmentation processing to obtain fragments measuring approximately 300 bp; adaptor/tag sequences were added using a Nextera XT Kit (Illumina, Inc) according to the manufacturer's instructions [22]. Next, the fragments were processed to amplify clusters in a paired-end flowcell and sequenced on a MiSeq System according to the paired-end method with 150 bp read length, using MiSeq Reagent Kit v2 (Illumina, Inc).
Sanger Sequencing. In preparation for Sanger sequencing, the LR-PCR products were purified using an Agencourt AMPure XP kit (Beckman Coulter, Inc). They were then subjected to cycle sequencing with BigDye Terminator v1.1 Cycle Sequencing Kit (Thermo Fisher Scientific Inc) and purified using BigDye Xterminator Purification Kit (Thermo Fisher Scientific Inc) [23]. The sequencing products were subjected to electrophoresis and sequenced on a 3130xl Genetic Analyzer (Thermo Fisher Scientific Inc). SeqScape Software v2.6 (Thermo Fisher Scientific Inc) was used to detect mutations based on comparisons with reference sequences (PKD1: NG_008617, PKD2: NG_008604) [24].

Mutation Detection System
Unique and specific analysis software was customized with Omixon Target (Omixon Ltd, Budapest, Hungary) to identify pathogenic genetic mutations in either PKD1 or PKD2, exclusively (Fig 1). Omixon Target is a software package for analyzing targeted sequencing data obtained from NGS platforms [25]. The software was set up to use hg19 [26] as the primary human reference sequence, and the target region was set to detect variants in all exons of PKD1 and PKD2 by extending the margins 30 bp beyond the exon-intron boundaries. This software package performed mapping and alignment of read sequences onto reference sequence hg19 using the Omixon-Read-Mapper software (Omixon Ltd) with imported FASTQ files from the sequencing instrument. After mapping, genomic variants were called using a toolkit for genome analysis from GATK (Broad Institute, Cambridge, MA, USA).
The called mutations were annotated using our customized database of variants in 140 healthy Japanese volunteers and databases reconstructed from the Single Nucleotide Polymorphism database (dbSNP) of the National Center for Biotechnology Information (NCBI) [27], the PKD mutation database (PKDB) [28], the predicted effects of SnpEff [29], and mutation variants reported in PubMed. The"position and base pattern match"rule was applied; that is, an annotation was applied if the position, reference, and actual mutation coincided for the variant and the annotation in each database. Other databases of conservation probability and pseudogene annotation were applied if the patterns matched at a position. The conservation probability data was sourced from the University of California, Santa Cruz (UCSC) Vertebrate Conservation Score [30] for the relevant regions of PKD1 and PKD2. This software package had a graphical user interface and ran on Windows.

Mutation Classification
With reference to the classification protocol described by Rossetti et al. [11] and Audrézet et al. [31], the mutations detected in this study were categorized into four classes: definitely pathogenic, highly likely to be pathogenic, likely pathogenic, and likely neutral. Nonsense mutations, frameshift mutations, large rearrangements detected by multiplex ligation-dependent probe amplification (MLPA) analysis, typical splicing mutations previously confirmed as truncating mutations in PubMed literature, and in-frame changes of more than five amino acids were all classified as definitely pathogenic mutations. Missense mutations that had been reported previously in patients with ADPKD, atypical splicing mutations predicted to be splicing defects on the basis of public cloud-based computing program NNSplice [32] detection of nucleotide substitution mutations in the vicinity of exon-intron junctions, and in-frame changes of fewer than six amino acids were classified as highly likely pathogenic mutations.
Novel missense mutations predicted as pathogenic by public cloud-based analyses including Sorting Intolerant from Tolerant (SIFT) [33], PolyPhen-2 [34], Align Grantham Variation Grantham Deviation (A-GVGD) [35] and MutationTaster [36] were classified as likely pathogenic mutations, according to the scoring method of Genkyst in a previous study [31]. Missense mutations with no predicted pathogenicity and silent mutations were classified as likely neutral variants (Fig 1).

MLPA Analysis
Samples that were found to be mutation-negative by NGS were subjected to MLPA analysis [37]. We screened for large rearrangements involving deletions and duplications in PKD1 and PKD2 using multiple PCR reactions for each exon. The SALSA MLPA PKD1 (P351), PKD2 (P352), and TSC2 (P046) kits were purchased from MRC-Holland, Inc (Amsterdam, Netherlands). After denaturation of sample DNA, a mixture of 105 MLPA probes was added to the sample. For each exon sequence of the sample DNA, two adjacent MLPA probes were hybridized and ligated into a single probe with DNA ligase. All ligated probes on each exon were amplified simultaneously using a common PCR primer pair. One PCR primer was labeled with a fluorescent tag and the resulting PCR amplicons were visualized using capillary electrophoretic separation. Deletion of one or more exon sequence in the sample DNA was identified by a decrease in peak height, which reflects amplification.

Specific Amplification of PKD1 and PKD2 by LR-PCR
The LR-PCR products were sufficiently amplified as single bands visualized by 1% agarose gel electrophoresis (S1 Fig). All of the LR-PCR reactions were performed simultaneously using the same conditions and in the same plate.

Construction of a Database of Polymorphisms in Healthy Volunteers
All of the 140 healthy Japanese volunteers were imaged by ultrasonography to confirm absence of cysts. Based on the analysis of all healthy subjects using the variant identification/classification criteria, 241 nucleotide variants were detected in PKD1 and PKD2 (S3 Table). Four nonsynonymous variants (p.R3183Q was detected in three subjects and p.A3879T, p.R1587C, and p. G1395R were detected in one subject) were predicted to be likely pathogenic mutations and the other 237 were predicted to be likely neutral variants, according to the scoring protocol using cloud-based computing. From these results, the specificity of our system was estimated to be 95.7% (134/140) (Fig 2). The 237 nonpathogenic variants were then assembled into a database comprising polymorphic variants of healthy Japanese volunteers. This database was loaded onto the PKD1/2 variant analysis system with other reconstructed databases, including PKDB, dbSNP, Pseudogene.org, UCSC conservation probability, and mutations previously collected from PubMed articles. We detected 663 mutations, including polymorphic variants, in 101 Japanese patients with ADPKD; among these mutations, we identified those predicted to be pathogenic (Fig 3).

Comparison with Sanger Sequencing and Other NGS Methods
Sanger sequencing is regarded as the gold standard for sequencing. To validate our system, Sanger sequencing was performed on DNA from 20 patients randomly selected from those we 140 healthy Japanese volunteers were recruited and they were age 35 or older and were confirmed as having no renal cysts by ultrasonography. Four nonsynonymous variants predicted to be likely pathogenic mutations in six subjects and other 134 subjects were predicted in likely neutral variants by the scoring protocol using cloud-based computing [31]. The specificity of the system was estimated to be 95.7%. identified as having definite pathogenic mutations. Their mutations were confirmed by the Sanger method. In addition, our system could be applied using a MiSeq sequencer (another NGS sequencing method). The pathogenic mutations identified in these 20 patients by Ion PGM sequencing and MiSeq sequencing were identical.

Discussion
Launched in 2011, NGS has become a popular medical tool because of its convenient benchtop method, cost-effectiveness, and suitability for targeted sequencing [40]. Benchtop NGS instruments are now used in routine genetic testing in reference laboratories. They can be used even for large genes such as PKD1 or PKD2 because of their high-throughput analytic capacity. For these reasons, a convenient ADPKD testing system based on NGS was developed for clinical use.  Six pseudogenes that share a high degree of homology with PKD1 could hinder the performance of genetic testing of PKD1. The LR-PCR primers in this study were designed to avoid misamplification of these pseudogenes. This system allowed all of the LR-PCR reactions to run simultaneously under the same conditions on the same instrument. This improvement simplifies the LR-PCR procedure.
Despite extensive efforts to differentiate pathogenic mutations from unclassified variants [11,31], it has been difficult to establish whether a genetic change is a pathogenic mutation or a polymorphism. In this study, we improved the system for distinguishing pathogenic mutations from nonpathogenic variants by following three steps: first, we combined the most upto-date databases of PKD pathogenic mutations worldwide; second, we incorporated the data of 237 normal polymorphic variants from 140 healthy Japanese volunteers into the combined databases; third, we confirmed the results obtained from the combined databases using wellestablished bioinformatic tools to predict pathogenic relevance by cloud-based computing.
Our NGS system in combination with the software and LR-PCR achieved high sensitivity with MLPA analysis-an overall detection rate of 93.1% among 101 Japanese patients with ADPKD. Rossetti et al. also analyzed DNA samples from patients with ADPKD using NGS, in 2012, and reported an overall detection rate of 63% [13]. This low detection rate may be related to the insufficient performance of NGS sequencers at that time and the short read length of 75 bp used in their protocol. In our study, we used a read length of 200 bp with the latest method and sequencer. It is advantageous to use long read lengths for detecting variations such as insertion/deletion mutations. Furthermore, using the same samples, the pathogenic mutation results obtained by Sanger sequencing were found to be consistent with those obtained using our system. When our system was tested using a MiSeq sequencer applied to the same LR-PCR products, the pathogenic mutations identified were the same as those obtained using the Ion PGM. Thus, our system can work effectively with different NGS platforms. Previously it has been reported that the performance of the Ion PGM is lower than that of the MiSeq [41]. However, according to our results, even a mutation that comprised a 76 bp deletion (G4205indel; 12613_12690del77insA) could be detected using the Ion PGM; thus, its performance can be considered on par with standard sequencing methods.
Recent improvements in technology have led to NGS studies with favorable sensitivities of >90% [15][16][17]. However, the sensitivity of 99.2% reported by Tan et al [15] was the matching rate between the variants obtainedfrom 25 patients detected by their NGS and those detected previously by Sanger method. Further, they calculated the specificity in a manner similar to that of the reference alleles. Eisenberger et al [16] also evaluated the specificity that was the matching rate of the polymorphic variants from 55 patients between NGS and Sanger method. In contrast, our study evaluated the clinical sensitivity of the pathogenic mutations using 101 patients and we originally evaluated the clinical specificity using 140 volunteers who were confirmed as being healthy.
In a comprehensive analysis of Japanese patients with ADPKD, Kurashige et al. detected genetic mutations in PKD2 at a rate of 23.6% based on the Sanger sequencing method [42]. They reported that the frequency of mutations in PKD2 was significantly higher than in European patients with ADPKD. However, we identified genetic mutations in PKD2 at a rate of 12.6% in Japanese patients with ADPKD recruited at two independent hospitals. The different population ratio might be attributable to population-selection bias. Renal function survival in PKD1 mutation carriers is 15-20 years shorter than in PKD2 mutation carriers [7,43]; therefore, patient selection at clinics where there are many patients with ESRD might lead to a higher proportion of PKD1 mutation carriers and conversely a lower proportion of PKD2 mutation carriers.
The identical mutations found in two or more unrelated patients were as follows: p. G233fs25X, p.R380C, p.L727P, p.R1672fs98X, p.L3287del, p.R3753W, and p.E4148D in the PKD1 gene and p.R322Q in the PKD2 gene. The genealogy of all the ADPKD patients was investigated and no connections between their pedigrees were identified, confirming them to be genetically unrelated. Thus, it is likely that these identical mutations arose as independent recurrent mutations, though we cannot rule out the possibility that they were derived from founder mutations originating from common remote ancestors. The p.R1672fs98X (c.5014-5015delAG) mutation in PKD1 was the most frequent recurrent mutation-observed in four unrelated patients. This mutation was reported to be a typical recurrent mutation arising through the mechanism of non-homologous end joining [31].
The detection rate of large rearrangements is reported to be about 4% in patients with ADPKD [44]. Previously, it was recommended that MLPA analysis be performed first, in PKD gene mutation analysis, because the Sanger sequencing method was both complicated and expensive. However, our results indicate that this recommendation should be updated. Because the NGS system developed in this study is cost-effective and has high-throughput performance, the pair-end mapping feature in the genotyping protocol makes it possible to detect such large deletions located within the LR-PCR amplicon [16], MLPA analysis would be recommended only for those patients who emerge as being mutation-negative following sequencing with this NGS system.
Among the four nonsynonymous variants found in our sample of 140 healthy individuals (and initially classified as likely pathogenic mutations), variant p.R3183Q was found in three people. Thus, it is likely that this variant is a normal polymorphic variant in the population; it has been filed in dbSNP as rs79648977 and in PKDB as "indeterminate." On the other hand, variant p.G1395R has been filed as "likely pathogenic mutation" in PKDB, and the other two variants have no annotations. These variants were detected in healthy individuals (one person each) who had no renal cysts evident by ultrasonography. Because these volunteers were 35 years of age or older, the possibility of late-onset ADPKD emerging cannot be formally ruled out at this time. Ongoing follow-up of these individuals could provide further important information that will be helpful for future genetic analysis and understanding of ADPKD.
In conclusion, we developed an improved and efficient genetic testing system for ADPKD based on NGS, incorporating the easy-to-use LR-PCR approach to avoid amplification of pseudogenes and a newly developed software system to effectively identify pathogenic mutations. For Japanese patients with ADPKD, the sensitivity of this new system was 93.1%, specificity was 95.7%, and PPV and NPV were 94.0% and 95.0%, respectively. We anticipate that our system will facilitate genetic testing of ADPKD.