Whole Exome Sequencing Identifies Recessive PKHD1 Mutations in a Chinese Twin Family with Caroli Disease

Background Mutations in PKHD1 cause autosomal recessive Caroli disease, which is a rare congenital disorder involving cystic dilatation of the intrahepatic bile ducts. However, the mutational spectrum of PKHD1 and the phenotype-genotype correlations have not yet been fully established. Methods Whole exome sequencing (WES) was performed on one twin sample with Caroli disease from a Chinese family from Shandong province. Routine Sanger sequencing was used to validate the WES and to carry out segregation studies. We also described the PKHD1 mutation associated with the genotype-phenotype of this twin. Results A combination of WES and Sanger sequencing revealed the genetic defect to be a novel compound heterozygous genotype in PKHD1, including the missense mutation c.2507 T>C, predicted to cause a valine to alanine substitution at codon 836 (c.2507T>C, p.Val836Ala), and the nonsense mutation c.2341C>T, which is predicted to result in an arginine to stop codon at codon 781 (c.2341C>T, p.Arg781*). This compound heterozygous genotype co-segregates with the Caroli disease-affected pedigree members, but is absent in 200 normal chromosomes. Conclusions Our findings indicate exome sequencing can be useful in the diagnosis of Caroli disease patients and associate a compound heterozygous genotype in PKHD1 with Caroli disease, which further increases our understanding of the mutation spectrum of PKHD1 in association with Caroli disease.


Introduction
Caroli disease is a rare and complex autosomal congenital disorder that presents as cystic dilatation of the intrahepatic bile ducts [1]. It most commonly manifests as jaundice, cirrhosis, and dilatation of renal tubules, as well as renal impairment in children with associated multicystic or polycystic kidney disease in the second to third decades of life [2,3]. It can also lead to persistent recurrent cholangitis caused by cholestasis, and, if left untreated, patients will eventually develop biliary cirrhosis, portal hypertension [4], and sometimes bile duct carcinoma [5]. The main mode of Caroli disease inheritance is an autosomal recessive form [6]. Since the causative mutation for autosomal recessive polycystic kidney disease (ARPKD), a major cause of renal and liver-related morbidity and mortality in neonates and infants [7], was identified, several cases of ARPKD with liver manifestations, including Caroli disease, have been reported. Mutations in polycystic kidney and hepatic disease gene 1 (PKHD1) are responsible for Caroli disease [8], and many causative mutations are known [9,10,11].
PKHD1 is located on chromosome 6p12.3-6p12.2 and contains a 16.2 kb coding sequence divided into 66 exons, separated by introns varying in size up to 472 kb [12,13]. It encodes fibrocystin/polyductin (FPC), a type of membrane-associated receptor-like protein [14,15] that is predominantly expressed in the apical domain of renal tubule epithelial cells, and may play an important role in collecting duct and biliary differentiation [16]. Recently, advances in next generation sequencing technologies have enabled whole exome sequencing (WES) to become a technically feasible and powerful tool for identifying pathogenic mutations in various Mendelian disorders [17], including rare diseases [18,19]. This is especially adaptable for the detection of PKHD1 (including 66 exon) mutations in families with Caroli disease.
To date, the function of PKHD1 is still unclear, and we know little about the relationship between the PKHD1 genotype and clinical phenotype in Caroli disease. In the present study, therefore, we investigated a Chinese twin family with Caroli disease to detect PKHD1 mutations using WES, and to evaluate the clinical phenotype correlation associated with these mutations.

Subjects
The subjects were from a dizygotic male twin family in Shandong province, China. The proband (Patient A, born as the second twin), a 10-year-old boy, the first twin (Patient B), and their parents underwent detailed clinical and ultrasonographical examinations. Both twins were clinically diagnosed with Caroli disease.
The study protocol was approved by the Human Ethics Committee of the Affiliated Hospital of Medical College, Qingdao University (Shandong, China) and is compliant with the Code of Ethics of the World Medical Association and informed consent was obtained. The parents of the subjects in this manuscript have given written informed consent (as outlined in PLOS consent form) to publish these case details. Blood samples were collected from the twins and their parents. DNA was extracted with a standard phenol-chloroform extraction procedure, consisting of the lysis of white blood cells, followed by protein digestion, extraction of the DNA with phenol-chloroform, and precipitation of DNA with isopropanol.

Whole exome sequencing
WES was carried out on the proband using human exome capture, which was performed according to the protocol from Illumina's TruSeq Exome Enrichment Guide (SureSelectXT Target Enrichment System for Illumina Paired-End Sequencing Library, Agilent). The Agilent Human All Exon 50 Mb Exome Enrichment kit was used as exome enrichment probe sets. Genomic DNA libraries were prepared according to the manufacturer's instructions (Illumina, San Diego, CA). Briefly, 5 mg of genomic DNA in 80 ml of EB buffer was fragmented in a Bioruptor (Diagenode) to 100-500 bp fragments. DNA fragments between 150-250 bp were recovered by gel extraction, then end repair and size selection procedure were performed by T4 DNA poly and Klenow poly cleave 39. An 'A' base was added to the 39 end using Klenow 39 to 59 exo minus, then DNA fragments were ligated to the Illumina multi-PE-adaptor. PCR amplification using 12 cycles was subsequently carried out of the DNA product by mixing it with 1 ml of Illumina multi-PE primer #1 (25 mM), 1 ml of Illumina multi-PE primer #2 (0.5 mM), and 1 ml of Illumina index primer (25 mM).
Captured DNA libraries were sequenced with Illumina HiSeq 2000, which yielded 200 (26100) bp from the final library fragments using V2 reagent. Base calling was performed by 1.8 software (Illumina; data after 22 nd June, 2011). The sequence reads obtained were aligned to the human genome reference sequence (NCBI36/hg18), and variations were identified using the software tool supplied with the instrument. Finally we got 62.09 M high quality reads, and 44.85 M were mapped to the reference genome, the mean depth of the target region was 114.836. Targeted bases with at least 506 was 75.81%, 20682.23%, 10689.04%, 4693.56%, 1696.09%. Based on these general statistics, we performed further analysis. All identified PKHD1 variations were annotated with information to identify candidate mutations displaying the depth of coverage, conservation across species, percentage of reads with the variant, novelty, potential splice site alteration, and likelihood that a variation is deleterious to the protein. This information was extracted from reference data sets or computed in bulk for all variations.

Mutation analysis and confirmation
Variants of PKHD1 identified by exome sequencing were confirmed using Sanger sequencing. Two fragments covering the coding sequence and the flanking intronic sequence of PKHD1 (MIM# 606702, GenBank NM_138694.3) were amplified using PKHD1 primer pairs for exon 23 (Forward: 59-CTCCCTTACT-GAGTTTCC-39 and Reverse: 59-AACAATAAGTCCCTTTCC-39) and exon 24 (Forward: 59-GATGAAACTCTGTAAGGTG-GAT-39 and Reverse: 59-GGAAGGGAGATGTTGGGT-39). Identical amplification conditions were used for both primer pairs in a total volume of 25 ml containing 250 nM dNTPs, 100 ng of template DNA, 0.5 mM of each primer, and 1.25 U AmpliTaq Gold DNA polymerase in 16 reaction buffer (10 mM Tris HCl, pH 8.3, 50 mM KCl, 2.5 mM MgCl 2 ). PCR amplifications were performed with an initial denaturing step at 94uC for 5 min, then 35 cycles of: 94uC for 30 s, 59uC or 63uC (for exons 23 and 24, respectively) for 60 s, 72uC for 30 s, followed by 10 min of final extension at 72uC. Amplified PCR products were purified and sequenced using the appropriate PCR primers and the BigDye Terminator Cycle Sequencing kit (Applied Biosystems, Foster City, CA) and run on an automated sequencer, ABI 3730XL (Applied Biosystems) to perform mutational analysis.
Denaturing high-performance liquid chromatography (DHPLC) screening of the PKHD1 mutation

Clinical phenotype
The proband (Patient A) was healthy until the age of 1 year, when he was found to have asymptomatic splenomegaly. At 5 years of age he presented with anorexia and an upper abdominal mass and was diagnosed with Caroli disease. After 2 years, mild jaundice was apparent on the systemic skin, and the abdominal mass had increased, causing upper abdominal pain. This was accompanied by liver cirrhosis, hypersplenism, severe anemia, and a polycystic kidney. The first twin (Patient B) had no obvious clinical symptoms except for intrahepatic bile duct dilatation. Both parents were negative for the presence of liver and renal anomalies as shown by an ultrasonography, were non-consanguineous, and had no family history of genetic diseases.

Mutation analysis
As PKHD1 is a long gene, composed of 66 exons, we performed WES and applied several filtering steps to exclude nongenetic variants by filtering the database of dbSNP and 1000 genomes to select for nucleotide changes predicted to have a damaging effect on the PKHD1 protein by SIFT (Sorting intolerant from tolerant) and PolyPhen-2. The depth of coverage for c.2341 C.T and c.2507 T.C mutations in exons 23 and 24 of PKHD1 are 776and 1056, which suggest high reliability of sequencing. Sanger sequencing confirmation of the proband revealed a compound heterozygous genotype, based on a novel missense variant, c.2507 T.C (p.Val836Ala, SIFT score 0.02, PolyPhen-2 score 0.998), predicted to cause a valine to alanine substitution at codon 836 in exon 24 (accession no. NM_138694.3; first nucleotide of the initiation codon numbered 1), and a known nonsense mutation, c.2341C.T (p.Arg781*), predicted to change an arginine to a stop codon at codon 781 in exon 23 (Figures 1, 2). Mutation p.Arg781* has previously been described in Caucasian-American patients and those from the Netherlands, France, Denmark, Germany, Portugal, and Belgium.
Co-segregation analysis of this pedigree revealed that the first twin also carries the same compound heterozygous genotype, and that both parents are carriers of a single heterozygous mutation. The father harbors the p.Val836Ala mutant, while the mother carries the p.Arg781* variant. These two different missense variants were individually inherited from both parents, resulting in the compound heterozygous genotype co-segregating with the Caroli disease-affected pedigree in both twins.
Comparing the WES findings with the different phenotypes of the twins, we determined whether any variants in genes related to Caroli disease, hepatic function, or kidney function could act as a genetic modifier for Caroli disease (such as NPHP3, PKD1 and so on). However, no positive results were identified.

DHPLC screening of the PKHD1 mutation
Analysis of 200 normal chromosomes from 100 healthy controls of Chinese Han origin by DHPLC found no evidence of the novel missense variant, c.2507T.C (p.Val836Ala), or the p.Arg781* variant (Figure 3). This suggests that the compound heterozygous genotype observed in this family is causative of the Caroli disease phenotype.

Bioinformatic analysis of PKHD1 mutation
We obtained PKHD1 family protein sequences from NCBI and UCSC websites and used Vector NTI software to perform multiple-sequence alignments in various animal species, including Mus musculus, Rattus norvegicus, Pan paniscus, Xenopus (Silurana) tropicalis, Falco cherrug, Zonotrichia albicollis and Homo sapiens (Figure 4). The p.Val836Ala variant was found to be located in a highly conserved region of the PKHD1 protein.

Discussion
Caroli disease, which has an estimated incidence of approximately 1 in 100,000 newborns, is a complex disorder of the intrahepatic bile ducts presenting with multiple saccular segmental and cystic dilatations [20]. When progressive, it can cause recurrent cholangitis, jaundice, the accumulation of intrahepatic stones, portal hypertension, liver failure, and even cholangiocarcinoma [4,21]. The pathogenesis of Caroli disease appears to be related to dilatations and malformation of a ductal plate, which are either diffuse or confined to only one part of the liver [22]. It can be divided into the pure form of Caroli disease and Caroli's syndrome, which presents with repeated bouts of cholangitis resulting from bile stasis, hepatolithiasis, gall bladder stones, and symptoms associated with hepatic fibrosis such as portal hypertension and poor hepatic reserve. The disease spectrum of clinical phenotypes caused by mutations in PKHD1 is relatively complex, ranging from perinatally-fatal ARPKD to Congenital Hepatic Fibrosis (CHF)-predominant presentations in adulthood with mild or no apparent kidney disease [23].
In 2002, Ward and colleagues first screened PKHD1 mutations in 14 probands with ARPKD (some cases of ARPKD with mainly liver manifestations, including Caroli disease diagnosed in adulthood), and revealed that eight of the affected individuals were compound heterozygotes [16]. Since then, several large-scale mutation detection studies have focused on the longest PKHD1 ORF (open reading frame) [24,25,26]. Mera et al. used direct sequencing to detect PKHD1 mutations in a cohort of 90 North American ARPKD/CHF patients, and identified 77 PKHD1 sequence variants, which supported previously published genotype-phenotype correlation findings [23]. Sandro et al. used DHPLC to report a compound heterozygous genotype (c.10364delC/p.Ile 3468Val) of PKHD1 in a 36-year-old female with Caroli disease [27]. To date, at least 300 different pathogenic mutations have been found throughout most of the coding exons of the human PKHD1 gene (http://www.humgen.rwth-aachen. de/). Approximately 60% of these are truncating mutations and 40% are missense mutations, suggesting that the recessive form of this disease results from loss of function of the normal protein in different degree [28].
Although severely affected ARPKD and CHF patients account for most known PKHD1 mutations, and patients with Caroli disease have a low rate of PKHD1 mutation detection [27], the genotype-phenotype associated with PKHD1 mutations is relatively complex. ARPKD patients carrying two truncating mutations have a severe disease phenotype resulting in perinatal death, while other combinations of mutations, such as splicing and missense mutations, have a more variable but usually less severe phenotype [29]. Meral et al. analyzed the clinical, molecular, PKHD1 mutation, and imaging data of 73 patients with ARPKD and CHF (including 51 with Caroli syndrome). Although biallelic PKHD1 mutations were identified in only 43 families and one  heterozygous mutation in 20 families, the authors concluded that kidney and liver disease are independent, and that variability in severity does not reflect the type of PKHD1 mutation [11].
In this work, we identified a causative compound heterozygous PKHD1 mutation in a Chinese twin family with Caroli disease. Because of the large size of PKHD1, DHPLC or single-strand conformation polymorphism (SSCP) screening techniques have been used in all but one of the published studies; variants detected by screening were further characterized by targeted direct sequencing. However, DHPLC and SSCP have their own shortcomings, so, more recently, next generation sequencing technologies have been employed to rapidly accelerate the discovery of the genetic causes of human diseases. WES is a powerful tool for investigating the genetic underpinnings of human disease [30]. As Mendelian pathogenic mutations are frequently exonic, exome sequencing is an efficient method to simultaneously examine many coding regions, and has rapidly proven to be an important tool in genetic research [31]. It is especially adapted to large genes such as PKHD1, as it not only overcomes the timeconsuming and laborious nature of traditional PCR but also has relatively lower costs. Therefore, we performed WES to screen PKHD1 mutations.
We found a novel compound heterozygous genotype of PKHD1 (p.Val836Ala/p.Arg781*) in twin of a Chinese family with Caroli disease. Their father harbors the p.Val836Ala mutant, while their mother carries the p.Arg781* variant. The p.Val836Ala mutation results in an amino acid change from valine to alanine, but as both amino acids are neutral the potential impairment on protein function is unclear; however, the amino acid at position 836 is highly conserved between species. The p.Arg781* variation leads to a truncated PKHD1 protein, which lacks the succeeding seven IPT domains and two G8 domains, thus losing the functionality of the wild-type protein. Zerres et al. firstly reported p.Arg781* in ARPKD [7], showing that the same mutation can cause different clinical phenotypes.
As seen in the present family, the proband's clinical phenotype of Caroli syndrome included liver cirrhosis, hypersplenism, severe anemia, and a polycystic kidney, while the first twin suffered from pure Caroli disease. Therefore, the genotype-phenotype associated with PKHD1 mutations is complex. Sgro et al. [9] reported the compound heterozygous genotype PKHD1 IVS55+1GRA/p.Trp 2690Arg in patients detected prenatally with Caroli disease; IVS55+1GRA was inherited from the father and the missense mutation p.Trp2690Arg from the mother, which is consistent with a recessive pattern of inheritance. Our findings also lay the basis for a more accurate and rapid prenatal diagnosis of Caroli disease in its early stages.
In conclusion, we combined WES with Sanger sequencing to report a novel compound heterozygous genotype of PKHD1 causative of Caroli disease in a Chinese twin family. The characteristic disease phenotype shows obvious differences between the twins. Our study enlarged the genotype-phenotype correlations of PKHD1, which might be useful for understanding the pathophysiological mechanisms of Caroli disease.