Genomic Characteristics of Genetic Creutzfeldt-Jakob Disease Patients with V180I Mutation and Associations with Other Neurodegenerative Disorders

Inherited prion diseases (IPDs), including genetic Creutzfeldt-Jakob disease (gCJD), account for 10–15% of cases of prion diseases and are associated with several pathogenic mutations, including P102L, V180I, and E200K, in the prion protein gene (PRNP). The valine to isoleucine substitution at codon 180 (V180I) of PRNP is the most common pathogenic mutation causing gCJD in East Asian patients. In this study, we conducted follow-up analyses to identify candidate factors and their associations with disease onset. Whole-genome sequencing (WGS) data of five gCJD patients with V180I mutation and 145 healthy individuals were used to identify genomic differences. A total of 18,648,850 candidate variants were observed in only the patient group, 29 of them were validated as variants. Four of these validated variants were nonsense mutations, six were observed in genes directly or indirectly related to neurodegenerative disorders (NDs), such as LPA, LRRK2, and FGF20. More than half of validated variants were categorized in Gene Ontology (GO) terms of binding and/or catalytic activity. Moreover, we found differential genome variants in gCJD patients with V180I mutation, including one uniquely surviving 10 years after diagnosis of the disease. Elucidation of the relationships between gCJD and Alzheimer’s disease or Parkinson’s disease at the genomic level will facilitate further advances in our understanding of the specific mechanisms mediating the pathogenesis of NDs and gold standard therapies for NDs.


Introduction
All five gCJD patients with V180I mutation were positive or weakly positive for 14-3-3 protein in the cerebral spinal fluid (CSF) ( Table 2). In three of the patients, total tau (t-tau) protein titers were greater than the recommended cut-off level (1,000 pg/mL) [38]. The phospho-tau/ total tau ratio (p/t tau ratio) have been recommended as second screening tau test, and recommended p/t tau ratio in ref. [38] were below than 4 × 10 −2 . The p/t tau ratio in the CSF of the four patients ranged from 2.209 × 10 −2 to 10.093 × 10 −2 . Therefore, according to recommended tau protein detection criteria, patients no. 1 and 3 were within the range established for CJD.
Because the CSF of patient no. 4 was not stored, the RT-QuIC assay with substrate replacement was performed using CSF samples from only four patients, and the sensitivities were 75% (Fig 1). Abbreviation: RT-QuIC, Real-time quaking induced conversion. 14-3-3 protein, tau protein detection and RT-QuIC with substrate replacement were performed using CSF. Scores within the range of CJD are displayed in bold.

Whole genome data analysis
Raw data for whole genome information are shown in S1 and S2 Tables. A total 72-114 Gbp with Q30 of sequences was generated. The average read length was 126 bp, and about 51-92 Gbp were aligned with mapping quality of Q20 to the reference genome (hg19). The genomic variants in the five gCJD patients with V180I mutation were compared with 134 of 135 healthy individuals (see materials and methods, S3 Table and S1 File), and 125 of 18,648,850 variants were selected for validation (Fig 2). Of these 125 candidates, 76 were Results of RT-QuIC analysis. ThT fluorescence was measured by relative fluorescence units (rfu) with saturation occurring at 65,000 for every 42 min. The RT-QuIC responses were performed using 10 −3 dilution of sCJD patient with 129MM type brain homogenate (NHBX0/0001), 10 −5 CJD negative control brain homogenate (NHBZ0/0005), 15ul of 10 −1 dilution of CSF from 4 gCJD cases with V180I (the Patients nos. 1, 2, 3 and 5) and 1 sCJD case with 129MM. Each point represents the mean of 4 replicate rfu readings.  observed in all five patients, one was observed in four patients, two were observed in two patients, and 46 were observed in only one patient (S4 Table). From these candidates, to eliminate false positives, we performed validation of the 125 candidate variants using Fluidigm SNPtype Assays and Sanger sequencing Validation of the 125 candidate variants was performed using Fluidigm SNPtypeAssays and Sanger sequencing. Of the 125 candidates, 29 were confirmed as variants, 80 were false positive, five were validated as variants but also observed in healthy individuals, and the remaining 11 failed the validation because of low gDNA quantity from patient no. 5 (Tables 3 and 4 and S4  Table).
Our analysis showed that there were 29 validated variants present in the patients in this study, with one patient harboring as many as 10 variants (patient no. 1). Each validated variant was observed in a single patient. None of the 29 variants were observed in patient no. 3. Variants with known functions with biological terms extracted from gene ontology (GO) analysis of genes harboring each variant are listed in Table 3. Four nonsense candidates were validated as variants, two of which were observed in patient no. 5, and the remaining nonsense variants were observed in patients no. 1 and 2. The other validated variants were missense variants.

Genes harboring validated variants and analysis of biological interactions
Understanding the biological interactions among genes and diseases facilitates the identification of disease mechanisms and target proteins for new drugs. Based on such approaches, we can conveniently sort candidate genes that should be preferentially analyzed before applying other approaches using reverse genetics tools, such as methods for target gene disruption and gene silencing. Thus, genes harboring the 29 validated variants were used as seeds to analyze the biological interaction between genes and NDs, such as PrDs, AD, and PD.
The 29 genes were not found to be directly related to PrDs; however, interactions with other NDs, such as AD and PD, were detected. In addition, indirect relationships with PrDs were observed. Thus, using results from other published studies, candidate genes related to disease onset or susceptibility for CJD with the V180I mutation could be inferred.

Discussion
Characteristics of the patients 14-3-3 protein has been used as a biomarker for CJD, and all five patients in this study were positive for 14-3-3 protein in the CSF. However, positivity for 14-3-3 protein is not always observed in gCJD patients with V180I mutation, as reported previously [37]. Alternatively, to distinguish patients with CJD from those with other diseases having elevated t-tau protein in the CSF, studies have been performed to determine the applicability of the p/t tau protein ratio [38]. In cases of CJD, the p/t tau ratio in the CSF has been shown to be significantly lower than that in patients with other diseases.
Although the diagnostic specificity (100%) of the standard RT-QuIC technique has been proven, the sensitivity of the technique in cases with the V180I mutation has been found to be somewhat lower than that in other cases of CJD [34]. In this study, the sensitivity was 75% by RT-QuIC with substrate replacement which has been used to improve the sensitivity of analysis while maintaining the high degree of specificity of standard RT-QuIC. The diagnostic criteria for familial CJD (fCJD or gCJD) include definite or probable CJD plus definite or probable CJD in a first-degree relative and/or a neuropsychiatric disorder plus disease-specific PRNP gene mutation (http://www.cdc.gov/prions/cjd/diagnostic-criteria. html). Based on these diagnostic criteria, all five of the patients with gCJD in this study could be accurately diagnosed with gCJD because of their neurological symptoms and the V180I mutation, which were further supported by biochemical analyses (Tables 1 and 2).
All guardians of the five patients informed us that they did not have family histories of CJD. Although gCJD with V180I is a genetic form, it is commonly observed in families of gCJD patients with V180I without neurodegenerative symptoms. In a previous study [34], 11 of 186 gCJD patients with V180I confirmed having a family history of the disease. Three of 11 patients had family histories of CJD, and the remaining patients had family histories of dementia. Although gCJD with V180I is not frequently reported in family histories, the possibility that V180I is related to the pathogenicity of the disorder cannot be excluded because the symptoms of gCJD with the V180I mutation have been shown to be unique compared with other types of CJD and with AD [34,36,37]. These distinct symptoms led us to postulate that this unique pattern of symptoms and incomplete penetrance were caused by the influence of unknown genes (loci). Thus, we analyzed genomic differences between gCJD patients with V180I mutation and healthy individuals.

Whole genome data analysis
Interestingly, in this study, each of the 29 validated variants was observed only in one patient. Therefore, we concluded that it was difficult to identify gene variants associated with the disease mechanism of CJD and found in more than two patients. Accordingly, we focused on the 29 variants observed in the single patient and analyzed the biological interactions among the    gene harboring these variants by GO analysis [39]. Eleven genes were categorized in the GO term of binding (GO:0005488), and 10 were categorized in the term of catalytic activity (GO:0003824). All genes harboring nonsense mutations (GBP4, POLR1C, SMC5, and TWF1) were classified into at least one of these two GO terms. Although GO analysis provided specified categories of the core biological functions, these enrichment data could not be used to easily predict disease relationships among genes harboring validated variants. Four newly identified nonsense variants were detected. Because these variants were observed here for the first time, no functional data for the nonsense variants were available. However, it may be possible to use reported nonsense variants in the same gene to infer the roles of the newly identified nonsense variants. As an example, SMC5, which has a role in promoting chromatid homologous recombination, harbored the variant chr9_72912888 (G to T) in patient no. 5. The SMC5 gene encodes a protein of 1,101 amino acids, and the variant above results in the E354X mutation, encoding a stop codon. Although this variant was reported here for the first time, similar nonsense mutations in tumor samples allow us to infer the phenotype [40]. Additionally, GBP4, an interferon signaling-related protein, harbored the newly identified variant chr1_89657064 (C to A) in the same patient, resulting in the E266X mutation. Sixteen stop gain mutations have been reported, as listed in the Ensembl database (GRCh37, release  Changes in other nucleotides causing nonsense mutations in these four genes have not been reported to have any significant relationships with NDs. The four newly identified variants causing nonsense mutations must be analyzed further to determine their relationships with prion diseases and other NDs using reverse genetic tools, such as gene knockout or gene interference technology.

Genes harboring validated variants and analysis of biological interactions
LPA has been shown to be associated with AD (Figs 3 and 4 and S1A Fig), and we observed a newly identified LPA variant (chr6_161015089, R1177Q) in patient no. 4 [41,42]. Although this mutation is not a form related to the nonsense phenotype, PolyPhen-2 and Sorting Intolerant From Tolerant (SIFT) scores suggested that the mutation caused an abnormal functional change in LPA in patient no. 4 ( Table 4).
The single-nucleotide polymorphism (SNP) rs189305274, which was observed in patient no. 4, is a variant in ACO1. The abnormal functional change in this target resulting from rs189305274 was predicted to be damaging by PolyPhen-2 and SIFT scores (Table 4). ACO1 was not shown to be directly associated with NDs. However, indirect interactions with NDs, including PrDs, were detected. ACO1 has been reported to inhibit or regulate amyloid beta (A4) precursor protein (APP) [43,44], which is involved in the induction of AD and acceleration of PrDs [27,45,46]. Thus, mutations that cause functional changes in ACO1 may delay PrDs progression by regulating APP. These results may explain why patient no. 4 is still alive and suggest that factors inhibiting ACO1 expression may be promising candidates for repression of CJD.
The S135C mutation (rs200152641) in FGF20 was observed in patient no. 2. FGF20, which was found to interact directly with PD, has been shown to be a candidate therapeutic protein for PD (Fig 3). An indirect interaction was observed between FGF20 and PrDs (Fig 4 and S1C  Fig), and mitogen-activated protein kinase (MAPK) 1 and MAPK3 have been reported to inhibit PrP expression, supporting their potential application in therapeutic strategies against PrDs. Because FGF20 is known to induce MAPK1 and MAPK3, it may also have applications in the treatment of PrDs. Significant functional changes in FGF20 protein were predicted by PolyPhen-2 and SIFT (Table 4); such changes may influence MAPK1 and MAPK3 regulation, representing a potential therapeutic strategy for the management of PrDs. Thus, this variant may has influenced the rapid progression of patient no. 2. POSTN and LRRK2 are also involved in the upregulation and activation of MAPK1 and/or MAPK3, respectively (S1B and S1E Fig).
TET1 is associated with late-onset AD (LOAD), and the expression of TET1 is increased in the hippocampus of patients with preclinical AD and LOAD [47], and the exon region of TET1 (also called CXXC finger 6 [CXXC6]) has been shown to be related to LOAD [48]. However, although TET1 has been shown to be directly related to AD, we did not detect a direct or indirect relationship with PrDs.
The E578V mutation in POSTN was observed in patient no. 1. POSTN has been reported to promote upregulation of NOTCH-1 [49,50]. Moreover, NOTCH-1 activation is associated with dendritic atrophy in PrDs, and NOTCH-1 regulation may be controlled by prion infection [51].
LRRK2 has been reported to be related to AD. The SNP rs34410987, which causes a proline-to-leucine mutation (P755L) in LRRK2, as observed in this study, was first reported in nine Chinese patients with PD in 2006 and is an important mutation in the diagnosis of PD [52]. However, it is unclear whether this mutation is a causative agent for PD because it has also been observed in healthy individuals, and no significant associations with PD has been reported [53][54][55][56]. Despite these previous findings, we could not exclude the possibility that P755L in LRRK2 may be associated with PrDs. Further studies are needed to analyze the associations between diseases and gene variations. For example, in gastric cancer [57], genotype associations have been analyzed among various types of gastric cancer according to the Laurens' classification system. Additionally, to determine whether P755L may be a causative mutation in PrDs, further analyses are needed. The occurrence of both V180I in PRNP and P755L in LRRK2 may represent a causative haplotype for gCJD. However, the number of patient samples was insufficient to confirm this possibility, and additional studies will be required.

Conclusions
Our results supported the possibility that PrDs may be associated with other NDs through the different gene variants found in five gCJD patients with V180I mutation, including a patient who survived more than 10 years after disease diagnosis. Six of the validated variants were located within genes (LPA, LRRK2, TET1, FGF20, ACO1, and POSTN) that have been shown to be associated with NDs such as PrDs, AD, and PD. Although these gene mutations having significant associations with NDs were observed only in patients with gCJD, more in-depth follow-up studies of the families of these patients are required to definitively conclude that these variants are involved in the pathogenesis of gCJD with V180I. Four variants causing nonsense mutations were newly identified. Further studies are needed to determine the relationships between these mutations and disease development because of the lack of significant phenotype data on the genes harboring these variants (e.g., SMC5, GBP4, POLR1C, and TWF1). Despite the lack of data, we cannot exclude the possibility that these genes may contribute to the overall risk of the disease. In order to confirm these data, additional analyses using reverse genetic methods, such as gene knockout or gene interference technologies, are needed.

Materials and Methods
This study was approved by the Institutional Review Board of the Korea Centers for Disease Control and Prevention (IRB No. 2014-06EXP-04-R-A). Written informed consent was obtained from the patients or their legal guardians.

Subjects
All five gCJD patients with V180I mutation described in this study were South Korean natives. Detail information regarding these patients is given in Tables 1 and 2. No patients reported having a family history of CJD. In addition, we also selected the whole genome sequences of 135 healthy Koreans stored at the Division of Center for Genome Science in KNIH as controls for variation filtering. However, the mutation V180I was observed in one individual. Although this individual did not have gCJD, the genomic sequence of this individual was excluded.
Because DNA samples of these controls were not available for further analysis, we also collected DNA samples from 10 healthy individual volunteers (ages 20-40 years) for variant validation. Thus, a total of 145 DNA samples from healthy Koreans were used in this study.

DNA extraction
Genomic DNA was extracted from the patients and 10 control individuals described above. Total DNA was isolated from whole blood samples using a QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany), according to the manufacturer's instructions. The extracted DNA was quantified using a Quant-iT PicoGreen dsDNA kit (Invitrogen, Carlsbad, CA, USA).

Biochemical analysis of the CSF of patients with gCJD
Detection of 14-3-3 and tau protein. CSF samples from all five patients were analyzed by western blot analysis for 14-3-3 protein expression. Enzyme-linked immunosorbent assays (ELISA) were used to analyze t-tau and p-tau proteins. These biochemical analyses were performed as previously described [58].
Whole genome sequencing. Whole genome sequencing of the five gCJD patients with V180I (cases 1-5) was performed using the Solexa sequencing technology platform (HiSeq2500; Illumina, San Diego, CA, USA). Briefly, 1-3 μg of gDNA was fragmented by sonication. Library span size was controlled using a Covaris System to generate~500-bp inserts. The sonicated DNA was end-repaired using T4 DNA polymerase and Klenow polymerase. Tailing was performed to create sticky ends, and Illumina paired-end adaptor oligonucleotides were ligated to the sticky ends. Adaptor-ligated oligonucleotides were subjected to polymerase chain reaction (PCR) enrichment. Ligated DNA was then size-selected for lengths of about 500 bp, including adaptors of about 180 bp. Cluster generation was performed using amplified library DNA in a flowcell to produce clusters. Purified fragments were loaded onto the Illumina flow cell for sequencing on the Illumina Hiseq2500 instrument.
Raw data are described in S1 and S2 Tables. Reads were aligned to the Human Reference Genome (hg19) using Burrows-Wheeler Aligner (BWA) v0.6.1. Multisample SNP calling was performed using GATK and Picard tools with default parameters.
The genome sequences of the 134 healthy individuals were called and annotated in a manner similar to that of the patients. Next, variant call format (VCF) files were merged using VCF tools v4.1 (https://vcftools.githubio.io/). Differences in variant counts between groups were calculated and analyzed using PLINK v1.9 (pngu.mgh.harvard.edu/~purcell/plink).
Variant filtering. A total of 11,223,176 candidate variants without rs number were identified, of which 802,183 variants were not observed in 134 healthy Korean sequences. Next, intronic variants were removed, and variants causing frameshift, inframe-deletion, missense, or stop loss mutations were selected. Additionally, 7,425,674 candidate variants with rs numbers were filtered in the same manner. Next, unannotated variants in any gene and intronic variants were removed, and 1,102 candidate exonic variants, including variants causing frameshift, inframe-deletion, missense, or stop loss mutations, were selected for further validation. Of these 1,102 candidates, 110 were found in all five of the patients, and 94 newly identified variants were also selected. Additionally, 20 variants found in 1-4 patients were selected followed by phred scores (over 600). These 224 candidates were selected for variant validation; and after filtering, as described in S4 Table, 125 candidates were processed for further validation.
Validation of selected variants. A total of 125 variants were validated by Fluidigm Bio-Mark Dynamic Array. Variant validation was performed among all five patients and 10 healthy individuals as described above. However, variant validation in patient no. 5 failed. Thus, the variants observed in patient no. 5 were revalidated by Sanger sequencing. However, owing to low gDNA quality, only three variants were revalidated using this method. PCR and sequencing were performed as previously described using the PCR primers listed in S5 Table [23].
Genome data analysis. All variants that showed different allele frequencies between groups were analyzed using Variant Effect Predictor in ENSEMBL release 81 (based on the GRCh37), and the effects of variants on the annotated canonical transcripts for all genes were assessed using PolyPhen-2 and SIFT [60,61]. The effects of amino acids substitutions resulting from the different variants were predicted and considered as damaging changes if indicated as 'probably damaging' or 'possibly damaging' by PolyPhen-2 or if indicated as 'deleterious' by SIFT. GO categorization were performed using the Panther Classification System (http:// pantherdb.org; Table 3). Known information of analyzed genes was extracted from GeneCards (http://www.genecards.org).
Biological network analysis. Biological networks among the genes (proteins) harboring validated variants were created using Pathway Studio 9.0 software (Ariadne, Rockville, MD, USA). The molecular interactions between the genes and NDs, such as AD, PD, and PrDs, were extracted from Elsevier's MedScan text-mining software, which contains biological articles and abstracts. Error information and interactions because of mismatches and mistyping were removed from the output data.   Table. Information for 1,102 variants. All information for the 1,102 variants, including variants generated during the validation process (see stage IV and V in Fig 3). The Fluidigm array design results are listed in the last two columns. Ninety-nine variants were not designed for Fluidigm SNPtypeAssays for the following reasons: the target design parameters were not met, and there were regions containing large repeats and/or complex genomic diversity; GC contents were outside of the range of product specification; adjacent variant(s) were found within 30 bp of the target SNP; and target sequences were not supported. 'High' indicates that the target could be designed to the product specifications; 'Medium' indicates that the target could be designed, but with limited support because of problems such as high GC content (over 65%) of the target or indel target. (XLSX) S5 Table. Primer sequences for variants validation.