Exome Sequencing and Genetic Testing for MODY

Context Genetic testing for monogenic diabetes is important for patient care. Given the extensive genetic and clinical heterogeneity of diabetes, exome sequencing might provide additional diagnostic potential when standard Sanger sequencing-based diagnostics is inconclusive. Objective The aim of the study was to examine the performance of exome sequencing for a molecular diagnosis of MODY in patients who have undergone conventional diagnostic sequencing of candidate genes with negative results. Research Design and Methods We performed exome enrichment followed by high-throughput sequencing in nine patients with suspected MODY. They were Sanger sequencing-negative for mutations in the HNF1A, HNF4A, GCK, HNF1B and INS genes. We excluded common, non-coding and synonymous gene variants, and performed in-depth analysis on filtered sequence variants in a pre-defined set of 111 genes implicated in glucose metabolism. Results On average, we obtained 45 X median coverage of the entire targeted exome and found 199 rare coding variants per individual. We identified 0–4 rare non-synonymous and nonsense variants per individual in our a priori list of 111 candidate genes. Three of the variants were considered pathogenic (in ABCC8, HNF4A and PPARG, respectively), thus exome sequencing led to a genetic diagnosis in at least three of the nine patients. Approximately 91% of known heterozygous SNPs in the target exomes were detected, but we also found low coverage in some key diabetes genes using our current exome sequencing approach. Novel variants in the genes ARAP1, GLIS3, MADD, NOTCH2 and WFS1 need further investigation to reveal their possible role in diabetes. Conclusion Our results demonstrate that exome sequencing can improve molecular diagnostics of MODY when used as a complement to Sanger sequencing. However, improvements will be needed, especially concerning coverage, before the full potential of exome sequencing can be realized.


Introduction
MODY (maturity-onset diabetes of the young) is a heterogeneous group of diabetes caused by single gene defects in at least ten genes affecting pancreas development and beta-cell function [1,2,3]. The most common MODY forms are caused by mutations in the glucokinase gene (GCK) [4] and the hepatocyte transcription factor genes HNF1A and HNF4A [5,6]. GCK-MODY (MODY2) is a mild disease manifesting as slightly elevated fasting glucose, well controlled without medical treatment, and no risk for late diabetesassociated complications [7,8]. In contrast, HNF1Aand HNF4A-MODY (MODY3 and MODY1, respectively) typically lead to progressive beta-cell dysfunction and high risk for late complications and patients often benefit from sulfonylurea treatment [9,10,11]. HNF1B-mutations result in a syndromic diabetes form (MODY5), which includes renal failure, genital and pancreatic malformations, and liver dysfunction [12,13] According to the OMIM database, mutations in seven other genes (BLK, CEL, INS, KLF11, NEUROD1, PAX4, PDX1) can cause inherited diabetes with a MODY phenotype. There are also other forms of monogenic diabetes such as neonatal diabetes that presents before six months of age and syndromic diabetes, in which other features than diabetes dominates the clinical pictures (reviewed in [1]).
Genetic testing in monogenic diabetes is important for diagnosis and treatment [1,2,3]. When MODY is suspected, the current approach involves PCR amplification and Sanger sequencing of candidate genes, frequently with an iterative approach based on clinical features. For example, most laboratories will first screen HNF1A, followed by HNF4A and GCK in subjects exhibiting the classical features of MODY; and first GCK, then HNF1A and HNF4A, if the diabetic phenotype is mild and fasting glucose 5.5-8.5 mmol/l [1,2]. If the patient presents with renal dysfunction, urogenital or pancreatic malformations, HNF1B is usually the first gene that is tested [1,2].
Although systematic studies are lacking, our experience is that molecular genetic testing reveals a mutation in one of the common MODY genes in about 50% of probands referred to our laboratory. The remaining cases would also benefit from a genetic diagnosis, but the cost of sequencing other candidate genes often precludes further testing. A standard, complete investigation of HNF1A, HNF4A and GCK includes sequencing of 31 exons, where each sequencing reaction must be evaluated separately. Hence, the current approach is expensive and timeconsuming, and establishes a molecular diagnosis only among a limited number of genes. Whole-exome capture and highthroughput sequencing has a great potential to detect causal gene variants in dominant and recessive disorders as well as in diseases due to de novo mutations [14,15,16,17,18]. Here, we describe our experience using exome sequencing in MODY patients referred to us for genetic testing.

Ethics Statement
The study was approved by the Regional Ethical Committee for Medical Research West and performed according to the Helsinki Declaration. We obtained verbal and written informed consent from the study participants.

Study Population
We carried out whole-exome sequencing on nine probands with MODY of unknown cause recruited from the Norwegian MODY Registry ( Figure 1 and Table 1). There was diabetes running in the families for at least three generations, autosomal dominant inheritance and age at diagnosis 11-28 years for at least one family member. All probands were negative for mutations in HNF1A, HNF4A, GCK, HNF1B and INS by Sanger sequencing.

Targeted Capture and Massive Parallel Sequencing
Targeted capture and massive parallel sequencing were performed at HudsonAlpha Institute for Biotechnology (Huntsville, AL) (File S1). In brief, SureSelect Human All Exon Kit (Agilent Technologies, Santa Clara, CA) was used for exome enrichment, and sequencing was performed on Genome Analyzer GAIIx (Illumina Inc., San Diego, CA). Samples were sequenced on one lane for paired-end 72-bp reads, and the samples with lowest yield were complemented with one additional lane of singleread 76-bp reads (P01, P03, P04, P05 P07).

Read Mapping and Variant Analysis
We mapped paired-end-reads and single-reads to the reference human genomes (UCSC NCBI37/hg19) using Burrows-Wheeler Alignment tool (BWA) [19] (see also supporting Materials and Methods File S1). PCR duplicates were removed with PICARD (http://picard.sourceforge.net) followed by base quality recalibra-tion using GATK [20]. SNPs and indels were called by SAMtools [21] mpileup, correcting for overestimated mapping quality from BWA. SNPs were filtered by the following criteria: (1) SNPs should not be in a cluster with window-size of 10 bp, (2) sequencing depth should be at least 8 X; and (3) quality score should be $30. We used Annovar [22] (Nov 22, 2011) and in-house scripts to annotate and filter variants after variant calling. The performance of our exome sequencing variant calling pipeline was tested against heterozygous genotypes present in the enrichment target regions derived from the Affymetrix 6.0 whole-genome genotyping array in seven of the individuals (File S1).

Selection of genes of Interest
We selected genes previously implicated in monogenic diabetes and related syndromes [1,24], genes with important roles in the beta cell [25,26] and genes implicated from whole-genomesignificant SNP associations with type 2 diabetes (T2D) or fasting glucose [24,27,28,29,30,31]. For the T2D and fasting glucoseassociated regions, we selected named gene/genes highlighted in the respective publications. Notably, experimental evidence directly supporting that these genes are responsible for the associations, is mostly lacking. The 111 genes, of which 109 were included in the capture assay, totaled 272 kb of exonic sequence (Table S1).

Rare Variant Validation
All variants in the candidate mutation set were validated by PCR amplification of the variant-containing exon from the original patient sample, followed by Sanger sequencing. Frequency estimates were generated by genotyping 340 Norwegian healthy controls using the MassARRAY iPLEX system or by Sanger sequencing.

Results
By sequencing the exomes of the nine MODY probands (Table 1, Figure 1), we obtained 3.5-5.8 Gb mapable sequence per sample with 36-57 X median coverage of the targeted exome and 88-93% of the exome targeted at least eight times ( Table 2). The candidate diabetes genes showed similar coverage (Tables S1 and S2). The exceptions were DGKB and THADA, which were not present on the exome enrichment array highlighting one problem with our approach: current target capture reagents may not include all exons of interest. The MODY genes HNF4A and HNF1B showed relatively good coverage ($87%, at 8X) throughout the entire coding regions, while GCK (83%), HNF1A (72%) and INS (58%) were less uniformly covered (Tables S1 and S2).
We identified an average of 14,463 substitutions and indels per sample (in the targeted exome) after quality control ( Table 3). The quality of the data was investigated by comparisons from 7,800 genotyped SNPs, present in the Agilent capture region, and obtained from the Affymetrix 6.0 genotyping array for seven of the nine individuals. Between 89 and 92% of the heterozygous genotyping-array-SNPs present in the capture regions were detected (File S1 and Table S3).
Next, we developed a data reduction pipeline consisting of several steps (Table 3). We first excluded all variants not present in the actual coding sequence or in splice sites; and synonymous variants other than those occurring at canonical splice sites. We subsequently filtered against an in-house database of genetic variants from 50 Norwegian whole exomes and finally excluded variants with minor allele frequencies .0.5% of the 1000 Genomes Project. This reduced the number to 183-213 rare, coding single-nucleotide substitutions and coding indels per individual (Table 3). Combining all nine individuals, this resulted in 1,733 different variants located in 1,569 different genes. Only 50 variants were present in more than one of the nine individuals. On the gene level, 266 genes were listed with rare (most often different) variants in more than one individual, and 24 genes with rare variants in three or more individuals. Hence, despite our rigid procedure for variant filtration, a large number of potential candidate genes emerged from the nine patient data sets.
We then focused on the candidate gene list (Table S1). We identified 14 rare coding variants in 12 genes of the 111 candidate genes. Thirteen of the variants were verified by Sanger sequencing (Tables 3 and 4). Frequency estimates in 340 healthy controls, computational methods to estimate deleteriousness and literature searches were performed for the remaining 13 variants (Table 4). When available, additional family members were sequenced for variants not present in either the 1000 G database, dbSNP or in our 340 Norwegian controls ( Figure 1). In subject P01, we identified the heterozygous nonsense mutation c.1071G.A/p.R357X introducing a premature stop codon in PPARG exon 7. This mutation has previously been shown to cause severe insulin-resistant diabetes and partial lipodystrophy [32,33,34]. The proband's age at diagnosis was 38 years; she was included in our study because a family member was diagnosed at age 28 years. The proband's BMI was 27.6 kg/m 2 in 2002, 30.1 kg/m 2 in 2010, and her insulin requirement has in the same period increased from 1.4 to 2.2 U/kg/day. There was a high prevalence of micro-and macro-vascular complications in most of the affected family members, although none were available for genetic testing (Table 1, Figure 1). Thus, at recruitment the proband had a MODY phenotype, but was insulin-resistant on follow-up. The same mutation has been reported in patients with a similar clinical picture [32,33,34]. We consider this mutation pathogenic.
In subject P03, we detected the novel and heterozygous nonsynonymous ABCC8 mutation c.4096G.A/p.A1366T. Amino acid 1366 is highly conserved and located in the ATP-binding domain. Other nearby amino acid substitutions are associated with either congenital hyperinsulinism or neonatal/adult-onset diabetes [35,36]. The proband was diagnosed with diabetes at 25 years of age and is currently treated with sulfonylurea and metformin. All four diabetic family members (none available for genetic testing) were treated with OHA. The proband's age at diagnosis was late for ABCC8 diabetes although other such cases have been described [35,37,38,39]. The large size of ABCC8 makes it less amenable to Sanger-based mutation screening, which may underestimate the role of this gene in MODY. Since the proband was sulfonylureasensitive, we categorized p.A1366T as being probably pathogenic. Further sequencing studies of unselected MODY patients will Table 3. Overview of all substitutions and indels detected in the nine probands before and after variant reduction.  Table 4. Rare coding variants identified in the 111 target candidate genes using whole exome sequencing in nine patients with suspected MODY.  elucidate if ABCC8 is a more common cause of MODY than previously anticipated. In subject P07, we identified a novel non-synonymous mutation c.266G.A/p.R89Q in HNF4A. This was surprising since HNF4A already had been screened. Sanger re-sequencing confirmed the mutation. When re-examining the first electropherogram, the mutation was detectable. Hence, it had been overlooked. Another substitution of the same codon, c.265C.T/p.R89W, has been identified in MODY [40]. The amino acid residue at position 89 of HNF4A is highly conserved from Drosophila to humans and part of the DNA binding domain. Both parents are of normal weight but developed diabetes in their early forties (Figure 1). Sanger sequencing revealed that the mutation was inherited from the maternal side of the family that appears to have a stronger history of diabetes. After receiving the molecular diagnosis the proband made a successful transfer from insulin to sulfonylurea. We consider the mutation p.R89Q pathogenic.

Other Rare Variants in the Candidate Gene Set
In the 111 candidate genes, we also identified novel (not present in 50 in-house exomes, 340 healthy controls or 1000 Genomes) variants in potentially interesting genes implicated in susceptibility to diabetes, albeit thus far not in an autosomal dominant mode of inheritance: ARAP1, GLIS3, MADD, NOTCH2 and WFS1 (Table 4). Each genetic variant is discussed in some detail below.
In subject P02, we detected the novel and heterozygous nonsynonymous MADD variant c.3479G.C/p.S1160T. The MADD (MAP-kinase activating death domain) protein is known to have a role in apoptosis [41] and SNPs in the MADD region are associated with elevated pro-insulin and fasting glucose levels [42]. The subject was diagnosed with diabetes at 24 years of age. He has for ten years been treated with sulfonylurea and is currently on insulin (0.5 U/kg/day). C-peptide and proinsulin were detectable, however, not elevated. The affected and lean brother also carried the variant. He was diagnosed with diabetes 39 years old and is treated with oral hypoglycaemic agents (OHA). No other family members were available for genetic analysis.
Individual P03 (who had a probably pathogenic ABCC8 mutation) also had a potentially interesting variant in GLIS3 which is a transcription factor expressed in beta-cells and important for insulin gene expression [43]. Mutations in GLIS3 can cause a recessive form of neonatal diabetes and congenital hypothyroidism (OMIM#610199) [44,45]. There are, however, no reports on dominant GLIS3 mutations, and the variant is located in a protein region with no known function. With only the proband available for genetic testing, it was not possible to study the segregation of the variant in the family. We consider that the ABCC8 mutation is more likely to be the pathogenic variant in this patient.
In subject P04, we detected the ARAP1 (previously CENTD2) variant c.1349G.A/p.R450H. It is predicted to be benign by PolyPhen and AlignGVGD (Table 4) but damaging by SIFT, and it co-segregates with diabetes in the core family (Figure 1). Common variants at this locus have been associated with type 2 diabetes, fasting glucose and pro-insulin level [28,46,47], and it is suggested that this effect is mediated through reduced insulin secretion capacity [46,47]. However, as with most GWASassociated regions, the causative variant has not yet been pinpointed, and the nearby STARD10 gene was recently suggested as a better biological candidate gene in the region [47]. Although not on our original candidate gene list, in retrospect, no rare variants were detected in STARD10.
The NOTCH2 variant c.4228C.T/p.R1410C, found in P05, did not co-segregate with diabetes ( Figure 1). This gene was implicated as a type 2 diabetes locus in a recent GWAS metaanalysis [28]. It is also known that heterozygous mutations in NOTCH2 can cause Alagille syndrome (OMIM#610205) and Hajdu-Cheney syndrome (OMIM #102500). The patient showed no symptoms suggesting any of these diseases.
In subject P09, we detected the novel WFS1 variant c.2107C/ T/p.R703C. The affected amino acid residue is strongly conserved and the variant is suggested to be probably damaging by all three prediction programs (Table 4). Recessive mutations in WFS1 can lead to Wolfram syndrome (OMIM #222300), which includes diabetes, hearing impairment and psychiatric disease, while heterozygous carriers appear to show no major symptoms associated with diabetes. Wfs1 null mice and genetic association studies suggest a role in insulin secretion [48,49]. The proband developed diabetes 14 years old with no type 1 auto-antibodies and currently requires 0.5 U/kg/day insulin. There was no familial hearing impairment. The affected brother and father carried the variant, but not the affected uncle ( Figure 1). Age-ofdiagnosis and insulin requirements are distinctly different between the affected brothers and their father and uncle.

Discussion
Exome sequencing has shown a great potential for identification of disease mutations in monogenic disorders [14,15,16,17,18]. However, it is not clear how representative the early proof-ofprinciple studies are and whether this technology is ready to replace or complement traditional Sanger sequencing for clinical genetic testing.
Here, we show that exome sequencing can provide a significant diagnostic advantage in a substantial fraction of patients where Sanger sequencing often is inefficient, such as cases with atypical clinical presentation (family P03) or when clinical information is limited (family P01). For this group of patients, exome sequencing is an attractive option compared to the current ''phenotypically'' driven genetic testing as it allows testing beyond the short list of genes typically tested by Sanger sequencing.
To illustrate how this technology could be utilized for routine diagnostic use, we restricted our analysis to a list of 111 candidate genes. Our list included known disease genes for monogenic diabetes, insulin resistance and diseases related to glucose homeostasis. We also explored rare variants in candidates such as genes encoding transcription factors important for pancreatic development and islet specification/differentiation [25,26], and genes identified in GWAS of diabetes and fasting glucose levels. Our study identified some rare variants in the latter gene categories (Table 4 and results section). Although these variants are located in attractive candidates, to claim causality would obviously require much more extensive proof than for genes already known to cause autosomal-dominant diabetes. Such evidence would include co-segregation between variant and disease in large families, the presence in other subjects with a similar phenotype together with functional and clinical studies. Especially the limed number of available family members for segregation analysis, does not allow us to reach this level of support for the variants in our ''candidate gene'' category. Hence, we could not determine whether these variants are causing MODY or at least may act as polygenic risk factors that warrant further investigations (Table 4).
A possible advantage with exome sequencing is that it also allows for an extensive search for completely novel diabetes genes in individuals with no genetic defect in the known diabetes genes. However, for diabetes, where the genes for several monogenic forms already have been detected [1], the search for remaining, unmapped disease loci will be hampered by significant locus-and clinical heterogeneity. Our registry-based clinical sample with limited access to extended pedigrees was not powered to identify novel disease genes among the approximately 200 rare coding variants in each individual. Thus, international efforts to sequence the entire exomes of larger numbers of carefully selected subjects and to identify large multi-generational diabetes families may be a way forward.
For diagnostic utility, our study reveals that exome sequencing can increase the possibility for a genetic diagnosis in MODY. The coverage for certain key genes must, however, be improved before exome sequencing can replace Sanger sequencing in routine molecular diagnostics. Recent and ongoing improvements in capture hybridization and high-throughput sequencing technologies are promising, but the coverage problem may not be solved completely by new enrichment kits, higher read depths and longer reads. In the meantime, it might be attractive to use tailored hybridization capture for the disease of interest followed by very high-coverage sequencing of the disease-specific gene panels [50]. The increased coverage for the target genes must be weighted against the cost of developing, optimizing and keeping up-to-date disease-specific gene panels and the limited ability to detect unexpected phenotype-genotype correlations.
In conclusion, we consider phenotypically driven Sanger sequencing still as the first choice for genetic testing in patients with classical features of MODY. Exome sequencing is currently an important complement when Sanger sequencing is negative, or in patients with atypical clinical presentation. In the near future, we believe that tailored hybridization capture for selected genes of interest and very high-coverage sequencing of specific gene panels will replace Sanger sequencing. Ongoing refinements in the design of capture reagents, sequencing technologies and bioinformatics will, however, most likely ultimately lead to exome and possibly whole-genome sequencing as state-of-the art in molecular diagnostics of MODY.

Supporting Information
File S1 Supplementary Materials and Methods. (DOC)