Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Gene-Specific Function Prediction for Non-Synonymous Mutations in Monogenic Diabetes Genes

  • Quan Li,

    Affiliation Endocrine Genetics Lab, The McGill University Health Center (Montreal Children's Hospital), Montréal, Québec, Canada

  • Xiaoming Liu ,

    Xiaoming.Liu@uth.tmc.edu (XL); huiqi.qu@uth.tmc.edu (HQQ)

    Affiliation Human Genetics Center, Division of Epidemiology, Human Genetics and Environmental Sciences, The University of Texas School of Public Health, Houston, Texas, United States of America

  • Richard A. Gibbs,

    Affiliations Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America

  • Eric Boerwinkle,

    Affiliations Human Genetics Center, Division of Epidemiology, Human Genetics and Environmental Sciences, The University of Texas School of Public Health, Houston, Texas, United States of America, Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America

  • Constantin Polychronakos,

    Affiliation Endocrine Genetics Lab, The McGill University Health Center (Montreal Children's Hospital), Montréal, Québec, Canada

  • Hui-Qi Qu

    Xiaoming.Liu@uth.tmc.edu (XL); huiqi.qu@uth.tmc.edu (HQQ)

    Affiliation Human Genetics Center, Division of Epidemiology, Human Genetics and Environmental Sciences, The University of Texas School of Public Health, Houston, Texas, United States of America

Gene-Specific Function Prediction for Non-Synonymous Mutations in Monogenic Diabetes Genes

  • Quan Li, 
  • Xiaoming Liu, 
  • Richard A. Gibbs, 
  • Eric Boerwinkle, 
  • Constantin Polychronakos, 
  • Hui-Qi Qu
PLOS
x

Abstract

The rapid progress of genomic technologies has been providing new opportunities to address the need of maturity-onset diabetes of the young (MODY) molecular diagnosis. However, whether a new mutation causes MODY can be questionable. A number of in silico methods have been developed to predict functional effects of rare human mutations. The purpose of this study is to compare the performance of different bioinformatics methods in the functional prediction of nonsynonymous mutations in each MODY gene, and provides reference matrices to assist the molecular diagnosis of MODY. Our study showed that the prediction scores by different methods of the diabetes mutations were highly correlated, but were more complimentary than replacement to each other. The available in silico methods for the prediction of diabetes mutations had varied performances across different genes. Applying gene-specific thresholds defined by this study may be able to increase the performance of in silico prediction of disease-causing mutations.

Introduction

To date, a number of methods have been developed to predict functional effects of rare human mutations based on the impact of protein function and/or evolutionary conservation [1][3]. These methods are valuable to assist the diagnosis of monogenic inheritance diseases. In the area of diabetes, there is a common monogenic form, i.e. maturity-onset diabetes of the young (MODY). MODY accounts for ∼1% to 5% of all cases of diabetes, while it is mainly seen in young adults (≤25 years old) [4]. As an autosomal dominant inherited form of diabetes, MODY is caused by gene mutations leading to insufficient insulin production without or with minimal insulin resistance [5]. To date, at least 13 genes have been identified with mutations that cause MODY, i.e. HNF4A (MODY1) [6], GCK (MODY2) [7], HNF1A (MODY3) [6], PDX1 (MODY4) [8], HNF1B (MODY5) [9], NEUROD1 (MODY6) [10], KLF11 (MODY7) [11], CEL (MODY8) [12], PAX4 (MODY9) [13], INS (MODY10) [14], BLK (MODY11) [15], ABCC8 (MODY12) [16], [17], and KCNJ11 (MODY13) [16], [17].

MODY caused by different gene mutations may have different severities of diabetes and different drug responses to diabetes medications [18]. For example, MODY2 (accounts for ∼20% of all MODY cases [19]) caused by GCK gene mutations tends to have mild hyperglycaemia without obvious glycosuria. Patients with MODY2 are often asymptomatic [20] or only identified in women during pregnancy and diagnosed as gestational diabetes [21]. Most patients with MODY2 can have blood glucose satisfactorily controlled by diet therapy and don't need hypoglycemic medication [22], [23]. In contrast, MODY3 caused by HNF1A mutations, the most common type of MODY that accounts for ∼63% of all MODY cases [19], tends to have obvious glycosuria because of impaired glucose-stimulated insulin secretion [24], as well as decreased renal threshold for glucose [25]. MODY3 patients tend to have good response to sulphonylurea treatment and don't rely on insulin therapy [18], [26]. Because of the implications of pharcogenetics and personalized medicine, molecular diagnosis of MODY has clinical importance for clinical decision and for genetic counseling [18], [26]. However, because of unavailability and expense of MODY molecular diagnosis, it is not uncommon that MODY patients are classified as type 2 diabetes [27], [28] and occasionally as type 1 [29].

The rapid progress of advanced genomic technologies has been providing new opportunities to address the need of MODY molecular diagnosis. The identification of mutations in MODY genes by sequencing technologies will enable the molecular diagnosis of MODY, whereas a new issue is emerging. Most mutations causing MODY are nonsynonymous single-nucleotide mutations causing the change of an amino acid residue (according to The Human Gene Mutation Database [30], http://www.hgmd.org/). High throughput sequencing technologies enable screening of a large number of patients and parallel sequencing of a large number of genes. If a known MODY gene mutation is identified in a patient suspected of MODY, the molecular diagnosis of MODY can be established. However, the increased throughput of sequencing technologies is likely to produce increased numbers of missense variants whose causative role in MODY can be questionable. Bioinformatics tools, e.g. SIFT (http://sift.jcvi.org/) [31] and PolyPhen (http://genetics.bwh.harvard.edu/pph2/index.shtml) [32], are often used to assess the pathogenicity of a nonsynonymous mutation [27]. Because the limitations of in silico methods, the functional prediction of a nonsynonymous mutation lacks a gold standard. To date, a number of bioinformatics methods besides SIFT and PolyPhen, based on different algorithms, have been developed [1][3]. The purpose of this study is to compare the performances of different bioinformatics methods in the functional prediction of nonsynonymous mutations in each MODY gene, and to provide reference matrices to assist the molecular diagnosis of MODY.

Methods

Data source

The diabetes mutation data analyzed in this study were acquired from the Human Gene Mutation Database (HGMD) 2013.4 release (http://www.hgmd.org/) [30]. As the purpose of this study is to assess the prediction performances of different in silico methods for nonsynonymous single-nucleotide mutations, insertion/deletion mutations (InDels) are not included in this study. Altogether, 1,130 nonsynonymous single-nucleotide mutations from 24 genes have been reported causing MODY or neonatal diabetes. Among these genes, 7 genes harbor more than 30 single-nucleotide mutations within each gene with the total of 1,091 diabetes mutations (Table 1), while the other 17 genes harbor ≤6 diabetes mutations in each gene. To enable statistical comparisons of different in silico methods across different genes, those 17 genes with ≤6 diabetes mutations were not involved in this study. Among the 1,091 mutations, 155 mutations from the genes ABCC8, GCK, INS, or KCNJ11, have the phenotype of neonatal diabetes, either transient or permanent. The other 936 mutations in the 7 genes have the phenotype of MODY.

Control nonsynonymous single-nucleotide mutations in the diabetes genes were acquired from the NHLBI GO Exome Sequencing Project (ESP) [33], [34], the ARIC samples [35] in the CHARGE Exome Sequencing Project [36], and the 1000 Genome Project [37], excluding mutations recorded in the HGMD database.

Functional prediction of nonsynonymous single-nucleotide mutations

Eleven methods, including PhyloP [38], GERP++ RS [39], SiPhy [40], SIFT [31], PolyPhen-2 [32], the likelihood ratio test (LRT) [41], MutationTaster [42], Mutation Assessor [43], FATHMM [44], RadialSVM score [3], and logistic regression (LR) score [3], were covered in the dbNSFP database [45], [46] and compared in this study (Table 2). Among the 1,091 mutations involved in this analysis, 104 mutations from the genes GCK, HNF1A, HNF1B, HNF4A, and INS, are nonsense mutations, i.e. producing a pre-termination codon; two other mutations from the gene GCK replace a termination codon with an amino acid codon. For these mutations, the methods, PolyPhen-2 HDIV, PolyPhen-2 HVAR, MutationAssessor, FATHMM, Radial SVM score, LR score are not applicable to nonsense mutations or mutations assumed with highly damaging potential. Other methods, except MutationTaster, tend to have higher error rates (false negative rates, FNR), compared to the prediction of amino acid substitution mutations, i.e. SIFT FNR = 72%, GERP++ RS FNR = 40%, PhyloP FNR = 35%, SiPhy FNR = 26%, LRT FNR = 21%, and MutationTaster FNR = 3%, for the prediction of nonsense mutations. Compared with amino acid substitutions, the assessment of the functional effect of nonsense mutations tends to be less of an issue. The final analysis of this study involved 985 nonsynonymous single-nucleotide mutations. The quantitative performances of these methods were compared by the Spearman's rank correlation test [47] and the ANOVA test using the IBM SPSS Statistics 19 software ((IBM SPSS Inc., Chicago, IL, USA). To re-define gene-specific thresholds of deleterious mutations, the receiver operator characteristic (ROC) analysis was calculated by the sensitivity and specificity values of screening series of cutoffs of each method for each gene. A redefined threshold was identified using the maximum Matthews correlation coefficient (MCC) [48].

thumbnail
Table 2. Methods for function prediction for non-synonymous mutations*.

https://doi.org/10.1371/journal.pone.0104452.t002

Results and Discussion

In our analysis, the prediction scores by different methods of the diabetes mutations are highly correlated (Table 3). The highest correlations are seen between RadialSVM score and LR score (r = 0.957), PolyPhen-2 HDIV and PolyPhen-2 HVAR (r = 0.89), and phyloP and GERP++ RS (r = 0.871), while the other correlations have r<0.80. Therefore, in spite of the high statistical significance of correlations between different methods, different methods may not be able to replace each other except for the above three pairs. Especially, the FATHMM method has no obvious correlation with PhyloP, GERP++ RS, and LRT, while the correlation with MutationTaster is less significant. On the other hand, we observed significantly varied performances in detecting deleterious mutations by different methods (Table 4). Prediction errors by the in silico methods highlight the limitations of these methods and the need for cautious applications of the in silico prediction in data explanation. Among different methods, FATHMM has the lowest false negative rate (FNR = 1%), but also the highest false positive rate (FPR = 95%) [Matthews correlation coefficient (MCC) = 0.127]. Considering the lack of correlation of FATHMM with the PhyloP, GERP++ RS, and LRT, caution should be taken when explaining the FATHMM results because of its high FPR and low MCC. The highest MCC scores were seen in the RadialSVM score (MCC = 0.474, FNR = 5%), PolyPhen-2 HDIV (MMC = 0.447, FNR = 9%), PolyPhen-2 HVAR (MCC = 0.434, FNR = 16%) and LR score (MCC = 0.393, FNR = 4%).

thumbnail
Table 3. Correlations of different Methods for function prediction for non-synonymous mutations causing diabetes [Spearman's ρ (P value)].

https://doi.org/10.1371/journal.pone.0104452.t003

thumbnail
Table 4. Method comparisons for function prediction for non-synonymous mutations causing diabetes.

https://doi.org/10.1371/journal.pone.0104452.t004

Our investigation further disclosed significant differences of the quantitative performances of different methods, except SIFT, across different genes (Table 5). Varied performances across genes highlight another aspect of limitation of these in silico methods. The distribution of the prediction scores presented in Table 5 may be able to serve as a matrix to assist the assessment of functional effects of new mutations in these diabetes genes.

thumbnail
Table 5. Prediction score comparisons of diabetes mutations in different genes [Mean±Standard Deviation, N (Maximum/Percentile 75/Median/Percentile 25/Minimum)].

https://doi.org/10.1371/journal.pone.0104452.t005

The varied performances of these methods in different genes and the different scores of each method for different genes suggest that using gene-specific thresholds for deleterious mutations may improve the prediction performance of these in silico methods. We screened each gene and identified the gene-specific threshold with maximum MCC. Nonsynonymous single-nucleotide mutations in the diabetes genes from the NHLBI GO Exome Sequencing Project (ESP) [33], [34], the ARIC samples [35] in the CHARGE Exome Sequencing Project [36], and the 1000 Genome Project [37], were used as controls without including mutations recorded in the HGMD database. Shown by our analysis (Table S1), we have been able to improve the prediction performance of each method in most cases, with the FATHMM method as an exception because of its nil/low FNRs in those diabetes genes. For example, the FNR of GERP++ RS for HNF4A mutations and the FNR of LRT for HNF1B mutations were decreased without any obvious change of their FPRs. On the other hand, redefined thresholds are able to decrease the FPRs of LRT for INS mutations, MutationTaster for ABCC8 mutations, LR score for INS mutations, LRT for ABCC8 mutations, MutationTaster for INS mutations, and MutationTaster for HNF1B mutations, without obviously increasing the FNRs. The general performances of different methods were summarized in Table 6. From low to high MCCs, the methods were sorted from left to right and from top to bottom. The average difference of MCCs and P value of each two methods was shown.

thumbnail
Table 6. Comparisons of the performances of different methods by MCCs [Average difference (P value)].

https://doi.org/10.1371/journal.pone.0104452.t006

The varied performance of different methods in different genes is related to specific molecular mechanisms of diabetes mutations. For the 41 INS mutations involved in this study, 34 mutations cause neonatal diabetes. These mutations exert diabetic effects by causing misfolding of the insulin protein, rather than inactivating the gene [49], [50]. The dominantly inherited mode of the disease is from dominant negative mechanism, instead of haploinsufficiency. The misfolded insulin protein interferes cellular processes, leading to severe endoplasmic reticulum stress and potentially β cell death by apoptosis [50]. In contrast, a heterozygous individual with one copy of inactivating INS mutation may still have a sufficient response to metabolic regulation, thus without neonatal diabetes. For the prediction of neonatal diabetes mutations in the INS gene, a protein structure-based prediction method may thus have better performance than others. In this study, we see that PolyPhen-2 with structure-based predictive features has better performance than the more sequence-based SIFT method (Table S1). Unlike other monogenic diabetes genes, the neonatal diabetes mutations in ABCC8 and KCNJ11 are gain-of-function mutations [51]. Sequence-based method like SIFT has also lower performance for these mutations than PolyPhen-2.

We acknowledge the current publication bias of diabetes mutations (i.e. the bias towards identifying and reporting diabetes-causing mutations in the general human population). The diabetes mutations have been identified by studies involving much larger number of human individuals, while the genome sequencing projects involved limited number of human subjects. For a disease-causing mutation, no matter its low frequency, as long as the mutation is identified, it will be included. For example, in the case of GCK and HNF1A genes, the numbers of reported diabetes mutations are much larger than control mutations (479 vs. 22, 324 vs. 78, respectively). We also want to emphasize the application of gene-specific mutations as functionally neutral controls. Our analysis showed that different methods using redefined thresholds by genome-wide control mutations, instead of gene-specific controls, tend to have poor performances (data available upon request). To acquire a satisfactory MCC tends to need a large number of both diabetes mutations and functional neutral mutations. The gene-specific prediction model proposed by our study will have further improved performance with the availability of sequencing data of a larger number of human individuals.

In conclusion, the available in silico methods for the prediction of diabetes mutations have varied performances across different genes. In spite of the high statistical significance of correlations between different methods, different methods may not be able to replace each other. Because of varied performances across genes, applying gene-specific thresholds when possible (i.e. for genes with a number of disease mutations identified and the ROC analysis feasible) may be able to increase the performance of prediction. For genes without sufficient numbers of mutations for the ROC analysis, a consensus threshold should be used [52]. Nevertheless, the limitations of the above methods warrant that new methods are being developed continuously. For example, Johansen et al. recently developed a sequence conservation-based artificial neural network predictor called NetDiseaseSNP [53]. Capriotti et al. developed a Meta-SNP algorithm for the detection of disease-associated nsSNVs, which integrates four different methods: PANTHER, PhD-SNP, SIFT and SNAP. They showed these methods are orthogonal with different biologically relevant relationships, and the integration of different methods achieved higher accuracy [54].

Supporting Information

Table S1.

Method comparisons for gene-specific function prediction for non-synonymous mutations causing diabetes.

https://doi.org/10.1371/journal.pone.0104452.s001

(XLS)

Acknowledgments

We apologize to all colleagues whose work could not be cited owing to space limitations.

Author Contributions

Conceived and designed the experiments: XL HQQ. Performed the experiments: QL XL HQQ. Analyzed the data: QL XL HQQ. Contributed reagents/materials/analysis tools: RAG EB CP. Contributed to the writing of the manuscript: QL XL CP HQQ.

References

  1. 1. Thusberg J, Vihinen M (2009) Pathogenic or not? And if so, then how? Studying the effects of missense mutations using bioinformatics methods. Human mutation 30: 703–714.
  2. 2. Thusberg J, Olatubosun A, Vihinen M (2011) Performance of mutation pathogenicity prediction methods on missense variants. Human mutation 32: 358–368.
  3. 3. Dong C, Wei P, Jian X, Boerwinkle E, Wang K, et al. Comparison of functional prediction methods for nonsynonymous SNPs in exome sequencing studies of human diseases. Submitted
  4. 4. Fajans SS, Bell GI, Polonsky KS (2001) Molecular Mechanisms and Clinical Pathophysiology of Maturity-Onset Diabetes of the Young. N Engl J Med 345: 971–980.
  5. 5. American Diabetes A (2007) Standards of Medical Care in Diabetes–2007. Diabetes Care 30: S4–41.
  6. 6. Odom DT, Zizlsperger N, Gordon DB, Bell GW, Rinaldi NJ, et al. (2004) Control of pancreas and liver gene expression by HNF transcription factors. Science 303: 1378–1381.
  7. 7. Matschinsky FM (1990) Glucokinase as glucose sensor and metabolic signal generator in pancreatic beta-cells and hepatocytes. Diabetes 39: 647–652.
  8. 8. Thomas H, Jaschkowitz K, Bulman M, Frayling TM, Mitchell SM, et al. (2001) A distant upstream promoter of the HNF-4alpha gene connects the transcription factors involved in maturity-onset diabetes of the young. Hum Mol Genet 10: 2089–2097.
  9. 9. Wang L, Coffinier C, Thomas MK, Gresh L, Eddu G, et al. (2004) Selective Deletion of the Hnf1{beta} (MODY5) Gene in {beta}-Cells Leads to Altered Gene Expression and Defective Insulin Release. Endocrinology 145: 3941–3949.
  10. 10. Malecki MT, Jhala US, Antonellis A, Fields L, Doria A, et al. (1999) Mutations in NEUROD1 are associated with the development of type 2 diabetes mellitus. Nat Genet 23: 323–328.
  11. 11. Neve B, Fernandez-Zapico ME, Ashkenazi-Katalan V, Dina C, Hamid YH, et al. (2005) Role of transcription factor KLF11 and its diabetes-associated gene variants in pancreatic beta cell function. Proc Natl Acad Sci U S A 102: 4807–4812.
  12. 12. Raeder H, Johansson S, Holm PI, Haldorsen IS, Mas E, et al. (2006) Mutations in the CEL VNTR cause a syndrome of diabetes and pancreatic exocrine dysfunction. Nat Genet 38: 54–62.
  13. 13. Plengvidhya N, Kooptiwut S, Songtawee N, Doi A, Furuta H, et al. (2007) PAX4 mutations in Thais with maturity onset diabetes of the young. J Clin Endocrinol Metab 92: 2821–2826.
  14. 14. Haneda M, Chan SJ, Kwok SC, Rubenstein AH, Steiner DF (1983) Studies on mutant human insulin genes: identification and sequence analysis of a gene encoding [SerB24]insulin. Proc Natl Acad Sci U S A 80: 6366–6370.
  15. 15. Borowiec M, Liew CW, Thompson R, Boonyasrisawat W, Hu J, et al. (2009) Mutations at the BLK locus linked to maturity onset diabetes of the young and beta-cell dysfunction. Proc Natl Acad Sci U S A 106: 14460–14465.
  16. 16. Ashcroft FM, Rorsman P (1989) Electrophysiology of the pancreatic beta-cell. Prog Biophys Mol Biol 54: 87–143.
  17. 17. Ashcroft FM, Gribble FM (1998) Correlating structure and function in ATP-sensitive K+ channels. Trends Neurosci 21: 288–294.
  18. 18. Pearson ER, Liddell WG, Shepherd M, Corrall RJ, Hattersley AT (2000) Sensitivity to sulphonylureas in patients with hepatocyte nuclear factor-1alpha gene mutations: evidence for pharmacogenetics in diabetes. Diabet Med 17: 543–545.
  19. 19. Frayling TM, Evans JC, Bulman MP, Pearson E, Allen L, et al. (2001) beta-cell genes and diabetes: molecular and clinical characterization of mutations in transcription factors. Diabetes 50 Suppl 1: S94–100.
  20. 20. Feigerlova E, Pruhova S, Dittertova L, Lebl J, Pinterova D, et al. (2006) Aetiological heterogeneity of asymptomatic hyperglycaemia in children and adolescents. Eur J Pediatr 165: 446–452.
  21. 21. Ellard S, Beards F, Allen LI, Shepherd M, Ballantyne E, et al. (2000) A high prevalence of glucokinase mutations in gestational diabetic subjects selected by clinical criteria. Diabetologia 43: 250–253.
  22. 22. Velho G, Froguel P, Gloyn A, Hattersley A (2004) Maturity onset diabetes of the young type 2.
  23. 23. Martin D, Bellanné-Chantelot C, Deschamps I, Froguel P, Robert J-J, et al. (2008) Long-Term Follow-Up of Oral Glucose Tolerance Test–Derived Glucose Tolerance and Insulin Secretion and Insulin Sensitivity Indexes in Subjects With Glucokinase Mutations (MODY2). Diabetes Care 31: 1321–1323.
  24. 24. Miura A, Yamagata K, Kakei M, Hatakeyama H, Takahashi N, et al. (2006) Hepatocyte nuclear factor-4alpha is essential for glucose-stimulated insulin secretion by pancreatic beta-cells. Journal of Biological Chemistry 281: 5246–5257.
  25. 25. Menzel R, Kaisaki PJ, Rjasanowski I, Heinke P, Kerner W, et al. (1998) A low renal threshold for glucose in diabetic patients with a mutation in the hepatocyte nuclear factor-1alpha (HNF-1alpha) gene. Diabet Med 15: 816–820.
  26. 26. Shepherd M, Shields B, Ellard S, Rubio-Cabezas O, Hattersley A (2009) A genetic diagnosis of HNF1A diabetes alters treatment and improves glycaemic control in the majority of insulin-treated patients. Diabetic Medicine 26: 437–441.
  27. 27. Ellard S, Bellanne-Chantelot C, Hattersley AT (2008) Best practice guidelines for the molecular genetic diagnosis of maturity-onset diabetes of the young. Diabetologia 51: 546–553.
  28. 28. Qu HQ, Li Q, Lu Y, Fisher-Hoch SP, McCormick JB (2012) Diabetes related DNA mutations in Americans of Mexican Origin with Health Disparities Disclosed by NextGen Sequencing. The American Society of Human Genetics 2012 Annual Meeting Available: http://www.ashg.org/2012meeting/abstracts/fulltext/f120120262.htm. Accessed 24 July 2014.
  29. 29. Shields B, McDonald T, Ellard S, Campbell M, Hyde C, et al. (2012) The development and validation of a clinical prediction model to determine the probability of MODY in patients with young-onset diabetes. Diabetologia 55: 1265–1272.
  30. 30. Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, et al. (2003) Human gene mutation database (HGMD): 2003 update. Human mutation 21: 577–581.
  31. 31. Ng PC, Henikoff S (2003) SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res 31: 3812–3814.
  32. 32. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, et al. (2010) A method and server for predicting damaging missense mutations. Nat Methods 7: 248–249.
  33. 33. Tennessen JA, Bigham AW, O'Connor TD, Fu W, Kenny EE, et al. (2012) Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337: 64–69.
  34. 34. Fu W, O'Connor TD, Jun G, Kang HM, Abecasis G, et al. (2013) Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493: 216–220.
  35. 35. The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. The ARIC investigators. Am J Epidemiol 129: 687–702.
  36. 36. Morrison AC, Voorman A, Johnson AD, Liu X, Yu J, et al. (2013) Whole-genome sequence–based analysis of high-density lipoprotein cholesterol. Nature 201: 3.
  37. 37. An integrated map of genetic variation from 1,092 human genomes. Nature 491: 56–65.
  38. 38. Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A (2010) Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res 20: 110–121.
  39. 39. Davydov EV, Goode DL, Sirota M, Cooper GM, Sidow A, et al. (2010) Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS computational biology 6: e1001025.
  40. 40. Garber M, Guttman M, Clamp M, Zody MC, Friedman N, et al. (2009) Identifying novel constrained elements by exploiting biased substitution patterns. Bioinformatics 25: i54–i62.
  41. 41. Chun S, Fay JC (2009) Identification of deleterious mutations within three human genomes. Genome Research 19: 1553–1561.
  42. 42. Schwarz JM, Rodelsperger C, Schuelke M, Seelow D (2010) MutationTaster evaluates disease-causing potential of sequence alterations. Nat Methods 7: 575–576.
  43. 43. Reva B, Antipin Y, Sander C (2011) Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res 39: e118.
  44. 44. Shihab HA, Gough J, Cooper DN, Day IN, Gaunt TR (2013) Predicting the functional consequences of cancer-associated amino acid substitutions. Bioinformatics 29: 1504–1510.
  45. 45. Liu X, Jian X, Boerwinkle E (2011) dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum Mutat 32: 894–899.
  46. 46. Liu X, Jian X, Boerwinkle E (2013) dbNSFP v2.0: a database of human non-synonymous SNVs and their functional predictions and annotations. Hum Mutat 34: E2393–2402.
  47. 47. Wang L-L, Liu Y-H, Meng L-L, Li CG, Zhou S-F (2011) Phenotype Prediction of Non-Synonymous Single-Nucleotide Polymorphisms in Human ATP-Binding Cassette Transporter Genes. Basic & Clinical Pharmacology & Toxicology 108: 94–114.
  48. 48. Qu HQ, Li Q, Rentfro AR, Fisher-Hoch SP, McCormick JB (2011) The definition of insulin resistance using HOMA-IR for Americans of Mexican descent using machine learning. PLoS One 6: e21041.
  49. 49. Colombo C, Porzio O, Liu M, Massa O, Vasta M, et al. (2008) Seven mutations in the human insulin gene linked to permanent neonatal/infancy-onset diabetes mellitus. The Journal of clinical investigation 118: 2148.
  50. 50. Støy J, Edghill EL, Flanagan SE, Ye H, Paz VP, et al. (2007) Insulin gene mutations as a cause of permanent neonatal diabetes. Proceedings of the National Academy of Sciences 104: 15040–15044.
  51. 51. Edghill EL, Flanagan SE, Ellard S (2010) Permanent neonatal diabetes due to activating mutations in ABCC8 and KCNJ11. Reviews in endocrine and metabolic disorders 11: 193–198.
  52. 52. Gonzalez-Perez A, Lopez-Bigas N (2011) Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. Am J Hum Genet 88: 440–449.
  53. 53. Johansen MB, Izarzugaza JM, Brunak S, Petersen TN, Gupta R (2013) Prediction of disease causing non-synonymous SNPs by the Artificial Neural Network Predictor NetDiseaseSNP. PloS one 8: e68370.
  54. 54. Capriotti E, Altman RB, Bromberg Y (2013) Collective judgment predicts disease-associated single nucleotide variants. BMC Genomics 14 Suppl 3: S2.
  55. 55. Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, et al. (2004) Aligning multiple genomic sequences with the threaded blockset aligner. Genome Research 14: 708–715.