Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Sibling method increases risk assessment estimates for type 1 diabetes

  • Hoang V. Lam ,

    Contributed equally to this work with: Hoang V. Lam, Dat T. Nguyen

    Affiliation Department of Endocrinology, Cho Ray Hospital, Ho Chi Minh City, Vietnam

  • Dat T. Nguyen ,

    Contributed equally to this work with: Hoang V. Lam, Dat T. Nguyen

    Affiliation Department of Science and Technology, Hoa Sen University, Ho Chi Minh City, Vietnam

  • Cao D. Nguyen

    cao.nguyen@health.wa.gov.au

    Affiliations Department of Business and Information System, Economics University, Ho Chi Minh City, Vietnam, Clinical Analysis and Modelling—Department of Health, Western Australia, Australia

Abstract

We presented a risk assessment model to distinguish between type 1 diabetes (T1D) affected and unaffected siblings using only three single nucleotide polymorphism (SNP) genotypes. In addition we calculated the heritability from genome-wide identity-by-descent (IBD) sharing between full siblings. We analyzed 1,253 pairs of affected individuals and their unaffected siblings (750 pairs from a discovery set and 503 pairs from a validation set) from the T1D Genetics Consortium (T1DGC), applying a logistic regression to analyze the area under the receiver operator characteristic (ROC) curve (AUC). To calculate the heritability of T1D we used the Haseman-Elston regression analysis of the squared difference between the phenotypes of the pairs of siblings on the estimate of their genome-wide IBD proportion. The model with only 3 SNPs achieving an AUC of 0.75 in both datasets outperformed the model using the presence of the high-risk DR3/4 HLA genotype, namely AUC of 0.60. The heritability on the liability scale of T1D was approximately from 0.53 to 0.92, close to the results obtained from twin studies, ranging from 0.4 to 0.88.

Introduction

One of the main reasons for disease gene identification is to provide the ability to identify people who are at risk of disease. Thus, a central question for the field is whether validated marker data can be used to discriminate effectively between cases and controls. However, even markers with replicated highly significant odds ratios may be poor classifiers and most variants identified so far confer only small increments in risk and still explain only a small proportion of phenotypic diversity [1]. T1D is a major chronic childhood disease caused by a combination of genetic and environmental influences and genome wide association studies (GWAS) have found over 60 genes to affect the risk of the disease, with the HLA loci having the greatest impact on susceptibility (reviewed in [2, 3]). However, the AUC for risk prediction using multiple identified variants ranges from 0.65 to 0.68 for T1D (see ref. [4] for more details) despite the fact that T1D has a very strong family component with a heritability estimate from 0.4 to 0.8 [58].

The association of T1D with alleles at HLA loci, especially the HLA class II genes DR and DQ, is well validated [9]. The highest risk is seen in individuals who are heterozygous HLA-DRB1*03 and HLA-DRB1*04 types. HLA allele typing assists in determining risk for T1D, and in studies to understand the pathogenesis of T1D. It is particularly useful in prevention and intervention trials that test potential preventative treatments in high-risk subjects [10]. However, the high cost of HLA genotyping is a major impost on such large scale programs but is beyond the reach of smaller research groups.

In this study, we presented a cost-effective predictive model that could distinguish T1D status in siblings from multiplex families. Our model can be conducted at birth for early prediction and prevention. Our 3-SNP model can not only prevent mortality, but also decrease morbidity and public health costs.

Materials and methods

T1D datasets

We used subjects from the Type 1 Diabetes Genetics Consortium (T1DGC) [11]. A subject was labelled as affected if the subject had documented T1D with onset at 37 years old, had used insulin within 6 months of diagnosis, and had no concomitant disease or disorder associated with diabetes. Most subjects came from families where more than one child was affected, and genotyping and clinical data were also collected for parents and unaffected sibs.

For each family, we randomly selected an affected subject to form a dataset, namely, probands. Next, sibs were selected from each family and were paired together with the probands to create 1,253 pairs of proband-sib. Then we randomly split the 1,253 pairs into two datasets, namely, a “discovery” dataset of 750 pairs and a “validation” dataset of 503 pairs, subject to the equal proportion of case vs control in each dataset.

Predictors

We have recently presented a 3-SNP set, namely, rs2854275, rs3104413 and rs9273363 that could rapidly define the HLA-DR and HLA-DQ types relevant to T1D (see our methods in [12]). We used these SNPs genotyped from the probands as well as theirs sibs to predict the risk of a new sibling at birth to be developed T1D in a multiplex family.

Risk assessment model

We used a logistic regression model to construct risk prediction models [13]. This method finds the logistic curve that best predicts the risk of disease P = yes on the basis of continuous or categorical independents of an observation G = (g1,,gn), formally: (1)

Logistic regression uses an approach called maximum likelihood estimation to estimate regression coefficients. In this case, the predictor G is a 9-dimension vector, including the 3 SNPs genotyped from a proband, the corresponding 3 SNPs genotyped from a sib and 3 binary indicators showing whether or not a genotype from the proband is equal to the corresponding genotype from the sib.

We measured the discriminative accuracy of the predictive models using receiver-operator curve (ROC) analyses [14,15]. The ROC plots the relationship between the true positive rate (TPR or sensitivity) and false positive rate (FPR or 1-specificity) across all possible threshold values that define the disease. The area under the receiver-operator curve (AUC) is the probability that a randomly chosen case will have a higher estimated risk of developing the disease than a randomly chosen control. The AUC ranges from 0.5 to 1, where a higher number implies a better discriminative model between cases and controls. One important feature of the AUC is that it is not dependent on the number of cases or controls tested as described in ref. [16].

Heritability

Heritability of disease traits is formally defined as the proportion of phenotypic variance in a population attributable to additive genetic factors [17]. Traditionally, heritability is often estimated on the basis of parent-offspring correlations for continuous traits or the ratio of the incidence in first-degree relatives of affected persons to the incidence in first-degree relatives of unaffected persons [5]. In this study, we used the Haseman-Elston regression analysis [18], the simple estimation procedure for n sib pairs, of the squared difference between the phenotypes Pi1 and Pi2 of the ith pair of siblings on the estimate of their genome-wide IBD proportion , formally: (2)

Because T1D subjects were recruited from different families across worldwide [2,3] the equation (2) is adjusted for stratification using a linear mixed model: (3) where u is random effect from T1D samples’s regions. The fixed effect coefficient β is estimated using the lme4 package [19] after determining that the genome-wide IBD were distributed normally. We assume that the parents are not inbred so an estimate of the narrow sense heritability is simply: (4) where is an estimate of the total phenotypic variance.

To account for ascertainment that generates a much higher proportion of cases in our analyzed samples than in the population, the estimate heritability on the observed scale case-control study was transformed to that on the liability scale as [17,20]: (5) Where K is the population prevalence of T1D disease, P is the proportion of case vs. control and is the height of the standard normal probability density function at the truncation threshold T = Φ-1(1-K).

Results

IBD estimate

IBD probabilities were calculated using Plink [21]. Fig 1 shows the distribution of the genome-wide additive coefficients. The average proportion of the genome-shared IBD between the sib pairs (the coefficient of additive genetic variance) was 0.516 (standard deviation 0.056), with a range of 0.226–0.715.

thumbnail
Fig 1. Histogram of the genome-wide additive genetic relationships of full-sib pairs estimated from genetic markers using Plink.

https://doi.org/10.1371/journal.pone.0176341.g001

Heritability

We used a non-parametric bootstrap method [22] to calculate the standard error of the heritability. We divided the sib-pair dataset into three subsets, namely affected-affected sib-pairs, affected-unaffected sib-pairs and unaffected-unaffected sib-pairs. From each subset, we randomly selected with replacement 300 sib-pairs to reconstruct a new dataset of 900 samples where the proportion of case vs. control was always fixed at 0.5. We repeated this bootstrap routine 10,000 times to generate 10,000 new different datasets. Table 1 shows the overall h2L of T1D ranging from 0.53 to 0.92 depends on different settings of the T1D prevalence K. Our heritability estimates using sib-pair methods are closed to the results obtained from twin studies, ranging from 0.4 to 0.88 [58]. The R program and IBD data are available in Supporting Information files.

thumbnail
Table 1. The heritability on liability scale (h2L) of T1D estimates using the well-known Haseman-Elston regression analysis.

https://doi.org/10.1371/journal.pone.0176341.t001

AUC

The AUC and the corresponding 95% CIs for the sib-pair logistic regression model obtained in the discovery and validation sets are shown in Table 2. The AUC for the model was 0.75 (95% CI, 0.72–0.77) in the discovery set, and when applied to the validation set, the AUC was also 0.75 (95% CI, 0.72–0.78). Thus, the model revealed consistency between the discovery and the validation sets. The overall AUCs in both datasets are far better than those of the model using only the presence / absence of the high risk HLA-DR3/4, namely AUC of 0.61 (s.e. 0.014–0.018). Fig 2 shows the ROC analyses from the sib-pair logistic regression model applied on the two datasets.

thumbnail
Fig 2.

ROC analyses from logistic regression models on A) T1D’s discovery set n = 750 pairs (1,500 samples) and B) T1D’s validation dataset n = 503 pairs (1,006 samples). Each point represents a test defined by a different logit score.

https://doi.org/10.1371/journal.pone.0176341.g002

thumbnail
Table 2. AUC on T1D discovery and validation datasets generated from logistic regression models.

https://doi.org/10.1371/journal.pone.0176341.t002

Discussion

In the past 15 years, the genetics of common human diseases has been transformed by GWAS. These studies have been a powerful approach to the identification of genes involved in these complex diseases and led to developing predictive genetic tests. The tests using SNPs to predict an individual’s future risk of disease are one of the most appealing early disease prediction methods. Such tests can be conducted at birth and, by use of appropriate prevention strategies, prevent individuals from contracting a disease. These tests have the potential to be the cornerstone of epidemiology and are anticipated to have a large impact on health care (see further reviews in refs [2325]). It is important to note that ROC curve is a simple and convenient overall measure of diagnostic test accuracy and does not depend on the prevalence of disease in the actual population. However, to measure the performance of a prediction model in clinical settings, the positive predictive value (PPV) and negative predictive value (NPV), which incorporate the disease prevalence in the testing population, are necessary. PPV is the proportion of patients who test positive for the disease who actually have the disease, and the NPV is the proportion of subjects who test negative who are actually free of the disease. Note that, like sensitivity and specificity, the positive and negative predictive values are dependent on the risk score threshold T (the logit score in this study). When disease is rare like T1D, the threshold should be selected toward lower left portion of the ROC curve where the sensitivity is small but the specificity is high [13]. For example, when siblings of affected probands are screened, the proposed 3-SNP model achieves the PPVs of 2.7% and 7.8%, and the NPVs of 99.3% and 97.9% for the prevalence of 1% and 3%, respectively. In this case, screening for low prevalence T1D disease is cost effective because the cost of screening is less than the cost of care if the disease is not detected before disease onset. T1D, an autoimmune disease resulting from immune-mediated destruction of the insulin-producing β-islet cells of the pancreas, causes substantial morbidity and mortality and requires life-long insulin treatment. By the time that the disease is detected clinically, the β-cells are almost completely destroyed, and no known treatment can restore them.

Even though preventing or curing type 1 diabetes at risk subjects remain elusive despite effortless and substantial investments in industrialized countries [26,27], diabetes prevention research has been developed rapidly in recent years. In addition to current impressive methods to prevent type 1 diabetes such as metabolic modifications, antigen-specific vaccination, pancreatic transplantation, stimulation of β-cell regeneration, or avoidance of environmental triggers of islet autoimmunity [28], advances in stem cell biology, cell encapsulation methodologies, and immunotherapy will benefit the lives of patients in the end [27,29,30]. Importantly, if early onset diabetes of young children could be identified, screening high risk individuals can stave off or even avoid the short term as well as long term complications of type 1 diabetes. For short term complications, monitoring sugar can prevent new-onset diabetic ketoacidosis, the most severe acute diabetes-related central nervous system complication in young patients [31]. Early monitoring and modification of insulin sensitivity can also hamper diabetic nephropathy, one of the major causes of morbidity and mortality in type 1 diabetes [32,33]. The mortality and morbidity of heart disease are significantly escalated in type 1 diabetes patients compared to the nondiabetic population. An intervention at early stage to achieve glycaemia as close to normal as possible could alleviate and/or delay all of the cardiovascular complications of diabetes in high risk patients [34,35]. Several international projects for diabetes prevention such as DIPP [36], TEDDY [37], TRIGR [38], TrialNet [39] have screened and monitored thousands of newborn infants for HLA-DQB1 allele association with susceptibility to type 1 diabetes. As shown in the results, our proposed method is more accurate and much cheaper than the typical HLA typing.

Genetic factors play a significant role in T1D disease, as indicated by the proportion of explained variance (h2). As the heritability estimates for T1D explain ~90% of the phenotypic variance the GWAS-based predictions can be significantly improved by incorporating many other factors. These include invoking rare variants, structural variants, interaction between genes and environment factors, non-linear interaction between genes and genes, family history conditional on genotype at known loci and signals in non-coding regions [20,2325].

Supporting information

S1 File. Histogram R.

R program for producing histogram of IBD data.

https://doi.org/10.1371/journal.pone.0176341.s001

(R)

S2 File. Heritability R.

R program for calculating t1d h2 using Haseman-Elston regression analysis.

https://doi.org/10.1371/journal.pone.0176341.s002

(R)

S3 File. Sibling IBD data.

Plink IBD probabilities for all sib-pairs studied in this project.

https://doi.org/10.1371/journal.pone.0176341.s003

(TXT)

S4 File. Sibling IBD data.

Plink IBD probabilities for non-affected vs non-affected sib pairs.

https://doi.org/10.1371/journal.pone.0176341.s004

(TXT)

S5 File. Sibling IBD data.

Plink IBD probabilities for non-affected vs affected sib pairs.

https://doi.org/10.1371/journal.pone.0176341.s005

(TXT)

S6 File. Sibling IBD data.

Plink IBD probabilities for affected vs affected sib pairs.

https://doi.org/10.1371/journal.pone.0176341.s006

(TXT)

Acknowledgments

We are much grateful to Dr. Grant Morahan, Centre for Diabetes Research, University of Western Australia, for his invaluable comments and insightful input and we sincerely thank the anonymous reviewers for their time and effort into improving our manuscript.

Author Contributions

  1. Conceptualization: CN HL.
  2. Data curation: CN.
  3. Formal analysis: HL DN CN.
  4. Investigation: HL DN CN.
  5. Methodology: HL DN CN.
  6. Project administration: HL DN.
  7. Resources: DN CN.
  8. Software: DN CN.
  9. Supervision: HL CN.
  10. Validation: DN.
  11. Visualization: DN.
  12. Writing – original draft: HL DN CN.
  13. Writing – review & editing: HL DN CN.

References

  1. 1. Jakobsdottir J, Gorin MB, Conley YP, Ferrell RE, Weeks DE. Interpretation of genetic association studies: markers with replicated highly significant odds ratios may be poor classifiers. PLoS Genet. 2009 Feb 6;5(2):e1000337. pmid:19197355
  2. 2. Morahan G, Varney M. The Genetics of Type 1 Diabetes. In: The HLA Complex in Biology and Medicine A Resource Book. New Delhi: JayPee Brothers Publishing; 2010:205–18.
  3. 3. Morahan G. Insights into type 1 diabetes provided by genetic analyses. Current Opinion in Endocrinology, Diabetes and Obesity. 2012 Aug 1;19(4):263–70.
  4. 4. Wei Z, Wang K, Qu HQ, Zhang H, Bradfield J, Kim C, et al. From disease association to risk assessment: an optimistic view from genome-wide association studies on type 1 diabetes. PLoS Genet. 2009 Oct 9;5(10):e1000678. Reich T, James JW, Morris CA. The use of multiple thresholds in determining the mode of transmission of semi‐continuous traits. Annals of human genetics. 1972 Nov 1;36(2):163–84. pmid:19816555
  5. 5. Reich T, James JW, Morris CA. The use of multiple thresholds in determining the mode of transmission of semi‐continuous traits. Annals of human genetics. 1972 Nov 1;36(2):163–84 pmid:4676360
  6. 6. Kyvik KO, Green A, Beck-Nielsen H. Concordance rates of insulin dependent diabetes mellitus: a population based study of young Danish twins. Bmj. 1995 Oct 7;311(7010):913–7. pmid:7580548
  7. 7. Kaprio J, Tuomilehto J, Koskenvuo M, Romanov K, Reunanen A, Eriksson J, et al. Concordance for type 1 (insulin-dependent) and type 2 (non-insulin-dependent) diabetes mellitus in a population-based cohort of twins in Finland. Diabetologia. 1992 Nov 1;35(11):1060–7. pmid:1473616
  8. 8. Hyttinen V, Kaprio J, Kinnunen L, Koskenvuo M, Tuomilehto J. Genetic liability of type 1 diabetes and the onset age among 22,650 young Finnish twin pairs. Diabetes. 2003 Apr 1;52(4):1052–5. pmid:12663480
  9. 9. Atkinson MA, Eisenbarth GS. Type 1 diabetes: new perspectives on disease pathogenesis and treatment. The Lancet. 2001 Jul 21;358(9277):221–9.
  10. 10. Van Belle TL, Coppieters KT, Von Herrath MG. Type 1 diabetes: etiology, immunology, and therapeutic strategies. Physiological reviews. 2011 Jan 1;91(1):79–118. pmid:21248163
  11. 11. Barrett JC, Clayton DG, Concannon P, Akolkar B, Cooper JD, Erlich HA, et al. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nature genetics. 2009 Jun 1;41(6):703–7. pmid:19430480
  12. 12. Nguyen C, Varney MD, Harrison LC, Morahan G. Definition of high-risk type 1 diabetes HLA-DR and HLA-DQ types using only three single nucleotide polymorphisms. Diabetes. 2013 Jun 1;62(6):2135–40. pmid:23378606
  13. 13. Hosmer DW Jr, Lemeshow S, Sturdivant RX. Applied logistic regression. John Wiley & Sons; 2013 Apr 1.
  14. 14. Metz CE. Basic principles of ROC analysis. In Seminars in nuclear medicine 1978 Oct 1 (Vol. 8, No. 4, pp. 283–298). WB Saunders.
  15. 15. Fawcett T. An introduction to ROC analysis. Pattern recognition letters. 2006 Jun 30;27(8):861–74.
  16. 16. Lu Q, Obuchowski N, Won S, Zhu X, Elston RC. Using the optimal robust receiver operating characteristic (ROC) curve for predictive genetic tests. Biometrics. 2010 Jun 1;66(2):586–93. pmid:19508241
  17. 17. Falconer, DS, Mackay TFC. Introduction to Quantitative Genetics. Addison 123 (Wesley Longman Ltd, 1996).
  18. 18. Haseman JK, Elston RC. The investigation of linkage between a quantitative trait and a marker locus. Behavior genetics. 1972 Mar 1;2(1):3–19. pmid:4157472
  19. 19. Bates D, Maechler M, Bolker B, Walker S. lme4: Linear mixed-effects models using Eigen and S4. R package version. 2014 Jan;1(7).
  20. 20. Lee SH, Wray NR, Goddard ME, Visscher PM. Estimating missing heritability for disease from genome-wide association studies. The American Journal of Human Genetics. 2011 Mar 11;88(3):294–305. pmid:21376301
  21. 21. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics. 2007 Sep 30;81(3):559–75. pmid:17701901
  22. 22. Efron B. Bootstrap methods: another look at the jackknife. In Breakthroughs in Statistics 1992 (pp. 569–593). Springer New York.
  23. 23. Epstein CJ. Medical genetics in the genomic medicine of the 21st century. The American Journal of Human Genetics. 2006 Sep 30;79(3):434–8. pmid:16909381
  24. 24. Manolio TA. Bringing genome-wide association findings into clinical use. Nature Reviews Genetics. 2013 Aug 1;14(8):549–58. pmid:23835440
  25. 25. Jostins L, Barrett JC. Genetic risk prediction in complex disease. Human molecular genetics. 2011 Oct 15;20(R2):R182–8. pmid:21873261
  26. 26. Atkinson MA, Eisenbarth GS, Michels AW. Type 1 diabetes. The Lancet. 2014 Jan 10;383(9911):69–82.
  27. 27. Ziegler AG, Bonifacio E, Powers AC, Todd JA, Harrison LC, Atkinson MA. Type 1 diabetes prevention: a goal dependent on accepting a diagnosis of an asymptomatic disease. Diabetes. 2016 Nov 1;65(11):3233–9. pmid:27959859
  28. 28. Gillespie KM. Type 1 diabetes: pathogenesis and prevention. Canadian Medical Association Journal. 2006 Jul 18;175(2):165–70 pmid:16847277
  29. 29. Bresson D, von Herrath M. Immunotherapy for the prevention and treatment of type 1 diabetes. Diabetes care. 2009 Oct 1;32(10):1753–68 pmid:19794001
  30. 30. Vetere A, Choudhary A, Burns SM, Wagner BK. Targeting the pancreatic β-cell to treat diabetes. Nature reviews Drug discovery. 2014 Apr 1;13(4):278–89. pmid:24525781
  31. 31. Cameron FJ, Scratch SE, Nadebaum C, Northam EA, Koves I, Jennings J, et al. Neurological consequences of diabetic ketoacidosis at initial presentation of type 1 diabetes in a prospective cohort study of children. Diabetes care. 2014 Jun 1;37(6):1554–62. pmid:24855156
  32. 32. Bjornstad P, Cherney D, Maahs DM. Early diabetic nephropathy in type 1 diabetes–new insights. Current opinion in endocrinology, diabetes, and obesity. 2014 Aug;21(4):279. pmid:24983394
  33. 33. Bjornstad P, Snell-Bergeon JK, Rewers M, Jalal D, Chonchol MB, Johnson RJ, et al. Early Diabetic Nephropathy. Diabetes Care. 2013 Nov 1;36(11):3678–83. pmid:24026551
  34. 34. Nathan DM, DCCT/Edic Research Group. The diabetes control and complications trial/epidemiology of diabetes interventions and complications study at 30 years: overview. Diabetes care. 2014 Jan 1;37(1):9–16 pmid:24356592
  35. 35. Parikka V, Näntö-Salonen K, Saarinen M, Simell T, Ilonen J, Hyöty H, et al. Early seroconversion and rapidly increasing autoantibody concentrations predict prepubertal manifestation of type 1 diabetes in children at genetic risk. Diabetologia. 2012 Jul 1;55(7):1926–36. pmid:22441569
  36. 36. Virtanen SM, Kenward MG, Erkkola M, Kautiainen S, Kronberg-Kippilä C, Hakulinen T, et al. Age at introduction of new foods and advanced beta cell autoimmunity in young children with HLA-conferred susceptibility to type 1 diabetes. Diabetologia. 2006 Jul 1;49(7):1512–21. pmid:16596359
  37. 37. Hagopian WA, Lernmark Å, Rewers MJ, Simell OG, SheE JX, Ziegler AG, et al. TEDDY–the environmental determinants of diabetes in the young. Annals of the New York Academy of Sciences. 2006 Oct 1;1079(1):320–6.
  38. 38. Krischer J, De Beaufort C. Study design of the Trial to Reduce IDDM in the Genetically Risk (TRIGR). Pediatric diabetes. 2007;8:117–37 pmid:17550422
  39. 39. Skyler JS, Greenbaum CJ, Lachin JM, Leschek E, Rafkin‐Mervis L, Savage P, et al. Type 1 Diabetes TrialNet–an international collaborative clinical trials network. Annals of the New York Academy of Sciences. 2008 Dec 1;1150(1):14–24.