Investigating spousal concordance of diabetes through statistical analysis and data mining

Objective Spousal clustering of diabetes merits attention. Whether old-age vulnerability or a shared family environment determines the concordance of diabetes is also uncertain. This study investigated the spousal concordance of diabetes and compared the risk of diabetes concordance between couples and noncouples by using nationally representative data. Methods A total of 22,572 individuals identified from the 2002–2013 National Health Insurance Research Database of Taiwan constituted 5,643 couples and 5,643 noncouples through 1:1 dual propensity score matching (PSM). Factors associated with concordance in both spouses with diabetes were analyzed at the individual level. The risk of diabetes concordance between couples and noncouples was compared at the couple level. Logistic regression was the main statistical method. Statistical data were analyzed using SAS 9.4. C&RT and Apriori of data mining conducted in IBM SPSS Modeler 13 served as a supplement to statistics. Results High odds of the spousal concordance of diabetes were associated with old age, middle levels of urbanization, and high comorbidities (all P < 0.05). The dual PSM analysis revealed that the risk of diabetes concordance was significantly higher in couples (5.19%) than in noncouples (0.09%; OR = 61.743, P < 0.0001). Conclusions A high concordance rate of diabetes in couples may indicate the influences of assortative mating and shared environment. Diabetes in a spouse implicates its risk in the partner. Family-based diabetes care that emphasizes the screening of couples at risk of diabetes by using the identified risk factors is suggested in prospective clinical practice interventions.


Introduction
Various studies have reported genetic factors for diabetes mellitus [1][2][3][4], warranting its familial aggregation [5][6][7][8]. Nevertheless, few studies have investigated the clustering of diabetes [9,10], particularly in married couples who were not genetically related. A cross-sectional study on concordant diseases in couples revealed that the odds of diabetes concordance was significantly high after adjustment for age alone (odds ratio [OR] = 1.70, 95% confidence interval [CI] = 1.06-2.74) but not after adjustment for age, smoking, and body mass index (OR = 1.41, 95% CI = 0.87-2.26) [11]. The findings regarding the spousal concordance of diabetes are substantially inconclusive. Moreover, age is considered a crucial determinant of diabetes. Studies have reported that old age is strongly associated with a high risk of diabetes [4,8,9,12]; the risk increases with age. Thus, middle-aged and elderly couples are susceptible to diabetes because of slowing metabolism and obesity. A common phenomenon across all the studies on the family clustering of metabolic disorders is the lack of nonfamily counterparts who did not share the same environments. Hence, it is imperative to conduct a concordance study that compares the disparity in the risk of diabetes between couples and noncouples to ascertain the effects of a common environment while examining the age vulnerability.
Most studies on family clustering have reported merely univariate statistics or investigated a very limited number of associated factors. However, familial clustering or concordance pertains to the common experiences of certain morbidities within a family and is conceivably involved with the risk factors in individual family members. Therefore, examining the factors associated with diabetes in each spouse is crucial for obtaining a more comprehensive understanding of diabetes concordance in couples. Prior research has reported sex differences in the occurrence of diabetes. Men were more likely to be diagnosed as having hyperglycemia than are women, particularly men with an older age and habits of smoking and drinking [9,12,13]. A study indicated no significant association between income level and diabetes prevalence [14]; however, most studies have reported an association between income and diabetes, with low household income identified as the risk factor [15,16]. Moreover, the risk of diabetes and other metabolic syndromes varied with occupations because of varying work-related physical activities [13,16]. Although higher levels of urbanization were associated with higher risk of diabetes [15], the association remains inconsistent. In addition, studies have indicated that diabetes could be associated with certain chronic diseases such as HIV and psychiatric morbidities [17][18][19]. The effects of the potential associated factors on the spousal concordance of diabetes require investigation.
Scarce studies have examined a control group and associated factors for diabetes clustering in couples. Therefore, the present study sought to determine the spousal concordance of diabetes by adopting a mathematically matched group of noncouples to compare the risk of diabetes concordance between couples and noncouples by using nationally representative data.
design. The study was approved by the Research Ethics Committee of China Medical University Hospital, Taiwan.

Data source and study sample
The National Health Insurance (NHI) program, established in 1995, provides comprehensive health care benefits to more than 99.7% of the residents of Taiwan (N = 23.50 million). All the medical claims from this universal program are managed by the National Health Research Institutes (NHRI), which releases the population-based National Health Insurance Research Database (NHIRD). This retrospective study retrieved longitudinal data from the 2002-2013 registry of the NHIRD, which contains the reimbursement claims of 1 million randomly sampled beneficiaries. The NHRI has indicated that this NHIRD subset can completely represent all the enrollees. The claim diagnoses in the NHIRD were coded using the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM).
This study used the data fields "relation" and encrypted individual identifiers to match married spouses from the NHIRD registry. Only two individuals having a relationship status of being insured and dependent spouses were identified as a couple by using "spouse" in the data field "relation" and the prerequisite of the encrypted identifiers mutually matched between the two spouses. Furthermore, to obtain an initial diagnosis of diabetes throughout the observation period, individuals diagnosed as having diabetes mellitus (ICD-9-CM 250.x) in 2002 were excluded from the study. Patients younger than 16 years were also excluded. Initially, data of 5,680 couples were obtained. However, 43 patients were excluded because of inadequate or missing data (0.76%). Consequently, the current study identified a cohort of 5,643 married couples, comprising 11,286 individuals (5,643 insured and 5,643 dependent spouses).
To ascertain the similarity between the case (couples) and control (noncouples) groups, except for the couple status, the case group was matched with the control group in terms of the same single value of sex, age, and comorbidities through 1:1 propensity score matching (PSM) to reduce selection bias [20]. This procedure was repeated twice for each member of a couple to obtain two randomly selected noncouple counterparts (total four individuals in the matching: dual PSM). Thus, the three matched variables were tested twice for any significant differences between the two groups. The results indicated high similarity with no differences in sex, age, or comorbidities (all P = 1, Table 1), thus confirming that the couples and noncouples qualified for the comparison. PSM provides an alternative to adjust for covariates at the level of multivariate analysis [21]. Consequently, 5,643 couples and 5,643 noncouples (N = 22,572 individuals) were included in the subsequent analysis.

Variables
The concordance of diabetes was determined using a dichotomous outcome variable. Concordance was reported if both spouses or counterparts were diagnosed using ICD-9-CM codes (250.x) for diabetes mellitus; otherwise, discordance was reported.
The present study included two categories of independent variables that are possibly associated with diabetes: 1. characteristics of the insured spouse, comprising sex, age, premiumbased monthly salary, occupation, urbanization level, region, catastrophic illness or injury, and comorbidities; and 2. characteristics of the dependent spouse, comprising age, catastrophic illness or injury, and comorbidities. The urbanization level and region were considered common environmental characteristics of the couples. The remaining variable was the characteristics of the individual spouses. Legally, the Taiwan government allows only heterosexual marriage; thus, one sex, that of the insured spouse, was used to eliminate collinearity. Age did not pass the normality test, including skewness and kurtosis, and was therefore classified into five ordinal levels, according to the frequency distribution. Furthermore, premium-based monthly salary, occupation, region, and catastrophic illness or injury were defined on the basis of the official NHI classifications. The National Health Insurance Administration issues the Catastrophic Illness and Injury card to patients with severe illness or injury. Patients with numerous catastrophic illness and injury conditions, such as regular dialysis or permanent disability, can apply for the card after the severity reaches the official criteria of the NHI program and is verified by a board-certified physician. Comorbidities were assessed using the Charlson comorbidity index (CCI) [22], a frequently used measure in clinical research. After original scoring from 0 to 6 conducted by weighting ICD-9-CM codes for each spouse, this study classified comorbidities into 0 (no comorbidities) and 1-3 (high comorbidities) because of the low-frequency distribution of CCI scores exceeding 3. The urbanization level was graded using a 5-point scale, with 1 and 5 indicating the highest and lowest urbanization levels, respectively. All the 11 independent variables were measured on a categorical or an ordinal level. All the variables in the case-control design were defined at the pair level (couples versus noncouples).

Data analysis
In this study, data were analyzed through statistical analysis and data mining. Statistical methods included the chi-squared test and logistic regression. The chi-squared test determined the prevalence rates of diabetes concordance at the bivariate level. Logistic regression was used mainly for predicting diabetes concordance at the multivariate level, with the adjusted odds ratio (OR) and corresponding 95% confidence interval (CI). Because the members of the couples and noncouples were matched for the three variables, conditional logistic regression was used to analyze the matched pair data without the matching factors in the regression model [23,24]. The conditional likelihood was estimated within the same matched set for binary diabetes concordance [25]. Moreover, collinearity diagnostics were computed using indices including variance inflation and tolerance. For data mining, C&RT and Apriori, two methods under the no hypothesis paradigm, were used to explore hidden patterns that statistics might fail to detect [26,27]. The application of data mining techniques in longitudinal study analysis of a large clinical data source may discover useful information on disease prediction and health care delivery [28][29][30]. C&RT, a decision tree, was used for classification [31]. The Apriori algorithm of association rules was used to mine for potential associations in the extracted research data [32]. Data mining largely served as a supplement to statistics. In contrast to theory-based statistical analysis, data mining is substantially more data-driven. Research that analyzes the individual level factors associated with the couple concordance of diabetes is still lacking. Therefore, this study used statistics and data mining for the optimization of pioneering modeling for the concordance factors. The joint findings engendered by the two approaches should increase the strength of evidence on diabetes concordance. Data were analyzed using SAS 9.4 and IBM SPSS Modeler 13.

Results
The common characteristics of 11,286 individual spouses were analyzed and merged in the unit of a couple. Table 2 lists the descriptive statistics of the 5,643 couples, including age and health characteristics. Most couples were aged 16-44 years (33.34%), without catastrophic illness or injury (91.29%) or any comorbidities (70.69%). A summary of cross-tabulations of the three characteristics and sex is listed in Table 2. Table 3 presents the characteristics of spouses and their associations with spousal concordance of diabetes. The prevalence rates of diabetes in the insured and dependent spouses were 18.41% (1,039/5,643) and 16.64% (939/5,643), respectively. When calculated in the unit of one couple, the prevalence rate of diabetes in either the insured or dependent spouse of the 5,643 couples was 24.67% (1,392/5,643); however, only 16.92% of the noncouples included one individual diagnosed as having diabetes (n = 955). The cross-tabulations of individual spouse characteristics and diabetes in only one spouse of a couple are presented as the intermediate results of concordance. Furthermore, the chi-squared test revealed that nine independent variables were significantly associated with spousal concordance: age, monthly income, occupation, region, catastrophic illness or injury, and comorbidities of the insured spouse, as well as age, catastrophic illness or injury, and comorbidities of the dependent spouse (all P < 0.0001). Overall, old age (!65 years), low monthly income ( US $760), catastrophic illness or injury, and CCI = 2 were significantly associated with a higher prevalence of spousal concordance. Insured spouses who were soldiers, social security insured, veterans, and associated with religious groups were more likely to develop spousal concordance of diabetes, compared with those involved in other occupations. This study did not detect any signs of collinearity. Table 4 presents the logistic regression results. The results of the unadjusted model indicated that 10 independent variables were significantly associated with spousal concordance (all P < 0.05). After all other covariates were held constant, nine variables remained significantly associated with spousal concordance of diabetes (all P < 0.05). Male insured spouses were more likely to experience spousal concordance than their female counterparts were (OR = 1.587; 95% confidence interval [CI] = 1.181-2.133). Insured spouses aged 45-54, 55-64, and !65 years were more likely to experience spousal concordance (OR = 3.817, 8.084, and 17.127; 95% CI = 1.950-7.472, 4.224-15.473, and 8.962-32.732; respectively), compared with those aged 16-44 years. Moreover, insured spouses residing in areas with urbanization levels of 2 and 3 were more likely to experience spousal concordance (OR = 1.425 and 1.817; 95% CI = 1.004-2.021 and 1.167-2.828; respectively), compared with those in level 1 urbanization areas. The odds of spousal concordance were significantly lower in insured spouses residing in the northern region than those residing in Taipei (OR = 0.632; 95% CI = 0.420-0.951). Regarding health characteristics, the odds of spousal concordance were significantly higher in insured spouses with catastrophic illness or injury than in those without these factors (OR = 1.527; 95% CI = 1.004-2.001). The odds of spousal concordance were significantly higher in insured spouses with medium-high comorbidity (CCI = 2) than in those without comorbidities (OR = 1.556; 95% CI = 1.009-3.618). Dependent spouses aged 45-54, 55-64, and !65 years were more likely to experience spousal concordance (OR = 3.405, 8.338, and 13.882; 95% CI = 1.921-6.035, 4.895-14.201, and 8.162-23.609; respectively), compared with those aged 16-44 years. Moreover, dependent spouses with catastrophic illness or injury were more likely to experience spousal concordance (OR = 1.478; 95% CI = 1.005-2.071), compared with those without these factors. In addition, dependent spouses with medium-high comorbidity (CCI = 2) were more likely to experience spousal concordance (OR = 1.904; 95% CI = 1.453-2.496), compared with those without comorbidities. Table 5 presents the results of couple-level analysis following 1:1 dual PSM. The chi-squared test revealed a significant association of marital status with diabetes concordance (P < 0.0001). Couples were significantly associated with a higher prevalence of concordance (5.19% versus 0.09%) than were noncouples. The percentage of one spouse diagnosed with diabetes in couples was higher than that of one individual with diabetes in noncouples. This phenomenon is consistent among both male and female (18.54% > 13.38%, 6.13% > 3.54%, respectively). Moreover, conditional logistic regression indicated that marital status was significantly associated with diabetes concordance (P < 0.0001). The odds of diabetes concordance were significantly higher in couples than in noncouples (OR = 61.743; 95% CI = 26.128-191.726).
After feature selection, data mining was performed with a reduced set of relevant data. The following classification rules were identified for predicting spousal concordance: 1. CCI ! 1; fourth, fifth, and sixth categories of occupation; and residence in northern and southern regions for insured spouses; and 2. age ! 55 years and CCI ! 1 for dependent spouses. For predicting no spousal concordance, the classification rules were a monthly income of !US$960 and no comorbidities for insured spouses and age = 16-54 years and no comorbidities for dependent spouses. The prediction accuracy of C&RT was 85.7%-90.9%. The Apriori algorithm was not sensitive in detecting the association rules for the presence of spousal concordance. However, the acquired rules for predicting no spousal concordance included the male sex, age = 16-44 years, no catastrophic illness or injury, and no comorbidities for insured spouses, as well as age = 16-44 years and no catastrophic illness or injury for dependent spouses. Confidence in Apriori is an indication of the probability that the rule is correct. In this study, the confidence of the Apriori algorithm was 95.3%-98.2%, indicating a strong association between the extracted patterns and spousal concordance of diabetes. Overall, the indices of accuracy and confidence demonstrate effective data mining [33,34].

Discussion
High concordance in couples versus low concordance in noncouples To our knowledge, this study is the first that investigated spousal concordance of diabetes in a matched case-control design. A contrast of high and low concordance rates of diabetes in couples and noncouples, respectively, was identified. The dual PSM analysis revealed this phenomenon in both prevalence rates and ORs. The determined prevalence rate of spousal concordance was 5.19% (293/5,643) in couples, strongly higher than in noncouples (0.09%).
The OR of 61.743 represents the marked effect of a common family environment on the development of diabetes in couples and deserves emphasis. Both couples and noncouples were matched by sex, age, and comorbidities; therefore, the high contrast in the concordance is not attributable to old-age vulnerability and is closely related only to the coupled status. Assortative mating and similarities between both members of a married couple in a common environment may explain the high concordance of diabetes in couples [35]. Studies have indicated resemblances between spouses [36,37], particularly in long-standing couples. Notably, collectivism in Taiwanese culture [38] may reinforce behavioral resemblances in couples. Furthermore, through cohabitation in the same family environment, concordant health behaviors, including exercise and dietary habits, and shared lifestyles in couples can be shaped [39][40][41][42] and might thus lead to a shared exposure, such as concordant obesity [43], to diabetes [44]. Hence, family-based intervention for modifiable health behaviors is a priority in clinical practice.

Individual-level characteristics predicting couple-level concordances
Statistical analysis and data mining yielded the combined results regarding factors associated with spousal concordance of diabetes. In addition to the couple status, nine factors, including personal and shared characteristics, of spousal concordance warrant attention. Most insured spouses were men who could have a higher risk of diabetes than their female counterparts [9,23]. The prevalence rate of diabetes was higher in insured spouses (18.41% in insured spouses versus 16.64% in dependent spouses), thus explicating the finding that insured men were more likely to experience spousal concordance of diabetes than were insured women (Table 4). Old age was markedly associated with high risks of concordant diabetes, particularly in spouses aged !65 years (both ORs > 13, accuracy = 85.7%-90.9%); this observation is in accordance with the findings of previous studies [45,46]. The urbanization level and region, which are the shared geographical characteristics of couples, were identified as the determinants of spousal concordance. Levels 2 and 3 of urbanization were associated with higher odds of spousal concordance, whereas residence in the northern region was associated with a lower risk. The geographical disparities in concordant diabetes warrant further research and require the attention  [47,48] and indicate that medical conditions of individual spouses contribute to concordant diabetes in couples.
Overall, diabetes in a spouse may indicate the risk of diabetes in the partner. A previous study indicated that spousal diabetes is associated with a 26% increase in the risk of diabetes in the partner [49], echoing the present findings. The phenomenon of spousal concordance of diabetes is evident. Therefore, the clinical prevention of diabetes should target spouses whose married partners were diagnosed as having diabetes by applying the individual-level and shared geographical risk factors identified in this study, including old age, mid-range urbanization, and chronic morbidities.

Couple-oriented health insurance: couplitation
Health insurance schemes might adjust medical payments by sex, age, and morbidities, such as capitation reimbursement [50]. A family history of certain chronic and catastrophic illnesses among genetically related family members is considered for determining premiums. Nevertheless, the spouse history of diabetes is typically not involved in the risk rating of individual-level health insurance plans. Therefore, the present study proposes a novel yet reasonable direction of a couple-oriented insurance scheme, couplitation, that is aimed at developing comprehensive coverage and reimbursements for spouse-vulnerable chronic diseases [51][52][53], particularly diabetes. Couplitation may improve early detection through examination in a manner paralleling capitation. This spouse-related risk rating of an insurance scheme requires feasibility analysis in future studies.
The limitations of the present study are mainly related to the database used. First, the NHIRD does not include information on the educational level, health behaviors, laboratory test results, cohabitation duration, and other joint characteristics of the couples. The absence of these data weakens the statistical strength of this study. Second, the body mass index is a major risk factor for diabetes; the absence of this factor may result in residual confounding and thus bias the findings in an unknown direction. Third, high level of awareness or knowledge of symptoms of diabetes may lead to early diagnosis. Due to the lack of awareness-related data in the NHIRD, the current study failed to take this factor into consideration. Finally, all spouses retrieved from the database were limited to the insured-dependent relationship. The generalization of the study findings to all other relationships requires deliberation.

Conclusions
This study involved cohort and case-control designs, individual-and couple-level analyses, and statistical analysis and data mining, all of which were aimed at providing strong evidence. This study adds to the existing knowledge base by determining the evident effects of a common family environment and individual characteristics on diabetes concordance in couples. Old-age vulnerability in diabetes cannot explain this high concordance phenomenon in couples. Diabetes in one spouse indicates the risk of diabetes in the partner. Therefore, this study suggests that family-based diabetes health care and clinical intervention be conducted using the individual risk factors identified in this study. Future studies may focus on investigating the spousal concordance of a specific type of diabetes.