Joint binary response modelling for childhood comorbidity in Ethiopia

Childhood diarrhea and Acute Respiratory Infection (ARI) are two diseases with similar risk factors in tropical developing regions. The objective of this study was to employ a joint binary response model and identify risk factors for childhood diarrhea and ARI in children under the age of five. A joint binary response model that takes into account the interdependence of the two diseases was used. Explanatory variables such as residence, vaccination, mother’s education, and antenatal care visits during pregnancy were found to be statistically significant risk factors for diarrhea in the joint model, whereas residence, the number of children ever born, vaccination, mother’s education, and wealth index were found to be statistically significant risk factors for childhood Acute Respiratory Infection. We discovered a common odds ratio value (4.30) greater than one, indicating a positive relationship between the two childhood diseases. As a result, using a joint model to assess the risk factors for diarrhea and acute respiratory infection (ARI) was reasonable. Furthermore, the standard errors of the parameter estimates in the joint response model were found to be smaller than the corresponding standard errors in the separate models. The risk factors such as residence, vaccination, and mother’s education all had a significant effect on the two correlated dichotomous response variables, diarrhea and ARI.


Introduction
Childhood Morbidity in low-income countries is commonly characterized by more than one health condition [1], and it is a challenge in many settings that can lead to death. On the other hand, coinfection due to overlapping risk factors such as nutrition, sanitation, and overcrowding [2] is extremely common among children under the age of five seeking care [3]. Diarrhea and ARI (Acute Respiratory Infection) are two diseases with common risk factors in tropical developing countries [4].
Diarrhea is one of the leading causes of death among Ethiopian children under the age of five. According to World Health Organization (WHO) estimates, diarrhea is responsible for more than 13% (one out of every ten) child deaths in Ethiopia [5]. According to previous research, the percentage of children under the age of five who had diarrhea decreased from 24 percent in 2000 to 12 percent in 2016 [6]. Furthermore, acute respiratory infection (ARI), particularly pneumonia, is one of the leading causes of morbidity and mortality in Ethiopia, accounting for 18% of all deaths [5]. The country has made a remarkable commitment to reducing ARI morbidity and mortality in children [7]. Deaths can be reduced if health interventions target one or more concurrent diseases [1]. Usually, childcare programs addressed single diseases such as diarrhea, acute respiratory infections, fever, and malaria infections [3]. Diarrhea and acute respiratory infections continue to be the leading causes of death in children under the age of five [8]. The illness of the two health conditions could be due to shared risk factors in childhood. For instance, diarrhea and acute respiratory diseases may share risk factors such as age as a child-dependent risk factor or poor sanitation and crowding as environmental risk factors [9]. Walker et al. [10] discovered that diarrhea and ARI have overlapping epidemiology, which is due in part to shared risk factors such as inadequate breastfeeding, malnutrition (which includes undernutrition, which is classified as underweight, stunting, and wasting), and a lack of zinc. According to Bbaale [4], breastfeeding for the first 6 months, socioeconomic status, type of house the child lived in, maternal occupation, child's age, and nutritional status were all significant common factors associated with pneumonia and diarrhea.
Moreover, Leung et al. [11] found that younger age of child, male gender, undernutrition, lower maternal education, low wealth quintile, and poor breastfeeding practices were associated with the concurrent occurrence of diarrhea and pneumonia in under-five children in Dhaka. Hence, the majority of the risk factors for ARI and diarrhea are modifiable, since the two outcomes share some major risk factors, they can be addressed through integrated interventions [12].
Takele et al. [13] proposed a spatial effect model to explain the risk of diarrhea, cough, and fever comorbidity in Ethiopia and pointed out that childhood comorbidity is not distributed evenly across the country, with significant comorbidity hotspots in Tigray and Oromia. The findings suggest that using a combined morbidity management technique to reduce children's illnesses and deaths has a significant impact. Regions with high comorbidity rates in children should be examined for healthcare interventions.
In order for childcare programs conducted by health institutions to be successful, statistical methods are required to assess the features of diseases that coexist. A recent trend in discovering similar patterns of variation has been signifying to employ a joint analysis that assesses risk factors for such diseases [14]. The joint modelling addresses the correlation of the two response variables with the same set of covariates. The two outcomes improve control over type I error rates in multiple tests and increase efficiency in parameter estimates.
Most previous health research in Ethiopia has focused on studying a single disease at a time. Separate analyses, however, may fail to provide a comprehensive picture of the epidemiology of the diseases and the combined effects of childhood diseases on the population under consideration due to common and overlapping risk factors [15]. A few studies on joint response modeling of correlated outcomes have concentrated on continuous, discrete, or a combination of discrete and continuous outcomes [16].
Epidemiological methodology and research have advanced over time, moving from studying a single disease to studying several diseases concurrently. Thus, in this paper, we focused on comorbidity in children under the age of five by employing a joint binary response model that takes into account the interdependence of the two infections when assessing risk factors. Hence, the Ethiopian government and other institutions involved in the health sector can use the findings of this study to improve the design and implementation of interventions that address the country's exposure to multiple illnesses at an early age.

Data source
The empirical analysis in this paper was based on data from 2016 Ethiopian Demographic and Health Survey (EDHS). The Demographic and Health Surveys (DHS) are a well-established source of reliable population data, with a strong emphasis on childhood diseases [6]. This study included 9917 under-five-year-old children, with each record containing information on childhood diseases as well as a list of covariates that may have an impact on children's health.

A description of the study variables
Let Y be a vector consisting of two dichotomous dependent variables, Y1 (Diarrhea Status) and Y2 (Acute Respiratory Infection Status). If the child had diarrhea and an acute respiratory infection, the two binary response variables would be set to '1' or '0.' The probabilities of occurrence of the possible joint outcomes of Y1 and Y2 for a set of m-paired observations can be illustrated as in Table 1.
Similarly, we can represent the joint and separate probabilities of Y 1 and Y 2 indicated in the Table 1 as, p ij = P (y 1 = i, y 2 = j); i,j = 0,1 and p k = P(y k = 1), k = 1,2 respectively. Where: P 11 = the child has both diarrhea and ARI P 01 = the child has no diarrhea but has ARI P 10 = the child has diarrhea but not ARI P 00 = the child has neither diarrhea nor ARI The following predictors were included in the final models: child's age in months, child's sex, residence, antenatal care visit, vaccination, mother's education, number of children, and wealth index.

Separate and joint models
The primary goal of the standard binary regression model is to predict the response variable categories as a function of the covariates in the model. However, in the bivariate logistic regression model, we attempt to investigate the relationship between two distinct dichotomous response variables (Y 1 and Y 2 ) by modelling them jointly as a function of the explanatory variables under consideration. Each pair of dependent variables (Y i1 , Yi 2 ) has four possible outcomes, as shown in Table 1. These are: (Y i1 = 1, Yi 2 = 1), (Y i1 = 1, Yi 2 = 0), (Y i1 = 0, Yi 2 = 1), and (Y i1 = 0, Yi 2 = 0).
The joint probability of each of these four outcomes was modelled using three systematic components, which are as follows: ■ The odds ratio, which is denoted by ψ describes the dependence of one marginal distribution on the other. Bivariate logistic regression model or bivariate logistic odds-ratio model proposed by [17][18][19] is specified by modelling the marginal distribution of each of Y j , and the common odds ratio. The common odds ratio or ψ is defined as the ratio of the odds of Y 1 = 1 given that Y 2 = 1 and the odds of Y 1 = 1 given that Y 2 = 0, that is ψ ¼ p 11 p 00 p 10 p 01 . The odds ratio is used to measure the association between the two dichotomous response variables (Y 1 and Y 2 ); the value of ψ equal to one indicates a lack of association between Y 1 and Y 2 .
Bivariate logistic regression provides adjusted estimates of concordance by simultaneously estimating covariate effects on the odds ratio, which describes the pair wise association structure. Furthermore, using bivariate logistic regression analysis may provide more precision than unadjusted estimates from other possible categorical distributions. The logistic regression models can be used to express the influence of the independent variables on the marginal probabilities, that is, the marginal probability of diarrhea (Y 1 ) and ARI (Y 2 ), as well as the odds ratio. Similarly, the relationship between covariates and each disease can be investigated using separate logistic regression models for each outcome: These standard logistic regression models, as shown in Eqs (1) and (2), do not take into account the correlation between the two childhood diseases. However, we assume that the two diseases of the same individual share a set of common unobservable features. Let u j denote the random intercept shared by the two diseases of the j th (j = 1,2,. . ., m) individual. Let ϕ 1i and ϕ 2i define dummy variables with ϕ 1i =1 for i = 1 and ϕ 2i =1 for i = 2. Then, the joint response model for the bivariate logit is given by As a result, in this equation, all individuals' bivariate responses (Y 1j ,Y 2j ) are stacked into a single response vector. The random intercepts are assumed to differ from one individual to the next. Furthermore, the random intercepts are assumed to be normally distributed with a mean zero and variance s 2 u . Let G denote the distribution of u j . Given the shared random intercept, an individual's joint response is assumed to be independent. Using conditional independence from this assumption, we can write the joint response model's likelihood function as follows: Following Ghebremichael [15], we used maximum likelihood estimation to obtain estimates of the model parameters (b 1 ;b 2 ; s 2 u ). The estimation was based on maximizing the loglikelihood function, which is maximizing the logarithm of the likelihood function. However, because the integrals for Eq (4) do not have closed-form solutions, the parameters of the joint model were estimated simultaneously via maximum likelihood by evaluating the integrals using Gaussian adaptive quadrature approximation [20].

Parameters in the joint and separate models
The parameters in the joint response model must be interpreted conditionally on the random intercepts, in contrast to the unconditional interpretations of the parameters in the separate models. The first set of parameters are conditional (subject-specific), while the second set of parameters are population-averaged (marginal). As a result, it's impossible to compare the two sets of parameters directly. The marginal effects of the covariates in Eq (3) are required to compare the parameters in the joint response model to those obtained from the separate models. The subject-specific parameters of the joint model can be marginalised as described below. From the separate models (1)-(2), we can write the marginal mean as: Similarly, in the joint response model (3), the conditional mean for the i th response is given by: The marginal information from the joint response model can be obtained by taking the expectation by the random intercept, as shown below: We used an approximation of the marginal mean as in [15] because there is no exact closed-form solution for the marginal mean in the above equation. The magnitude of the between-subjects variation determines the discrepancies between the two parameters. When all individuals have the same random intercept value, the marginal parameters are roughly equivalent to the conditional parameters.

Results and discussion
The analyses of this study were based on 9917 valid observations of children under the age of five in Ethiopia. As shown in Table 2, 5.9% and 8.2% of male child diarrhea and ARI cases, as well as 5.1% and 7.8% of female child diarrhea and ARI cases respectively, were reported in the previous fifteen days. Geographically, 4.6% of children with diarrhea and ARI lived in urban, compared to 22.1% who lived in rural areas. In terms of vaccination, 63.3% of children in Ethiopia were reported to have received various types of vaccination. The covariates residence, vaccination, antenatal visit, wealth index, and mother's education were found to be significantly associated with diarrhea and ARI (P<0.05). Table 3 shows that in the 2016 DHS, Ethiopia had a prevalence of 4.3% of children with diarrhea and ARI. On the other hand, 11.7% of the children only had ARI, and 6.7% of the children only had diarrhea morbidity.
As shown in Table 4, a common odds ratio (4.30) greater than one indicates a positive relationship between the two childhood diseases. As a result, the bivariate logistic regression model was deemed appropriate for investigating the risk factors associated with diarrhea and ARI diseases. This method provides estimates with vibrant concurrency properties by simultaneously estimating the effects of risk factors on the odds ratio. Furthermore, the model may provide better precision than unadjusted estimates obtained by considering a multinomial distribution. Regression equations are used to describe the effects of risk factors on the marginal probability of the two diseases and the odds ratio [21]. If Y 1 and Y 2 share some or all explanatory variables, and the effects of the shared explanatory variables are estimated jointly, the bivariate model is parametrically dependent. Covariates were screened before building the models because they showed a significant association with diarrhea and ARI in the chi-square and descriptive analyses.
Maximum likelihood estimator of model parameters obtained in R using the VGAM and Zeligchoice packages from [22,23]. As shown in Table 3, in the separate model, we found that covariates such as vaccination, mother's education, and mother's antenatal care were significantly associated with child diarrhea status, whereas residence, number of children, vaccination, mother's education, and wealth index were significantly associated with childhood acute respiratory disease status. Standard models that ignore interdependence between two or more responses typically produce relatively biased estimates [15]. Thus, in assessing the potential risk factors for diarrhea and acute respiratory infection (ARI), we used a joint model that captures the dependence between childhood comorbidity.
Covariates such as vaccination, mother's education, and antenatal care were found to be significant factors for childhood diarrhea in the joint response model. Similarly, among children under the age of five, residence, the number of children, vaccination, mother's education, and wealth index were significant risk factors for acute respiratory infection (ARI). We noticed that children who did not receive vaccinations were 1.4 times more likely to develop diarrhea and ARI. In terms of parent education level, children from mothers with secondary and higher education were 0.60 and 0.39 times less likely to be affected by diarrhea, respectively, than children from mothers with no education. These findings are consistent with those of an Egyptian study [24]. Furthermore, children born to mothers who visited fewer antenatal care centers during their pregnancy were 1.61 times more likely to suffer from childhood diarrhea than children born to mothers who had regular antenatal care visits. Children living in rural areas, on the other hand, were 1.72 times more likely than their counterparts to be infected with ARI disease. This could be due to a lack of healthcare and a poor way of life in rural areas. In terms of the number of children ever born, we concluded that as the number of children increases, so does the likelihood of ARI infection i.e. a child from a family with more children is more vulnerable to acute respiratory disease. Furthermore, children born to mothers with a secondary school education were 0.60 less likely to be infected with ARI than children born to mothers with no education. In terms of wealth quantile, we found that a child from a wealthy family was less likely to be infected by ARI, i.e. children from wealthier families were less likely to be infected by acute respiratory disease than children from poorer families. Being in a higher wealth status, as compared to the lowest (poorest), was found to reduce the likelihood of diarrhea and ARI, which is consistent with what was found in Uganda [4].
More specifically, we found that the standard errors of the parameter estimates in the joint response model were lower than the corresponding standard errors in the marginal models. This indicates that the joint model outperforms the marginal models in terms of efficiency. As a result of these findings, the need for jointly modeling two correlated responses was validated.

Simulation result
Simulation can assist researchers in understanding the entire statistical model, making full use of parameter estimates, and communicating findings in a reader-friendly manner [25]. The means of the predicted probabilities of an event were calculated for each of the two categories of the dependent variable, as shown in S1-S5 Tables in S1 File. The absolute value of the difference between those two means was then calculated. If a model predicts well, cases with events should have high predicted values and cases without events should have low predicted values [26]. Along with the tables and graphs of the quantities of interest in S1 Fig in S1 File, the built joint model demonstrated its ability to make good predictions. The mean estimate for Y, the standard deviation, the median, and the 95% confidence interval measures of uncertainty around the average are all included in each row of statistics for expected and predicted values.

Conclusion
In this paper, we focused on childhood comorbidity by employing a joint response model that takes into account the interdependence of the two diseases, diarrhea and ARI. Different sociodemographic and biological risk factors for diarrhea and ARI were considered. There was a 4.3 percent childhood comorbidity of diarrhea and ARI among 9917 under-five children from DHS (2016) in Ethiopia. There was a positive correlation between the two infections, which was taken into account in this study to obtain unbiased estimates. Previous research on diarrhea and ARI comorbidity modeled each disease separately, ignoring the possibility of a link between the two. The findings of this study contributed to the appreciation of joint models that deflate parameters that would otherwise be overestimated in separate models. Residence, vaccination, mother's education, and antenatal care visits during pregnancy were found to be significant risk factors for diarrhea in the joint model, whereas residence, the number of children ever born, vaccination, mother's education, and wealth index were statistically significant risk factors for ARI in childhood. The risk factors such as residence, vaccination, and mother's education all had a significant impact on the two correlated binary response variables, diarrhea and ARI.

Limitations
We did not take into account the spatial driving factors that contribute to geographic variations. Hence, future research should look into joint models for more than two diseases including spatial variation effect as well as other social and macroeconomic factors, in order to better explain variation in childhood multiple illnesses. Besides that, this study is limited to Ethiopia, limiting its applicability or generalizability to other countries. Furthermore, since cases were self-reported by family/caretakers and not subjected to clinical diagnosis, there might have been some recall bias and misclassification of diseases.