Quantifying non-communicable diseases’ burden in Egypt using State-Space model

The study aimed to model and quantify the health burden induced by four non-communicable diseases (NCDs) in Egypt, the first to be conducted in the context of a less developing county. The study used the State-Space model and adopted two Bayesian methods: Particle Filter and Particle Independent Metropolis-Hastings to model and estimate the NCDs’ health burden trajectories. We drew on time-series data of the International Health Metric Evaluation, the Central Agency for Public Mobilization and Statistics (CAPMAS) Annual Bulletin of Health Services Statistics, the World Bank, and WHO data. Both Bayesian methods showed that the burden trajectories are on the rise. Most of the findings agreed with our assumptions and are in line with the literature. Previous year burden strongly predicts the burden of the current year. High prevalence of the risk factors, disease prevalence, and the disease’s severity level all increase illness burden. Years of life lost due to death has high loadings in most of the diseases. Contrary to the study assumption, results found a negative relationship between disease burden and health services utilization which can be attributed to the lack of full health insurance coverage and the pattern of health care seeking behavior in Egypt. Our study highlights that Particle Independent Metropolis-Hastings is sufficient in estimating the parameters of the study model, in the case of time-constant parameters. The study recommends using state Space models with Bayesian estimation approaches with time-series data in public health and epidemiology research.


Introduction
The epidemiological burden of chronic diseases is increasing worldwide, in the developed countries (DCs) as well as in the less developing countries (LDCs), marking that non-communicable diseases (NCDs) are no longer related to affluence. NCDs are responsible for almost 70% of all deaths worldwide; 85% of these deaths occur in less developing countries [1]. Three main demographic factors drive the noticeable increase in NCDs; aging of the population, population growth, and unplanned urbanization, and other factors such as globalization of unhealthy lifestyles [1,2]. Egypt, one of the less developing countries where GDP per capita is the availability, quality, and accessibility of healthcare services significantly affect the population's health. The third type of SMPH is based on Multiple Indicator Multiple Cause (MIMIC) models in which health is dealt with as an unobserved construct or latent variable to be determined by its causes and indicators and to be estimated in a system of structural equations. Examples; Multiple Indicator Multiple Cause health Status Index (MIMIC-HSI) [19], and Multiple Indicator Multiple Cause Burden of Disease Index (MIMIC-BDI) [18]. MIMIC-HSI gives more availability for health status with multiple domains. MIMIC-HSI was used to measure the disability caused by some diseases [20]. It was also applied to studies concerned with the population's health and the individual's [21,22]. The main shortcoming of MIMIC-HSI is the exclusion of non-fatal health outcomes from the index's estimation [23]. Both MIMIC-HSI and DALYs presume the parameters' stability and constant severity weights over a given time [24]. In 2004, Kaltjob proposed (MIMIC-BDI) [18], the disease-related variables were added, and the independence between severity weights and duration of disease was not presumed. Kaltjob used his suggested metric to rank ten different diseases in the year 2000, as well as to investigate their burden on the French population [23]. MIMIC model, however, suffers from several shortcomings. It is found incompetent in circumstances with a small number of observations or observations with absent values [25] and is problematic for applying on time-series data [26,27]. Additionally, its presumptions of independence between the structural and measurement errors, and the stationary or normally distributed observations are not always applicable [26]. Furthermore, MIMIC's estimated coefficients are not consistent with diverse sample sizes [28]. Although the NCDs' encumbrance in Egypt is substantial, no attempt was conducted to measure their burden. Therefore, the study aims at filling this gap by measuring the NCDs burden trajectories in Egypt. The study's main objective is to develop non-communicable diseases' burden-related health status index (NCDs-BDI) in Egypt. In such an endeavor, health is dealt with as an unobserved (latent) construct characterized by its observable determinants and observable indicators [22]. This effort is the first to be conducted in the context of one of the less developing countries (LDCs), and among the few performed worldwide. Our suggested health metric used the State-Space Model (SSM) to represent the latent health variable's relationship with its causes and indicators. SSM avoids several drawbacks of the Multiple Indicator Multiple Cause (MIMIC-BDI) model [23]. In contrast to the MIMIC, SSM has several advantages. It allows the current state of the latent construct to depend on its previous state and, most importantly, does not impose restrictions on the number of the causes and indicators added to the model [29][30][31]. SSM is used in studies with a small number of observations. Additionally, it is applied to model time-series observations and studies wherein the number of time points is greater than the number of individual cases. It also allows examining the intra-observations variability [29,32]. We applied the State-Space Model with two Bayesian methods: Particle Filter (PF) or Sequential Monte Carlo (SMC) method and the Particle Independent Metropolis-Hastings (PIMH) method, and we estimated the burden trajectories of four NCD diseases: 1) cardiovascular diseases, 2) neoplasms, 3) diabetes and kidney diseases, and 4) chronic respiratory diseases. Additionally, we estimated the relationships between the burden and its causes and indicators and compared between the two estimation methods. In composing the non-communicable diseases' burden (NCDs-BDI) index, the study drew on [23][P. [13][14][15][16] conceptual framework for population health assessment. In building such a health metric, we conducted some adjustments on the determinants and indicators of the health construct to accommodate better the NCDs' impact (Fig 1, see colored boxes). Fig 1 summarizes the proposed leading causes and indicators of the disease-related population health index. The supposed determinants include biological and behavioral risk factors, the disease's prevalence, and disease-related disability weight. The study assumed that the biological risk factors (including high blood pressure or high blood glucose) and the behavioral risk factors (such as smoking, unhealthy diet, obesity, alcohol consumption, and physical inactivity) affect health status not only indirectly through increasing the disease incidence [33], but also directly. The risk factors may influence disease burden directly through the behavior of the patients towards their illness. The study assumed that patients with higher risk factors are with low ability to confront their diseases, do not respond quickly to their pains, nor comply with the new therapy, which, in turn, affect the burden of the diseases [34]. The biological and behavioral risk factors were not considered in Kaltjob the framework [18]. The increase in the incidence of a specific pathology causes a rise in the burden of disease-related population's health. Disease-related disability weights are essential determinants of the burden of disease. The study assumed that there is a positive relationship between disability weight and the burden of illness. Indicators of the disease-related population's health metric are presumed to include; premature mortality and health services utilization (hospital facilities utilization and ambulatory health service utilization). Considering premature deaths as an indicator or a cause of disease burden is controversial. In this study, contrary to kaltjob study [18] and in accordance to other studies [35,36], our framework adopted that premature death is an indicator (i.e., it is a consequence of the population's health status, not a cause). Accordingly, we assumed a positive relationship between the burden of disease and the mortality rates. This assumption is also one of DALYs' main features; the higher the age-specific mortality rates, the worse the health metric [37]. Population's health status, no doubt, is a significant determinant of health service use (both of the hospital facilities and ambulatory health service). Most of the studies agreed that the lower the population health status (i.e., higher disease burden), the higher the utilization of all health services [38][39][40][41]. On the other hand, it is not always the case that higher utilization of health services is induced by a higher disease burden. More heightened awareness of the population can make them quickly respond to their pain. Some studies assumed that increased use of health services might indicate an improvement in population health status because it is associated with therapy and early diagnosis of the diseases or periodical check-ups [42]. Nevertheless, the study assumed that the higher the disease burden, the greater is the utilization of health care services, particularly in a less developing country such as Egypt. Two significant external variables explain health indicators; health care services supply and the population's financial capabilities. Literature supported a positive effect of the availability of health services on its utilization [43]. Moreover, population socio-economic status and its related financial capabilities is a significant determinant of health services utilization. Population socio-economic status is strongly related to the population's awareness and lifestyle, the behavior against the disease symptoms and acute cases, and access to health services [38][39][40]. In societies such as Egypt, where there is no full health insurance coverage, and individuals' out-of-pocket health expenditure represents about 60% of total health expenditure, financial capabilities positively impact health service utilization, especially if the prices are affordable. Some literature showed a negative relationship between higher prices and utilization of health services [44]. In this study, one of the relationships that were suggested by Kaltjob studies [18,23], the effect of the disability weights on health services' use, is eliminated as it has no theoretical base. We believe that disability weights indirectly affect the use of health care services through the disease burden. To meet our objective, we organized the study into five sections. Following the introduction, section two displays variables, data sources and their limitations, and details the SSM model and inference methods. Section three delivers the estimates of disease-related population health metrics for four groups of diseases. The discussion and conclusion are provided in section four and five, respectively.

Variables and data sources
The proposed disease-related population health metric NCDs-BDI is estimated for the four groups of NCDs using Egypt macro-level time series data from 1990 to 2017. We used data on disease prevalence rate instead of the incidence rate. It is challenging to find incidence and average duration of disability for all diseases and sequelae [45], notably in LDCs. Estimates of disability weights using population-based surveys have been used as a component of DALYs' measures after 2010 [46]. Therefore, our study as well used the general population disability weights data. Data on the prevalence of each group of diseases (cases per 100,000 population), years of life lost by cause, and the general population disability weights were collected from the Institute for Health Metrics and Evaluation website (IHME) [47]. We used [48] estimates of the prevalence of the five biological and behavioral risk factors: High cholesterol, high blood glucose, high blood pressure, obesity, and smoking. Following [49], we used the number of beds and the number of physicians to measure the supply of health care services: health facilities and ambulatory services. The number of days spent in the hospital and the number of outpatients were used as a proxy of health services (facilities and ambulatory) utilization. Data on these variables were gathered from the Annual Bulletin of Health Services Statistics, Central Agency for Public Mobilization and Statistics, Egypt [50], and were categorized according to physicians' specialities. GDP per capita data are used as a proxy for health services financial access and were gathered from the World Bank's national accounts data [51].

Data manipulation
Handling missing data. Out of the 14 variables, four had missing observations; mainly, the two indicator variables that measure the use of health services (number of days spent in the hospitals and number of outpatients) and the supply of health services (number of specialists and number of beds). In applying the Particle Filter method of estimation, we used a single imputation method to fill in the missing values for the variables number of specialists and the number of beds. We calculated the averages in this method because the missing data were less than 40% [52]. For the indicators, the number of outpatients and number of days spent in hospitals, we used a technique stemmed from literature [53]. Whenever there was a missing value at time t, they estimated the states based on the available information up to time t-1 [53][Algorithm 2, P. 522]. Inspired by this technique, we used only the available information for each indicator or response variable in time t to calculate the likelihood function; otherwise, the likelihood function of the missing value is considered one and equal weights are assigned to the particles 1 N À � (see Algorithm 1). Accordingly, the likelihood functions were used to calculate the importance weights of the simulated particles according to formula 13, 14, and 15. Assigning a value one to the likelihood function for the missing value will allow us to ignore it in the process of estimating the importance weight as a multiplication of the three weights.
In the instance of applying the Particle Independent Metropolis-Hastings (PIMH), we used a new approach of multiple imputations technique with Amelia package in R [54]. This method has several advantages as it can fit different data mechanisms, keeps the data variability, and gives efficient results in small samples [55]. This new method uses the Expectation-Maximization Bootstrapping approach (EMB). Bootstrapping in Amelia refers to getting several copies from the same dataset and filling them using the expectation-maximization method. Copies of multiple samples ensure the uncertainty in the imputation process. This method uses all the available data, even if it is not used in the analytic model. Multiple imputation gives unbiased estimates and works well with missing at random or missing completely at random data [56]. It is also influential in longitudinal data [57].
Suppose that D is the data matrix, D*(i.i.d.)MVN(μ, σ). At first, we assumed initial values for μ and σ, then we drew values ðDÞ from the assumed multivariate normal distribution (MVN) with these initial values of μ and σ for each copy of the data sets. Afterwards, the expectation-maximization starts. The Expectations is performed using the estimated values of μ and σ (from the previous step) to draw random numbers from the normal distribution to fill in the missing data. Then, we used the complete data to maximize the likelihood function for the two parameters. Iterate until convergence [54]. The likelihood function is Lðm; s j DÞ Q N i¼1 f MVN ðd i j m; sÞ; d i is the i th observation. The most conservative assumption in this method is that the data should follow a multivariate normal distribution. If this assumption is relaxed, we can make some transformations to get it as close to normal as possible [54]. In many cases, if we have non-normal or discrete variables, Amelia's normal model works well in imputation [58].
Two steps detected the linear trend of the data. First, we applied the non-parametric Mann-Kendall (MK) test [59,60] to check the existence of a monotonic upward or downward trend of each series. The null hypothesis of the test assumes that there is no upward or downward trend. It can be applied in case of missing data, but this test doesn't confirm the linearity of the trend. The main advantage of this test, it doesn't require any presumptions of the data distribution. Second, we performed linear interpolation for the missing data, and checked the linearity of the trend to affirm the choice of the linear function in interpolation. We used t-test with Sieve-bootstrap to allow for dependence between observations, assuming that there is no linear trend in the null hypothesis [61,62]. The following table (Table 1) summarizes the results of the two tests: The results indicate the rejection of the null hypothesis in the two tests, implying the monotonic and linear trend in all of the series. Consequently, it was possible to apply the linear imputation in the five series.
Algorithm 1 Particle Filter end for 9: if y 1t = Nan then 10: Handling high correlation among the five biological and behavioral risk factors. The five variables that indicate biological and behavioral risk factors, logically, are highly correlated. Therefore, we used the suggested time series factor analysis (TSFA) to collect these variables in one factor that indicates risk factors' prevalence [63]. Time series factor analysis uses the same equation of ordinary exploratory factor analysis but with subscript t. The R package TSFA has been used in this analysis to get the factors that represent the prevalence of risk factors [64]. According to TSFA, we can relax the observation independence and normality; we only need to check if the data are stationary or not and apply differencing if required. Suppose that at time t, for t equals 1, . . ., T time points, we have k latent variables (η t ), and M indicators (y t ); the model's equation will be as follows [63]: where α is M vector of intercept parameters, β is M � K matrix of factor loadings, � t is M vector of measurement errors. We assumed that the intercept (α) is equal to zero in the application of the model [63]. We applied the unit root test Augmented Dickey-Fuller test (ADF) to detect data stationarity [65]. The ADF test depends on the following equation: where α is a constant, β the coefficient on a time trend, t is the deterministic trend, and p the lag order of the autoregressive process, and Δy t−p is the difference of p th lag order of the series y t . The test detects the null hypothesis of γ=0. The five variables are non-stationary (each series's mean and variance are not constant and function in time) and should be differenced. Obesity, cholesterol, and blood glucose are integrated of order two. Raised blood pressure and tobacco are integrated of order one (see Table 2). As we have integrated data of order greater than zero, the mean and variance of indicators will change over time, and the estimation of the constant parameters will be problematic. Consequently, we applied two differences to the five variables to reach stationarity. Then, the equation of the time series factor model will be [63][P. 6]: The two extracted factors were assumed to be correlated. The correlation between the two differenced factors was small (0.35). Many methods of rotations can be used in case of interdependent factors such as oblimin, quartimin, geomin, promax, promaj, simplimax, and it is called oblique rotation. Quartimin rotation was used as a rotation method in this analysis [66]. Moreover, we estimated the undifferenced factor scores using Bartlett factor scores to be consistent with the other variables (have the same number of data points), using the following formula [63][P. 12]: We were able to obtain not time-dependent parameters from the TSFA model using the differenced data series. The resulting Bartlett factor scores depend on the factor loading β extracted from the TSFA model 4 and the error covariance ω. [63]. The resulting factor scores were used in the rest of the study.

Model and statistical analysis
To estimate the latent states' trajectory and the parameters in the State-Space model (SSM), we performed a parallel estimation of the course of the latent states and the parameters using the Bayesian approach. We applied two techniques of the Bayesian approach (we used MATLAB in applying the two methods [67]): Particle Filter (PF) or Sequential Monte Carlo (SMC) method and the Particle Independent Metropolis-Hastings (PIMH) method. The PF assumes that the parameters are dynamic; therefore, we used the online estimation technique in which the estimation is performed sequentially as a new observation is becoming available. In contrast, the PIMH assumes that the parameters are static; hence, we used the offline estimation technique which depends on the entire observations of y 1:t , y for t = 1, . . ., T [68]. Particle filter (sequential Monte Carlo). We estimated the latent states' posterior density in the particle filter method based on the observed variables' available information. It is a sequential process of obtaining the latent states' posterior at time t based on the latent posterior at time t−1 and the new observed points at time t [69]. Assume that we have the following state equation: The state equation follows the Markov property; the value of the disease's burden (x t ) depends only on the value (x t−1 ). The latent variable also depends on the risk factors (u 1t ), the disease prevalence (u 2t ), the average of mild and moderate disability weights (u 3t ), the average of severe disability weights (u 4t ), and the state noise (� 1t ). Additionally, we have three measurement equations for the three indicators. The first measurement equation for (y 1t ) refers to the years of life lost due to death (YLL). The indicator variable (YLL) is assumed to be a function in the burden of disease (x t ) only, and it takes the form: The second measurement equation is for the ambulatory health services utilization (y 2t ) (proxied by the number of outpatients). It is assumed to be a function in the burden of disease (x t ), the ambulatory health services supply (measured by the number of specialists) (Z 2t ), and health services financial access proxied by GDP per capita (Z 3t ) (an estimate of the individual's financial capability); it is written as: The third measurement equation is for the indicator (y 3t ), the hospital facilities utilization (proxied by the number of days spent in hospitals). It is assumed to be influenced by the burden of disease, hospital services supply (proxied by the number of beds) (z 1t ), and GDP per capita (z 3t ).
where � 1t , � 2t , � 3t are the three measurements' noises respectively. For simplicity, we assumed that they follow Gaussian distribution. Regarding the normality assumption of the indicators, we have three response variables. The first response variable, years of life lost (YLL), is a continuous variable due to its calculation methods [70,71]. The other two response variables (the number of days spent in a hospital and the number of outpatients) have missing values. Therefore, in PIMH, these two response variables needed imputation using EM algorithm to assist random draws of missing values from the normal distribution even if the main distribution of the data is not normal. Consequently, the imputed versions of the data are of continuous type, and the assumption of normality in PIMH can be acceptable in case of using the imputed data.
In particle filter (PF) analysis, we have to choose between two ways of handling count data: either using robust linear models that overcome the shortfalls of the non-normality of the data or make the transformations to approach normality such as log transformation, square root, standardization, Box-Cox transformation [72,73]. We standardized all the variables to approach normality and have the same assumptions of the normal distribution in the two methods (PF and PIMH) to achieve proper comparison. The recursive computation of the latent states works as follows [74, P. 139-141]: According to the Bayes theorem, we can compute the posterior density from the following equation: Pðx t jy 1:t Þ ¼ Pðy t jx t ÞPðx t jy 1:tÀ 1 Þ pðy t jy 1:tÀ 1 Þ ; ð10Þ where the prior density of the latent variable is P(x t |y 1:t−1 ), the likelihood of the data is P(y t |x t ), and the normalising constant or the marginal likelihood is p(y t |y 1:t−1 ). We discarded the normalising constant for simplicity as it is not always tractable. The previous equation can be rewritten as: Pðx t jy 1:t Þ / Pðy t jx t ÞPðx t jy 1:tÀ 1 Þ: Accordingly, the first step assumes that the initial value of the state, x 0 at t = 1 follows a density p(x 0 ). Each iteration t starts with the posterior of x ðiÞ ðtÀ 1jtÀ 1Þ obtained from the previous iteration t−1; we can calculate x ðiÞ ðtjtÀ 1Þ from the state (transition) equation to get new samples. We started from t = 2, and the estimated states at t = 1 are assumed to be the average of the simulated particles from the uniform distribution (Algorithm 1). The importance weight of each particle w ðiÞ tjtÀ 1 is calculated according to: Where N is the number of particles, x ðiÞ tjtÀ 1 are iid samples from P(x t |y 1:t−1 ), and their corresponding weights w ðiÞ tjtÀ 1 . From Eq 12, as we do not have information about the target distribution we can rewrite the weights as follows: In each iteration, the particles with low weights are discarded, and the new iteration starts with the highly weighted particles. The weights in Eq 13 have three dimensions based on the three measurement Eqs (7, 8 and 9), so we need to estimate w 1t , w 2t , w 3t . The three weight functions are estimated as follows: According to the previous studies [18,23], and in agreement with our conceptual framework, the response variables are conditionally independent given the state variable and the parameters (see Fig 1). Assuming that the three indicator variables are independent, we multiplied the three unnormalized weights following [75]: The normalized weights (sum to one) are given by: The state estimation was calculated as the average of weighted particles using the following formula: Finally, we added an artificial noise ψ to assume state equations for the parameters, and to allow for the change of parameters through time using a random walk process: It should be noted that whatever was the prior distribution of the error term, it would not affect the final results [76]. Using the samples of x 1:t we got an unbiased estimator of the likelihood function which was used in the PIMH method according to the following formula [30]:  [77]. It allows for the aligned estimation of the latent states and parameters [30]. As it is an approach for inference by sampling, the parameters are drawn from a proposal density. The main idea is trying to find a Markov Chain for each parameter that converges to a stationary posterior distribution. The principal privilege of this method is that it reaches unbiased estimates [78].
Suppose we have proposal distribution Q, target distribution π, and proposed or candidate parameter θ 0 from proposal distribution Q(θ 0 jθ k−1 ). We should determine if the candidate parameter would be accepted in our chain and becomes θ k , or stay in the previous position θ k−1 . This decision is based on an acceptance probability (The numerator and the denominator contain the posterior distribution π so we do not need the normalized constant in this case as it will be cancelled out). α(θ 0 , θ k−1 ) [79, P.9-10].
aðy kÀ 1 ! y 0 Þ ¼ minð1; pðy 0 ÞQðy 0 ! y kÀ 1 Þ pðy kÀ 1 ÞQðy kÀ 1 ! y 0 Þ Þ ð20Þ Q(θ 0 ! θ k−1 )is the transition probability from θ 0 to θ k−1 , and Q(θ k−1 ! θ 0 )is the transition probability from θ k−1 to θ 0 . If the proposal distribution is a symmetric distribution then, Q(θ 0 ! θ k−1 ) = Q(θ k−1 ! θ 0 ), and the acceptance probability becomes as follows [80]: The acceptance probability checks if the new proposed point, under the posterior distribution, is more plausible than the previous one or not. If it is the case, then the acceptance probability would be equal to 1, and the new point will be accepted. Alternatively, we will generate a random number from U(0, 1). If this number is less than the acceptance probability we will accept it with probability α; otherwise, the new point will be rejected [79]. The parallel estimation of the latent states and the parameters was a challenge. The combined evaluation will be like two loops: one outer loop for parameters' estimation, and an inner loop for sequential Monte Carlo estimation of the latent states' trajectory and the related likelihood functions based on the estimated parameters in the outer loop (see Algorithm 2) [81]. The unbiased estimator of the likelihood estimated from the particle filter was used in PIMH to calculate the posterior of the parameters (the formula in Eq 19 was used in Algorithm 2, line 7).
In PIMH, the initial points of the parameters were zeros, and that for the measurement and the state's error variances were 0.01. In both techniques, PF and PIMH, we presumed that the coefficients were all positive except for the impact of the diseases' burden on health services' utilization (θ 2 and θ 5 ), and assumed that the proposal distribution is U (−1, 1). The independent Metropolis-Hastings sampler's positive parameters followed U(0, 1), and the variance of the measurement errors was U(0.1, 1.5). The burden of the disease's initial value follows U (−3, 3).  Table 3 displays the estimated parameters of the state and measurement equations according to the PF method. For each parameter, the estimation converged to a single value, ensuring that the parameters are time-invariant. The computation of the coefficients demonstrates some differences for different diseases. According to the PF findings, all the diseases' burdens are on the rise, (Fig 2). However, chronic respiratory diseases showed a sharp rise at the beginning of the 1990s, and it leveled off during the time interval 1995 to 2005, then it steeply increased after that. The other three diseases showed a gradual increase with neoplasm revealed a slight rise in its slop after 2005.

Results
The burden of the preceding year weakly predicts the disease burden in the current year for all the diseases (α 1 ). The risk factors (α 2 ) have a strong influence on the burden of neoplasms, chronic respiratory, and diabetes and kidney diseases except for cardiovascular diseases. On the contrary to our hypothesis, the disease prevalence exhibits a low positive impact (α 3 ) on the disease burden of the four groups. The mild/moderate weights apparently affect the burden of the four groups of diseases (α 4 ) and results show a salient positive impact of the severe weights (α 5 ) on the disease burden of chronic respiratory diseases and diabetes and kidney diseases. Considering the health metric's indicators, YLL shows high loadings (θ 1 ) in cardiovascular, and diabetes and kidney diseases; moderate in neoplasms, and low loading in chronic respiratory diseases. The number of outpatients (θ 2 ) has low positive loadings in neoplasms and cardiovascular diseases and low negative ones in the remaining two diseases. The number of days spent in hospitals (θ 5 ) indicates negative loadings in all the diseases, but neoplasms. The number of specialists' effect on the number of outpatients (θ 3 ) reveals a strong positive impact. GDP per capita (θ 4 ) has a weak influence on the number of outpatients in cardiovascular diseases, and it shows a strong positive effect in the rest of the diseases. On the other hand, the number beds' influence on the number of days spent in the hospitals (θ 6 ) was positive strong for the four diseases. Finally, GDP per capita (θ 7 ) strongly affects the number of days spent in a hospital only for chronic respiratory diseases. The estimated error variances of the three measurement equations for the different diseases range between 0.043 (cardiovascular) and 0.189 (diabetes and kidney disease), except for the third measurement error variance in neoplasms (0.551). state equation's error variances are between 0.366 (Diabetes and kidney diseases) and 1.254 (neoplasms).
Regarding PIMH, the resultant estimated parameters were calculated as an average of five imputed samples for each of the four diseases. We used 5000 iterations that were left after discarding the first 1250 burn-in iterations (Burn-in iterations are the first group of iterations that should be discarded from the chain [78]). The number of particles was 1000 in this method [30].
The judgment on the method was performed through several diagnostics. First, the chains' trace plots are stationary around specific values, showing high quality and samples' stability representing the posterior distribution [82] (S1-S4 Figs). The second examines the auto-correlation as an essential indicator of the convergence [81]. We have minimal correlations between the samples and the previous ones (the correlation vanishes after the fourth lag). Also, we have many independent samples that reflect the target distribution (Table in S1 Table). The final diagnostic compares the means of two different segments in the chain; usually the first 10% samples and the last 50% samples. The null hypothesis assumes that the two samples' means are the same [83]. The null hypothesis was accepted, which means that the samples are from the same distribution (p-values of the test are presented in S2 Table). The computation time of PIMH becomes higher, the number of iterations and particles increases (see S3 Table). PIMH results revealed an increasing trend of all diseases' burdens (always the first year is affected by the assumed initial values). Cardiovascular, neoplasm, and diabetes and kidney diseases showed a sharp rise, overall, notably after the 2011 revolution. After a sudden decline around 2015, neoplasm kept rising, cardiovascular and chronic respiratory rose then tended to decline, and diabetes and kidney diseases are plateauing after 2015, (Fig 3, and  Regarding NCDs-BDI determinants, results reveal that the rise in the previous year's burden caused an increase in this year's burden with a moderate impact. The higher prevalence of the disease and the prevalence of the risk factors caused a higher disease burden. The same results were reached regarding the moderate impact of the severity weights ( Table 4).
Years of life lost due to death (YLL) were significantly affected by diseases' burden (high loading). The average estimated loadings of the number of outpatients (θ 2 ) are very small and negative in all the diseases (ranges between -0.16 and -0.002). Similarly, the number of days spent in hospitals (θ 5 ) shows a very weak loading and indicates a negative relationship with the burden for all the diseases but neoplasms (Table 4).
For all diseases, the effects of the specialists' number (θ 3 ) and GDP per capita (θ 4 ) on the number of outpatients approached 0.5. On the other hand, the number of beds (θ 6 ) and the GDP per capita (θ 7 ) showed a moderately positive effect on the number of days spent in hospitals (around 0.47). Finally, the state and the measurement errors' variances show low variation between diseases 0.7 and 0.8.

Discussion
The study aimed to develop a burden of disease index in Egypt. In developing such an index, it adopted the conceptualization that it is essential to consider the demand for health care services induced by the disease alongside the disease-related morbidity and mortality. In this endeavor, we adopted the Kaltjob suggested framework [23]. However, we conducted three modifications: (a) We added the biological and behavioral risk factors as one of the major causes of the disease burden (were not considered in Kaltjob framework), (b) we considered premature deaths as an indicator (i.e., is a consequence of the population's health status) not a cause as formulated by Kaltjob, and (c) his presumed direct effect of the disability weights on health services' use is eliminated as it has no theoretical base. We believe that disability weights affect the use of health care services indirectly through the disease burden.
This endeavor is the first to be conducted in the context of a less developing county, Egypt, and among the few that had been performed worldwide. The study estimated four burdens of disease indices (NCDs-BDI) for four non-communicable diseases using two Bayesian estimation methods of the State-Space model: The Particle Filter (PF) and the Particle Independent Metropolis-Hastings (PIMH).
The estimated parameters using the Particle Filter method noticeably varied between diseases but static through time, while in the PIMH, the estimated parameters were very close to each other. Both methods; PF and PIMH came to the conclusion that all the diseases' burdens are on the rise. The slow rise in the burden of neoplasms and chronic respiratory diseases that began at the mid-nineties is most probably influenced by the health sector reform program. The health sector reform program started in 1997 and planned to be accomplished in 2015. It targeted comprehensive coverage of health services and the realization of better health indicators [84,85]. However, it was obstructed by the aftermath of political instability and economic hardships following the 25th January, 2011 revolution.
Most of the two methods' findings agreed with our assumptions and are in line with the literature. High prevalence of the risk factors, increased disease prevalence, and the increase in the disease's severity level all increase illness burden. The results also showed high loadings for the years of life lost due to death YLL in all the diseases, except for in PF estimation results, YLL has weak loadings for the neoplasms and chronic respiratory. The previous year's burden strongly predicts the current year's burden (the PF results have not confirmed this assumption). In contradiction to our assumption but in agreement with others [42], the use of health care services had low and negative loadings for all the diseases, except for neoplasms and cardiovascular diseases, suggesting a weak relationship between the burden and utilization. Nevertheless, the results unravel that neoplasms induce hospitalization demand, and cardiovascular induces outpatient clinics' demand. In the more developed countries, health services use can be a matter of high awareness and early check-ups, not a reason for high burden [42]. On the contrary, in Egypt, the negative relationship between disease burden and demand for health care services can be attributed to the lack of full health insurance coverage. Approximately half of the Egyptian population are covered by health insurance. Likewise, half of the retired people that are significantly exposed to chronic diseases have health insurance [86]. Lack of health insurance discourages the use of health care facilities, as it makes seeking health care (visiting doctors for diagnosis, staying at the hospital, buying drugs) costly, and consequently, out-of-pocket spending on health care represents 60% of total health expenditure. Additionally, some sick individuals seek pharmacists' advice and medical prescriptions, a widespread practice in Egypt.
The two methods assure the positive relationship between health services supplies and use, which coincides with our presumption and literature [43]. In accordance with other studies [38,40,44,87,88], findings show a strong positive relationship between the individual's financial capabilities and health care facilities' use (outpatients), indicating the affluent are more likely to seek health care than the vulnerable and uninsured groups. On the other hand, seeking pharmacists' advice is a major outlet for the poor in case of illness [89]. On the contrary, results reveal a weak relationship between individuals' financial capabilities and hospitalization (inpatients), reflecting that all people, the better-off and the poor, can not escape hospitalization if needed.
It is worth mentioning that although the two methods are different in the assumption of parameters' dynamism; they exhibited a substantial similarity in their findings. It is noteworthy to mention that using PIMH is promising in estimating the parameters and the latent states of the study model, as the parameters converged to a constant value. Diagnostics in PIMH methods assured the convergence to the posterior distribution. On the other hand, in PF, handling missing data in the indicators was much easier than the multiple imputation method applied with PIMH. We performed a sensitivity analysis to assess the performance of the Particle Filter inference method. The Particle Filter method was numerically sensitive to changing the initial values' distribution boundary; it gives estimated parameters of similar directions but with different magnitude. However, it retains the behavior of the latent state. Besides, we used root mean square discrepancy (RMSD) to assess the performance of the varying number of particles (Table in S4 Table). Another difference between the two methods is that PIMH is less numerically sensitive than PF but has greater computation time and (Tables in S3 and S4  Tables). Moreover, we should admit that using PIMH method gave us a higher estimated variance for measurement and state errors.

Conclusion
The study aimed to estimate an index of the burden of non-communicable diseases on the population's health. This metric will help providing policymakers in Egypt with tools to monitor and forecast future NCDs' progression and model their impact on several dimensions of the societies' demographic and socio-economic development. It also aimed to contribute to the efforts of modeling non-communicable disease trajectories. This attempt is the first to be conducted in Egypt.
The study provided evidence that the burdens of the four NCDs are on the rise. They are positively influenced by their recent past, risk factors, disease prevalence, disability weight, and disease-related deaths.
Our study provided evidence that the State-Space model is a concise representation of the latent variable and its indicators and determinants. The study opens richer insights for the usage of State-Space models with Bayesian estimation approaches in public health and epidemiology instead of ordinary econometric models. Using the previous techniques facilitates the investigation of disease dynamics and simultaneous estimation of latent construct and parameters. The negative and weak relationship between the burden and utilization of health care services found in Egypt's case cannot be generalized. The model should be applied in different countries to assess the assumed relationship between the burden and utilization of health care services in the model. It also highlights the need to enhance Egypt's health registration system as most of the data on demand for health care services are not available and if available, are incomplete.