Association between Multidrug-Resistant Tuberculosis and Risk Factors in China: Applying Partial Least Squares Path Modeling

Background Multidrug-resistant tuberculosis (MDR-TB) resulting from various factors has raised serious public health concerns worldwide. Identifying the ecological risk factors associated with MDR-TB is critical to its prevention and control. This study aimed to explore the association between the development of MDR-TB and the risk factors at the group-level (ecological risk factors) in China. Methods Data on MDR-TB in 120 counties were obtained from the National Tuberculosis Information Management System, and data on risk-factor variables were extracted from the Health Statistical Yearbook, provincial databases, and the meteorological bureau of each province (municipality). Partial Least Square Path Modeling was used to detect the associations. Results The median proportion of MDR-TB in new TB cases was 3.96% (range, 0–39.39%). Six latent factors were extracted from the ecological risk factors, which explained 27.60% of the total variance overall in the prevalence of MDR-TB. Based on the results of PLS-PM, TB prevention, health resources, health services, TB treatment, TB detection, geography and climate factors were all associated with the risk of MDR-TB, but socioeconomic factors were not significant. Conclusions The development of MDR-TB was influenced by TB prevention, health resources, health services, TB treatment, TB detection, geography and climate factors. Such information may help us to establish appropriate public health intervention strategies to prevent and control MDR-TB and yield benefits to the entire public health system in China.


Introduction
Tuberculosis (TB) is still the world's leading infectious cause of adult deaths and a major public health burden in developing countries [1,2]. In 2013, there were an estimated 9.0 million (range, 8.6-9.4 million) new TB cases and 1.5 million TB deaths globally [2]. According to the 2014 World Health Organization (WHO) global TB report, China ranks as second among the world's 22 high-burden countries with a TB incidence of approximately 0.9-1.1 million, accounting for 11% of the total number of cases in 2013 [2].
The increasing prevalence of drug-resistant TB (DR-TB), especially multidrug-resistant TB (MDR-TB), is a serious threat to global TB control and has become a major public health concern in several countries [1,3]. Beginning in 1994, the WHO and the International Union Against Tuberculosis and Lung Disease (WHO/IUATLD) initiated the Global Project on Anti-TB Drug Resistance Surveillance [4]. Currently, 144 countries have been covered [5]. Since 1996, an increasing number of provinces (municipalities) in China have begun to conduct drug resistance surveillance based strictly on the WHO/IUATLD Guidelines [4,6]. China is one of 27 countries in the world with the highest burden of MDR-TB, there were estimated to be 5.7% (95% CI 4.5-7.0%) of new TB cases and 26% (95% CI 22-30%) of previously treated cases with MDR-TB according to WHO [2,4].
Identifying the risk factors contributing to the development of essential and may help in developing appropriate MDR-TB control strategies [7]. Previous studies have identified several individual-level risk factors for MDR-TB, including but not limited to age [7][8][9]10,11], sex [8], genetic susceptibility [12,13], occupation [7,14], previous treatment [7-11,15-22,8,9,10,], smoking [7,11], alcohol abuse [11], human immunodeficiency virus (HIV) infection [11], and socioeconomic status [13,15,[23][24][25]. However, as an infectious disease, MDR-TB could be due not only to individual-level factors but also to group-level factors, such as TB case notification rates, health service, health expenditure, the directly observed therapy short course and climatic factors [26][27][28]. Nevertheless, to the best of our knowledge, few specialized studies in China exploring group-level MDR-TB risk factors have been published. Therefore, we conducted the current ecological study to explore the relationships between group-level risk factors and MDR-TB in China. This research can provide clues to the etiology of the disease and knowledge that can be utilized for potential control and prevention of the disease [29].

Study area
Based on a baseline survey of drug-resistant TB, the epidemic of TB, the prevention and treatment of TB, socioeconomic conditions, geographical locations, three provinces (Henan, Zhejiang and Heilongjiang) and two municipalities (Chongqing and Tianjin) were selected as regions of study. These provinces and municipalities lie in the east-central, eastern, northern, southwestern and north-central regions of China (Fig 1). study was obtained from the Health Statistical Yearbook, the provincial databases, and the meteorological bureau of each province (or municipality). In this study, we used an ecological study design to explore the association between MDR-TB and related risk factors, with the statistical unit being the drug resistance surveillance county. The data are at county level and include no personal information.

Statistical analysis
The data were analyzed using SAS statistical software, version 9.0, (SAS Institute, Inc., Cary, NC, USA) and SmartPLS, version 2.0 M3 (SmartPLS, Hamburg, Germany). The proportion of MDR-TB in new cases was first calculated for each of the 120 counties. Spearman rank Association between Multidrug Resistant Tuberculosis and Risk Factors correlations were then computed to examine the bivariate correlations among risk factors in view of the non-normal characteristics of the data. Structural Equation Modeling (SEM) was also employed to explore the relationship between ecological risk factors and the proportion of MDR-TB. The obtained model was tested and modified using the PLS-PM approach. A Pvalue less than 0.10 was considered statistically significant.
SEM can be viewed as a combination of factor analysis and multiple regression [31] including two parts: a measurement model relating the measurement variables (MVs) to their own latent variable (LV, the unmeasurable variable) and a structural model relating some LVs to other LVs. In this study, an exploratory factor analysis (EFA) [32,33] was first used to explore the latent structure of the risk-factor variables; then, an initial SEM (Fig 2) was constructed based on the results of the EFA. However, considering that the data were obviously characterized by not only skewed distribution but also small sample size and highly correlated variables, a PLS algorithm was used to test and modify the proposed model [34,35]. PLS-PM is an iterative algorithm that offers explicit estimates of the LVs using fewer cases and fewer assumptions about the data distribution. Measurement models can be either reflective or formative, depending on the causality between the MVs and LVs (Fig 2). A reflective model is used to construct the measurement model to extract the commonly encountered correlation structure between MVs, while a formative model is an alternative when only one MV exists [35]. In reflective blocks of PLS-PM, MV loadings indicate how well the indicators reflect its LV. In formative blocks, weights/loadings allow us to determine the extent to which each indicator contributed to the formation of the constructs.
For PLS-PM, no global goodness-of-fit criterion exists because it is assumed that the variance is distribution free. Alternatively, there is a set of standard measures for PLS-PM. The reliability and validity of the model estimation were evaluated according to these measures. For a reflective measurement model, composite reliability (CR), which measures internal consistency, is considered acceptable if its value is greater than 0.7 for established constructs [34,35]. Similarly, factor loadings, reflecting the MVs' variance explained by the construct, are also considered acceptable when they are greater than 0.7 and should be eliminated when they are below 0.4 [36]. The average variance extracted (AVE) is used to evaluate the discriminant validity and should also be greater than 0.5 for all latent variables [34,35]. For a structural model, the path coefficients are evaluated first in terms of sign and significance by applying a bootstrapping test. The determination coefficient R 2 is then used to reflect the level or share of the composites' explained variance. The SmartPLS software, which support graphical modeling, was used for the statistical analysis. A bootstrapping procedure was adopted to assess statistical significance. In this study, the inner estimate of the standardized latent variable was accomplished through a path-weighting scheme, and 1000 resamples were specified in the bootstrapping procedure.

Prevalence of MDR-TB
Henan, Heilongjiang, Chongqing, Zhejiang and Tianjin started anti-TB drug resistance project in 2001,2004,2005,2003 and 2008, respectively. The proportion of new TB cases across the 120 counties with MDR ranged from 0% to 39.39%. It presented obviously skewed (S1 Fig) with a median value of 3.96% and an interquartile range (IQR) of 1.52%-8.61%. Table 1 showed detailed data on MDR-TB in five provinces (municipalities).  Ecological risk factors for MDR-TB Twenty-eight risk factors were considered (Table 2), including socioeconomic, health resource allocation, health service, TB prevention and treatment, climate and geography factors. Table 3 presented the measurements of possible ecological risk factors associated with MDR-TB, with the median and IQR. A preliminary analysis showed that there were significant correlations between various risk-factors based on Spearman rank correlation (S1 Table). For instance, significant correlations were found among x2-x8 (socioeconomic variables) and x22-x25 (climate variables), suggesting that traditional regression models (e.g., ordinary linear regression) might be inappropriate.
The latent structure of the risk factor variables Six latent factors ('Socioeconomic', 'Health resource', 'Health service', 'TB detection', 'TB treatment' and 'TB prevention') were extracted by an EFA from risk factors x1-x21 (Table 2). Together, these factors explained approximately 72.56% of the total variance for the 21 variables. Two latent factors ('Climate' and 'Geography') were extracted from risk factors x22-x28  observations: the higher the score on the 'Socioeconomic' factor, the higher the socioeconomic level; the higher the score on the 'Health resource' factor, the better the allocation of health resources; the higher the score on the 'Health service' factor, the lower the health service level; the higher the score on the 'TB detection' factor, the better the TB detection work; the higher the score on the 'TB treatment' factor, the higher the TB cure rate; the higher the score on the 'TB prevention' factor, the better the TB prevention; the higher the score on the 'Climate' factor, the more humid the climate; and the higher the score on the 'Geography' factor, the higher the elevation.

Complex relationship between the proportion of MDR-TB and risk factors
Based on the results of the EFA, we constructed an initial model of the proportion of MDR-TB with risk factors (Fig 2). Based on the iterative PLS-PM procedure and the practical meanings of variables, four MVs (x13, x15, x20, and x26) were removed. The modified model depicted in Fig 3 included the bootstrapping test results (P-values) for the loadings of the measurement model and path coefficients of the structure model. Table 4 presents the evaluation of the measurement model, which showed that all factor loadings were higher than 0.4, the CR for each LV was above or close to 0.7, and the AVE was always greater than 0.5. Therefore, the measurement models were considered acceptable for evaluation of the structural model. The R 2 for the model was 0.276, indicating that 27.6% of the total variance in the proportion of MDR-TB was explained by the risk-factor variables.
In the modified PLS-PM (Fig 3), the remaining variables had a substantial relationship with their respective dependent variables. All of the path coefficients, interpreted as standardized beta coefficients, were statistically significant (P0.10) except for the path from 'Socioeconomic' to 'MDR.' Nevertheless, this path met the evaluation criteria and was retained. The 'TB prevention' factor had the largest effect, with a standardized path coefficient of -0.327 (i.e., there was a negative relationship between the 'TB prevention' and 'MDR' factors). Additionally, the 'TB treatment' and 'Climate' factors both had negative relationships with 'MDR' with standard path coefficients of -0.229 and -0.324, respectively. Finally, the 'Health Resources', 'Health Services', 'TB Detection' and 'Geography' factors all had positive relationships with the 'MDR' factor, with standardized path coefficients of 0.153, 0.236, 0.179 and 0.216, respectively.
Eight latent factors ('Socioeconomic', 'Health resource', 'Health service', 'TB detection', 'TB treatment', 'TB prevention', 'Climate' and 'Geography') were extracted and included in the modified PLS-PM, which explained 27.6% of the total variance in the prevalence of MDR-TB. All of these risk factors exerted a significant influence on the development of MDR-TB with the exception of 'Socioeconomic'. Based on our data, socioeconomic factors were not found to be associated with the development of MDR-TB. However, some studies of individual-level risk factors for MDR-TB had reported that socioeconomic factors were related to MDR-TB [15,25,24,]. This apparent discrepancy may be a result of the different levels of analysis (individual-level or group-level) and different populations. In some ecological studies for TB risk factors, socioeconomic factors were also found to be correlated with the development of TB [37,38]. 'TB prevention' and 'TB treatment' both exhibited a negative relationship with 'MDR' in this study; i.e., the earlier the directly observed therapy short course implementation and the higher the TB cure rate, the lower the prevalence of MDR-TB. This result is similar to the finding from a national survey of drug-resistant TB in China that underscored the need for intervention that will increase continuity of treatment and reduce the rate of treatment default [39]. In contrast, 'Health resource', 'Health service' and 'TB detection' each had a positive relationship with the 'MDR' factor, indicating that the more suitable the allocation of health resources and the higher the health service level, the higher the prevalence of MDR-TB. This finding can be attributed, at least in part, to the elevated discovery rate of MDR-TB, presumably as a function of better health resources and health services. Note that the 'Health service' score runs contrary to the others. The reason for this finding is that the latent factor 'Health service' had a negative correlation with the child immunization rate and had a positive correlation with infant mortality, so higher scores on the 'Health service' factor implied higher infant mortality and a lower child immunization rate, i.e., a lower level of health services. 'Climate' had a negative relationship with 'MDR', i.e., the more humid the climate, the lower the prevalence of MDR-TB. The 'Geography' factor had a positive relationship with 'MDR', i.e., the higher the elevation, the higher the prevalence of MDR-TB. A previous study based on "Anti-tuberculosis drug resistance in the world: Report no. 4" also showed that some ecological factors associated with DR-TB, such as health expenditure, humidity and temperature (with a negative relationship) and TB epidemic, health expenditure and the directly observed therapy short course (with a positive effect) [27]. Another study of MDR-TB trends based on surveillance and population-representative survey data collected worldwide by the WHO found that better surveillance indicators and a higher GDP per capita were associated with declining MDR-TB, whereas a higher existing absolute burden of MDR-TB was associated with an increasing trend [40]. Therefore, a reasonable MDR-TB monitoring plan, as well as prevention and control strategies, should be formulated based on the relationship between the prevalence of MDR-TB and the related risk factors, e.g., increasing health resources, improving health service levels and DOTS implementation.
PLS-PM was used to construct an SEM for the relationship between ecological risk factors and MDR-TB. The variables typically exhibited not only multicollinearity but also non-normality and a small sample size. As is well known, traditional multivariate regression may perform poorly on datasets with multiple independent variables showing multicollinearity. Moreover, although some machine learning methods such as the support vector machine method are not heavily affected by multicollinearity, it appears to be unable to synthesize the manifest variables to obtain a latent score. PLS-PM is a "soft modeling" approach, requiring fewer distributional assumptions, and the variables studied can be numerical, ordinal, or nominal; hence, the procedure requires no normality assumptions [34,35]. This is a very appealing feature for the analysis of the ecological risk factors involved in this study. Thus, taking the anti-TB drug resistance monitoring county as the unit of analysis, we employed an EFA to identify the LVs and explore the relationship between ecological risk factors and MDR-TB prevalence levels using PLS-PM.
This study had several limitations. First, we only focused on MDR-TB in new cases rather than retreatment cases; the reason was that few retreatment cases were detected in each province. Thus, we did not cover acquired and transmitted resistance MDR-TB cases. Second, information on HIV infection status for TB cases was not collected here because TB patients in China are not routinely tested for HIV.
In conclusion, this study expands the current knowledge of the ecological risk factors for MDR-TB in China. The results showed that TB prevention, health resources, health services, TB treatment, TB detection, geography and climate factors were associated with the occurrence of multidrug resistance. Despite the limitations mentioned above, understanding the risk factors associated with MDR-TB can provide context for future studies that will yield benefits to the entire public health system in China.