Skip to main content
Advertisement
  • Loading metrics

Leveraging machine learning to evaluate factors influencing vitamin D insufficiency in SLE patients: A case study from southern Bangladesh

Abstract

Vitamin D insufficiency appears to be prevalent in SLE patients. Multiple factors potentially contribute to lower vitamin D levels, including limited sun exposure, the use of sunscreen, darker skin complexion, aging, obesity, specific medical conditions, and certain medications. The study aims to assess the risk factors associated with low vitamin D levels in SLE patients in the southern part of Bangladesh, a region noted for a high prevalence of SLE. The research additionally investigates the possible correlation between vitamin D and the SLEDAI score, seeking to understand the potential benefits of vitamin D in enhancing disease outcomes for SLE patients. The study incorporates a dataset consisting of 50 patients from the southern part of Bangladesh and evaluates their clinical and demographic data. An initial exploratory data analysis is conducted to gain insights into the data, which includes calculating means and standard deviations, performing correlation analysis, and generating heat maps. Relevant inferential statistical tests, such as the Student’s t-test, are also employed. In the machine learning part of the analysis, this study utilizes supervised learning algorithms, specifically Linear Regression (LR) and Random Forest (RF). To optimize the hyperparameters of the RF model and mitigate the risk of overfitting given the small dataset, a 3-Fold cross-validation strategy is implemented. The study also calculates bootstrapped confidence intervals to provide robust uncertainty estimates and further validate the approach. A comprehensive feature importance analysis is carried out using RF feature importance, permutation-based feature importance, and SHAP values. The LR model yields an RMSE of 4.83 (CI: 2.70, 6.76) and MAE of 3.86 (CI: 2.06, 5.86), whereas the RF model achieves better results, with an RMSE of 2.98 (CI: 2.16, 3.76) and MAE of 2.68 (CI: 1.83,3.52). Both models identify Hb, CRP, ESR, and age as significant contributors to vitamin D level predictions. Despite the lack of a significant association between SLEDAI and vitamin D in the statistical analysis, the machine learning models suggest a potential nonlinear dependency of vitamin D on SLEDAI. These findings highlight the importance of these factors in managing vitamin D levels in SLE patients. The study concludes that there is a high prevalence of vitamin D insufficiency in SLE patients. Although a direct linear correlation between the SLEDAI score and vitamin D levels is not observed, machine learning models suggest the possibility of a nonlinear relationship. Furthermore, factors such as Hb, CRP, ESR, and age are identified as more significant in predicting vitamin D levels. Thus, the study suggests that monitoring these factors may be advantageous in managing vitamin D levels in SLE patients. Given the immunological nature of SLE, the potential role of vitamin D in SLE disease activity could be substantial. Therefore, it underscores the need for further large-scale studies to corroborate this hypothesis.

Introduction

SLE is characterized by inflammation, pain, and damage to organs and tissues. Symptoms of SLE can vary greatly from person to person and can include fatigue, joint pain, and stiffness, skin rash (typically on the face or scalp), fever, hair loss, mouth ulcers, anemia, swelling in the legs, sun sensitivity, chest pain, enlarged lymph nodes. SLE can also affect internal organs and can lead to serious complications if left untreated. Treatment for SLE is tailored to each individual patient based on the severity of their symptoms and the organs affected. The relationship between vitamin D and SLE was first described in 1995, and it appears that vitamin D receptors are expressed on immune cells and may play an important role in regulating the immune system [1]. Moreover, individuals with SLE often have low levels of vitamin D, and introduce of vitamin D supplements may improve the health condition of SLE patients [2].

SLE is a prolonged autoimmune illness that can affect numerous organ systems in the body. The exact reason for SLE is not fully understood, but it is believed to be a combination of genetic, ecological, and hormonal issues. In recent years, it is seen that SLE patients with low vitamin D have a poorer prognosis. Several research has been conducted to investigate the association between vitamin D and SLEDAI, but the results have been inconsistent, with some studies showing a correlation and others finding no association. It is to be noted that vitamin D exists in two major forms: vitamin D2 and vitamin D3 and both are biologically inactive and need to be hydroxylated in the liver and kidney to make them active known as calcitriol (1,25-dihydroxyvitamin D3). Calcitriol binds to vitamin D receptors in the body and then modulates gene expression and cell differentiation [2].

Low vitamin D is a widespread public health issue and about 1 billion people globally are affected by that. The incidence of lower vitamin D varies widely depending on factors such as geographic location, skin pigmentation, age, and lifestyle. In 2007, a study reported that 80% of South Asian people were vitamin D insufficient, with 40% having severely low vitamin D level [3]. Studies have shown that the pervasiveness of vitamin D insufficiency in North India ranges between 30–90%. It is found that 91% and 84% of young girls, and pregnant women were vitamin D insufficient. Low vitamin D in the North Indian population is also due to the high pigmentation of the skin. And such skin faces difficulty to create vitamin D from sunlight [4].

Bangladesh is a tropical country located at 24 degrees north latitude, which means that it receives abundant sunlight year-round. However, despite the abundance of sunlight in Bangladesh, several research found that the occurrence of low vitamin D is excessive among Bangladeshi women and it was almost 83%. Such high occurrence can be attributed to a number of factors, such as cultural practices that restrict sun exposure, particularly for women, poor nutrition, and reduced vitamin D intake in the diet. The prevalence of vitamin D insufficiency is particularly high among low-income lactating women [5]. Islam & Amin [6] found that 38% of high-income women and 50% of low-income women had hypovitaminosis D (vitamin D levels <15.5 ng/mL) Furthermore, the study found that about 39% of Bangladeshi young university girls and 30% of veiled women had low vitamin D levels. These findings suggest that cultural practices and poor nutrition are major contributing factors to lower vitamin D in women in Bangladesh. Some important research related to vitamin D and SLE patients is presented in Table 1.

thumbnail
Table 1. Details of participants, cut-off value for vitamin D, number of SLE patients.

https://doi.org/10.1371/journal.pgph.0002475.t001

Related research work

This research addresses two prominent issues. The primary motivation is the high prevalence of vitamin D insufficiency in SLE patients, especially in the southern part of Bangladesh. Understanding the contributing factors to this condition and its potential impact on disease activity is crucial for patient care. In this study, while we utilize established algorithms, Linear Regression, and Random Forest, the novelty lies in their application to the specific problem of vitamin D insufficiency in SLE patients and the combination of these algorithms with an extensive feature importance analysis. Traditionally, studies investigating vitamin D insufficiency in SLE patients have relied on conventional statistical methods. However, our approach leverages the power of machine learning to uncover potentially intricate patterns and relationships in the data that could not be detected by standard techniques. For instance, we performed an in-depth feature importance analysis using the Random Forest’s intrinsic feature importance, permutation-based feature importance, and SHAP values, which helps to understand the relative contribution of each feature to the model’s predictions. This is a significant advancement over traditional statistical methods and can provide much richer insights. By integrating these machine learning models with traditional statistical analyses, our research aims to provide a comprehensive understanding of the dynamics influencing vitamin D levels in SLE patients. This innovative approach addresses the limitations of previous methods and enhances our understanding of vitamin D insufficiency in SLE patients. The findings from this study can potentially contribute to better clinical management strategies for SLE patients and drive further investigations in this field.

Materials and methods

Study sites, study design, and duration

Chattogram, also known as Chittagong, is the second-largest city in Bangladesh with a population of around 7.62 million and a total area of 5282.92 square km as shown in Fig 1. The city is home to 22 government hospitals, with Chattogram Medical College Hospital (CMCH) being one of the largest and most well-equipped. A case-control study was conducted at CMCH from December 1, 2017 to December 31, 2018, during which both qualitative and quantitative data were collected.

Ethical approval, sample size, inclusion, and exclusion criteria and sampling

Written permission is collected from the Research Technical Committee, Ethical Committee, and Peer Review Committee of Chattogram Medical College in order to conduct this research. The research reference number and date of approval are provided as M/PG/2017/366 with date 27/12/2017. The study recruited a patient group of 50 patients who came to the hospital during the study period, using a specified case definition as described by the American College of Rheumatology (ACR, 1997) criteria for SLE classification and all patients who gave written consent to participate in the study and duration of participation is Dec 2017 to Dec 2018.

The small sample size is considered to be a result of time constraints and limited funding resources, which made it difficult to conduct a larger study. The study had established a set of exclusion criteria to ensure that the results obtained were as accurate and reliable as possible. Patients with certain pre-existing conditions such as End-Stage Renal Disease, Diabetes Mellitus, severe sepsis, SLE with overlap, osteoporosis, osteomalacia, first-degree relatives of SLE or any connective tissue disease, and individuals with psychiatric conditions were excluded from the study as these conditions could potentially affect the results and skew the findings. Furthermore, bed-ridden patients, patients receiving bisphosphonates, pregnant women and lactating mothers, and patients who did not give their consent were also excluded from the study to minimize any potential risks and ensure the safety and well-being of the participants.

Laboratory data

The study performed CBC, ESR, CRP, ANA, anti-dsDNA, vitamin D level or 25(OH)D, 24-hour urinary total protein, and renal Lupus nephritis patients. The method used to measure the inactive form of vitamin D3, 25(OH)D status, was the electrochemiluminescence immunoassay method in the serum of SLE patients. Four milliliters of venous blood were obtained from each patient maintaining eight hours of fasting. Also, blood, urine, and tissue samples were collected from SLE patients for clinical investigation and the duration of the collection is Dec 2017 to Dec 2018. We also had access to these patients’ information until 3 July 2020.

Data collection, collation, and analysis

The study collected qualitative data from the patients through face-to-face interviews in the Medicine, Rheumatology, Nephrology, and Dermatology wards of the CMC hospital. A pre-tested questionnaire was used for this study, and all demographic data, such as age, gender, educational status, geographic area, and monthly family income were recorded. Clinical data, including drugs, duration of sun exposure (hours/day), and SLEDAI were also recorded. Laboratory testing data of the patients were also collected from the sample testing laboratory of the hospital. The following link provides free access to the data. https://gitlab.com/Jishan/sle-data/-/tree/main/Data.

Summary statistics

In this study, the vitamin D levels of SLE patients were described in relation to their clinical presentations, using mean and standard deviation. Laboratory-reported data for vitamin D were categorized according to the following classification: normal (N) levels being ≥ 30 ng/ml, deficient (D) levels being < 10 ng/ml, and insufficient (I) levels falling between 10 and 30 ng/ml [7, 10, 14, 21, 23, 24]. The vitamin D levels were then assessed against various demographic characteristics of the SLE patients. All statistical analyses were conducted using the open-source software JASP (Link: https://jasp-stats.org/). By examining the distribution of vitamin D levels across different demographic and clinical factors, this study offers valuable insights into the health concerns faced by underprivileged and marginalized SLE patients in Bangladesh. This information, in conjunction with the findings from the Random Forest regression model and SHAP values, can contribute to a better understanding of the factors that impact vitamin D levels among SLE patients and inform targeted interventions to address their unique health needs.

Machine learning (ML) approach

Traditional regression analysis relies on assumptions such as linearity and additivity of the relationship between response and explanatory variables, statistical independence, homoscedasticity (constant variance), and normality of the errors [30]. Moreover, the presence of non-linearities and interactions can make it difficult to design efficient regression models. In this study, machine learning approaches are employed to understand the relationship between vitamin D and various factors related to SLE disease. Random forests (RF) regression model, a widely used ML approach, is capable of handling multicollinearity, feature interactions, and non-linearities [31]. RF regression method, inspired by the high performance of ML approaches, uses bootstrap training data and randomness of the explanatory variables to generate an uncorrelated forest of decision trees. Importantly, the RF model is robust to overfitting and computationally less expensive. The number of trees in the forest and the size of the feature subset to consider while searching for the best split are two crucial hyperparameters of the RF model [32].

In this study, the optimal hyperparameters are obtained using a three-fold Cross Validation (CV) approach. A linear regression (LR) model is also included as a baseline model for comparison. Models are evaluated using the root mean squared error (RMSE) and mean absolute error (MAE) regression evaluation metrics. Both RMSE and MAE scores are reported with 95% confidence intervals (CI) [33] by leveraging the bootstrap resampling technique. The test data is utilized to perform the resampling technique with replacement for 50,000 times. The 95% CI interval is constructed on the [2.5, 97.5] percentile boundaries. Impurity-based feature importance, permutation-based feature importance, Shapley values-based SHAP feature importance, and partial dependence plots are also incorporated in this study to understand the individual feature contributions to vitamin D levels [34].

Permutation importance is a reliable feature importance measure, as it is model-agnostic and takes into account the feature interaction effects. In contrast, SHAP feature importance provides a more comprehensive explanation of an RF regression model’s output compared to impurity-based and permutation-based feature importance approaches. SHAP assigns a contribution value to each feature for each prediction made by the RF regression model, allowing for an examination of how each feature individually contributes to the model’s output and how the contribution of each feature depends on the values of the other features [34]. SHAP has been found to be a more effective approach for explaining predictions compared to traditional linear regression effect sizes, particularly for more complex models such as the RF regression model. This is because SHAP considers the interaction between features and can provide more accurate and comprehensive explanations of the model’s output. In this study, SHAP waterfall plots are presented to understand the influence of each feature on the predictions using Shapley values. By utilizing machine learning approaches, specifically the Random Forest regression model and SHAP feature importance, this study seeks to uncover a deeper understanding of the factors that contribute to low vitamin D levels in SLE patients. This comprehensive approach provides valuable insights into the complex relationships between vitamin D and various aspects of SLE disease, ultimately contributing to a better understanding of the health concerns facing underprivileged and marginalized SLE patients in Bangladesh.

Experimental setup and model configuration

ML computations are performed on a multi-core machine with CPU: 11th Gen Intel Core @ 2.30GHz (8 cores), RAM: 32 GB RAM, and OS: Windows 11 Home.

The following Python implementations are presented in this paper:

  • LR model implemented by the sci-kit learn [35] module in Python
  • RF model implemented by the sci-kit learn module [35] in Python

To implement ML models, the entire dataset is split into two parts, 80% data for training and 20% for testing. GridSearchCV from the sci-kit learn module [35] is used to optimize the hyperparameters of the RF regression model using 3-fold CV. The values of the complexity parameter, the maximum depth of the tree, and the number of trees in the forest are found as 0.00001, 2, and 50 respectively as best hyperparameters in GridSearchCV for the RF model.

In this study, overfitting was mitigated through several strategies:

Cross-validation: A 3-Fold cross-validation strategy was employed, which helps to ensure that the model generalizes well to unseen data. In cross-validation, the dataset is divided into ’k’ subsets or ’folds’. The model is then trained on k-1 folds and tested on the remaining fold. This process is repeated k times, with a different fold used as the testing set each time. This approach provides a robust estimate of the model’s predictive performance.

Hyperparameter tuning: Optimal values for the model’s hyperparameters were selected through a grid search approach, which finds the combination of hyperparameters that result in the best cross-validation performance.

Feature importance analysis: Through feature importance analysis methods like permutation feature importance and SHAP values, irrelevant features, or ’noise’, were identified. Models can overfit if they learn patterns from noise, so this analysis helped to ensure that the models focused on the genuine, informative features.

Ensemble method: The Random Forest algorithm was used, which is an ensemble of decision trees. By averaging the predictions of multiple decision trees, the Random Forest model effectively reduces the risk of overfitting.

Lastly, we validated our approach by computing bootstrapped confidence intervals. This provided robust uncertainty estimates that further ensured the prevention of overfitting.

These steps helped to ensure that our model had good predictive performance, not just on the training data, but also on unseen data.

A comparison table that highlights the strengths and weaknesses of our method compared to previous methods is included as Table 2.

The Table 2 illustrates that while traditional methods like correlation and t-tests are simple and widely accepted, they might not capture complex relationships in the data. On the other hand, our proposed methods, Linear Regression and Random Forest, are capable of modeling such complex relationships and providing insights into the relative importance of features. However, they also come with their own challenges, such as the need for careful tuning and the interpretability of the results. We believe that the combined use of traditional statistical tests and machine learning models in our study provides a more comprehensive understanding of the factors influencing vitamin D levels in SLE patients.

Results

In this study, 84% and 16% of the patients were female and male respectively. Also, 52%, 20%, and 10% of the patients were treated with high-dose Prednisolone, high-dose Prednisolone with methylprednisolone, low-dose Prednisolone respectively. And only 10% were not taking any steroidal medications. The study assigned values of 1 for male, 0 for female, 0 for body mass index (BMI) level 24 or less, 1 for BMI level 25 or over, 1 for sun exposure more than one hour, 0 for sun exposure less than one hour, 1 for positive renal involvement, and 0 for negative renal involvement. Table 3 provides detailed descriptive data.

The average age of participants is 25.26 years, with a standard deviation of 9.80 years, indicating a moderate spread around the mean. The youngest participant is 13 years old, while the oldest is 50 years old. The sex distribution shows a higher proportion of females in the sample. Participants have an average hemoglobin (HB) level of 10.12 g/dL, with a standard deviation of 1.03 g/dL, and their erythrocyte sedimentation rate (ESR) has a mean value of 61.68 mm/h and a relatively large standard deviation of 30.11 mm/h. The C-reactive protein (CRP) levels show an average value of 11.71 mg/L and a high standard deviation of 17.11 mg/L, suggesting a wide range of values. The Fatigue Severity Scale (FSS) scores have a mean value of 5.50 and a standard deviation of 1.03. The Systemic Lupus Erythematosus Disease Activity Index (SLEDAI) scores exhibit an average value of 17.26 with a standard deviation of 13.31. Finally, the mean body mass index (BMI) is 0.20, with a standard deviation of 0.40, and the average vitamin D level is 19.56, with a standard deviation of 5.34. Overall, these descriptive statistics provide insights into the distribution and central tendencies of the data, which can be valuable for further analyses.

Fig 2 illustrates that the distribution of the response variable, Vitamin D, does not conform to a normal distribution. Nevertheless, it is still possible to model it using the linear regression (LR) model, as linear regression analysis does not necessitate normality for either the explanatory variables or the target variable. Moreover, a non-parametric random forest (RF) model does not require stringent assumptions.

thumbnail
Fig 2. Distribution of target variable (vitamin D levels).

https://doi.org/10.1371/journal.pgph.0002475.g002

Adopting an exploratory approach serves as an excellent initial step in identifying potential relationships between variables. Nonparametric methods, such as Spearman’s rank-order correlation (Fig 3), are useful for analyzing nonlinear relationships. It is observed that some variables, including Age, Sex, BMI, Sun, RI, FSS, and CRP, exhibit an insignificant correlation with Vitamin D, indicating little or no association between these variables and Vitamin D levels. However, a positive correlation exists between hemoglobin (Hb) and Vitamin D, suggesting that individuals with higher Vitamin D levels also tend to have elevated hemoglobin levels. This may be attributed to various factors, such as the role of Vitamin D in regulating iron metabolism, which is crucial for hemoglobin production. Furthermore, a strong negative relationship is observed between SLEDAI and Vitamin D levels, indicating that individuals with higher SLEDAI scores typically have lower Vitamin D levels.

thumbnail
Fig 3. Spearman’s correlation map between the variables.

https://doi.org/10.1371/journal.pgph.0002475.g003

A heat map serves as an effective tool for visualizing the relationship between two variables. The heat map depicted in Fig 4 demonstrates the connection between SLEDAI and Vitamin D. Patients with very high SLEDAI scores (ranging from 20 to 30) tend to exhibit Vitamin D insufficiency, while patients with low or mild SLEDAI scores (ranging from 1 to 5) generally have Vitamin D levels close to normal. This relationship aligns with the findings of Begum et al. [29], suggesting that patients with low SLEDAI scores are in the recovery stage and maintain normal Vitamin D levels.

thumbnail
Fig 4. Heat map presentation of vitamin D level & SLEDAI.

https://doi.org/10.1371/journal.pgph.0002475.g004

Table 4 was obtained by conducting independent two-sample t-tests to compare the mean vitamin D levels among SLE patients across different groups based on gender, sun protection usage, and sun exposure duration. The t-tests were performed to assess if there are any significant differences in the means between the groups. The resulting p-values, along with 95% confidence intervals for the differences in means, were reported to determine the presence of significant associations between the factors and vitamin D levels. These findings were then interpreted in conjunction with the correlation and heat map analyses to gain a comprehensive understanding of the relationships between the variables under study.

thumbnail
Table 4. Vitamin D level of SLE patients considering gender, residence, sun protection, and sun-exposure.

https://doi.org/10.1371/journal.pgph.0002475.t004

In Table 4, an analysis of the mean vitamin D levels in SLE patients is presented, considering factors such as gender, sun protection usage, and sun exposure duration. The findings indicate no significant association between gender and vitamin D levels (p = 0.7056), which contrasts with the results of Yan et al. [36], who reported a substantial effect of gender on vitamin D levels. This discrepancy may be attributed to differences in the study populations, methodologies, or sample sizes, and further investigation is warranted. In addition, the results show no significant relationship between the use of sun protection measures (such as sunscreen or umbrellas) and vitamin D levels in SLE patients (p = 0.6119). This finding is consistent with the observations from the heat map and correlation analysis, where factors such as age, sex, BMI, sun exposure, and sun protection measures exhibited weak or no correlation with vitamin D levels. However, there is a significant relationship (p = 0.0089) between vitamin D levels and the duration of sun exposure, where SLE patients who receive more than an hour of sun exposure have higher vitamin D levels compared to those with less sun exposure. This finding suggests that longer sun exposure may improve vitamin D levels in SLE patients, despite recommendations for photosensitive SLE patients to avoid sun exposure [37]. Magro et al. [38] also mentioned low vitamin D levels were common due to avoiding sun exposure and the use of sun protective measures. The heat map and correlation analysis further support this observation, as they reveal a positive association between sun exposure and vitamin D levels, particularly in individuals with low SLEDAI scores who are in the recovery stage.

To further substantiate our findings, we employed machine learning models and compared the performance of the Random Forest (RF) model with the baseline Linear Regression (LR) model using bootstrapped test data (Table 5).

In order to validate the findings from the descriptive statistics, t-test, and correlation analysis, advanced machine learning models were employed, comparing the performance of the Random Forest (RF) model with the baseline Linear Regression (LR) model using bootstrapped test data (Table 5). As demonstrated through the previously mentioned descriptive statistics, t-test, and correlation results, variables such as Hb, SLEDAI, and sun exposure display significant relationships with vitamin D levels, while other variables like age, sex, BMI, sun protection measures, and CRP exhibit weak or negligible correlation. These insights offer valuable information regarding the potential predictors of vitamin D levels in SLE patients and can be further investigated using more sophisticated modeling techniques, such as the Random Forest (RF) and Linear Regression (LR) models. Building upon the insights obtained from the descriptive statistics, t-test, and correlation analysis, the Random Forest (RF) model was employed to further investigate the relationships between the variables and the vitamin D levels in SLE patients.

The performance of the RF model was compared to the LR model using bootstrapped test data, and the outcomes were reported in Table 5 based on the Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) measures. Table 5 reveals that the RF model outperformed the LR model in terms of both RMSE and MAE scores. Specifically, the RF model achieved the lowest RMSE score of 2.98 while maintaining a narrow CI, and the RF model also produced the lowest MAE score of 2.68 with a narrow CI. The superior performance of the RF model is likely due to its ability to capture complex, nonlinear relationships between the predictors and the target variable, as evidenced by the earlier findings from the descriptive statistics, t-test, and correlation analysis. However, it is essential to note that feature importance in the RF model is sensitive to cardinality. To investigate potential issues with cardinality further, a non-informative variable with random numbers is also added to compute the feature importance. This approach helps ensure a more robust and reliable assessment of the variable importance and better understanding of their contribution to the RF model’s predictive performance.

The findings from the RF model’s feature importance analysis, as shown in Fig 5A and 5B, are consistent with the earlier results from Table 3 (descriptive statistics), Table 4 (t-test results), and Fig 3 (Spearman correlation analysis). In these previous analyses, it was identified that SLEDAI, Hb, CRP, ESR, and age exhibit significant relationships with vitamin D levels. The RF model’s feature importance analysis reinforces these findings, as it also reveals that these five features are the most contributing factors towards the predictions of vitamin D levels, even in the presence of additional non-informative features.

thumbnail
Fig 5.

Variable importance of the RF regression model on the training data: A. variable importance without non-informative feature, B. variable importance with non-informative feature.

https://doi.org/10.1371/journal.pgph.0002475.g005

Figs 5B, 6, and 7 further support these conclusions, as the contributions of BMI, FSS, Sun, RI, and gender are no better than the random noise. The five most contributing features (SLEDAI, Hb, CRP, ESR, and age) remain consistent across various feature importance approaches, including permutation-based feature importance and Shapley values-based feature importance.

thumbnail
Fig 6. Permutation-based feature importance of the RF regression model on the testing data.

https://doi.org/10.1371/journal.pgph.0002475.g006

thumbnail
Fig 7. Shapley values-based feature importance on the testing data.

https://doi.org/10.1371/journal.pgph.0002475.g007

The local explanation of the vitamin D level predictions in Fig 7 aligns with the feature importance results and shows how individual feature contributions are aggregated to form the final predictions. The single-variable partial dependence (PDP) plots in Fig 8 also confirm the importance of these features, as they demonstrate the relationship between the individual features and the predicted outcome of the RF regression model. The PDP plots are consistent with the variable importance, permutation importance, and SHAP importance measures, further corroborating the findings from Tables 3, 4, and Fig 3.

thumbnail
Fig 8.

Local explanation of the Vitamin D level predictions in two test observations: A. Predicted vitamin D level in test observation #1, B. Predicted vitamin D level in test observation #7.

https://doi.org/10.1371/journal.pgph.0002475.g008

Fig 6 displays the most influential features in descending order based on the magnitude of SHAP values across the test dataset. The blue and red dots situated to the right of zero represent a positive influence on vitamin D levels, while dots to the left signify a negative impact on the predictions. Local explanations of the predictions are provided in Fig 8, illustrating the contributions of individual features. In Fig 7, the final predictions are calculated by aggregating the positive (red) and negative (blue) contributions of each feature from the base expected value. Fig 8A demonstrates that the test observation #1’s final prediction is 25.967, resulting from the accumulation of individual feature contributions with a base value of 19.569. Similarly, for test observation #7, the RF model predicts vitamin D levels of 16.362, as shown in Fig 8A. These observations indicate that the individual feature contributions align with the feature importance plot presented in Fig 6.

This study employs single-variable partial dependence plots (PDP) to visualize the relationship between individual features and the predicted outcomes of the RF regression model. Fig 9 presents PDP plots, incorporating ten Individual Conditional Expectation (ICE) curves and a PD line. In the top-left PDP plot, the dashed orange line indicates minimal dependence of vitamin D levels on age, while the top-right plot reveals a nonlinear and monotonically increasing partial dependence of vitamin D levels on Hb values greater than 10. Similarly, a weak nonlinear partial dependence is observed for the CRP variable in the bottom-left plot. In contrast, a strong partial dependence is evident in the bottom-right plot, demonstrating that the partial dependence of vitamin D levels on SLEDAI is nonlinear and declines sharply for values exceeding 5. Consequently, the PDP plots align with the variable importance, permutation importance, and SHAP importance measures. Nonetheless, minor variations in ICE curves (light blue lines) are noticeable in all PDP plots, except for the bottom-right PDP plot. These findings highlight the importance of considering SLEDAI, Hb, CRP, ESR, and age as crucial factors in the clinical management of SLE patients and the need for further investigation into their potential therapeutic implications.

thumbnail
Fig 9. Partial dependence of vaccination uptake on the influential variables for the training data.

https://doi.org/10.1371/journal.pgph.0002475.g009

Discussion

This research aimed to find how vitamin D level correlates with SLEDAI. No relationship association between SLEDAI and the level of vitamin D is observed using ML. This finding is similar to the findings of Kim et al. [15], Muñoz-Ortego et al. [19], Abbasi et al. [39], and Petri et al. [40]. According to Gado et al. [23], low vitamin D is common for SLE patients and observed a strong correlation between the level of vitamin D with ESR, and FSS. In the present research considering the ML approach, it is found that lower levels of vitamin D are strongly connected to ESR. However, no relationship between vitamin D and FSS is observed that is in line with Stockton et al. [41].

SLE patients have low vitamin D levels having an average level of 19.54 ± 5.29 ng/ml. This suggests that SLE patients have vitamin D insufficiency. This finding is similar to that of other researchers in case-control studies, such as those conducted by Mok and Lau [42], Mandal et al. [21], and Farid et al. [22]. It is important to note that lower levels of vitamin D are a common issue in many populations, not just in SLE patients, and it is important for all individuals to monitor and manage their vitamin D levels. The study conducted by Abaza et al. [43] on the Egyptian population found similar results, with the control group having optimal vitamin D levels of 79 ± 28.7 ng/ml, while the SLE patients were vitamin D insufficient with a mean level of 17.6 ± 6.9 ng/ml. This further supports the idea that lower levels of vitamin D may be a risk factor for poor SLE disease outcomes.

It is observed that there is a positive relationship between age and vitamin D, which is consistent with the findings of Khazaei et al. [44]. However, it’s worth noting that a study conducted by Arabi et al. [45] on Lebanese people observed that older patients had lower levels of vitamin D compared to younger patients, which may refer to other factors influencing vitamin D levels in SLE patients, such as disease activity or medications. A statistically strong association between BMI and vitamin D is observed, as supported by research by El-Sherbiny et al. [46]. Similarly, studies have also found a strong association between renal involvement in SLE patients and vitamin D levels, as supported by Park et al. [47].

It is seen that lower levels of vitamin D are common in rural slum areas due to various factors such as pollution, crowding, lack of education, less sun exposure, inadequate intake of a balanced diet and vitamin D supplements, and low socioeconomic conditions as reported by Hossain et al. [48]. Additionally, a study by Karimzadeh et al. [25] found that vitamin D supplements did not significantly improve the level of vitamin D, which may be due to factors such as disease activity, poor absorption, or other underlying health conditions. These findings emphasize the importance of identifying and addressing the underlying causes of lower levels of vitamin D, as well as the need for more research to determine the most effective ways to improve the level of vitamin D in this population.

Based on the information provided in Table 6 and the references cited, it can be concluded that most SLE patients have vitamin D insufficiency. The mean value of vitamin D for SLE patients in our study is similar to the reference values found in other studies, such as those by Kamen et al. [7], Ruiz-Irastorza et al. [8], Ben-Zvi et al. [11], García-Carrasco et al. [24], and Karimzadeh et al. [25], which further supports the idea that lower levels of vitamin D are common in SLE patients.

Conclusion

The study finds a high prevalence of vitamin D insufficiency, particularly among females, suggesting a crucial role of vitamin D in SLE patients, who are predominantly female. Most SLE patients in this study have low vitamin D levels associated with high SLEDAI scores. Moreover, the study indicates that low levels of Hb, CRP, and ESR may contribute to these low vitamin D levels. The causal relationship between vitamin D and SLE outcomes remains unclear, but vitamin D supplementation may improve prognosis. Further studies are needed to confirm these findings, and any vitamin D supplementation should be undertaken under the guidance of a healthcare provider due to potential side effects and interactions.

Limitations

This study does not take into account corticosteroid therapy, complement levels, anti-dsDNA antibody titers, concomitant rheumatologic therapies, and ongoing vitamin D or calcium supplementation. However, future research will be conducted to include these factors.

Supporting information

S2 File. Questionnaire and permission letter.

https://doi.org/10.1371/journal.pgph.0002475.s002

(DOCX)

References

  1. 1. Sakthiswary R, Raymond AA. The clinical significance of vitamin D in systemic Lupus Erythematosus: A systematic review. PLOS ONE, 2013; 8(1): e55275. pmid:23383135
  2. 2. Yap KS, Northcott M, Hoi AB, Morand EF, Nikpour M. Association of low vitamin D with high disease activity in an Australian systemic lupus erythematosus cohort, Lupus Science and Medicine, 2015; 2(1): e000064. pmid:25893106
  3. 3. Hollick MF. Vitamin D deficiency. The New England Journal of Medicine, 2007; 357: 266–281. pmid:17634462
  4. 4. Aparna D, Nazmul A. Vitamin D deficiency in South Asian populations: A serious emerging problem, Journal of Enam Medical College, 2013; 3(2): 63–66.
  5. 5. Mahmood S, Rahman M, Biswas SK, Saqueeb SN, Zaman S, Manirujjaman M. Vitamin D and Parathyroid hormone status in female garment workers: A case-control study in Bangladesh. Hindawi BioMed Research International, 2017; 4105375: 1–7. pmid:28473985
  6. 6. Islam QT, Amin MR. Vitamin D deficiency–current status and its impact in clinical medicine. Bangladesh Journal of Medicine, 2017; 28(1): 1–3.
  7. 7. Kamen DL, Cooper GS, Bouali H, Shaftman SR, Hollis BW, Gilkeson GS. Vitamin D deficiency in systemic lupus erythematosus. Autoimmun Rev., 2006; 5(2): 114–7. pmid:16431339
  8. 8. Ruiz-Irastorza G, Egurbide MV, Olivares N, Martinez-Berriotxoa A, Aguirre C. Vitamin D deficiency in systemic lupus erythematosus: prevalence, predictors and clinical consequences, Rheumatology, 2008; 47: 920–923. pmid:18411213
  9. 9. Borba VZC, Vieira JGH, Kasamatsu T, Radominski SC, Sato EI, Lazaretti-Castro M. Vitamin D deficiency in patients with active systemic lupus erythematosus, OsteoporosInt, 2009; 20: 427–433. pmid:18600287
  10. 10. Wu PW, Rhew EY, Dyer AR, Dunlop DD, Langman CB, Price H, et al. 25-Hydroxyvitamin D and cardiovascular risk factors in women with systemic lupus erythematosus, Arthritis & Rheumatism (Arthritis Care & Research), 2009; 61(10): 1387–1395. pmid:19790113
  11. 11. Ben-Zvi I, Aranow C, Mackay M, Stanevsky A, Kamen DL, Marinescu LM, et al. The Impact of vitamin D on dendritic cell function in patients with systemic lupus erythematosus, PLoS ONE, 2010; 5(2): e9193. pmid:20169063
  12. 12. Ruiz-Irastorza G, Gordo S, Olivares N, Egurbide M-V, Aguirre C. Changes in vitamin D levels in patients with systemic lupus erythematosus: Effects on fatigue, disease activity, and damage, Arthritis Care & Research, 2010; 62(8): 1160–1165. pmid:20235208
  13. 13. Bonakdar ZS, Jahanshahifar L, Jahanshahifar F, Gholamrezaei A. Vitamin D deficiency and its association with disease activity in new cases of systemic lupus erythematosus, Lupus, 2011; 20: 1155–1160. pmid:21680639
  14. 14. Ezzat Y, Sayed S, Gaber W, Mohey AM, Kassem TW. 25-Hydroxy vitamin D levels and its relation to disease activity and cardiovascular risk factors in women with systemic lupus erythematosus, The Egyptian Rheumatologist, 2011; 33(4): 195–201.
  15. 15. Kim H-A, Sung J-M, Jeon J-Y, Yoon J-M, Suh C-H. Vitamin D may not be a good marker of disease activity in Korean patients with systemic lupus erythematosus, Rheumatol Int, 2011; 31: 1189–1194. pmid:20352222
  16. 16. Szodoray P, Tarr T, Bazso A, Poor G, Szegedi G, Kiss E. The immunopathological role of vitamin D in patients with SLE: data from a single centre registry in Hungary, Scandinavian Journal of Rheumatology, 2011; 40(2): 122–126. pmid:20977384
  17. 17. Fragoso TS, Dantas AT, Marques CDL, Junior LFR, Melo JHL, Costa AJG, et al. 25-Hydroxyivitamin D3 levels in patients with systemic lupus erythematosus and its association with clinical parameters and laboratory tests, Revista Brasileira de Reumatologia, 2012; 52(1): 55–65.
  18. 18. Monticielo OA, Brenol JCT, Chies JAB, Longo MGF, Rucatti GG, Scalco R, et al. The role of BsmI and FokI vitamin D receptor gene polymorphisms and serum 25-hydroxyvitamin D in Brazilian patients with systemic lupus erythematosus, Lupus, 2012; 21: 43–52. pmid:21993390
  19. 19. Muñoz-Ortego J, Torrente-Segarra V, Prieto-Alhambra D, Salman-Monte TC, Carbonell-Abello J. Prevalence and predictors of vitamin D deficiency in non-supplemented women with systemic lupus erythematosus in the Mediterranean region: A cohort study, Scandinavian Journal of Rheumatology, 2012; 41(6): 472–475. pmid:22830580
  20. 20. Emam FE, El-Wahab TMA, Mohammed MS, Elsalhy AS, Rahem SIA. Assessment of serum vitamin D level in patients with systemic lupus erythematosus, Egyptian Society for Rheumatology and Rehabilitation, 2014; 41: 71–78.
  21. 21. Mandal M, Tripathy R, Panda AK, Pattanaik SS, Dakua S, Pradhan AK, et al. Vitamin D levels in Indian systemic lupus erythematosus patients: association with disease activity index and interferon alpha, Arthritis Research & Therapy, 2014; 16: R49. pmid:24507879
  22. 22. Farid E, Hassan AB, Jaradat AA, Al-Segai O. Prevalence of vitamin D deficiency in adult patients with Systemic Lupus Erythematosus in Kingdom of Bahrain, MOJ Womens Health, 2017; 6(1): 338–342. pmid:29528574
  23. 23. Gado KH, Gado TH, Samie RMA, Khalil NM, Emam SL, Fouad HH. Clinical significance of vitamin D deficiency and receptor gene polymorphism in systemic lupus erythematosus patients, The Egyptian Rheumatologist, 2017; 39(3): 159–164.
  24. 24. García-Carrasco M, Mendoza-Pinto C, Etchegaray-Morales I, Soto-Santillán P, Jiménez-Herrera EA, Robles-Sánchez V, et al. Vitamin D insufficiency and deficiency in Mexican patients with systemic lupus erythematosus: Prevalence and relationship with disease activity. Reumatol Clin., 2017; 13(2): 97–101. pmid:27084269
  25. 25. Karimzadeh H, Shirzadi M, Karimifar M. The effect of Vitamin D supplementation in disease activity of systemic lupus erythematosus patients with Vitamin D deficiency: A randomized clinical trial, Journal of Research in Medical Sciences, 2017; 22(4). pmid:28400826
  26. 26. Al‑Kushi AG, Azzeh FS, Header EA, ElSawy NA, Hijazi HH, Jazar AS, et al. Effect of Vitamin D and calcium supplementation in patients with systemic lupus erythematosus. Saudi Journal of Medicine & Medical Sciences, 2018; 6: 137‑142. pmid:30787840
  27. 27. Lin T-C, Wu J-Y, Kuo M-L, Ou L-S, Yeh K-W, Huang J-L. Correlation between disease activity of pediatric-onset systemic lupus erythematosus and level of vitamin D in Taiwan: A case-cohort study, Journal of Microbiology, Immunology, and Infection, 2018; 51(1): 110–114. pmid:27147283
  28. 28. Acosta-Colman I, Morel Z, Paats A, Ortíz N, Román L, Vázquez M, et al. Association between vitamin D deficiency and disease activity in Paraguayan patients with systemic lupus erythematosus, Revista Colombiana de Reumatología, 2022; 29(1): 19–25.
  29. 29. Begum A, Miah MT, Ayaz K, Shanchay MSS, Das J. Association of Vitamin D level with disease activity in SLE patients at a tertiary care hospital. Journal of Bangladesh College of Physicians and Surgeons, 2022; 40(2): 105–110.
  30. 30. James G, Witten D, Hastie T, Tibshirani R, James G. An introduction to statistical learning: with applications in R. Springer; 2015.
  31. 31. Breiman L. Random forests. Machine Learning, 2001; 45: 5–32.
  32. 32. Ahmed J, Green R, Alauddin M, Jaman MH, Saha G. Explainable Machine Learning Approaches to Assess the COVID-19 Vaccination Uptake: Social, Political, and Economic Aspects. Preprints, 2022; 2022060115.
  33. 33. Sanchez-Lengeling B, Wei JN, Lee BK, Gerkin RC, Aspuru-Guzik A, Wiltschko AB. Machine learning for scent: Learning generalizable perceptual representations of small molecules. arXiv preprint, 2019;
  34. 34. Lundberg S, Lee S.-I. A Unified Approach to Interpreting Model Predictions. 2017;
  35. 35. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-Learn: Machine Learning in Python. Journal of Machine Learning Research, 2011; 12: 2825–2830.
  36. 36. Yan X, Zhang N, Cheng S, Wang Z, Qin Y. Gender differences in vitamin D status in China.Medical Science Monitor, 2019; 25: 7094–7099. pmid:31541605
  37. 37. Singh A, Kamen DL. Potential benefits of vitamin D for patients with systemic lupus erythematosus, Dermatoendocrinol, 2012; 4: 146‑51. pmid:22928070
  38. 38. Magro R, Saliba C, Camilleri L, Scerri C, Borg AA. Vitamin D supplementation in systemic lupus erythematosus: relationship to disease activity, fatigue, and the interferon signature gene expression, BMC Rheumatol, 2021; 5: 53. pmid:34857051
  39. 39. Abbasi M, Rezaieyazd Z, Afshari JT, Hatef M, Sahebari M, Saadati N. Lack of association of vitamin D receptor gene BsmI polymorphisms in patients with systemic lupus Erythematosus.Rheumatology International, 2010; 30: 1537–1539. pmid:20473502
  40. 40. Petri M, Bello KJ, Fang H, Magder LS. Vitamin D in systemic lupus erythematosus: modest association with disease activity and the urine protein-to-creatinine ratio. Arthritis Rheum., 2013; 65(7): 1865–71. pmid:23553077
  41. 41. Stockton KA, Kandiah DA, Paratz JD, Bennell KL. Fatigue, muscle strength and vitamin D status in women with systemic lupus erythematosus compared with healthy controls, Lupus, 2012; 21: 271–278. pmid:22004972
  42. 42. Mok C, Lau C. Pathogenesis of Systemic Lupus Erythematosus, Journal of clinical pathology, 2003; 56: 481–490. pmid:12835292
  43. 43. Abaza NM, El-Mallah RM, Shaaban A, Mobasher SA, Al-Hassanein KF, Zaher AAA, et al. Vitamin D deficiency in Egyptian systemic lupus erythematosus patients: how prevalent and does it impact disease activity? Integrative Medicine Insights, 2016; 11: 27–33, pmid:27695278
  44. 44. Khazaei Z, Khazaei S, Beigrezaei S, Nasri H. Vitamin D deficiency in healthy people and its relationship with gender and age, Journal of Parathyroid Disease, 2018; 6(1): 16–18.
  45. 45. Arabi A, Baddoura R, El-Rassi R, Fuleihan GEH. Age but not gender modulates the relationship between PTH and vitamin D, Bone, 2010; 47: 408–412. pmid:20452474
  46. 46. El-Sherbiny D, El-Badawy M, Elmahdi A. Body mass index in systemic lupus erythematosus: relation to disease activity, bone mineral density, and vitamin D level. The Egyptian Journal of Hospital Medicine, 2021; 82(1): 89–95.
  47. 47. Park Y, Baek I, Kim K, Kim W, Cho C. SAT0466 Serum vitamin D deficiency is associated with active renal disease in systemic lupus erythematosus. Annals of the Rheumatic Diseases, 2018; 77: 1091.
  48. 48. Hossain HT, Islam QT, Khandaker MAK, Ahasan HN. Study of serum vitamin D level in different socio-demographic population—A pilot study. Journal of Medicine, 2017; 19(1): 22–29,