Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Exploration of risk factors for the incidence of knee osteoarthritis in rural areas of northern China and the establishment of a prediction model

  • Junhui Ma ,

    Contributed equally to this work with: Junhui Ma, Qiang Ma

    Roles Conceptualization, Funding acquisition, Writing – review & editing, Conceptualization, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Respiratory and Critical Care Medicine, People’s Hospital of Ningxia Hui Autonomous, Yinchuan, China

  • Qiang Ma ,

    Contributed equally to this work with: Junhui Ma, Qiang Ma

    Roles Resources, Project administration, Methodology

    Affiliation Thoracic Surgery Department, People's Hospital of Ningxia Hui Autonomous, Yinchuan, China

  • Chao Shi,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition

    Affiliation Department of Central Labororatory, People’s Hospital of Ningxia Hui Autonomous Region, Yinchuan, China

  • Bing Zhuan ,

    Roles Resources, Software, Supervision

    zhuanb518@163.com (BZ); 58263147@qq.com (JM)

    Affiliation Department of Respiratory and Critical Care Medicine, People’s Hospital of Ningxia Hui Autonomous, Yinchuan, China

  • Jun Ma

    Roles Conceptualization, Validation, Visualization, Writing – original draft, Writing – review & editing

    zhuanb518@163.com (BZ); 58263147@qq.com (JM)

    Affiliation Department of Sports Medicine, People’s Hospital of Ningxia Hui Autonomous Region, Yinchuan City, Ningxia, China

Abstract

Objective

This study sought to identify knee osteoarthritis (KOA) contributing factors and develop a preliminary forecasting model for its development.

Methods

Participants were systematically invited to complete an exhaustive medical questionnaire designed to capture relevant health and demographic information. Following data collection, univariate analyses were conducted to assess the significance of the variables obtained from the questionnaire. To delineate the association between identified risk factors and the occurrence of KOA, a binary logistic regression model was utilized. The reliability of the model was evaluated through internal validation, encompassing both calibration and discrimination analyses. Calibration was quantified using the Hosmer–Lemeshow χ² statistic to assess the model’s goodness of fit, while discrimination was gauged utilizing the receiver operating characteristic (ROC) curve, providing a comprehensive evaluation of the model’s predictive accuracy.

Results

In the present study, a total of 445 cases were analyzed, with 266 cases employed for model development and 179 cases reserved for internal validation. Univariate analysis revealed significant statistical differences between the two groups with respect to several variables, including family history of KOA, heating methods, stair usage, anxiety and depression, toilet type, and the frequency of consumption of vegetables, fruits, red meat, and dairy products. Binary logistic regression analysis identified advanced age, lower educational level, use of a squat toilet, family history of KOA, and psychological conditions such as anxiety and depression as significant risk factors for the development of KOA. Furthermore, a moderate predictive value was observed for incident KOA based on a combination of factors, including age, gender, weight, height, family history of KOA, toilet type, mode of transportation, dairy product consumption, and emotional state.

Conclusions

Our findings indicate that, in addition to established risk factors such as age, gender, height, and weight, lifestyle and dietary habits also play a pivotal role in the etiology of KOA. These factors not only serve as potential risk markers but also exhibit predictive utility for the onset of KOA, suggesting a comprehensive approach to prevention and intervention strategies.

Introduction

Osteoarthritis (OA), recognized as the most prevalent joint disorder globally [1], ranks as a significant contributor to disability and is among the most common chronic illnesses [2]. Characterized by pivotal pathological features such as cartilage degeneration, osteophyte formation, and subchondral bone sclerosis, OA results in chronic pain, functional impairment, and a diminished quality of life for affected individuals [3,4]. Therapeutic interventions, including arthroplasty and osteotomy, although effective, often entail substantial financial costs, thereby exacerbating the economic burden on OA patients [5].

KOA, a prevalent chronic joint disorder, ranks as the predominant cause of lower limb disability among the elderly population [6]. In the pursuit of preventive strategies, it is imperative to delineate the risk factors contributing to this condition. These factors can be categorized into two distinct groups: nonmodifiable risks and potentially modifiable risks. Nonmodifiable risks encompass intrinsic attributes such as age, gender [7], genetic susceptibility, and family history. Conversely, potentially modifiable risks include body mass index (BMI), occupational hazards [8], joint injury, quadriceps weakness, nutrient deficiencies, bone mineral density, and oestrogen insufficiency [9,10].

To date, a plethora of predictive models have been developed to estimate the incidence of KOA. Among these, the model proposed by H.J. M. Kerkhof [11] stands out as the most robustly supported by empirical evidence. This model, which was derived from three distinct populations, reveals that the inclusion of readily accessible ‘Questionnaire’ variables, genetic markers, OA at other joint sites, and biochemical markers contributes only marginally to the predictive accuracy of KOA incidence when compared to the traditional parameters of age, gender, and body mass index (BMI) within an elderly cohort.

Nevertheless, the paucity of empirical research investigating the impact of dietary habits and lifestyle on KOA in the rural regions of northern China remains a notable gap in the current literature. Against this backdrop, the primary objective of our study was to delineate the risk factors associated with KOA utilizing a questionnaire-based survey approach, ultimately aiming to construct a robust risk prediction model.

Materials and methods

Study design and setting

This investigation was designed as a cross-sectional study, utilizing a comprehensive questionnaire (S1 File) and radiographic assessment of the knees. The questionnaire, which encompassed basic demographic information, lifestyle factors, dietary habits, and transportation modes, was administered to all participants by a team of skilled interviewers. Each participant underwent a complimentary radiographic examination of both knees to facilitate the collection of detailed imaging data. In May 2022, we conducted information collection in several randomly selected rural areas.

All participants granted oral informed consent, and we exclusively retrieved the survey instruments and conducted knee X-ray examinations on those who acceded. This study does not involve minors. This study has been reviewed and approved by the Ethics Committee of Ningxia Hui Autonomous Region People’s Hospital(Ethics Approval Number:[2021]-YCSKY-010).

Participants

In the context of our study, the following inclusion and exclusion criteria were meticulously established to ensure the integrity and relevance of the research findings:

Inclusion Criteria:(1) Participants must be aged 40 years or older. (2) Gender is not a limiting factor, and individuals of any gender are eligible. (3) Participants must possess the cognitive capacity to comprehend the investigation procedure, provide informed consent by signing a written consent form, and explicitly express their willingness to partake in the study.

Exclusion Criteria:(1)Individuals who are unable to provide informed consent are ineligible.(2)Those with terminal illnesses or mental health disorders are excluded to avoid potential confounding variables.(3) Participants with inflammatory joint diseases are precluded from the study due to the potential impact on the investigation’s outcomes.(4)Individuals diagnosed with dementia are excluded to ensure the validity of the study’s data.(5)Those who have sustained acute soft tissue injuries to the knee within the past week are not eligible to minimize the influence of recent trauma on study results.(6)Participants who have undergone knee replacement surgery are excluded to focus on individuals with intact knee joints.(7) Pregnant and lactating women are excluded to avoid any potential confounding effects related to pregnancy or lactation.(8)Individuals with serious complications that could interfere with the study’s objectives or outcomes are excluded to maintain the homogeneity of the study population.

The progress of modeling

Data preprocessing: the missing rate of all continuous variables in this study was less than 10%. Based on the principles of data integrity and statistical rationality, multiple linear regression imputation was adopted for missing value handling. Specifically, a regression model incorporating other risk factors (e.g., age, BMI, dietary habits, and psychological status) was constructed, and missing values were predicted using the correlations between variables. Compared with mean/median imputation, this method can more accurately retain the distribution characteristics of data and reduce bias.

Prior to imputation, the distribution differences of continuous variables (e.g., height, weight, and dietary frequency) between the two groups were tested using independent samples t-tests (for normally distributed data) or Mann-Whitney U tests (for non-normally distributed data). Meanwhile, chi-square tests were used to examine the differences in the composition ratios of core categorical variables (e.g., family history and toilet type) between the two groups. The results showed that there were statistically significant differences in the distribution of family history and frequency of dairy product consumption between the case group (KOA group) and the control group (P < 0.05). Therefore, imputation operations were performed separately for the two groups to avoid confusion of data characteristics between groups and ensure that the imputed data better conformed to the actual situation of each group.

The variables with missing values and their corresponding missing rates in this study are as follows: height (1.35%, 6/446), weight (0.89%, 4/446), frequency of vegetable and fruit consumption (1.12%, 5/446), frequency of red meat and its product consumption (0.67%, 3/446), frequency of dairy product consumption (1.57%, 7/446), and anxiety and depression status (0.45%, 2/446). No missing data were observed for other variables, including age, gender, family history, toilet type, and heating method. All missing rates were calculated based on the 446 cases finally included in the analysis.

Dataset Balancing: To address the slight imbalance in sample size between the case group (149 cases) and the control group (118 cases), the Synthetic Minority Oversampling Technique (SMOTE) was used to balance the dataset during the data preprocessing stage prior to model training. Only the 60% model training set (267 cases) was balanced, while the 40% internal validation set (179 cases) retained its original sample distribution to ensure the authenticity of validation results.

Specifically, the control group (minority class) served as the basis. The SMOTE algorithm calculated the feature differences between each minority class sample and its neighboring samples, generating 31 new minority class samples. After balancing, both the case group and the control group in the training set had 149 samples, achieving a 1:1 distribution.

After sample generation, the Kolmogorov-Smirnov test was used to verify the distribution characteristics of key continuous variables (e.g., height, weight), and the chi-square test was used to verify the distribution of categorical variables (e.g., family history, toilet type). The results showed no statistically significant difference in distribution between the synthetic samples and the original minority class samples (P > 0.05), ensuring that no new bias was introduced during the balancing process.

In the initial phase of our analytical approach, we employed univariate statistical techniques to evaluate the association between individual risk factors and the onset of incident KOA. This exploratory analysis provided a foundation for the identification of significant risk factors. Building upon these findings, we subsequently constructed a multivariate logistic regression model, incorporating only those risk factors that were deemed statistically meaningful. This methodological strategy enabled us to derive a robust predictive model that accounts for the complex interplay of risk factors in the context of knee OA development.

Subsequently, this study conducted a rigorous validation of the proposed model, focusing on both calibration and discrimination parameters. Calibration was employed to evaluate the accuracy of the predicted probabilities, utilizing the Hosmer–Lemeshow χ² statistic for goodness of fit. This metric enabled a comparative analysis between the observed and predicted risk deciles, with smaller values denoting optimal calibration.

Discrimination, on the other hand, was assessed to gauge the model’s proficiency in accurately classifying subjects into disparate risk groups. For this purpose, the area under the ROC curve was adopted as a measure of discriminative efficacy. The ROC curve plotting sensitivity against 1-specificity across various risk score cut-off points provided a visual representation of the model’s discriminative capacity, where larger ROC values signified enhanced discriminative power. The diagnostic outcomes for KOA are delineated in Table 1, while the definitions and descriptions of the predictors utilized in the model are provided in Table 2 for clarity and reference.

Statistical analysis

All statistical analyses were conducted utilizing SPSS version 25. A p-value of less than 0.05 was deemed statistically significant, whereas a p-value below 0.01 was considered highly significant.

Results

Population characteristics

A comprehensive dataset comprising 572 questionnaires was meticulously assembled for this study. Upon meticulous screening, 83 questionnaires were identified as invalid, while an additional 43 were excluded based on predefined exclusion criteria. Consequently, a total of 446 cases (S1 Table) were deemed eligible for inclusion in this research. To facilitate subsequent analyses, these cases were randomly allocated into two distinct groups: a training group and a verification group, with allocations of 60% and 40%, respectively. Specifically, 267 cases were utilized for the development of the model, while 179 cases were reserved for the internal validation of the model (Fig 1).

thumbnail
Fig 1. Flowchart of participant inclusion and exclusion.

A comprehensive dataset consisting of 572 questionnaires was initially gathered. Upon meticulous review, 83 questionnaires were deemed invalid due to incomplete or inconsistent responses. Subsequently, an additional 43 participants were excluded based on predefined exclusion criteria. Consequently, a total of 446 cases were included in the present research, forming the basis for the subsequent analysis.

https://doi.org/10.1371/journal.pone.0338003.g001

The fundamental demographic characteristics of the KOA cohort and the control group are delineated in Table 3. It is observed that the KOA group comprised 149 incident cases that conformed to the diagnostic criteria outlined in Table 1, whereas the control group consisted of 118 participants. Notably, there were significant differences between the two groups in terms of occupation and educational level, as well as height. In contrast, no statistically significant disparities were found with respect to gender, weight, and age.

The relationship between risk factors and incident KOA

In the present study, we administered a comprehensive questionnaire consisting of 62 questions to our participants. Following univariate analysis, the variables that emerged as significant are delineated in Table 4. Notably, the KOA cohort exhibited a higher prevalence of family history compared to the control group. A marked discrepancy in heating methods was observed between the two groups, with the KOA group demonstrating a higher incidence of burning mineral fuel. Moreover, the KOA group reported a significantly higher frequency of utilizing sit-down toilets, ascending and descending stairs, and experiencing anxiety and depression relative to the control group. Statistically significant differences were also identified in the consumption frequencies of vegetables, fruits, red meat (and its derivatives), and dairy products between the two groups.

thumbnail
Table 4. Univariate Analysis of Risk Factors Between KOA Cases and Control Groups.

https://doi.org/10.1371/journal.pone.0338003.t004

Risk prediction models

The findings derived from the binary logistic regression analysis are presented in Table 5. The data reveal that the most prominent associations pertained to educational level, with the absence of a diploma demonstrating the strongest linkage, presenting an odds ratio (OR) of 4.474. Following this, notable associations were identified with respect to age, specifically within the 60–69-year-old bracket, yielding an OR of 3.927. The type of toilet facility, specifically the use of squat toilets, was also significantly associated with the condition, with an OR of 2.929. Furthermore, a family history of KOA emerged as a substantial risk factor, with an OR of 2.562. In addition to these factors, the odds ratio for anxiety and depression was determined to be 2.431, indicating a considerable association with the condition under investigation.

thumbnail
Table 5. Binary logistic regression model analyzing the association between risk factors and the incidence of KOA.

https://doi.org/10.1371/journal.pone.0338003.t005

Validation of the risk prediction models

The Hosmer–Lem show χ² statistic for assessing goodness of fit and the Area Under the Receiver Operating Characteristic (AUC) for the distinct risk factor groups across all risk prediction models are presented in Table 6. Notably, the inclusion of lifestyle variables into the model resulted in a significant enhancement of the AUC, increasing from 0.68 to 0.78. Model 5 has the largest AUC value (0.81), and it can be seen from Table 7 that it achieves the best predictive performance. Fig 2 illustrates the ROC curves for the five models, providing a visual representation of their predictive accuracy.

thumbnail
Table 6. Validation of the Risk Prediction Models: Calibration and Discrimination.

https://doi.org/10.1371/journal.pone.0338003.t006

thumbnail
Table 7. Performance Comparison of Classification Models.

https://doi.org/10.1371/journal.pone.0338003.t007

thumbnail
Fig 2. Receiver operating characteristic (ROC) curves comparing the performance of five risk prediction models.

https://doi.org/10.1371/journal.pone.0338003.g002

Discussion

The objective of our investigation was to delineate the risk factors and to develop a predictive model for KOA utilizing a questionnaire-based survey approach. As illustrated in Table 3, our analysis revealed that gender, body weight, and age did not exhibit statistically significant differences between the two study groups, a finding that contrasts with the outcomes of prior research. This discrepancy warrants further exploration and contextualization within the existing body of literature. The observed discrepancies in our findings may be attributed primarily to two pivotal factors. Firstly, regional disparities play a significant role, as our study population is situated in the arid climate of northwest China, where the inhabitants predominantly engage in agricultural activities. This unique demographic context may influence the prevalence and manifestation of KOA. Secondly, the limited sample size employed in our research may introduce bias into the statistical outcomes. Despite the lack of statistical significance in age distribution between the two groups, binary logistic regression analysis (refer to Table 5) revealed a notable trend: the incidence of KOA positively correlates with advancing age. This finding underscores the importance of considering age as a critical determinant in the etiology of KOA.

In the present study, significant statistical differences were observed in occupation, educational level, height [12], and BMI between the two groups. Specifically, the KOA cohort exhibited a higher prevalence of individuals engaged in farming, animal husbandry, and fishery compared to the control group. This occupational trend may serve as a potential risk factor for the development of KOA.

Utilizing a binary logistic regression model to examine the association between risk factors and the incidence of KOA, our findings, as presented in Table 5, indicate a notable disparity in KOA prevalence across different age groups. Specifically, individuals within the 60–69-year age bracket were found to be significantly more susceptible to KOA when compared to their 40–49-year-old counterparts.

Furthermore, this investigation elucidated that lifestyle and dietary habits are significant determinants of the incidence of KOA. Notably, our findings revealed that individuals who utilize squat toilets are at a heightened risk for developing KOA compared to those who use sit-down toilets, with an odds ratio (OR) of 2.929. Moreover, we observed that individuals who frequently engage in walking and cycling exhibit a greater propensity to develop KOA than those who predominantly use motor vehicles. These findings underscore the pivotal role of lifestyle choices in the etiology of KOA.

In the context of educational attainment, individuals without a high school diploma (odds ratio [OR] 4.474) or those with a junior high school diploma (OR 2.769) exhibited a significantly elevated risk of KOA compared to those with a high school education or higher. During the questionnaire administration, we observed that individuals with lower educational qualifications tended to experience poorer living conditions. Consequently, their dietary intake was often lacking in essential nutrients, such as vegetables, fruits, dairy products, and proteins. Notably, our findings revealed that dairy consumption served as a protective factor against KOA, particularly when compared to those who rarely consumed dairy products (refer to Table 5).

In the present study, we constructed five predictive models utilizing binary logistic regression to anticipate the onset of KOA. Given that our cohort is in its incipient stages, we confined our validation efforts to internal checks, as detailed in Table 6. Future longitudinal follow-ups will enable us to conduct external validation. As indicated in Table 6, Model 5 exhibited the highest discriminative power, incorporating variables such as age, gender, body weight, height, family history of KOA, lifestyle, dietary habits, and emotional state [13]. Notably, the most substantial enhancement in the model’s predictive capability was achieved with the integration of lifestyle habits and dietary patterns.

Conclusions

In summation, this study underscores that beyond established risk factors including age, gender, height, and weight, lifestyle and dietary habits emerge as significant contributory elements in the etiology of KOA. These modifiable factors not only serve as potential risk markers but also hold promise as predictive indicators for the onset of KOA, warranting further investigation and the development of targeted interventions.

References

  1. 1. Abramoff B, Caldera FE. Osteoarthritis: Pathology, Diagnosis, and Treatment Options. Med Clin North Am. 2020;104(2):293–311. pmid:32035570
  2. 2. International Foot and Ankle Osteoarthritis Consortium, Arnold JB, Bowen CJ, Chapman LS, Gates LS, Golightly YM, et al. International Foot and Ankle Osteoarthritis Consortium review and research agenda for diagnosis, epidemiology, burden, outcome assessment and treatment. Osteoarthritis Cartilage. 2022;30(7):945–55. pmid:35176480
  3. 3. Bijlsma JWJ, Berenbaum F, Lafeber FPJG. Osteoarthritis: an update with relevance for clinical practice. Lancet. 2011;377(9783):2115–26. pmid:21684382
  4. 4. Braun HJ, Gold GE. Diagnosis of osteoarthritis: imaging. Bone. 2012;51(2):278–88. pmid:22155587
  5. 5. Veronesi F, Salamanna F, Martini L, Fini M. Naturally Occurring Osteoarthritis Features and Treatments: Systematic Review on the Aged Guinea Pig Model. Int J Mol Sci. 2022;23(13):7309. pmid:35806306
  6. 6. Wang L-J, Zeng N, Yan Z-P, Li J-T, Ni G-X. Post-traumatic osteoarthritis following ACL injury. Arthritis Res Ther. 2020;22(1).
  7. 7. Zou Z, Luo X, Chen Z, Zhang YS, Wen C. Emerging microfluidics-enabled platforms for osteoarthritis management: from benchtop to bedside. Theranostics. 2022;12(2):891–909. pmid:34976219
  8. 8. Prieto-Alhambra D, Judge A, Javaid MK, Cooper C, Diez-Perez A, Arden NK. Incidence and risk factors for clinically diagnosed knee, hip and hand osteoarthritis: influences of age, gender and osteoarthritis affecting other joints. Ann Rheum Dis. 2014;73(9):1659–64. pmid:23744977
  9. 9. Glyn-Jones S, Palmer AJR, Agricola R, Price AJ, Vincent TL, Weinans H, et al. Osteoarthritis. Lancet. 2015;386(9991):376–87. pmid:25748615
  10. 10. Martel-Pelletier J, Barr AJ, Cicuttini FM, Conaghan PG, Cooper C, Goldring MB, et al. Osteoarthritis. Nat Rev Dis Primers. 2016;2:16072. pmid:27734845
  11. 11. Kerkhof HJM, Bierma-Zeinstra SMA, Arden NK, Metrustry S, Castano-Betancourt M, Hart DJ, et al. Prediction model for knee osteoarthritis incidence, including clinical, genetic and biochemical risk factors. Ann Rheum Dis. 2014;73(12):2116–21. pmid:23962456
  12. 12. Zhang W, McWilliams DF, Ingham SL, Doherty SA, Muthuri S, Muir KR, et al. Nottingham knee osteoarthritis risk prediction models. Ann Rheum Dis. 2011;70(9):1599–604. pmid:21613308
  13. 13. Burston JJ, Valdes AM, Woodhams SG, Mapp PI, Stocks J, Watson DJG, et al. The impact of anxiety on chronic musculoskeletal pain and the role of astrocyte activation. Pain. 2019;160(3):658–69. pmid:30779717