Predicting arthritis risk with machine learning: Insights from the 2023 National Health Interview Survey data

Tianhua Chen; Zhiwei Long

doi:10.1371/journal.pone.0336018

Abstract

Arthritis, a common chronic disease encompassing multiple subtypes of osteoarthritis and rheumatoid arthritis, was explored in this study as a risk-related factor based on data from the 2023 U.S. National Health Interview Survey (NHIS). The study included 26,031 participants (6,849 in the arthritis group; 19,182 in the control group), and 21 variables were found to be significantly different between groups by chi-square test. Fourteen key predictors were screened using support vector machine recursive feature elimination (SVM-RFE): age, general health, chronic obstructive pulmonary disease, gender, hypertension, coronary heart disease, body mass index (BMI), cancer, depression, dementia, asthma, diabetes, smoking status, and hepatitis. The column-linear graphical model constructed based on these variables showed excellent predictive performance (AUC = 0.813), the slope of the calibration curve was close to 1 (P = 0.444) indicating high predictive accuracy, and the decision curve analysis showed that its net benefit was better than that of a single predictor. The study demonstrated that the NHIS column-line graph model constructed based on machine learning algorithms can effectively predict the risk of arthritis and provide an important reference for clinical management. The prediction model established in this study provides a theoretical basis for accurate prevention and treatment strategies for arthritis.

Citation: Chen T, Long Z (2025) Predicting arthritis risk with machine learning: Insights from the 2023 National Health Interview Survey data. PLoS One 20(11): e0336018. https://doi.org/10.1371/journal.pone.0336018

Editor: Marwan Salih Al-Nimer,, University of Diyala College of Medicine, IRAQ

Received: June 12, 2025; Accepted: October 20, 2025; Published: November 26, 2025

Copyright: © 2025 Chen, Long. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The datasets analysed during the current study are available in the [NHIS] repository, [https://www.cdc.gov/nchs/nhis/].

Funding: This study was supported by the Shaoyang Science and Technology Program Guided Project (Grant No. 2024PT6221). No other external funding was received for this research.

Competing interests: The authors declare that they have no competing interests.

1. Introduction

Arthritis is a common chronic disease that includes osteoarthritis (OA), rheumatoid arthritis (RA), psoriatic arthritis, and other subtypes of arthritis, and currently affects more than 500 million people worldwide [1]. Arthritis is characterized by pain, stiffness and functional limitations, which greatly reduces patients’ quality of life and creates a significant socio-economic burden due to disability and high medical costs [2]. OA is often caused by age-related cartilage degeneration and biomechanical stress, whereas RA is caused by autoimmune-mediated synovial inflammation [3,4]. Despite differences in subtypes, they share numerous risk factors and collectively impose a substantial disease burden. Risk factors such as aging, obesity, genetic predisposition, and comorbidities (e.g., hypertension, diabetes mellitus) are associated with its pathogenesis [5]. Current diagnostic methods rely on clinical assessment, imaging, and biomarkers, but early detection remains challenging, and therapeutic strategies often focus on symptom management rather than disease modification [6]. These limitations highlight the need to identify novel risk predictors and develop robust models for early intervention. Given that the NHIS database used in this study queries arthritis as a single category via the question “Have you ever had arthritis?”, the present study aims to explore the common risk factors for arthritis in a broad sense, rather than distinguishing between specific subtypes.

The NHIS is a nationally representative cross-sectional survey that systematically collects comprehensive health data, including demographics, chronic diseases, lifestyle behaviors, and socioeconomic determinants. Its strengths of large sample size (e.g., more than 25,000 participants per year), standardization of protocols, and integration of socioeconomic determinants make it well-suited for studying multifactorial diseases such as arthritis [7–9]. Previous NHIS-based studies have elucidated the relationship between arthritis and obesity, smoking, and cardio-metabolic complications (e.g., hypertension, diabetes), highlighting the role of arthritis in risk stratification [10,11]. In addition, NHIS data inform public health policy by quantifying differences in arthritis prevalence among population subgroups, such as the fact that the prevalence of OA is twice as high in women as in men. However, existing arthritis studies using the NHIS have focused primarily on prevalence estimates or isolated risk associations and lack comprehensive models that assess multifactorial interactions, and these analyses rarely employ advanced feature selection techniques to optimize predictive accuracy [12–14]. This gap highlights the untapped potential of NHIS in developing predictive tools for arthritis risk stratification.

Despite progress in understanding the pathogenesis of arthritis, critical challenges remain. First, most risk prediction models rely on a limited number of variables, neglecting the synergistic effects of comorbidities and lifestyle factors [15]. Second, conventional statistical methods often fail to handle high-dimensional data efficiently, potentially missing important predictors. Recent studies have highlighted the utility of machine learning (ML) algorithms, such as support vector machine-recursive feature elimination (SVM-RFE), in optimizing variable selection from complex datasets [16]. However, few studies have integrated ML techniques with nomogram construction to improve clinical interpretability. Building on this foundation, our study aims to (1) identify critical variables associated with arthritis risk using SVM-RFE and (2) develop a clinically actionable nomogram to quantify individualized risk. By synthesizing demographic, clinical, and behavioral data from NHIS, this work addresses the unmet need for multifactorial risk assessment tools and provides insights for targeted interventions and resource allocation in arthritis management.

2. Materials and methods

2.1. Data collection

Data were obtained from the NHIS database (https://www.cdc.gov/nchs/nhis/) (accessed on December 17, 2024), a continuing survey that began in 1957 and documents information on the amount, distribution, and effects of illness and disability in the U.S. population. This study included adult participants with a history of arthritis from the 2023 NHIS data. Samples of participants with arthritis, as identified in the NHIS questionnaire, were selected for this study. The data access date for this study was 2024.12.17. Information identifying individual participants could not be accessed during or after data collection. The exclusion criteria were: (1) excluding participants aged 18 years or younger; (2) excluding participants with unclear arthritis diagnoses or missing information; (3) excluding participants with missing data for other variables. A total of 26,031 participants were recruited (arthritis = 6,849; non-arthritis = 19,182), and the flow chart for exclusion and inclusion was shown in Fig 1A. In addition, a flowchart illustrating the overall analytical framework of this study was also presented (Fig 1B).

Download:

Fig 1. Flow diagram of participant inclusion/exclusion criteria and the analytical workflow of this study.

A. Flowchart of Participant Inclusion and Exclusion, B. The overall analysis flow chart.

https://doi.org/10.1371/journal.pone.0336018.g001

2.2. Outcome definition

The NHIS database was accessed, the 2023 data was selected, and the Sample Adult Interview was chosen. The codebook system was entered, and ARTHEV_A was searched, where participants were asked, “Have you ever had arthritis?” Those who answered “yes” were categorized as the arthritis group and coded as 1, while those who answered “no” were categorized as the non-arthritis group and coded as 0.

2.3. Variables definition

The variables included sociodemographics, health status, and related diseases (S1 Table), as detailed below:

The sociodemographic characteristics included in the study were as follows: age (18–44 years coded as 1, 45–64 years as 2, and 65 years or older as 3); sex (male coded as 1, female as 2); race (Hispanic (Mexican/Mexican American) coded as 1, Hispanic (all other groups) as 2, non-Hispanic as 3); education level (less than high school coded as 1, high school graduate as 2, greater than high school as 3); marital status (married or living with a partner coded as 1, divorced/separated/widowed/never married as 2); poverty status (1–3 coded as 1, 4–7 as 2, and greater than 8 as 3); region of residence (Northeast coded as 1, Midwest as 2, South as 3, and West as 4); smoking status (never smokers coded as 1, former or current smokers as 2).

The health status data were as follows: Body Mass Index (BMI) was classified into 4 categories: underweight (<18.5 kg/m², coded as 1), normal weight (18.5–24.9 kg/m², coded as 2), overweight (25.0–29.9 kg/m², coded as 3), and obesity (30.0–40.0 kg/m², coded as 4); overall health status was classified as healthy (coded as 1) or not very healthy (coded as 2); mental health was defined based on whether the individual had received counseling or treatment from a mental health professional in the past 12 months (received care, coded as 1; did not receive care, coded as 2); health insurance status was categorized as not covered (coded as 1) or covered (coded as 2).

The relevant disease data were include: Diabetes (“ever been diagnosed with diabetes?”, “yes” coded as 1, “no” coded as 2); hypertension (“ever been told you have high blood pressure?”, “yes” coded as 1, “no” coded as 2); cancer (“ever been diagnosed with cancer?”, “yes” coded as 1, “no” coded as 2); asthma (“ever been diagnosed with asthma?”, “yes” coded as 1, “no” coded as 2); chronic obstructive pulmonary disease (“ever been told you have chronic obstructive pulmonary disease, emphysema, or chronic bronchitis?”, “yes” coded as 1, “no” coded as 2); hepatitis (“ever been diagnosed with hepatitis?”, “yes” coded as 1, “no” coded as 2); stroke (“ever been told you had a stroke?”, “yes”coded as 1, “no” coded as 2); dementia (“ever been diagnosed with dementia?”, “yes” coded as 1, “no” coded as 2); coronary artery disease (“ever been told you have coronary artery disease?”, “yes” coded as 1, “no” coded as 2); depression (“ever been told you have depression?”, “yes” coded as 1, “no” coded as 2).

2.4. Statistical analysis

A baseline table was constructed based on 22 variables, and the tableone package (v 0.13.2) was used for its generation [17]. Categorical variables were shown as counts (percentages), and the Chi-square test was applied to assess statistical differences between the arthritis group (1) and the non-arthritis group (0) (P < 0.05) to identify significant variables. A total of 26,031 recruited participants were divided into a training set and a test set at a ratio of 7:3, with 18,223 samples in the training set and 7,808 samples in the test set for subsequent analysis. Subsequently, based on the variables with significant differences, SVM-RFE was performed via the “caret” package (v 6.0.93) in the training set (10-fold cross-validation), and the variables output at the highest prediction accuracy of the model were selected as key variables [18]. And the model’s F1-score, Recall, Sensitivity, Specificity, Precision, and Area Under the Curve (AUC) (95% CI) were calculated in the test set to further evaluate its predictive performance. Finally, based on the key variables, a nomogram was constructed using the “RMS” package (v 6.5−0) [19] in the training set. Receiver operating characteristic (ROC) analysis was conducted using the “pROC” package (v 1.18.0), decision curve analysis (DCA) was conducted using the “ggDCA” package (v 1.6), and a calibration curve for the nomogram was plotted using the “regplot” package (v 1.1) [17,20,21]. All statistical analyses were performed using R (v 4.2.2).

3. Results

3.1. Identification of key variables

The baseline table results indicated significant differences (P < 0.001) between the group 1 and group 2 across 21 variables, including age, poverty status, region of residence, smoking status, BMI, overall health status, sex, race, health insurance status, diabetes, hypertension, cancer, asthma, education level, marital status, chronic obstructive pulmonary disease, hepatitis, stroke, dementia, coronary heart disease, and depression. In the arthritis group, the proportions of participants who were aged 65 and older, female, non-Hispanic, had a high school education or higher, divorced, separated, widowed, or never married, had a poverty status greater than 8, from the Southern region, former or current smokers, obese, healthy, with health insurance coverage, non-diabetic, hypertensive, non-cancerous, non-asthmatic, non-chronic obstructive pulmonary disease, non-hepatitis, without a history of stroke, non-dementia, non-coronary heart disease, and non-depression were relatively higher, with the following respective percentages: 59.4%, 61.3%, 91.6%, 61.2%, 52.3%, 65.6%, 38.0%, 52.2%, 41.0%, 68.5%, 87.1%, 97.7%, 71.4%, 60.3%, 77.5%, 80.6%, 86.8%, 96.2%, 92.4%, 97.3%, 86.8%, and 84.2% (Table 1). A total of 15 key variables were identified by the SVM-RFE machine learning algorithm, including age, overall health status, chronic obstructive pulmonary disease, sex, hypertension, coronary heart disease, BMI, cancer, depression, dementia, asthma, diabetes, poverty status, stroke, and hepatitis (Fig 2). The evaluation results of the model in the test set were AUC 0.751 (95% CI 0.738–0.764), F1 0.425, Recall/Sensitivity 0.320, Specificity 0.934, and Precision 0.635, indicating that the model had good predictive performance (Table 2).

Download:

Table 1. Baseline table.

https://doi.org/10.1371/journal.pone.0336018.t001

Download:

Table 2. Evaluation model indicators of test set.

https://doi.org/10.1371/journal.pone.0336018.t002

Download:

Fig 2. Accuracy of SVM-RFE Algorithm for Feature Selection.

https://doi.org/10.1371/journal.pone.0336018.g002

3.2. Establishment of a robust nomogram model based on key variables

To further predict the occurrence of arthritis, a nomogram was constructed based on the key variables (Fig 3A). The ROC curve revealed an AUC value of 0.814 for the nomogram (Fig 3B). The calibration curve was then evaluated and the slope approaching 1 indicated that the model’s predictions were highly accurate (Fig 3C). Finally, the DCA results of this study showed that within the full risk threshold range (0–1) indicated by the abscissa, both the nomogram curve and the prediction curves corresponding to each single risk factor were located in the area above the “All” line and “None” line, with no cases of falling below the reference lines. Additionally, it was found that the net benefit value of the nomogram model curve was significantly higher than that of most single risk factor curves. This indicated that the gain of a single risk factor for clinical decision-making was limited, while the nomogram model integrated with multiple factors had better decision-making guidance value (Fig 3D). These results suggested that the nomogram based on the key variables demonstrated excellent predictive performance for arthritis.

Download:

Fig 3. Construction and validation of the nomogram.

A. Nomogram, B. ROC Curve, C. Calibration Curve, D. DCA Decision Curve.

https://doi.org/10.1371/journal.pone.0336018.g003

4. Discussion

Arthritis, a leading cause of disability globally, is influenced by a complex interplay of demographic, socioeconomic, and comorbid factors [22]. Leveraging data from the NHIS, this study systematically evaluated 22 variables spanning sociodemographic characteristics, health behaviors, and chronic conditions to identify key predictors of arthritis. Through baseline comparisons and machine learning-based variable selection, 15 critical variables were identified, including age, sex, hypertension, and chronic obstructive pulmonary disease (COPD). A nomogram integrating these variables showed robust predictive performance (AUC = 0.814), outperforming individual predictors. These findings highlight the multifactorial nature of arthritis risk and provide a data-driven tool for individualized risk stratification, consistent with current efforts to optimize early diagnosis and resource allocation in arthritis management [23,24].

The present study found that the arthritis cohort had a higher proportion of older age (≥65 years) and females (59.4% and 61.3%, respectively), which is consistent with previous findings. Aging is strongly associated with cartilage degeneration and systemic inflammation, while hormonal differences, particularly the role of oestrogen in immune modulation, may explain the sex differences [24–26]. Notably, the higher prevalence of hypertension (71.4%) and coronary heart disease (CHD, 86.8%) in arthritis patients aligns with evidence linking chronic inflammation to cardiovascular comorbidities. Pro-inflammatory cytokines such as TNF-α and IL-6, which are elevated in arthritis, accelerate endothelial dysfunction and atherosclerosis, creating a bidirectional relationship between arthritis and cardiovascular pathologies [25,27]. Similarly, obesity (BMI ≥ 30, 41.0%) contributes to mechanical joint stress and adipose-derived inflammation via adipokines like leptin, further supporting its role in arthritis pathogenesis [28].

COPD and hepatitis-factors less conventionally emphasized in arthritis research. The inclusion of COPD (77.5%) and hepatitis (96.2% non-hepatitis) as predictors underlines the systemic inflammatory cross-talk. Common mechanisms, such as neutrophil extracellular traps (NETs) in COPD and autoimmune dysregulation in viral hepatitis, may amplify joint inflammation [29]. Conversely, the lower prevalence of diabetes (60.3% non-diabetic) contrasts with some studies linking hyperglycemia to osteoarthritis progression. This discrepancy may reflect differences in population characteristics or confounding by antidiabetic therapies with anti-inflammatory properties.

The strong predictive ability of the nomogram (AUC: 0.814) outperforms single variable models, highlighting the value of multivariate risk assessment. This is consistent with prior studies advocating integrated models for chronic disease prediction [30]. For instance, a similar nomogram for rheumatoid arthritis achieved an AUC of 0.79 by incorporating age, BMI, and smoking [31]. Our model extends this by integrating understudied variables like hepatitis and dementia, thereby increasing granularity. Clinically, such tools allow for personalised risk quantification, supporting early intervention, e.g., targeting smoking cessation in high-risk obese individuals or intensifying comorbidity management in hypertensive patients. Additionally, the calibration slope close to 1 suggests reliability across risk strata, supporting its utility in diverse populations.

Decision curve analysis (DCA) further validates the clinical relevance of the nomogram, demonstrating superior net benefit across threshold probabilities. This is in contrast to traditional biomarkers like CRP or ESR, which lack specificity for arthritis [32]. However, the model’s reliance on self-reported NHIS data may introduce recall bias, a limitation mitigated by the survey’s rigorous validation protocols.

In conclusion, this study leveraged the NHIS database and machine learning approaches to identify 15 critical variables-including age, sex, poverty status, and comorbidities-that collectively inform arthritis risk stratification. The robust nomogram model (AUC = 0.814) not only highlights the multifactorial nature of arthritis but also provides a novel framework for individualized risk assessment, advancing both mechanistic understanding and clinical management. However, several limitations must be acknowledged. The cross-sectional design limits causal interpretation, and reliance on self-reported diagnoses may underestimate true prevalence. Additionally, the absence of biomarker or genetic data restricts insight into underlying pathways. The lack of stratification by arthritis subtypes in this study may hinder researchers from precisely identifying the key pathogenic factors for each subtype, thereby impeding the formulation of targeted preventive strategies at the level of the underlying etiology. Furthermore, due to data limitations, we were unable to distinguish between patients groups with arthritis and those with joint pain, which may have led to an overestimation of the inflammatory burden in the results. Moreover, although the statistical analysis methods employed in this study can effectively validate the predictive value of risk factors, they exhibit significant limitations in terms of assumptions regarding variable associations, validation scenarios, data processing, and causal inference. Future studies could further enhance the scientific rigor and clinical applicability of the model by incorporating nonlinear regression models, conducting external multi-center validations, and adopting causal inference methodologies. Future studies integrating longitudinal designs and multi-omics data may further refine predictive accuracy and biological relevance.

5. Conclusion

This machine learning study identifies 14 arthritis predictors (established/novel) to develop a clinical nomogram, emphasizing multivariable modeling for precision medicine and comorbidity management. Although limited by cross-sectional data, it provides a framework for early intervention. Future research requires longitudinal validation and mechanistic exploration of predictor interactions to refine prevention strategies.

Supporting information

S1 Table. Variable and its number information.

https://doi.org/10.1371/journal.pone.0336018.s001

(DOCX)

Acknowledgments

We would like to express our sincere gratitude to all individuals and organizations who supported and assisted us throughout this research. Special thanks to the following authors: Zhiwei Long. In conclusion, we extend our thanks to everyone who has supported and assisted us along the way. Without your support, this research would not have been possible.

References

1. Cao P-Y, Wu K, Qian J-H, Luo H-Q, Ren X-H. Prevalence and Associated Factors of Arthritis in Middle- and Old-aged Populations in China. Sichuan Da Xue Xue Bao Yi Xue Ban. 2017;48(2):268–71. pmid:28612540
- View Article
- PubMed/NCBI
- Google Scholar
2. Haas J-P, Weimann V, Feist E. Polyarticular juvenile idiopathic arthritis and rheumatoid arthritis : Common features and differences. Z Rheumatol. 2022;81(1):4–13. pmid:34713333
- View Article
- PubMed/NCBI
- Google Scholar
3. Tang S, Zhang C, Oo W, Fu K, Risberg M, Bierma-Zeinstra S, et al. Osteoarthritis. Nature reviews Disease primers. 2025;11:10.
- View Article
- Google Scholar
4. Penghao J, Shuwen Q, Junchao H, Liping W, Yuemei W, Peng W. Hydrolysis of 2D nanosheets reverses rheumatoid arthritis through anti-inflammation and osteogenesis. Adv Mater. 2024;37.
- View Article
- Google Scholar
5. Emma S, Damian GH, Marita C, Theo V, Mohsen N, Rachelle B, et al. The global burden of other musculoskeletal disorders: estimates from the Global Burden of Disease 2010 study. Ann Rheum Dis. 2014;73.
- View Article
- Google Scholar
6. James MG, Maud W, Andra B, Heike AB-F, Annelies B, Giulio C. EULAR recommendations regarding lifestyle behaviours and work participation to prevent progression of rheumatic and musculoskeletal diseases. Ann Rheum Dis. 2022;82.
- View Article
- Google Scholar
7. Salud PC, Miguel Angelo Junior D, Rosario O, Fernando RA, Rocío IG, Verónica CS, et al. Trends, characteristics and mortality of U.S. adults unable to do aerobic leisure-time physical activity: The U.S. National Health Interview Survey 1998-2018. Med Sci Sports Exerc. 2025.
- View Article
- Google Scholar
8. QuickStats: Age-Adjusted Percentage* of Adults Aged ≥18 Years With Arthritis,† by Sex and Race and Hispanic Origin - National Health Interview Survey,§ United States, 2021. MMWR Morb Mortal Wkly Rep. 2023;72(4):109. pmid:36701259
- View Article
- PubMed/NCBI
- Google Scholar
9. Pocha CC, Chrusciel T, Salas J, Eisen S, Callahan LF, Ory MG, et al. Neighborhood Characteristics and Walking Behavior Among Adults With Arthritis: A National Health Interview Survey Study. Arthritis Care Res (Hoboken). 2025;77(1):136–42. pmid:39155669
- View Article
- PubMed/NCBI
- Google Scholar
10. Prevalence of doctor-diagnosed arthritis and arthritis-attributable activity limitation --- United States, 2007-2009. MMWR Morbidity and Mortality Weekly Report. 2010;59:1261–5.
- View Article
- Google Scholar
11. Cross M, Smith E, Hoy D, Nolte S, Ackerman I, Fransen M, et al. The global burden of hip and knee osteoarthritis: estimates from the global burden of disease 2010 study. Ann Rheum Dis. 2014;73(7):1323–30. pmid:24553908
- View Article
- PubMed/NCBI
- Google Scholar
12. GBD 2021 Gout Collaborators. Global, regional, and national burden of gout, 1990-2020, and projections to 2050: a systematic analysis of the Global Burden of Disease Study 2021. Lancet Rheumatol. 2024;6(8):e507–17. pmid:38996590
- View Article
- PubMed/NCBI
- Google Scholar
13. Lundberg K, Bengtsson C, Kharlamova N, Reed E, Jiang X, Kallberg H, et al. Genetic and environmental determinants for disease risk in subsets of rheumatoid arthritis defined by the anticitrullinated protein/peptide antibody fine specificity profile. Ann Rheum Dis. 2013;72(5):652–8. pmid:22661643
- View Article
- PubMed/NCBI
- Google Scholar
14. Ryan ZW, Armaan J, Ziqing W, Shozen D, Malathi S, Gloria K, et al. Toward precision sleep medicine: variations in sleep outcomes among disaggregated Asian Americans in the National Health Interview Survey (2006-2018). J Clin Sleep Med. 2023;19.
- View Article
- Google Scholar
15. Rosemarie K, Cecily L, Kurt G. Active epilepsy prevalence among U.S. adults is 1.1% and differs by educational level-National Health Interview Survey, United States, 2021. Epilepsy Behav. 2023;142.
- View Article
- Google Scholar
16. Khouzam RN, Gardner JD, Bomb R, Holden AA. Multi-vessel giant coronary artery aneurysm in an elderly female. Ann Transl Med. 2017;5(10):210. pmid:28603725
- View Article
- PubMed/NCBI
- Google Scholar
17. Panos A, Mavridis D. TableOne: an online web application and R package for summarising and visualising data. Evid Based Ment Health. 2020;23(3):127–30. pmid:32665250
- View Article
- PubMed/NCBI
- Google Scholar
18. Zang X, Li C, Wang Y, Huang X, Wang X, Zhang W, et al. Protein profile of circulating extracellular vesicles reveals biomarker candidates for diagnosis of post-traumatic deep vein thrombosis. Clin Chim Acta. 2024;561:119721. pmid:38796050
- View Article
- PubMed/NCBI
- Google Scholar
19. Sachs MC. plotROC: A Tool for Plotting ROC Curves. J Stat Softw. 2017;79:2. pmid:30686944
- View Article
- PubMed/NCBI
- Google Scholar
20. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:77. pmid:21414208
- View Article
- PubMed/NCBI
- Google Scholar
21. Huang Y, Cai L, Liu X, Wu Y, Xiang Q, Yu R. Exploring biomarkers and transcriptional factors in type 2 diabetes by comprehensive bioinformatics analysis on RNA-Seq and scRNA-Seq data. Ann Transl Med. 2022;10(18):1017. pmid:36267740
- View Article
- PubMed/NCBI
- Google Scholar
22. Yin K-J, Huang J-X, Wang P, Yang X-K, Tao S-S, Li H-M, et al. No Genetic Causal Association Between Periodontitis and Arthritis: A Bidirectional Two-Sample Mendelian Randomization Analysis. Front Immunol. 2022;13:808832. pmid:35154127
- View Article
- PubMed/NCBI
- Google Scholar
23. Sang Heon S, Jin Hyung J, Tae Ryom O, Eun Mi Y, Hong Sang C, Chang Seong K. Rheumatoid arthritis and the risk of end-stage renal disease: A nationwide, population-based study. Front Med (Lausanne). 10.
- View Article
- Google Scholar
24. Ro J, Kim SH, Kim H-R, Lee S-H, Min HK. Impact of lifestyle and comorbidities on seropositive rheumatoid arthritis risk from Korean health insurance data. Sci Rep. 2022;12(1):2201. pmid:35140294
- View Article
- PubMed/NCBI
- Google Scholar
25. Yang Z-J, Liu Y, Liu Y-L, Qi B, Yuan X, Shi W-X, et al. Osteoarthritis and hypertension: observational and Mendelian randomization analyses. Arthritis Res Ther. 2024;26(1):88. pmid:38632649
- View Article
- PubMed/NCBI
- Google Scholar
26. Xiaopeng L, Hou O, Ching Lung C, Bernard MY. Is hypertension associated with arthritis? The United States national health and nutrition examination survey 1999-2018. Ann Med. 2022;54.
- View Article
- Google Scholar
27. Shania B, Burkhard M, Jennifer A. Ferroptosis in arthritis: driver of the disease or therapeutic option? Int J Mol Sci. 2024;25.
- View Article
- Google Scholar
28. Venetsanopoulou AI, Alamanos Y, Voulgari PV, Drosos AA. Epidemiology of rheumatoid arthritis: genetic and environmental influences. Expert Rev Clin Immunol. 2022;18(9):923–31. pmid:35904251
- View Article
- PubMed/NCBI
- Google Scholar
29. Jonathan K, Christie MB. Multimorbidity in rheumatoid arthritis: literature review and future directions. Curr Rheumatol Rep. 2023;26.
- View Article
- Google Scholar
30. Cheng-Hsien H, Li-Yu L, An-Ping H. Editorial: rheumatoid arthritis and chronic obstructive pulmonary disease: pathogenesis and treatment challenges. Int J Rheum Dis. 2024;27.
- View Article
- Google Scholar
31. Jie P, Yuhua H, Nengzhi P, Lili Y. Association between dietary niacin intake and nonalcoholic fatty liver disease: NHANES 2003-2018. Nutrients. 2023;15.
- View Article
- Google Scholar
32. Zhang M, Wan L, Fang H, Zhang X, Wang S, Li F, et al. Integrated Multi-Level Investigation of Friend Leukemia Integration 1 Transcription Factor as a Novel Immune-Inflammatory Biomarker in Rheumatoid Arthritis: Bridging Bioinformatics, Clinical Cohorts, and Mechanistic Validation. J Inflamm Res. 2025;18:3105–23. pmid:40059948
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Cao P-Y, Wu K, Qian J-H, Luo H-Q, Ren X-H. Prevalence and Associated Factors of Arthritis in Middle- and Old-aged Populations in China. Sichuan Da Xue Xue Bao Yi Xue Ban. 2017;48(2):268–71. pmid:28612540
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Haas J-P, Weimann V, Feist E. Polyarticular juvenile idiopathic arthritis and rheumatoid arthritis : Common features and differences. Z Rheumatol. 2022;81(1):4–13. pmid:34713333
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Tang S, Zhang C, Oo W, Fu K, Risberg M, Bierma-Zeinstra S, et al. Osteoarthritis. Nature reviews Disease primers. 2025;11:10.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref4] 4. Penghao J, Shuwen Q, Junchao H, Liping W, Yuemei W, Peng W. Hydrolysis of 2D nanosheets reverses rheumatoid arthritis through anti-inflammation and osteogenesis. Adv Mater. 2024;37.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref5] 5. Emma S, Damian GH, Marita C, Theo V, Mohsen N, Rachelle B, et al. The global burden of other musculoskeletal disorders: estimates from the Global Burden of Disease 2010 study. Ann Rheum Dis. 2014;73.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref6] 6. James MG, Maud W, Andra B, Heike AB-F, Annelies B, Giulio C. EULAR recommendations regarding lifestyle behaviours and work participation to prevent progression of rheumatic and musculoskeletal diseases. Ann Rheum Dis. 2022;82.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref7] 7. Salud PC, Miguel Angelo Junior D, Rosario O, Fernando RA, Rocío IG, Verónica CS, et al. Trends, characteristics and mortality of U.S. adults unable to do aerobic leisure-time physical activity: The U.S. National Health Interview Survey 1998-2018. Med Sci Sports Exerc. 2025.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref8] 8. QuickStats: Age-Adjusted Percentage* of Adults Aged ≥18 Years With Arthritis,† by Sex and Race and Hispanic Origin - National Health Interview Survey,§ United States, 2021. MMWR Morb Mortal Wkly Rep. 2023;72(4):109. pmid:36701259
View Article
PubMed/NCBI
Google Scholar

[25] View Article

[26] PubMed/NCBI

[27] Google Scholar

[ref9] 9. Pocha CC, Chrusciel T, Salas J, Eisen S, Callahan LF, Ory MG, et al. Neighborhood Characteristics and Walking Behavior Among Adults With Arthritis: A National Health Interview Survey Study. Arthritis Care Res (Hoboken). 2025;77(1):136–42. pmid:39155669
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref10] 10. Prevalence of doctor-diagnosed arthritis and arthritis-attributable activity limitation --- United States, 2007-2009. MMWR Morbidity and Mortality Weekly Report. 2010;59:1261–5.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref11] 11. Cross M, Smith E, Hoy D, Nolte S, Ackerman I, Fransen M, et al. The global burden of hip and knee osteoarthritis: estimates from the global burden of disease 2010 study. Ann Rheum Dis. 2014;73(7):1323–30. pmid:24553908
View Article
PubMed/NCBI
Google Scholar

[36] View Article

[37] PubMed/NCBI

[38] Google Scholar

[ref12] 12. GBD 2021 Gout Collaborators. Global, regional, and national burden of gout, 1990-2020, and projections to 2050: a systematic analysis of the Global Burden of Disease Study 2021. Lancet Rheumatol. 2024;6(8):e507–17. pmid:38996590
View Article
PubMed/NCBI
Google Scholar

[40] View Article

[41] PubMed/NCBI

[42] Google Scholar

[ref13] 13. Lundberg K, Bengtsson C, Kharlamova N, Reed E, Jiang X, Kallberg H, et al. Genetic and environmental determinants for disease risk in subsets of rheumatoid arthritis defined by the anticitrullinated protein/peptide antibody fine specificity profile. Ann Rheum Dis. 2013;72(5):652–8. pmid:22661643
View Article
PubMed/NCBI
Google Scholar

[44] View Article

[45] PubMed/NCBI

[46] Google Scholar

[ref14] 14. Ryan ZW, Armaan J, Ziqing W, Shozen D, Malathi S, Gloria K, et al. Toward precision sleep medicine: variations in sleep outcomes among disaggregated Asian Americans in the National Health Interview Survey (2006-2018). J Clin Sleep Med. 2023;19.
View Article
Google Scholar

[48] View Article

[49] Google Scholar

[ref15] 15. Rosemarie K, Cecily L, Kurt G. Active epilepsy prevalence among U.S. adults is 1.1% and differs by educational level-National Health Interview Survey, United States, 2021. Epilepsy Behav. 2023;142.
View Article
Google Scholar

[51] View Article

[52] Google Scholar

[ref16] 16. Khouzam RN, Gardner JD, Bomb R, Holden AA. Multi-vessel giant coronary artery aneurysm in an elderly female. Ann Transl Med. 2017;5(10):210. pmid:28603725
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref17] 17. Panos A, Mavridis D. TableOne: an online web application and R package for summarising and visualising data. Evid Based Ment Health. 2020;23(3):127–30. pmid:32665250
View Article
PubMed/NCBI
Google Scholar

[58] View Article

[59] PubMed/NCBI

[60] Google Scholar

[ref18] 18. Zang X, Li C, Wang Y, Huang X, Wang X, Zhang W, et al. Protein profile of circulating extracellular vesicles reveals biomarker candidates for diagnosis of post-traumatic deep vein thrombosis. Clin Chim Acta. 2024;561:119721. pmid:38796050
View Article
PubMed/NCBI
Google Scholar

[62] View Article

[63] PubMed/NCBI

[64] Google Scholar

[ref19] 19. Sachs MC. plotROC: A Tool for Plotting ROC Curves. J Stat Softw. 2017;79:2. pmid:30686944
View Article
PubMed/NCBI
Google Scholar

[66] View Article

[67] PubMed/NCBI

[68] Google Scholar

[ref20] 20. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:77. pmid:21414208
View Article
PubMed/NCBI
Google Scholar

[70] View Article

[71] PubMed/NCBI

[72] Google Scholar

[ref21] 21. Huang Y, Cai L, Liu X, Wu Y, Xiang Q, Yu R. Exploring biomarkers and transcriptional factors in type 2 diabetes by comprehensive bioinformatics analysis on RNA-Seq and scRNA-Seq data. Ann Transl Med. 2022;10(18):1017. pmid:36267740
View Article
PubMed/NCBI
Google Scholar

[74] View Article

[75] PubMed/NCBI

[76] Google Scholar

[ref22] 22. Yin K-J, Huang J-X, Wang P, Yang X-K, Tao S-S, Li H-M, et al. No Genetic Causal Association Between Periodontitis and Arthritis: A Bidirectional Two-Sample Mendelian Randomization Analysis. Front Immunol. 2022;13:808832. pmid:35154127
View Article
PubMed/NCBI
Google Scholar

[78] View Article

[79] PubMed/NCBI

[80] Google Scholar

[ref23] 23. Sang Heon S, Jin Hyung J, Tae Ryom O, Eun Mi Y, Hong Sang C, Chang Seong K. Rheumatoid arthritis and the risk of end-stage renal disease: A nationwide, population-based study. Front Med (Lausanne). 10.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref24] 24. Ro J, Kim SH, Kim H-R, Lee S-H, Min HK. Impact of lifestyle and comorbidities on seropositive rheumatoid arthritis risk from Korean health insurance data. Sci Rep. 2022;12(1):2201. pmid:35140294
View Article
PubMed/NCBI
Google Scholar

[85] View Article

[86] PubMed/NCBI

[87] Google Scholar

[ref25] 25. Yang Z-J, Liu Y, Liu Y-L, Qi B, Yuan X, Shi W-X, et al. Osteoarthritis and hypertension: observational and Mendelian randomization analyses. Arthritis Res Ther. 2024;26(1):88. pmid:38632649
View Article
PubMed/NCBI
Google Scholar

[89] View Article

[90] PubMed/NCBI

[91] Google Scholar

[ref26] 26. Xiaopeng L, Hou O, Ching Lung C, Bernard MY. Is hypertension associated with arthritis? The United States national health and nutrition examination survey 1999-2018. Ann Med. 2022;54.
View Article
Google Scholar

[93] View Article

[94] Google Scholar

[ref27] 27. Shania B, Burkhard M, Jennifer A. Ferroptosis in arthritis: driver of the disease or therapeutic option? Int J Mol Sci. 2024;25.
View Article
Google Scholar

[96] View Article

[97] Google Scholar

[ref28] 28. Venetsanopoulou AI, Alamanos Y, Voulgari PV, Drosos AA. Epidemiology of rheumatoid arthritis: genetic and environmental influences. Expert Rev Clin Immunol. 2022;18(9):923–31. pmid:35904251
View Article
PubMed/NCBI
Google Scholar

[99] View Article

[100] PubMed/NCBI

[101] Google Scholar

[ref29] 29. Jonathan K, Christie MB. Multimorbidity in rheumatoid arthritis: literature review and future directions. Curr Rheumatol Rep. 2023;26.
View Article
Google Scholar

[103] View Article

[104] Google Scholar

[ref30] 30. Cheng-Hsien H, Li-Yu L, An-Ping H. Editorial: rheumatoid arthritis and chronic obstructive pulmonary disease: pathogenesis and treatment challenges. Int J Rheum Dis. 2024;27.
View Article
Google Scholar

[106] View Article

[107] Google Scholar

[ref31] 31. Jie P, Yuhua H, Nengzhi P, Lili Y. Association between dietary niacin intake and nonalcoholic fatty liver disease: NHANES 2003-2018. Nutrients. 2023;15.
View Article
Google Scholar

[109] View Article

[110] Google Scholar

[ref32] 32. Zhang M, Wan L, Fang H, Zhang X, Wang S, Li F, et al. Integrated Multi-Level Investigation of Friend Leukemia Integration 1 Transcription Factor as a Novel Immune-Inflammatory Biomarker in Rheumatoid Arthritis: Bridging Bioinformatics, Clinical Cohorts, and Mechanistic Validation. J Inflamm Res. 2025;18:3105–23. pmid:40059948
View Article
PubMed/NCBI
Google Scholar

[112] View Article

[113] PubMed/NCBI

[114] Google Scholar

Figures

Abstract

1. Introduction

2. Materials and methods

2.1. Data collection

2.2. Outcome definition

2.3. Variables definition

2.4. Statistical analysis

3. Results

3.1. Identification of key variables

3.2. Establishment of a robust nomogram model based on key variables

4. Discussion

5. Conclusion

Supporting information

S1 Table. Variable and its number information.

Acknowledgments

References