Simple Scoring System and Artificial Neural Network for Knee Osteoarthritis Risk Prediction: A Cross-Sectional Study

Background Knee osteoarthritis (OA) is the most common joint disease of adults worldwide. Since the treatments for advanced radiographic knee OA are limited, clinicians face a significant challenge of identifying patients who are at high risk of OA in a timely and appropriate way. Therefore, we developed a simple self-assessment scoring system and an improved artificial neural network (ANN) model for knee OA. Methods The Fifth Korea National Health and Nutrition Examination Surveys (KNHANES V-1) data were used to develop a scoring system and ANN for radiographic knee OA. A logistic regression analysis was used to determine the predictors of the scoring system. The ANN was constructed using 1777 participants and validated internally on 888 participants in the KNHANES V-1. The predictors of the scoring system were selected as the inputs of the ANN. External validation was performed using 4731 participants in the Osteoarthritis Initiative (OAI). Area under the curve (AUC) of the receiver operating characteristic was calculated to compare the prediction models. Results The scoring system and ANN were built using the independent predictors including sex, age, body mass index, educational status, hypertension, moderate physical activity, and knee pain. In the internal validation, both scoring system and ANN predicted radiographic knee OA (AUC 0.73 versus 0.81, p<0.001) and symptomatic knee OA (AUC 0.88 versus 0.94, p<0.001) with good discriminative ability. In the external validation, both scoring system and ANN showed lower discriminative ability in predicting radiographic knee OA (AUC 0.62 versus 0.67, p<0.001) and symptomatic knee OA (AUC 0.70 versus 0.76, p<0.001). Conclusions The self-assessment scoring system may be useful for identifying the adults at high risk for knee OA. The performance of the scoring system is improved significantly by the ANN. We provided an ANN calculator to simply predict the knee OA risk.

Nutrition Examination Survey (KNHANES V-1, online at http://knhanes.cdc.go.kr/knhanes) and the Osteoarthritis Initiative (OAI, online at www.oai.ucsf.edu). Since all data were available on the web and data analysis was secondary, no ethical statement was required for this work. The KNHANES V-1 was approved by the institutional review board (IRB) of the Korean Centers for Disease Control and Prevention (approval no. 2010-02CON-21-C), and all participants provided written consent. The OAI was approved by the IRB for the Committee on Human Research, University of California, San Francisco (approval no. 10-00532).

Data source and subjects
The KNHANES is an ongoing population-based and nationwide epidemiological survey conducted by the Korea Center for Disease Control and Prevention, Ministry of Health and Welfare [14]. The KNHANES consists of a health interview survey, a health examination survey (physical examination and clinical measurements), and a nutrition survey. In the KNHANES V-1 conducted in 2010, bilateral knee plain radiographs were assessed for all participants older than 50 years. All individuals, total participants from 3840 households, were randomly selected from 192 survey locations using stratified sampling, considering population gender, age, regional area, and type of residential area.
Initial candidates for this study included 3075 participants. Eligible participants were those who underwent both the right and left knee radiographic examination. To reduce the confounding factors that might influence knee OA, we excluded 315 participants who were receiving treatment for knee OA. We also excluded 16 participants who did not respond to the medical history interview and 79 participants with missing data in the health examination survey. Finally, a total of 2665 participants were included in this study.
The dataset were separated randomly into two independent groups, training and internal validation groups (Fig 1). The training group, comprised of two thirds (1777 participants) of the entire dataset, was used to construct an ANN model. The internal validation group, comprised of one third (888 participants) of the entire dataset, was used to assess the ability to predict knee OA.

Health interview survey and physical examination
The health interview survey, including knee OA symptoms, was conducted through a face-toface interview by trained interviewers. In the KNHANES V-1, we defined participants as having knee pain or stiffness by asking whether they had experienced knee pain or stiffness for more than 30 days during last 3 months. Each participant was also interviewed and completed a questionnaire exploring educational status, household income, alcohol consumption, smoking status, diabetes mellitus, hypertension, and physical activities. Walking, moderate, and heavy physical activities were measured as the average time per day. Hypertension was defined as a diastolic blood pressure (DBP) 90 mmHg, a systolic blood pressure (SBP) 140 mmHg, a self-reported physician diagnosis, or use of anti-hypertensive medications [15]. Moderate activities included activities such as carrying light objects, sweeping, mopping, vacuuming, and brisk walking. Heavy activities included occupational works involving heavy lifting and strenuous sports or recreation. Height, weight, and waist circumference were measured, and body mass index (BMI) was calculated.

Radiographic examination of the knee and definition of knee OA
In the KNHANES V-1, bilateral anteroposterior, lateral, and weight-bearing anteroposterior plain radiographs of knees were taken [16]. Radiographic changes relating to OA were assessed using the Kellgren/Lawrence (KL) grade [17]. The radiographic images were graded by trained two radiologists with concordant grades accepted. When there was a difference of 1 grade between two radiologists, the higher grade was accepted. If the difference was more than 1 grade, a third radiologist was consulted, and the grade concordant with third grade was accepted. We defined radiographic knee OA as having KL grade 2 in one or both knees [5]. Participants with radiographic knee OA and concurrent knee pain were defined as symptomatic knee OA [7]. Our definition of radiographic knee OA applied the same criteria as earlier epidemiologic studies [5], [7], [17]. Although prediction for radiographic knee OA with KL grade 2 is worthwhile, there is a continuous relationship between severity of knee OA and risk variables such age, BMI, and pain [18][19][20]. Therefore, we also investigated the prediction models for more severe radiographic knee OA with KL grade 3 and 4.

Development of the scoring system
For risk prediction model development, the association between risk factors and radiographic knee OA was examined by multivariable LR [21]. Based on the development dataset (KNHANES V-1), we included a comprehensive list of variables in Table 1 considered to be potentially associated with knee OA in a risk score model. To simplify the risk model, age range was divided into three levels (<60, 60-69, and 70 years). BMI range was also divided into three groups by the cut-off value of overweight (23 kg/m 2 ) and obesity (25 kg/m 2 ) based on the definition of obesity in the Asian regions [22]. Backward elimination was performed until we reached a final model with significant covariates. We intentionally used only categorized variables for LR to develop a simple scoring system. We developed a scoring system by assigning scores of 0-2 to multiple categories and scores of 0-1 to binary categories. This scoring system, which was calculated by summing up the arbitrary values for each risk factor, has been widely accepted for the prediction of diseases [23], [24].

Development of the artificial neural network
ANN models were constructed by use of NeuroSolution version 6.0 (NeuroDimension, Gainesville, FL). NeuroSolution is a professional software that simplifies the construction of ANN [25]. This software allowed simultaneous testing of different type of neural networks including generalized regression neural network, multilayer perceptron, probabilistic neural network, radial basis neural network, feed-forward neural network, and support vector machine. To avoid over-fitting, the prediction models were internally validated using cross validation. Performances of the prediction models were monitored during training and cross-validation to obtain optimal algorithm parameters, such as learning rate, momentum, and number of hidden nodes. The ANN construction was accomplished by the training group. In order to establish a simple prediction model, the same predictors selected in the scoring system were adopted to implement the modeling of ANN input layer. The ANN model was trained with the five-grade scale of radiographic severity (KL grade of 0-4) as an output variable. This training scheme was similar to multivariate linear regression. However, it produced nonlinear regression function which was optimized for prediction for individuals' KL grades [26]. Such ANN training scheme has been widely used for analysis of polychotomous grade prediction [27], [28]. Finally, the ANN model was used for prediction of four clinical outcomes with different cut-off values. The four primary outcome variables were presence of radiographic knee OA with KL grade 2, 3, 4, and symptomatic knee OA. In order to compare the performance of the ANN model, LR models for each clinical outcome were also constructed using the same training dataset.

External validation
Performance of the prediction model was evaluated in independent data, the OAI study. The OAI is a multicenter longitudinal cohort study; a prospective natural history study investigating the development and progression of knee OA in men and women ages 45-79 years at enrollment. Annual OAI interviews began in 2004 at 4 clinical sites, Baltimore, Columbus, Pittsburgh, and Pawtucket. The first 2 years of assessments have been completed, and those data have been publicly released [29]. We used version 0.2.2 AllClinical00, which was comprised of demographic, clinical, and knee imaging data. A total of 4731 of 4796 participants underwent both the knee radiographic examination and were eligible for external validation group. In the OAI, osteophyte and joint space narrowing scores were assessed for each knee by trained radiologists according to the OARSI Atlas grades. For our analysis, we computed KL grades for each knee using the equations provided on the OAI website and used the greater one among the right and left KL grades [18]. Due to the different definition of obesity in the non-Asian regions, the scoring system in the OAI adopted the modified cut-off values of overweight (25 kg/m 2 ) and obesity (30 kg/m 2 ) [22].

Statistical analysis
The prediction models were validated in two populations, the KNHANES V-1 (internal validation group) and OAI (external validation group). Area under the curve (AUC) of the receiver operating characteristic (ROC), accuracy, sensitivity, and specificity of the scoring model, LR, and ANN were calculated. We generated the ROC curves and selected cut-off points which maximized Youden's index [30]. Participants above the cut-off points were classified as being at high risk in each prediction model. We used SPSS 18.0 (SPSS Inc., Chicago, IL) for statistical analysis and MedCalc 12.3 (MedCalc, MariaKerke, Belgium) for ROC analysis.

Population characteristics
Characteristics of the KNHANES V-1 and OAI are presented in Table 1. Of 2665 participants from the KNHANES V-1, 958 (35.9%) had radiographic knee OA. Among 958 participants, 285 with pain were classified as having symptomatic knee OA. In the OAI, 2638 (55.8%) of 4731 participants had radiographic knee OA. Among 2638 participants, 1462 had symptomatic knee OA. There were significantly different demographic features between the internal validation group (KNHANES V-1) and external validation group (OAI). Especially, the participants in the OAI had higher BMI and waist circumference, were more likely to have knee pain and stiffness, but were less likely to have diabetes mellitus and hypertension than those in the internal validation group.

Calculation of prediction models
Multivariable LR demonstrated that seven predictors had a statistically significant association with radiographic knee OA in the development dataset ( Table 2). The numeric value was assigned to each variable, and we calculated individuals' score (range 0-9). The predictors selected for the scoring system included sex, age, BMI, educational status (graduated from college), hypertension, moderate physical activity, and knee pain. These predictors were also used to establish the LR and ANN models. Fig 2 presents the prevalence of radiographic and symptomatic knee OA for each risk score. In the KNHANES V-1, the prevalence of radiographic knee OA increased gradually, while the prevalence of symptomatic knee OA increased dramatically as the risk scores increased. Consistent results were observed when we applied the scoring system to the OAI. According to ROC analysis, a cut-off of 5 was selected for an indicator of high risk group in both the KNHANES V-1 and OAI for all clinical outcomes (S1 Table).
The ANN was trained with seven predictors, which were selected by the scoring system, as input variables. The model chosen for radiographic knee OA prediction was a multilayer perceptron neural network with back-propagation algorithm [12]. We found three neurons in the hidden layer. When the prediction performance of 10-fold cross validation was assessed in the training group, the final model showed an AUC of 0.80 and an accuracy of 71.9% for radiographic knee OA with KL grade 2. This ANN model was superior to the binary ANN models, which were trained separately for each clinical outcome with binary class as an output (S2 Table). Categorization by the binary ANN models caused the loss of the information about severity of knee OA, and it might lead to performance degradation [31].

Performance of prediction models
The Spearman's correlations between input variables and KL grade showed low range of 0.05-0.38 in development dataset. The KL grade was more significantly associated with the scoring system (r = 0.46, p<0.001) and the ANN (r = 0.59, p<0.001) in the internal validation group. In the external validation group, the scoring system (r = 0.26, p<0.001) and ANN (r = 0.36, p<0.001) also showed higher correlation with KL grade than input variables, which showed range of 0.01-0.22.  scoring system yielded an accuracy of 70.5%, sensitivity of 54.0%, and specificity of 78.8% for radiographic knee OA with KL grade 2 in the internal validation. The ANN predicted radiographic knee OA with an accuracy of 73.6%, sensitivity of 73.2%, and specificity of 73.9%, and was significantly superior to the scoring system (p<0.001) and LR (p = 0.018) in the internal validation group. Both scoring system and ANN showed a lower discriminative ability in predicting radiographic knee OA (AUC 0.62 versus 0.67, p<0.001) and symptomatic knee OA (AUC 0.70 versus 0.76, p<0.001) in the external validation. Table 3 shows the results of prediction modes for 4 clinical outcomes in the internal and external validation groups. We observed increasing prediction performance with increasing KL grade. For example, the AUCs in the internal validation were 0.73, 0.76, and 0.81 for KL grade 2, 3, and 4, respectively. It is important to identify the participants with radiographic knee OA among the participants complaining of knee pain, especially for clinicians [5]. Therefore, we also evaluated the discriminative ability to predict radiographic knee OA in participants with knee pain. Performance of prediction models for radiographic knee OA with KL grade 2 among the participants with knee pain is shown in Table 4. The scoring system and ANN showed the similar performance to the results in Table 3 in predicting the internal and external validation subgroups that had knee pain.

Development of a risk prediction calculator
Risk stratification is important because it provides easier insight into severity [32]. Based on the ROC analysis of prediction models for radiographic knee OA, participants were classified into two group, low risk and high risk groups. In the KNHANES V-1, high risk groups classified by the scoring system and ANN were 33.3% and 43.4% of participants, respectively. In the OAI, high risk groups classified by the scoring system and ANN were 53.4% and 53.5%, respectively. Fig 4 shows odds ratios of radiographic knee OA in the different risk groups indicated by the scoring system and ANN. Although the prediction models for KL grade 2 showed the lowest discriminative power, the results demonstrated that the scoring system and ANN effectively predicted the risk for radiographic knee OA with KL grade 2. The high risk group defined by the scoring system had odds ratio of 4.81 compared to the low risk group, and the high risk group defined by the ANN had odds ratio of 7.34 in the KNHANES V-1. In the OAI, the odds ratios were lower than those in the KNHANES V-1. We developed a simple ANN calculator to simply measure the knee OA risk. This program is based on Visual C++ computer language. This calculator is designed for use of the selfassessment setting to predict an individual's risk group. Fig 5 shows a screen

Discussion
To our knowledge, this is the first study to develop a simple scoring system and an ANN model for knee OA risk prediction using large population-based data. This self-assessment scoring system may be useful for identifying patients at high risk for knee OA. We found that the performance of the scoring system was improved significantly by the ANN when the same information was given. The predictors including sex, age, BMI, educational status, hypertension, moderate physical activity, and knee pain can be self-assessed or easily identified by the public health center. Such scoring system and ANN might be cost-effective screening tools identifying patients with untreated knee OA. These patients can then be received further evaluation such as knee radiograph and physical examination. However, these were designed for prediction of the disease, therefore they should be used for the purpose of the screening not the clinical diagnosis [7], [33].
The scoring system was developed to be easy and convenient for laypersons to perform a self assessment of knee OA risk. We intended to establish the simplest form of this scoring system. This scoring system may be also applied to mass screening for knee OA or public education about knee OA. If it is possible to use a computer, the ANN calculator could provide not only the risk score but also more accurate result computed by the ANN. Compared to other studies on risk prediction for knee OA, our scoring system and ANN had better performance. According to the Nottingham study, which suggested the first risk prediction model using conventional risk factors, the AUCs of radiographic and symptomatic knee OA were the same value of 0.60 when their prediction model was applied to the OAI population [7]. Our scoring system predicted radiographic and symptomatic knee OA with the AUCs of 0.62 and 0.70, respectively, and the ANN predicted with the AUCs of 0.66 and 0.76, respectively, for the OAI population. In addition, among the participants with pain, the scoring system and ANN predicted radiographic knee OA with the consistently good discriminative ability. If our prediction models retain good performance after validation for the patients complaining of knee pain in the outpatient clinic, it will be possible to use our prediction models as a cost-effective screening tool to determine candidates for knee radiograph.
We suggested that it would be possible to develop a predictive instrument using machine learning techniques such as ANN. The internal and external validation using ROC analysis supported that the ANN had a statistically significant improvement in predicting knee OA. ANN was more effective in analyzing the epidemiological underlying patterns of knee OA compared with the other methods, the scoring system and LR. This finding is consistent with the previous studies on the comparison of ANN and conventional methods in various complicated problems for predicting diseases [12], [25], [34]. Since ANN had an ability to incorporate nonlinearity in high dimensional space, it was possible to consider all factors for the improvement of sensitivity and specificity in predicting [10]. However, several studies pointed out that ANN could be considered as a black box due to its complexity [25]. Moreover, using the gradient descent learning algorithm, ANN intends to converge to local minima [35]. As a result, it suffers from the over-fitting problem. To avoid the local minima, finding optimal parameter is important but it is difficult [35]. Despite the high performance, ANN is mathematically difficult to apply, and this limits acceptance for many clinicians. To overcome this problem, we developed a practicable ANN calculator which can be easily adapted to the users. A major problem with the previous prediction system for knee OA was also difficulty in calculation of Simple Scoring and ANN for Knee OA LR model [7]. However, the ANN calculator will make it easy to use for the laypersons or clinicians and provide better performance for predicting knee OA.
Similar to earlier studies concerning prediction for knee OA, knee pain was selected as an important predictor [6]. Pain in OA patients is a leading cause of disability and the most common reason for total joint replacement surgery. However, pain is related to a subjective experience and influenced by social and environmental factors. Knee radiography is used as gold standard for knee OA because it reveals objective findings related to clinical outcomes [19], [26]. Even if a patient had radiographic knee OA without pain, recent researches recommended early treatment to prevent development of symptoms [3]. Therefore, both radiographic and symptomatic knee OA should be important clinical outcomes and we evaluated the combination of risk factors for prediction of both knee OA.
Our prediction model included traditional risk variables such as female, age, obesity, educational status. Educational level has been reported to be associated with physical factors on work-related musculoskeletal disorders [36]. Our results suggest that hypertension was associated with knee OA, and it was an unexpected predictor. The role of metabolic syndrome such as hypertension, diabetes mellitus, and hyperlipidemia was unclear [37]. However, recent studies supported the importance of the systemic metabolic effects in the pathophysiology of knee OA [16], and suggested that prevention of metabolic syndrome may reduce knee OA risk [38]. Several traditional risk variables were not included in our prediction models. Knee injury and family history of knee OA were excluded because they were not surveyed in KNHANES V-1. The percent of interim knee injury in the Framingham knee OA study were 2.7%, and odds ratio of interim knee injury with knee OA was 1.8 but it was not significant [39]. In meta-analysis of observational studies, prior history of knee injury was a strong risk factor for the development of knee OA, and odds ratio for case-control studies was 5.34 (95% CI 3.16-9.02) while that for cohort and cross-sectional studies were 3.74 (95% CI 2.16-6.47) and 3.34 (95% CI 1.95-5.75), respectively [40]. According to the previous studies, there is no effect of moderate physical activity on knee OA when the risk model was adjusted for knee injury [41]. Moreover, a prospective cohort study demonstrated an association between greater daily time spent in light intensity physical activities and reduced risk of onset and progression of disability in adults with OA of the knee or risk factors for knee OA [42]. Since we did not adjust for knee injury and this study was based on a cross-sectional survey, it is difficult to determine that moderate physical activity which was significantly associated with knee OA in this study could be direct risk factor for knee OA.
We found the differences in prediction performance between the KNHANES V-1 and OAI. This finding might result from the ethnic difference and genetic background [43]. The two population data have significant demographic and environmental differences influencing the onset and progression of knee OA. A previous study indicated that the undervalued performance was caused by the discrepancy of knee radiograph protocol [18]. While KL grades were directly obtained by two or three radiologists in the KNHANES V-1, the OAI employed OARSI Atlas grades instead of KL grades. Therefore, we needed to compute each KL grade with osteophyte and joint space narrowing score for the OAI. The different type of reading (original KL grade versus calculated KL grade) might affect the performance of the prediction models.
Tam et. al. investigated prediction protocol for predicting knee OA rehabilitation outcome using ANN [44]. To select a treatment protocol for the best improvements according to clinical conditions of patient, they applied the ANN to develop a computerized prediction system. There was a significant correlation between the rankings of the observed and expected pain improvement in the study, and the Spearman's rho was 0.424, which is statistically significant at p < 0.001 [44]. Lusina et. al. have developed an Osteoarthritis Risk Calculator (OA Risk C) and illustrated its acceptability and feasibility in a pilot study of 45 subjects using the Osteoarthritis Policy (OAPol) Model, which is a validated, state-transition simulation of the natural history and management of knee OA [45]. The model included age, sex, race/ethnicity, obesity status, family history of knee OA, occupational exposure to OA risk, and history of knee injury. Eighty-four percent of pilot study participants reported that OA Risk C was easy to understand, and 89% agreed that the graphs depicting their risk were clear and comprehensible [45]. Kerkhof et. al. investigated different types of risk prediction models for incident knee OA with questionnaire/easily obtainable variables, imaging variables, genetic and biochemical markers [46]. The performance of the model with gender, age, BMI, questionnaire variables, and genetic risk score in internal (Rotterdam Study-I), external (Rotterdam Study-II), and external (Chingford study) sets were 0.67, 0.62, and 0.64 of AUC, respectively. The AUC of ANN for KL 2, 3, and 4 was 0.66, 0.68, and 0.72, respectively in our external set. This indicates that our model shows slightly better AUC than the Kerkhof's model with genetic variables. The study note that a genetic risk score is not a very good predictor of future radiographic knee OA in an elderly population [46].
There are several limitations to this study. First, the study was based on a cross-sectional survey which had several defects due to medical views. For example, the prevalence of disease was based on a health interview survey taken on one occasion. BMI, physical activity status, as well as knee pain could differ according to the time of measurement. Secondly, we did not distinguish between tibiofemoral and patellofemoral knee OA. In recent years, two knee OA subsets have shown different pattern of etiology, risk factors, and symptoms [47]. In this study, KL grade did not consider the difference of these knee OA subsets. Third, the predictors in our prediction models included knee pain which is an important diagnostic criterion of symptomatic knee OA. In previous studies, it was a matter of the researcher's design whether pain was a risk factor or a clinical outcome [5], [7], [19]. Nonetheless, our study is worthwhile. When frequent knee pain occurred, our prediction models for knee OA may provide more accurate decision support than prediction model without knee pain as an input variable. Fourth, our results only apply to subjects not undergoing OA treatment, since we excluded subjects who were receiving treatment for knee OA. It would have been clinically interesting to identify factors associated with these patients with OA treatment in the future study, since they are the most affected clinically.

Conclusions
In conclusion, the most important finding of this study is the identification of patients at high risk of knee OA who need additional evaluation and appropriate treatment before aggravation. We developed a scoring system and an ANN, and validated them in the large population. The scoring system and ANN can be easily used and might contribute to the advancement of clinical decision tools. Further studies should be targeted at constructing an extended prediction model for progressive knee OA through the collection of prospective data.
Supporting Information S1