Development and Validation of a Risk Score Predicting Substantial Weight Gain over 5 Years in Middle-Aged European Men and Women

Background Identifying individuals at high risk of excess weight gain may help targeting prevention efforts at those at risk of various metabolic diseases associated with weight gain. Our aim was to develop a risk score to identify these individuals and validate it in an external population. Methods We used lifestyle and nutritional data from 53°758 individuals followed for a median of 5.4 years from six centers of the European Prospective Investigation into Cancer and Nutrition (EPIC) to develop a risk score to predict substantial weight gain (SWG) for the next 5 years (derivation sample). Assuming linear weight gain, SWG was defined as gaining ≥10% of baseline weight during follow-up. Proportional hazards models were used to identify significant predictors of SWG separately by EPIC center. Regression coefficients of predictors were pooled using random-effects meta-analysis. Pooled coefficients were used to assign weights to each predictor. The risk score was calculated as a linear combination of the predictors. External validity of the score was evaluated in nine other centers of the EPIC study (validation sample). Results Our final model included age, sex, baseline weight, level of education, baseline smoking, sports activity, alcohol use, and intake of six food groups. The model's discriminatory ability measured by the area under a receiver operating characteristic curve was 0.64 (95% CI = 0.63–0.65) in the derivation sample and 0.57 (95% CI  = 0.56–0.58) in the validation sample, with variation between centers. Positive and negative predictive values for the optimal cut-off value of ≥200 points were 9% and 96%, respectively. Conclusion The present risk score confidently excluded a large proportion of individuals from being at any appreciable risk to develop SWG within the next 5 years. Future studies, however, may attempt to further refine the positive prediction of the score.


Introduction
Excess body weight is increasingly recognized as an important public health threat worldwide. In Europe, 30-80% of adults are overweight (Body Mass Index (BMI) $25 kg/m 2 ) and among them up to 36% are classified as obese (BMI$30) [1,2]. Overwhelming evidence suggests that excess body weight is associated with higher risks for numerous chronic diseases [3]. However, not only body weight status per se, but also gain in body weight, irrespective of initial BMI, has been associated with many metabolic abnormalities [4][5][6], subsequently conveying an increased mortality risk [7]. A Danish study further suggests, that weight gain up to the obese level is related to higher risks of impaired glucose tolerance than maintaining weight at the obese level since the beginning of adult live [8].
Given there is no level of safe weight gain, strategies for primary prevention are urgently needed. Even though excess weight is in principle a matter of energy balance, susceptibility to weight gain appears to be determined by a complex interaction between genetic, environmental, socio-economic, cultural and behavioral factors [9]. Much emphasis has traditionally been devoted to the identification of single risk factors that etiologically relate to weight gain or the development of overweight/obesity; however, understanding the combined effects of these risk factors and/or their marker variables is fundamental in order to identify priorities for public health efforts. Additionally, in view of limited resources, prevention efforts may be targeted specifically to those individuals who are at highest risk for gaining substantial amounts of weight and hence associated health risks, and thus -in theory -might benefit most from prevention programs.
In recent years, prediction models to identify high-risk individuals have been proposed for several obesity-related diseases, including cardiovascular disease [10], type 2 diabetes [11], and cancer [12][13][14]. In the present study, we therefore aimed to develop a risk score predicting risk of substantial weight gain (SWG) within the following 5 years among primarily non-obese adults. Because this objective was addressed using data of the multi-center European Prospective Investigation into Cancer and Nutrition (EPIC), the present study additionally offered the unique opportunity to simultaneously investigate the suitability of one universal, trans-european prediction model for SWG.

Study population
The EPIC study is a multi-center prospective study designed primarily to investigate the relationship between diet, lifestyle and genetic factors and incidence of cancer [15,16]. Briefly, between 1992 and 2000, a total of 521u330 men and women, aged 25-70 years, were recruited in 23 centers and regions in 10 European countries: Denmark, Sweden, Norway, the United Kingdom, France, Germany, The Netherlands, Spain, Italy and Greece. In the majority of centers, participants were invited from the general population. Exceptions were the French cohort (based on members of the health insurance for teachers), the cohorts in Utrecht (The Netherlands) and Florence (Italy), which are based on women attending local population-based breast cancer screening programmes, components of the Italian and Spanish cohorts (including members of local blood donor associations), and most of the Oxford (UK) cohort (comprising health-conscious subjects, mainly vegetarians). In France, Norway, Utrecht, and Naples only women were recruited. Approval for this study was obtained from the ethical review boards of the International Agency for Research on Cancer (IARC) and from all local institutions where subjects had been recruited for the EPIC study: the Florence Health Authority Ethical Committee (Italy); the Norfolk Local Research Ethics Committee (UK); the Medical Ethics Committee of the Netherlands Organization for Applied Scientific Research (the Netherlands); the Ethics Committee of the Medical Association of the State of Brandenburg (Germany); and the Danish National Committee on Biomedical Research Ethics (Denmark). Written informed consent was obtained from all participants before joining EPIC study.
The prediction model was derived based on data from 6 EPIC centers from 5 countries which participated in the Diet, Obesity and Genes (DiOGenes) project [17], namely: the United Kingdom (UK-Norfolk), the Netherlands [(NL-Doetinchem and NL-Amsterdam/Maastricht); two separate centers because of differences in follow-up assessment of anthropometry], Italy (IT-Florence), Germany (GER-Potsdam), and Denmark (DK-Copenhagen/ Aarhus). Subsequently, this model was externally validated in eight remaining EPIC centers.
From the 146 543 initial participants in the derivation sample [17], data of 53 758 participants were finally used to guide model development (for flow-chart of exclusions see Figure 1). Briefly, exclusions refer to pregnancy and individuals with an extreme ratio between energy intake and energy requirement, to participants who provided no or unrealistic information on anthropometrics at either baseline or follow-up or who reported prevalent CVD, diabetes or cancer at baseline. Additionally, to maintain the same age range in all centers and to minimize confounding from changes in body composition and shape occurring in older age [18] or from undiagnosed chronic disease, the present study was restricted to participants aged $35 years at baseline and ,65 years at time of the second weight assessment. Finally, the present study was restricted to non-obese individuals (BMI,30) at baseline. After applying the same exclusions and further excluding individuals with missing data in any candidate predictor, the final validation sample consisted of 130 446 men and women, stemming from nine EPIC centers not included in the derivation sample. The centers of Norway and Varese (Italy) were excluded from the validation sample due to missing information on physical activity.

Dietary and lifestyle assessment
Dietary intake was assessed at baseline by means of validated country-specific dietary questionnaires that were designed to capture local dietary habits and to provide high compliance [16,19]. Participants were asked to report their average consumption of each food item over the past year. Food intake (gram/day, g/d) was calculated by multiplying the frequency of intake by portion size. The validity and reproducibility of the dietary questionnaires have been shown to be generally good [19,20]. Information on lifestyle factors was collected by questionnaire and/or face-to-face interview at baseline, including questions on highest level of education, occupational physical activity, sports activity, consumption of alcoholic beverages, and tobacco smoking [16].

Assessment of anthropometric measures
For each individual, two measures of body weight were available: one at baseline and one at follow-up. In most centers, height and weight were measured at baseline by trained personnel according to standardized procedures [21]. Body weight was corrected to reduce heterogeneity due to protocol differences in clothing worn during measurement by subtracting 1.5 kg in those individuals who were normally dressed and 1 kg in those participants who wore light clothing [21]. In the centers of France and Norway only self-reported anthropometric values were collected. For part of the Oxford center, linear regression models were used to predict sex-and age-specific values from individuals with both measured and self-reported weight (referred to as Oxford prediction equations) [21]. At follow-up, body weight was measured by trained staff in UK-Norfolk and NL-Doetinchem following the same protocol as during baseline measurements, while participants in all other centers measured their weight at home according to guidance provided. The accuracy of these selfreported weights was improved by using the Oxford prediction equations.

Statistical Approaches
Definition of case status. SWG was defined as gaining $10% of baseline weight during follow-up. This threshold was chosen for two reasons. First, it was considered major weight gain in relation to the time horizon of the prediction comprising 5 years. Second, it seems high enough to exclude random variation in body weight while simultaneously allowing for some weight gain as natural part of the aging process. Follow-up time, i.e. time between first and second weight assessment, varied considerably between individuals in EPIC (range: 1.2-12.4 years). To best account for these varying follow-up times and to additionally consider the velocity of weight gain, we used methods of survival analysis for statistical analysis. Thus, individual followup times (either time to SWG or difference between first and second weight assessment) were used in the proportional hazards model to estimate relative risks, and the results were combined with the baseline survivor function estimated at t = 5 years to provide estimates of 5-year absolute risk as described in detail below. Hence, each participant was followed for incidence of SWG from study entry to the second assessment of body weight (end of follow-up). Those subjects not experiencing SWG during follow-up were censored at time of their second weight assessment and participants experiencing SWG constituted the set of cases. Because it was only possible to determine case status at the time of the second weight assessment, the exact time needed to experience SWG was unknown for the cases. We therefore estimated the time theoretically needed for the cases to cross the threshold of $10% baseline-based weight gain by assuming linear weight gain.  1 No follow-up questionnaire (e.g. due to death before follow-up body weight assessment, not yet approached for follow-up body weight assessment, emigration or non-response to invitation). 2 Pregnant at baseline or follow-up. 3 10% missing items on FFQ. 4 Ratio of energy intake (EI) to energy expenditure (EE) estimated from predicted resting energy expenditure. 5 Missing data on baseline or follow-up weight, waist or height, missing follow-up time. 6 Baseline height,130 cm, BMI,16 kg/m 2 , 0,waist ,40 cm, waist.160 cm, follow-up weight.700 kg. Combination of waist,60 cm and BMI.25 kg/m 2 . 7 Annual weight change.5 kg (either direction) or annual waist change.7 cm (either direction). 8 Baseline cancer, diabetes or cardiovascular disease. 9 In contrast to the derivation of the model where it is important to obtain unbiased estimates of relative risk, we think only original data should be used in the validation sample and we therefore excluded individuals with missing values. doi:10.1371/journal.pone.0067429.g001 Potential predictor variables Selection of candidate predictors was primarily based on observed associations with weight change in previous analyses of EPIC and on reported or hypothesized associations in the literature. A total of 21 characteristics were included in the prediction model as candidate predictors. Specifically, we selected standard socio-demographic characteristics, including age, sex, and education, as well as lifestyle factors, namely physical activity (occupational and sports activity), alcohol consumption [22], and smoking status. In view of practical feasibility, selection of dietary factors was restricted to main food items. In accordance with previous EPIC analyses, intake of fruits and vegetables [23], meat [24], bread as indicator of dietary fiber intake [25], complemented by consumption of fish, vegetable oil and dairy products as components of the Mediterranean diet [26], were selected as potential predictors. Additionally, we included intake of butter and margarine, chocolate, cake and cookies, and soft drinks as candidate predictors due to their high energy density and results on health from previous studies [27][28][29].

Risk prediction model building
Candidate predictors were entered into a proportional hazards model in a stepwise forward model selection process with 0.1 as pre-specified p-values for entering and staying in the model as recommended by Parmer et al. [30]. Interaction terms were not included to keep the model parsimonious and easy to use. To account for heterogeneity between centers due to differences in questionnaire design, follow-up procedures, and other nonmeasured center effects, stepwise model selection was conducted separately by center. Variables statistically significantly associated with SWG in the same direction in at least two centers and not in the opposite direction were retained as predictors for the final model. Center-specific regression coefficients were obtained for all retained predictor variables by fitting them into a common centerspecific model and random-effects meta-analysis was used to calculate combined estimates. Score points (weights) for each predictor were assigned based on the value of the corresponding pooled b-coefficients multiplied by 100 and rounded to two decimal places. For each individual, a risk score was computed as a linear combination of the weighted predictors. The score was rescaled by adding 500 to avoid negative values in descriptive analyses. The probability of experiencing SWG within the following 5 years was finally calculated by inserting the individual risk score points into the survival function obtained from the proportional hazards model. For this, the baseline survival probability at 5 years, i.e. the probability of not developing SWG within 5 years, was estimated separately by center using the average value of each predictor over all individuals in the derivation sample. These center-specific values were again pooled using random-effects meta-analysis.
Because missing data may be associated with bias in estimates of regression coefficients which were used for constructing the risk score, we used multiple imputation techniques in the derivation population [31,32]. Briefly, in multiple imputation missing data are replaced by several plausible values sampled from their predictive distribution based on the observed data by creating multiple copies of the original data set. Standard statistical methods are being performed in each imputation data set and the results are finally combined by appropriately accounting for the uncertainty about missing data [32]. We used 20 imputation cycles and selection of predictors was performed for each center and separately by imputation data set. As described by Vergouewe et al. [33], predictors that were significantly associated with SWG in at least 50% of the imputed data sets in each center were retained as center-specific predictors from which the final set of predictors was selected as described above.

Evaluation of the risk score's predictive performance
The predictive performance of the risk score was evaluated by means of discrimination and calibration in the derivation sample (internal validation) and in the independent EPIC centers (external validation). Discrimination was quantified by the c index developed for survival analysis which describes the model's ability to distinguish between persons with longer event-free survival and those with shorter event-free survival within a given time horizon [34,35]. The c index ranges from a minimum of 0.5 (no discriminatory accuracy) to a theoretical maximum of 1.0 (perfect discrimination).
To define an appropriate cut-off point for the continuous risk score for discrimination between high-risk and low-risk individuals, the Youden's index, a simple measure for which sensitivity and specificity are maximized across a range of possible cut-off values, was used [36,37]. It is defined as J = sensitivity + specificity -1 and ranges from 0 to 1, with 1 implying perfect separation of diseased and non-diseased by the continuous marker [37].
Calibration, as a measure of how reliable the predictions are, was evaluated by using a modified version of the Hosmer-Lemeshow-Test for survival analysis introduced by D'Agostino and Nam [34]. For this purpose, the observed probabilities of CRC at 5 years estimated by the Kaplan-Meier approach were compared with the average predicted probabilities across tenths of predicted risk which was also plotted for visualization.
Statistical analyses were performed using SAS (Statistical Analysis System, version 9.2; SAS Institute Inc, Cary, NC).

Results
Among 53 758 men and women in the derivation population, a total of 7u431 individuals gained $10% of baseline weight during a median follow-up of 5.4 years, amounting to 329u685 personyears (PY). In the validation sample, 14u622 participants experienced SWG during a median follow-up of 3.7 years (525u749 PY). General characteristics for each center of the derivation sample and the total validation population are presented in Table 1. In the derivation population, mean age at baseline was 50.2 years. Mean follow-up time differed considerably between centers, ranging from 3.6 years in UK-Norfolk to 8.8 years in IT-Florence. On average, individuals gained 3.8% of their baseline weight during follow-up, representing a mean annual proportion of baseline-based weight gain of 0.6%. This implies that individuals would need on average 16.7 years to gain 10% of their baseline weight. Due to the allwomen centers of France, It-Naples and NL-Utrecht, the proportion of men was substantially lower in the validation sample in comparison to the derivation set (21.5 vs. 41.2%). Mean annual weight gain was higher in the validation than in the derivation sample (521 g/y vs. 395 g/y) which may be explained by the shorter duration of follow-up in the validation sample and by the fact that weight fluctuations are higher over shorter periods of time.
The pooled estimates of relative risk for the association of included predictors with risk of SWG and corresponding score points assigned to each predictor are presented in Table 2. The pooled estimate of the background probability of avoiding SWG (analogous to 'survival') at 5 years estimated at average values of the predictors was 0.9331, implying that under average conditions about 93% of the population will stay free of SWG while 7% will experience SWG within 5 years. For each participant, the  (20) 10 (9) 12 (11) 27 (20) 12 (11) 22 (18) 21 (21) Fish 32 (25) 36 (26) 10 (9) 11 (11) 30 (21) 22 (23) 42 (24) 36 (  *Predictors were identified using center-specific stepwise Cox regression in the derivation sample. Those factors being significantly (two-sided P-value ,0.05) related to substantial weight gain in $2 centers were retained for the final model. Center-specific effects for the retained predictors were pooled using random-effects metaanalysis. These combined estimates of relative risk are presented in the  Table 3. Sensitivity, specificity, positive and negative predictive value for various cut-off points of the risk score in the derivation sample. In addition to between-center differences, the score generally performed better among men than women (Table 3), while the additional inclusion of menopausal status at recruitment did not affect the observed discriminatory accuracy in women across centers (data not shown). Information on sensitivity, specificity and predictive values according to various cut-off points of the score in the derivation sample suggested a threshold of $200 points as the optimal cut-off value to define high-risk individuals (Youden's index, J = 0.208) ( Table 3). This threshold captured 74% of the cases who experienced SWG. Furthermore, 46% of the persons who did not experience SWG had a score ,200. The corresponding positive and negative predictive values were 9% and 96%, respectively.
The estimated probability of experiencing SWG during 5 years agreed very well with the observed proportion of incident cases across tenths of predicted risk in the derivation sample although there was a slight overestimation of risk in the highest and lowest tenths of risk (Figure 2a, p = 0.02). In the total validation population, the score was also able to adequately quantify absolute risk, though comparison of observed and predicted risk implied a slight overestimation of risk in the lower and upper range of the score values and a slight underestimation in the middle range of the score (Figure 2b, p,.001). Inspection of calibration plots for each validation center indicated good calibration for the centers of Greece, UK-Health Conscious, UK-General Population and NL-Utrecht and adequate calibration in France and SWE-Malmoe (data not shown). In GER-Heidelberg we observed a systematic overestimation of risk, while in Spain calibration was poor, but no clear pattern of miscalibration was found.
The use of center-specific weights led to a marked improvement in discriminatory accuracy in the validation centers of France, Spain, Greece and GER-Heidelberg (Table 4). When centerspecific risk scores were developed (based on center-specific selection strategy), model performance remained essentially unchanged in comparison to the re-estimated model for all centers. The only exception was France for which discrimination improved from 0.61 (0.60-0.62) to 0.65 (0.63-0.67). Calibration generally improved or remained unchanged in the re-estimated model across validation centers (data not shown). Exceptions were the centers of GER-Heidelberg where risk remained continuously overestimated and Spain showing over-and underestimations of risk. Even in center-specific models, agreement between observed and predicted risk did not improve for those two centers.

Discussion
In this large multi-center prospective study of middle-aged European men and women, a risk score based on numerous easily assessable socio-demographic, dietary and lifestyle factors was found to exhibit moderate discriminatory accuracy and ability to accurately predict risk of experiencing SWG during the following 5 years.
Major strengths of the present study are its prospective design, its large sample size, the availability of information on a large number of risk factors for weight gain, the use of multiple imputation techniques to avoid potential bias in derivation of the score and the validation of the risk score in several independent, culturally diverse study populations.
Some methodological limitations need to be considered. At follow-up, most participants provided self-measured weight. However, we tried to correct for potential underreporting by applying prediction equations [21]. Further, only two measurements of body weight were available for each individual and weight gain was considered linear, which is a strong assumption about the course of weight gain. Weight gain is reversible, and it is well known that body weight tends to fluctuate over time [38], which may lead to repeated cycles of weight loss and recovery [39,40] that are not reflected in a two-point-in-time measurement. Fluctuations or non-linear weight gain in general may have resulted in misclassification of cases and non-cases and additionally in misspecification of the cases' time to event, which might have limited the performance of the obtained risk score model. Nevertheless, recent findings from the EPIC-Potsdam study based on 5 measurements of weight suggest that weight gain can be reasonably well approximated by a straight line over a follow-up period of 8 years on the population-level [41].
Directions of associations with SWG for some predictors in our model may be difficult to explain on a causal basis. It has to be kept in mind though that, in contrast to etiological studies trying to explain the cause of a disease, a prediction model aims to develop a good predictor to enable accurate predictions of the outcome [42]. Thus, predictors in a prediction model do not necessarily need to be well-established etiological factors with a strong biological background. They could also be a marker of other lifestyle factors which influence mechanisms that are implicated in the regulation of body weight. Thus, caution may be warranted to avoid misinterpretation of the identified predictors in terms of driving weight gain. Regarding the positive association of baseline smoking with SWG, for example, we explored in a sub-analysis that this relation was driven by the strong weight-increasing effect of smoking cessation during follow-up, while continuous smoking was not related to a higher risk of SWG compared to nonsmoking. This finding may be kept in mind when interpreting the results and emphasizes that weight management is warranted among individuals who attempt to quit smoking. Nevertheless, because future changes in smoking habits are unknown at the time of prediction, only baseline variables were included in the prediction model.
The discriminatory ability of the score was generally low to modest which may be explained by lack of information on some predictors in this analysis. Specifically, weight loss attempts [43], weight cycling [44,45] and large short-term weight changes [38,45] have been shown to determine future weight change. However, to obtain this type of information, a closer contact between participants and study personnel is required and assessment of this information in all centers of such a large study is challenging. Also, despite recent weight history may predict weight change in the near future, it is currently unknown whether this information is a strong factor to predict weight change over longer periods, e.g. 5 years.
In the field of chronic diseases, hopes have been raised that information on common genetic markers may be used to improve discriminatory accuracy beyond non-invasive factors and biochemical measures [11,46]. The predictive ability of genetic factors, however, currently appears limited [11,46]. For example, the addition of seven SNP's to the breast cancer model developed by Gail et al. only modestly improved discriminatory accuracy [46]. Similarly, the additional inclusion of 20 diabetogenic SNP's did barely improve discrimination of incident type 2 diabetes beyond lifestyle factors and metabolic markers in the EPIC-Potsdam cohort [47]. In respect to obesity, the EPIC-Norfolk study reported that 12 obesity-susceptible loci explained 0.9% of variation in BMI, with a c index of 0.57 for prediction of obesity Table 4. Discriminatory ability of the overall risk score across centers compared to the re-estimated overall model and centerspecific models in the derivation and validation sample.  [48]. Thus, despite overwhelming statistical significances and repeated replications, the explained variance and the predictive value of the currently identified obesity-susceptibility loci is low [49] and a considerable improvement of the model's accuracy due to inclusion of genetic markers appears unlikely. Additionally, it should be noted that very large independent relative risks are needed for a single predictor to meaningfully improve discrimination [46]. The discriminatory ability of the present risk score was reduced in the external validation sample, an observation that is also commonly reported for external validation studies in the field of chronic diseases [10,11]. Several reasons may be thought of to explain this phenomenon. First, overfitting of the model in the derivation sample may be responsible for the poorer performance in the validation sample; however, given that the sample size of the development sample was large and that the amount of optimism decreases with larger sample size [50], this explanation appears unlikely. Second, lower predictive accuracy in external populations may be due to differences between the derivation and validation population, especially with regard to methods of data collection, coding of predictors and endpoint, and the availability of all variables used to construct the score [50]. However, given the standardised methodology followed in EPIC, this explanation also seems rather unlikely. To account for the fact that some validation centers were sampled from specific groups rather than the general population, e.g. France, IT-Naples, NL-Utrecht and UK-Oxford, which may affect the model's performance, we excluded those centers in sensitivity analyses. Nevertheless, the overall discriminatory accuracy did barely change (0.59 (0.58-0.60)). Interestingly, apart from the overall difference in predictive ability between derivation and validation sample, there was considerable variation in discrimination across single cohorts of the derivation and validation sample, respectively. Specifically, discriminatory power ranged from 0.64 in UK-Cop./Aarhus to 0.76 in NL-AmMa in the derivation set and varied between 0.56 (France) and 0.67 in IT-Naples in the validation set. It is further noteworthy that a comparable predictive accuracy was exhibited among centers of similar socio-cultural background in the derivation and validation sample (e.g. in Denmark and Sweden, in Potsdam and Heidelberg). This suggests the prediction of weight gain to depend on underlying socio-cultural factors that were not similarly represented by the predictors included in the present model across the trans-European study populations.
The risk score adequately estimated risk in the total validation sample, while in some of the centers calibration was poor. It has been suggested that adjusting or re-calibration of the score to the local circumstances in external populations may increase the predictive performance. In the present study, re-estimation of regression coefficients slightly improved calibration in most validation centers except for GER-Heidelberg and Spain in which calibration remained poor even in center-specific models. An explanation for this finding may be the considerably shorter follow-up time in the two centers. While our prediction model was tailored to the time period of 5 years, average observed follow-up times in GER-Heidelberg and Spain were 2.1 and 3.3 years, respectively. Unfortunately, we did not have access to more recent data to further investigate this issue. Discriminatory ability markedly improved in four of the nine validation centers in the re-estimated model, whereas center-specific models did generally not lead to further improvements in discriminatory ability. The only exception was the center of France for which a populationspecific model yielded a c index of 0.65 (0.63-0.67) compared to 0.61 (0.60-0.62) in the re-estimated model and 0.56 (0.55-0.57) in the overall model.
Despite the observed improvements in discrimination using reestimation of parameters, the performance measures were generally moderate. Although we cannot rule out the possibility that important, maybe population-specific, predictors may not have been assessed in this study, our findings based on a wide range of predictors and several culturally diverse study populations rather convey the impression that the predictability of weight gain based on data from large population-based studies might be limited in general.
Test characteristics of the risk score also challenge its practical implementation into prevention programs. The optimal cut-off value to define high-risk individuals was $200 points and implies that preventive actions will be indicated for a substantial part of the population (55%). Of these high-risk individuals, 9% will indeed experience SWG within 5 years. On the other hand, 96% of the individuals with a score ,200 will indeed not develop SWG. It is of note that the optimal cut-off point was exemplarily defined using the Youden's index and for its calculation, sensitivity and specificity are considered as equally important. This however might not hold true in practice. When implementing a risk score in practice, designation of a cut-off value should depend on the importance attached to false-positives and false-negatives accounting for misclassification costs.
In conclusion, the present risk score was able to confidently exclude a large proportion of individuals from being at any appreciable risk to develop SWG within the next 5 years. Future studies, however, may attempt to further refine the positive prediction of the score by for example considering additional predictors both in general and on the national level.