Identifying individuals at high risk of excess weight gain may help targeting prevention efforts at those at risk of various metabolic diseases associated with weight gain. Our aim was to develop a risk score to identify these individuals and validate it in an external population.
We used lifestyle and nutritional data from 53°758 individuals followed for a median of 5.4 years from six centers of the European Prospective Investigation into Cancer and Nutrition (EPIC) to develop a risk score to predict substantial weight gain (SWG) for the next 5 years (derivation sample). Assuming linear weight gain, SWG was defined as gaining ≥10% of baseline weight during follow-up. Proportional hazards models were used to identify significant predictors of SWG separately by EPIC center. Regression coefficients of predictors were pooled using random-effects meta-analysis. Pooled coefficients were used to assign weights to each predictor. The risk score was calculated as a linear combination of the predictors. External validity of the score was evaluated in nine other centers of the EPIC study (validation sample).
Our final model included age, sex, baseline weight, level of education, baseline smoking, sports activity, alcohol use, and intake of six food groups. The model's discriminatory ability measured by the area under a receiver operating characteristic curve was 0.64 (95% CI = 0.63–0.65) in the derivation sample and 0.57 (95% CI = 0.56–0.58) in the validation sample, with variation between centers. Positive and negative predictive values for the optimal cut-off value of ≥200 points were 9% and 96%, respectively.
Citation: Steffen A, Sørensen TIA, Knüppel S, Travier N, Sánchez M-J, Huerta JM, et al. (2013) Development and Validation of a Risk Score Predicting Substantial Weight Gain over 5 Years in Middle-Aged European Men and Women. PLoS ONE 8(7): e67429. https://doi.org/10.1371/journal.pone.0067429
Editor: Hamid Reza Baradaran, Iran University of Medical Sciences, Iran (Republic of Islamic)
Received: September 28, 2012; Accepted: May 21, 2013; Published: July 16, 2013
Copyright: © 2013 Steffen et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This publication arises from a collaboration of two EU projects, the Diet, Obesity and Genes (DiOGenes) project and the Physical Activity, Nutrition, Alcohol, Cessation of Smoking, Eating out of Home and Obesity (PANACEA) project. DiOGenes is a pan-European study within the EU Sixth Framework Programme for Research and Technological Development (2005-2009) (FOOD-CT-2005-513946, http://www.diogenes.eu.org). PANACEA received funding from the EU in the framework of the Public Health Programme (project 2005328). This work was further supported by the European Commission: Public Health and Consumer Protection Directorate 1993–2004, the Research Directorate-General 2005, the Ligue contre le Cancer, the Societé 3M, the Mutuelle Générale de l'Education Nationale, and the Institut National de la Santé et de la Recherche Médicale; German Cancer Aid, the German Cancer Research Center, and the Federal Ministry of Education and Research (Germany); the Danish Cancer Society (Denmark); Health Research Fund (FIS) of the Spanish Ministry of Health RTICC 'Red Temática de Investigación Cooperativa en Cáncer (grant number C03/10, R06/0020); the participating regional governments and institutions of Spain; Cancer Research United Kingdom, the Medical Research Council, the Stroke Association, the British Heart Foundation, the Department of Health, the Food Standards Agency, and the Wellcome Trust (United Kingdom); the Italian Association for Research on Cancer and the National Research Council (Italy); the Dutch Ministry of Public Health, Welfare and Sports, the Dutch Ministry of Health, Dutch Prevention Funds, LK Research Funds, the Dutch Zorg Onderzoek Nederland, and the World Cancer Research Fund (Netherlands); the Swedish Cancer Society, the Swedish Scientific Council, and the Regional Government of Skane. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Excess body weight is increasingly recognized as an important public health threat worldwide. In Europe, 30–80% of adults are overweight (Body Mass Index (BMI) ≥25 kg/m2) and among them up to 36% are classified as obese (BMI≥30) , . Overwhelming evidence suggests that excess body weight is associated with higher risks for numerous chronic diseases . However, not only body weight status per se, but also gain in body weight, irrespective of initial BMI, has been associated with many metabolic abnormalities –, subsequently conveying an increased mortality risk . A Danish study further suggests, that weight gain up to the obese level is related to higher risks of impaired glucose tolerance than maintaining weight at the obese level since the beginning of adult live .
Given there is no level of safe weight gain, strategies for primary prevention are urgently needed. Even though excess weight is in principle a matter of energy balance, susceptibility to weight gain appears to be determined by a complex interaction between genetic, environmental, socio-economic, cultural and behavioral factors . Much emphasis has traditionally been devoted to the identification of single risk factors that etiologically relate to weight gain or the development of overweight/obesity; however, understanding the combined effects of these risk factors and/or their marker variables is fundamental in order to identify priorities for public health efforts. Additionally, in view of limited resources, prevention efforts may be targeted specifically to those individuals who are at highest risk for gaining substantial amounts of weight and hence associated health risks, and thus – in theory – might benefit most from prevention programs.
In recent years, prediction models to identify high-risk individuals have been proposed for several obesity-related diseases, including cardiovascular disease , type 2 diabetes , and cancer –. In the present study, we therefore aimed to develop a risk score predicting risk of substantial weight gain (SWG) within the following 5 years among primarily non-obese adults. Because this objective was addressed using data of the multi-center European Prospective Investigation into Cancer and Nutrition (EPIC), the present study additionally offered the unique opportunity to simultaneously investigate the suitability of one universal, trans-european prediction model for SWG.
Materials and Methods
The EPIC study is a multi-center prospective study designed primarily to investigate the relationship between diet, lifestyle and genetic factors and incidence of cancer , . Briefly, between 1992 and 2000, a total of 521°330 men and women, aged 25–70 years, were recruited in 23 centers and regions in 10 European countries: Denmark, Sweden, Norway, the United Kingdom, France, Germany, The Netherlands, Spain, Italy and Greece. In the majority of centers, participants were invited from the general population. Exceptions were the French cohort (based on members of the health insurance for teachers), the cohorts in Utrecht (The Netherlands) and Florence (Italy), which are based on women attending local population-based breast cancer screening programmes, components of the Italian and Spanish cohorts (including members of local blood donor associations), and most of the Oxford (UK) cohort (comprising health-conscious subjects, mainly vegetarians). In France, Norway, Utrecht, and Naples only women were recruited. Approval for this study was obtained from the ethical review boards of the International Agency for Research on Cancer (IARC) and from all local institutions where subjects had been recruited for the EPIC study: the Florence Health Authority Ethical Committee (Italy); the Norfolk Local Research Ethics Committee (UK); the Medical Ethics Committee of the Netherlands Organization for Applied Scientific Research (the Netherlands); the Ethics Committee of the Medical Association of the State of Brandenburg (Germany); and the Danish National Committee on Biomedical Research Ethics (Denmark). Written informed consent was obtained from all participants before joining EPIC study.
The prediction model was derived based on data from 6 EPIC centers from 5 countries which participated in the Diet, Obesity and Genes (DiOGenes) project , namely: the United Kingdom (UK-Norfolk), the Netherlands [(NL-Doetinchem and NL-Amsterdam/Maastricht); two separate centers because of differences in follow-up assessment of anthropometry], Italy (IT-Florence), Germany (GER-Potsdam), and Denmark (DK-Copenhagen/Aarhus). Subsequently, this model was externally validated in eight remaining EPIC centers.
From the 146 543 initial participants in the derivation sample , data of 53 758 participants were finally used to guide model development (for flow-chart of exclusions see Figure 1). Briefly, exclusions refer to pregnancy and individuals with an extreme ratio between energy intake and energy requirement, to participants who provided no or unrealistic information on anthropometrics at either baseline or follow-up or who reported prevalent CVD, diabetes or cancer at baseline. Additionally, to maintain the same age range in all centers and to minimize confounding from changes in body composition and shape occurring in older age  or from undiagnosed chronic disease, the present study was restricted to participants aged ≥35 years at baseline and <65 years at time of the second weight assessment. Finally, the present study was restricted to non-obese individuals (BMI<30) at baseline. After applying the same exclusions and further excluding individuals with missing data in any candidate predictor, the final validation sample consisted of 130 446 men and women, stemming from nine EPIC centers not included in the derivation sample. The centers of Norway and Varese (Italy) were excluded from the validation sample due to missing information on physical activity.
1No follow-up questionnaire (e.g. due to death before follow-up body weight assessment, not yet approached for follow-up body weight assessment, emigration or non-response to invitation). 2Pregnant at baseline or follow-up. 310% missing items on FFQ. 4Ratio of energy intake (EI) to energy expenditure (EE) estimated from predicted resting energy expenditure. 5Missing data on baseline or follow-up weight, waist or height, missing follow-up time. 6Baseline height<130 cm, BMI<16 kg/m2, 0<waist <40 cm, waist>160 cm, follow-up weight>700 kg. Combination of waist<60 cm and BMI>25 kg/m2. 7Annual weight change>5 kg (either direction) or annual waist change>7 cm (either direction). 8 Baseline cancer, diabetes or cardiovascular disease.9 In contrast to the derivation of the model where it is important to obtain unbiased estimates of relative risk, we think only original data should be used in the validation sample and we therefore excluded individuals with missing values.
Dietary and lifestyle assessment
Dietary intake was assessed at baseline by means of validated country-specific dietary questionnaires that were designed to capture local dietary habits and to provide high compliance , . Participants were asked to report their average consumption of each food item over the past year. Food intake (gram/day, g/d) was calculated by multiplying the frequency of intake by portion size. The validity and reproducibility of the dietary questionnaires have been shown to be generally good , . Information on lifestyle factors was collected by questionnaire and/or face-to-face interview at baseline, including questions on highest level of education, occupational physical activity, sports activity, consumption of alcoholic beverages, and tobacco smoking .
Assessment of anthropometric measures
For each individual, two measures of body weight were available: one at baseline and one at follow-up. In most centers, height and weight were measured at baseline by trained personnel according to standardized procedures . Body weight was corrected to reduce heterogeneity due to protocol differences in clothing worn during measurement by subtracting 1.5 kg in those individuals who were normally dressed and 1 kg in those participants who wore light clothing . In the centers of France and Norway only self-reported anthropometric values were collected. For part of the Oxford center, linear regression models were used to predict sex- and age-specific values from individuals with both measured and self-reported weight (referred to as Oxford prediction equations) . At follow-up, body weight was measured by trained staff in UK-Norfolk and NL-Doetinchem following the same protocol as during baseline measurements, while participants in all other centers measured their weight at home according to guidance provided. The accuracy of these self-reported weights was improved by using the Oxford prediction equations.
Definition of case status.
SWG was defined as gaining ≥10% of baseline weight during follow-up. This threshold was chosen for two reasons. First, it was considered major weight gain in relation to the time horizon of the prediction comprising 5 years. Second, it seems high enough to exclude random variation in body weight while simultaneously allowing for some weight gain as natural part of the aging process. Follow-up time, i.e. time between first and second weight assessment, varied considerably between individuals in EPIC (range: 1.2–12.4 years). To best account for these varying follow-up times and to additionally consider the velocity of weight gain, we used methods of survival analysis for statistical analysis. Thus, individual follow-up times (either time to SWG or difference between first and second weight assessment) were used in the proportional hazards model to estimate relative risks, and the results were combined with the baseline survivor function estimated at t = 5 years to provide estimates of 5-year absolute risk as described in detail below. Hence, each participant was followed for incidence of SWG from study entry to the second assessment of body weight (end of follow-up). Those subjects not experiencing SWG during follow-up were censored at time of their second weight assessment and participants experiencing SWG constituted the set of cases. Because it was only possible to determine case status at the time of the second weight assessment, the exact time needed to experience SWG was unknown for the cases. We therefore estimated the time theoretically needed for the cases to cross the threshold of ≥10% baseline-based weight gain by assuming linear weight gain.
Potential predictor variables
Selection of candidate predictors was primarily based on observed associations with weight change in previous analyses of EPIC and on reported or hypothesized associations in the literature. A total of 21 characteristics were included in the prediction model as candidate predictors. Specifically, we selected standard socio-demographic characteristics, including age, sex, and education, as well as lifestyle factors, namely physical activity (occupational and sports activity), alcohol consumption , and smoking status. In view of practical feasibility, selection of dietary factors was restricted to main food items. In accordance with previous EPIC analyses, intake of fruits and vegetables , meat , bread as indicator of dietary fiber intake , complemented by consumption of fish, vegetable oil and dairy products as components of the Mediterranean diet , were selected as potential predictors. Additionally, we included intake of butter and margarine, chocolate, cake and cookies, and soft drinks as candidate predictors due to their high energy density and results on health from previous studies –.
Risk prediction model building
Candidate predictors were entered into a proportional hazards model in a stepwise forward model selection process with 0.1 as pre-specified p-values for entering and staying in the model as recommended by Parmer et al. . Interaction terms were not included to keep the model parsimonious and easy to use. To account for heterogeneity between centers due to differences in questionnaire design, follow-up procedures, and other non-measured center effects, stepwise model selection was conducted separately by center. Variables statistically significantly associated with SWG in the same direction in at least two centers and not in the opposite direction were retained as predictors for the final model. Center-specific regression coefficients were obtained for all retained predictor variables by fitting them into a common center-specific model and random-effects meta-analysis was used to calculate combined estimates. Score points (weights) for each predictor were assigned based on the value of the corresponding pooled β-coefficients multiplied by 100 and rounded to two decimal places. For each individual, a risk score was computed as a linear combination of the weighted predictors. The score was rescaled by adding 500 to avoid negative values in descriptive analyses. The probability of experiencing SWG within the following 5 years was finally calculated by inserting the individual risk score points into the survival function obtained from the proportional hazards model. For this, the baseline survival probability at 5 years, i.e. the probability of not developing SWG within 5 years, was estimated separately by center using the average value of each predictor over all individuals in the derivation sample. These center-specific values were again pooled using random-effects meta-analysis.
Because missing data may be associated with bias in estimates of regression coefficients which were used for constructing the risk score, we used multiple imputation techniques in the derivation population , . Briefly, in multiple imputation missing data are replaced by several plausible values sampled from their predictive distribution based on the observed data by creating multiple copies of the original data set. Standard statistical methods are being performed in each imputation data set and the results are finally combined by appropriately accounting for the uncertainty about missing data . We used 20 imputation cycles and selection of predictors was performed for each center and separately by imputation data set. As described by Vergouewe et al. , predictors that were significantly associated with SWG in at least 50% of the imputed data sets in each center were retained as center-specific predictors from which the final set of predictors was selected as described above.
Evaluation of the risk score's predictive performance
The predictive performance of the risk score was evaluated by means of discrimination and calibration in the derivation sample (internal validation) and in the independent EPIC centers (external validation). Discrimination was quantified by the c index developed for survival analysis which describes the model's ability to distinguish between persons with longer event-free survival and those with shorter event-free survival within a given time horizon , . The c index ranges from a minimum of 0.5 (no discriminatory accuracy) to a theoretical maximum of 1.0 (perfect discrimination).
To define an appropriate cut-off point for the continuous risk score for discrimination between high-risk and low-risk individuals, the Youden's index, a simple measure for which sensitivity and specificity are maximized across a range of possible cut-off values, was used , . It is defined as J = sensitivity + specificity –1 and ranges from 0 to 1, with 1 implying perfect separation of diseased and non-diseased by the continuous marker .
Calibration, as a measure of how reliable the predictions are, was evaluated by using a modified version of the Hosmer-Lemeshow-Test for survival analysis introduced by D'Agostino and Nam . For this purpose, the observed probabilities of CRC at 5 years estimated by the Kaplan-Meier approach were compared with the average predicted probabilities across tenths of predicted risk which was also plotted for visualization.
Statistical analyses were performed using SAS (Statistical Analysis System, version 9.2; SAS Institute Inc, Cary, NC).
Among 53 758 men and women in the derivation population, a total of 7°431 individuals gained ≥10% of baseline weight during a median follow-up of 5.4 years, amounting to 329°685 person-years (PY). In the validation sample, 14°622 participants experienced SWG during a median follow-up of 3.7 years (525°749 PY). General characteristics for each center of the derivation sample and the total validation population are presented in Table 1. In the derivation population, mean age at baseline was 50.2 years. Mean follow-up time differed considerably between centers, ranging from 3.6 years in UK-Norfolk to 8.8 years in IT-Florence. On average, individuals gained 3.8% of their baseline weight during follow-up, representing a mean annual proportion of baseline-based weight gain of 0.6%. This implies that individuals would need on average 16.7 years to gain 10% of their baseline weight. Due to the all-women centers of France, It-Naples and NL-Utrecht, the proportion of men was substantially lower in the validation sample in comparison to the derivation set (21.5 vs. 41.2%). Mean annual weight gain was higher in the validation than in the derivation sample (521 g/y vs. 395 g/y) which may be explained by the shorter duration of follow-up in the validation sample and by the fact that weight fluctuations are higher over shorter periods of time.
The pooled estimates of relative risk for the association of included predictors with risk of SWG and corresponding score points assigned to each predictor are presented in Table 2. The pooled estimate of the background probability of avoiding SWG (analogous to ‘survival’) at 5 years estimated at average values of the predictors was 0.9331, implying that under average conditions about 93% of the population will stay free of SWG while 7% will experience SWG within 5 years. For each participant, the probability of SWG during the next 5 years [P(SWG,5y)] was calculated by inserting the individual's risk score into the following survival function while correcting for the averages of the participants’ risk factors:
The probability of experiencing SWG within the following 5 years for 100, 150, 200, 250, 300, 350, and 400 score points was 2.4, 3.9, 6.3, 10.2, 16.3, 25.4, and 38.3%, respectively. The discriminatory ability of the model measured by the c index (95% CI) was 0.64 (0.63–0.65) in the derivation sample. This means that individuals who experienced SWG during 5 years had higher predicted risks than persons not experiencing SWG in 64% of the cases. The discriminatory accuracy showed some variation across centers, with c indexes (95% CI) ranging from 0.64 (0.62–0.65) in DK-Copenhagen/Aarhus to 0.71 (0.68-0.75) in NL-Amsterdam/Maastricht. The overall discriminatory accuracy in the validation sample was 0.57 (0.56–0.58). Similarly to the observation for development sample, it differed across single centers, varying between 0.56 (0.55–0.57) in France and 0.67 (0.64–0.71) in IT-Naples. In addition to between-center differences, the score generally performed better among men than women (Table 3), while the additional inclusion of menopausal status at recruitment did not affect the observed discriminatory accuracy in women across centers (data not shown).
Information on sensitivity, specificity and predictive values according to various cut-off points of the score in the derivation sample suggested a threshold of ≥200 points as the optimal cut-off value to define high-risk individuals (Youden's index, J = 0.208) (Table 3). This threshold captured 74% of the cases who experienced SWG. Furthermore, 46% of the persons who did not experience SWG had a score <200. The corresponding positive and negative predictive values were 9% and 96%, respectively.
The estimated probability of experiencing SWG during 5 years agreed very well with the observed proportion of incident cases across tenths of predicted risk in the derivation sample although there was a slight overestimation of risk in the highest and lowest tenths of risk (Figure 2a, p = 0.02). In the total validation population, the score was also able to adequately quantify absolute risk, though comparison of observed and predicted risk implied a slight overestimation of risk in the lower and upper range of the score values and a slight underestimation in the middle range of the score (Figure 2b, p<.001). Inspection of calibration plots for each validation center indicated good calibration for the centers of Greece, UK-Health Conscious, UK-General Population and NL-Utrecht and adequate calibration in France and SWE-Malmoe (data not shown). In GER-Heidelberg we observed a systematic overestimation of risk, while in Spain calibration was poor, but no clear pattern of miscalibration was found.
Corresponding range of points for tenths in the derivation sample were <145, 145–<165, 165–<181, 181–<194, 194–<206, 206–<218, 218–<231, 231–<246, 246–<267, and ≥267. P for calibration = 0.02. Corresponding range of points for tenths in the validation sample <162, 162–<185, 185–<200, 200–<212, 212–<223, 223–<234, 234–<246, 246–<259, 259–<280, and ≥280. P for calibration = <001.
The use of center–specific weights led to a marked improvement in discriminatory accuracy in the validation centers of France, Spain, Greece and GER-Heidelberg (Table 4). When center-specific risk scores were developed (based on center-specific selection strategy), model performance remained essentially unchanged in comparison to the re-estimated model for all centers. The only exception was France for which discrimination improved from 0.61 (0.60–0.62) to 0.65 (0.63–0.67). Calibration generally improved or remained unchanged in the re-estimated model across validation centers (data not shown). Exceptions were the centers of GER-Heidelberg where risk remained continuously overestimated and Spain showing over- and underestimations of risk. Even in center-specific models, agreement between observed and predicted risk did not improve for those two centers.
In this large multi-center prospective study of middle-aged European men and women, a risk score based on numerous easily assessable socio-demographic, dietary and lifestyle factors was found to exhibit moderate discriminatory accuracy and ability to accurately predict risk of experiencing SWG during the following 5 years.
Major strengths of the present study are its prospective design, its large sample size, the availability of information on a large number of risk factors for weight gain, the use of multiple imputation techniques to avoid potential bias in derivation of the score and the validation of the risk score in several independent, culturally diverse study populations.
Some methodological limitations need to be considered. At follow-up, most participants provided self-measured weight. However, we tried to correct for potential underreporting by applying prediction equations . Further, only two measurements of body weight were available for each individual and weight gain was considered linear, which is a strong assumption about the course of weight gain. Weight gain is reversible, and it is well known that body weight tends to fluctuate over time , which may lead to repeated cycles of weight loss and recovery ,  that are not reflected in a two-point-in-time measurement. Fluctuations or non-linear weight gain in general may have resulted in misclassification of cases and non-cases and additionally in misspecification of the cases' time to event, which might have limited the performance of the obtained risk score model. Nevertheless, recent findings from the EPIC-Potsdam study based on 5 measurements of weight suggest that weight gain can be reasonably well approximated by a straight line over a follow-up period of 8 years on the population-level .
Directions of associations with SWG for some predictors in our model may be difficult to explain on a causal basis. It has to be kept in mind though that, in contrast to etiological studies trying to explain the cause of a disease, a prediction model aims to develop a good predictor to enable accurate predictions of the outcome . Thus, predictors in a prediction model do not necessarily need to be well-established etiological factors with a strong biological background. They could also be a marker of other lifestyle factors which influence mechanisms that are implicated in the regulation of body weight. Thus, caution may be warranted to avoid misinterpretation of the identified predictors in terms of driving weight gain. Regarding the positive association of baseline smoking with SWG, for example, we explored in a sub-analysis that this relation was driven by the strong weight-increasing effect of smoking cessation during follow-up, while continuous smoking was not related to a higher risk of SWG compared to non-smoking. This finding may be kept in mind when interpreting the results and emphasizes that weight management is warranted among individuals who attempt to quit smoking. Nevertheless, because future changes in smoking habits are unknown at the time of prediction, only baseline variables were included in the prediction model.
The discriminatory ability of the score was generally low to modest which may be explained by lack of information on some predictors in this analysis. Specifically, weight loss attempts , weight cycling ,  and large short-term weight changes ,  have been shown to determine future weight change. However, to obtain this type of information, a closer contact between participants and study personnel is required and assessment of this information in all centers of such a large study is challenging. Also, despite recent weight history may predict weight change in the near future, it is currently unknown whether this information is a strong factor to predict weight change over longer periods, e.g. 5 years.
In the field of chronic diseases, hopes have been raised that information on common genetic markers may be used to improve discriminatory accuracy beyond non-invasive factors and biochemical measures , . The predictive ability of genetic factors, however, currently appears limited , . For example, the addition of seven SNP's to the breast cancer model developed by Gail et al. only modestly improved discriminatory accuracy . Similarly, the additional inclusion of 20 diabetogenic SNP's did barely improve discrimination of incident type 2 diabetes beyond lifestyle factors and metabolic markers in the EPIC-Potsdam cohort . In respect to obesity, the EPIC-Norfolk study reported that 12 obesity-susceptible loci explained 0.9% of variation in BMI, with a c index of 0.57 for prediction of obesity . Thus, despite overwhelming statistical significances and repeated replications, the explained variance and the predictive value of the currently identified obesity-susceptibility loci is low  and a considerable improvement of the model's accuracy due to inclusion of genetic markers appears unlikely. Additionally, it should be noted that very large independent relative risks are needed for a single predictor to meaningfully improve discrimination .
The discriminatory ability of the present risk score was reduced in the external validation sample, an observation that is also commonly reported for external validation studies in the field of chronic diseases , . Several reasons may be thought of to explain this phenomenon. First, overfitting of the model in the derivation sample may be responsible for the poorer performance in the validation sample; however, given that the sample size of the development sample was large and that the amount of optimism decreases with larger sample size , this explanation appears unlikely. Second, lower predictive accuracy in external populations may be due to differences between the derivation and validation population, especially with regard to methods of data collection, coding of predictors and endpoint, and the availability of all variables used to construct the score . However, given the standardised methodology followed in EPIC, this explanation also seems rather unlikely. To account for the fact that some validation centers were sampled from specific groups rather than the general population, e.g. France, IT-Naples, NL-Utrecht and UK-Oxford, which may affect the model's performance, we excluded those centers in sensitivity analyses. Nevertheless, the overall discriminatory accuracy did barely change (0.59 (0.58–0.60)). Interestingly, apart from the overall difference in predictive ability between derivation and validation sample, there was considerable variation in discrimination across single cohorts of the derivation and validation sample, respectively. Specifically, discriminatory power ranged from 0.64 in UK-Cop./Aarhus to 0.76 in NL-AmMa in the derivation set and varied between 0.56 (France) and 0.67 in IT-Naples in the validation set. It is further noteworthy that a comparable predictive accuracy was exhibited among centers of similar socio-cultural background in the derivation and validation sample (e.g. in Denmark and Sweden, in Potsdam and Heidelberg). This suggests the prediction of weight gain to depend on underlying socio-cultural factors that were not similarly represented by the predictors included in the present model across the trans-European study populations.
The risk score adequately estimated risk in the total validation sample, while in some of the centers calibration was poor. It has been suggested that adjusting or re-calibration of the score to the local circumstances in external populations may increase the predictive performance. In the present study, re-estimation of regression coefficients slightly improved calibration in most validation centers except for GER-Heidelberg and Spain in which calibration remained poor even in center-specific models. An explanation for this finding may be the considerably shorter follow-up time in the two centers. While our prediction model was tailored to the time period of 5 years, average observed follow-up times in GER-Heidelberg and Spain were 2.1 and 3.3 years, respectively. Unfortunately, we did not have access to more recent data to further investigate this issue. Discriminatory ability markedly improved in four of the nine validation centers in the re-estimated model, whereas center-specific models did generally not lead to further improvements in discriminatory ability. The only exception was the center of France for which a population-specific model yielded a c index of 0.65 (0.63–0.67) compared to 0.61 (0.60–0.62) in the re-estimated model and 0.56 (0.55–0.57) in the overall model.
Despite the observed improvements in discrimination using re-estimation of parameters, the performance measures were generally moderate. Although we cannot rule out the possibility that important, maybe population-specific, predictors may not have been assessed in this study, our findings based on a wide range of predictors and several culturally diverse study populations rather convey the impression that the predictability of weight gain based on data from large population-based studies might be limited in general.
Test characteristics of the risk score also challenge its practical implementation into prevention programs. The optimal cut-off value to define high-risk individuals was ≥200 points and implies that preventive actions will be indicated for a substantial part of the population (55%). Of these high-risk individuals, 9% will indeed experience SWG within 5 years. On the other hand, 96% of the individuals with a score <200 will indeed not develop SWG. It is of note that the optimal cut-off point was exemplarily defined using the Youden's index and for its calculation, sensitivity and specificity are considered as equally important. This however might not hold true in practice. When implementing a risk score in practice, designation of a cut-off value should depend on the importance attached to false-positives and false-negatives accounting for misclassification costs.
In conclusion, the present risk score was able to confidently exclude a large proportion of individuals from being at any appreciable risk to develop SWG within the next 5 years. Future studies, however, may attempt to further refine the positive prediction of the score by for example considering additional predictors both in general and on the national level.
Special thanks deserves Wolfgang Bernigau for his major support in conducting the revision of this article.
Conceived and designed the experiments: AS TIAS SK PHMP HB. Performed the experiments: AS. Analyzed the data: AS. Wrote the paper: AS. Gave critical constructive feedback to the draft of the manuscript on data analysis, interpretation of results, and writing of the manuscript: NT MJS JMH JRQ EA MD BT KL HBB DvdA AM DP RT VK PV A. Trichopoulou PO DT BH PW KO JH A. Tjønneland GF LD FC KTK NW LM AMM.
- 1. Berghofer A, Pischon T, Reinhold T, Apovian CM, Sharma AM, et al. (2008) Obesity prevalence from a European perspective: a systematic review. BMC Public Health 8: 200.
- 2. Branca F, Nikogocian H, Lobstein T (2007) The Challenge of Obesity in the WHO European Region and the Strategies for Response. Copenhagen: WHO Regional Office for Europe.
- 3. Guh DP, Zhang W, Bansback N, Amarsi Z, Birmingham CL, et al. (2009) The incidence of co-morbidities related to obesity and overweight: a systematic review and meta-analysis. BMC Public Health 9: 88.
- 4. Fogarty AW, Glancy C, Jones S, Lewis SA, McKeever TM, et al. (2008) A prospective study of weight change and systemic inflammation over 9 y. Am J Clin Nutr 87: 30–35.
- 5. Juntunen M, Niskanen L, Saarelainen J, Tuppurainen M, Saarikoski S, et al. (2003) Changes in body weight and onset of hypertension in perimenopausal women. J Hum Hypertens 17: 775–779.
- 6. Norman JE, Bild D, Lewis CE, Liu K, West DS, et al. (2003) The impact of weight change on cardiovascular disease risk factors in young black and white adults: the CARDIA study. Int J Obes Relat Metab Disord 27: 369–376.
- 7. Hu FB, Willett WC, Li T, Stampfer MJ, Colditz GA, et al. (2004) Adiposity as compared with physical activity in predicting mortality among women. N Engl J Med 351: 2694–2703.
- 8. Black E, Holst C, Astrup A, Toubro S, Echwald S, et al. (2005) Long-term influences of body-weight changes, independent of the attained weight, on risk of impaired glucose tolerance and Type 2 diabetes. Diabet Med 22: 1199–1205.
- 9. World Health Organization (2000) Obesity: Preventing and Managing The Global Epidemic. Report of a WHO Consultation. Report No: 894. Geneva: WHO.
- 10. Cui J (2009) Overview of risk prediction models in cardiovascular disease research. Ann Epidemiol 19: 711–717.
- 11. Buijsse B, Simmons RK, Griffin SJ, Schulze MB (2011) Risk Assessment Tools for Identifying Individuals at Risk of Developing Type 2 Diabetes. Epidemiol Rev 33(1): 46–62.
- 12. Driver JA, Gaziano JM, Gelber RP, Lee IM, Buring JE, et al. (2007) Development of a risk score for colorectal cancer in men. Am J Med 120: 257–263.
- 13. Gail MH, Brinton LA, Byar DP, Corle DK, Green SB, et al. (1989) Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst 81: 1879–1886.
- 14. Spitz MR, Hong WK, Amos CI, Wu X, Schabath MB, et al. (2007) A risk model for prediction of lung cancer. J Natl Cancer Inst 99: 715–726.
- 15. Riboli E, Kaaks R (1997) The EPIC Project: rationale and study design. European Prospective Investigation into Cancer and Nutrition. Int J Epidemiol 26 Suppl 1S6–14.
- 16. Riboli E, Hunt KJ, Slimani N, Ferrari P, Norat T, et al. (2002) European Prospective Investigation into Cancer and Nutrition (EPIC): study populations and data collection. Public Health Nutr 5: 1113–1124.
- 17. Saris WH (2005) DiOGenes: an integrated multidisciplinary approach to the obesity problem in Europe. Nutrition Bulletin 30: 188–193.
- 18. Snijder MB, van Dam RM, Visser M, Seidell JC (2006) What aspects of body fat are particularly hazardous and how do we measure them? Int J Epidemiol 35: 83–92.
- 19. Kaaks R, Slimani N, Riboli E (1997) Pilot phase studies on the accuracy of dietary intake measurements in the EPIC project: overall evaluation of results. European Prospective Investigation into Cancer and Nutrition. Int J Epidemiol 26: S26–36
- 20. Margetts BM, Pietinen P (1997) European Prospective Investigation into Cancer and Nutrition: validity studies on dietary assessment methods. Int J Epidemiol 26 Suppl 1S1–5.
- 21. Haftenberger M, Lahmann PH, Panico S, Gonzalez CA, Seidell JC, et al. (2002) Overweight, obesity and fat distribution in 50- to 64-year-old participants in the European Prospective Investigation into Cancer and Nutrition (EPIC). Public Health Nutr 5: 1147–1162.
- 22. Bergmann MM, Schutze M, Steffen A, Boeing H, Halkjaer J, et al. (2011) The association of lifetime alcohol use with measures of abdominal and general adiposity in a large-scale European cohort. Eur J Clin Nutr 65: 1079–1087.
- 23. Buijsse B, Feskens EJ, Schulze MB, Forouhi NG, Wareham NJ, et al. (2009) Fruit and vegetable intakes and subsequent changes in body weight in European populations: results from the project on Diet, Obesity, and Genes (DiOGenes). Am J Clin Nutr 90: 202–209.
- 24. Vergnaud AC, Norat T, Romaguera D, Mouw T, May AM, et al. (2010) Meat consumption and prospective weight change in participants of the EPIC-PANACEA study. Am J Clin Nutr 92: 398–407.
- 25. Du H, van der A D, Boshuizen HC, Forouhi NG, Wareham NJ, et al. (2010) Dietary fiber and subsequent changes in body weight and waist circumference in European men and women. Am J Clin Nutr 91: 329–336.
- 26. Romaguera D, Norat T, Vergnaud AC, Mouw T, May AM, et al. (2010) Mediterranean dietary patterns and prospective weight change in participants of the EPIC-PANACEA project. Am J Clin Nutr 92: 912–921.
- 27. Malik VS, Schulze MB, Hu FB (2006) Intake of sugar-sweetened beverages and weight gain: a systematic review. Am J Clin Nutr 84: 274–288.
- 28. Pala V, Krogh V, Berrino F, Sieri S, Grioni S, et al. (2009) Meat, eggs, dairy products, and risk of breast cancer in the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. Am J Clin Nutr 90: 602–612.
- 29. Von Rüsten A (2012) Evaluation of food-based dietary guidelines of Germany concerning their potential of chronic disease prevention with suggestions for improvement: Results from the EPIC-Potsdam study (submitted) [PhD Thesis]. Berlin: Technical University.
- 30. Parmar MKB, Machin D (1995) Survival analysis – a practical approach. Chichester: John Wiley & Sons.
- 31. Donders AR, van der Heijden GJ, Stijnen T, Moons KG (2006) Review: a gentle introduction to imputation of missing values. J Clin Epidemiol 59: 1087–1091.
- 32. Sterne JA, White IR, Carlin JB, Spratt M, Royston P, et al. (2009) Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 338: b2393.
- 33. Vergouwe Y, Royston P, Moons KG, Altman DG (2010) Development and validation of a prediction model with missing predictor data: a practical approach. J Clin Epidemiol 63: 205–214.
- 34. D'Agostino RB, Nam BH (2004) Evaluation of the performance of surival analysis models: discrimination and calibration measures. Handbook of Statistics. New York: Elsevier. 1–15.
- 35. Pencina MJ, D'Agostino RB (2004) Overall C as a measure of discrimination in survival analysis: model specific population value and confidence interval estimation. Stat Med 23: 2109–2123.
- 36. Youden WJ (1950) Index for rating diagnostic tests. Cancer 3: 32–35.
- 37. Bewick V, Cheek L, Ball J (2004) Statistics review 13: receiver operating characteristic curves. Crit Care 8: 508–512.
- 38. Colditz GA, Willett WC, Stampfer MJ, London SJ, Segal MR, et al. (1990) Patterns of weight change and their relation to diet in a cohort of healthy women. Am J Clin Nutr 51: 1100–1105.
- 39. Lahti-Koski M, Mannisto S, Pietinen P, Vartiainen E (2005) Prevalence of weight cycling and its relation to health indicators in Finland. Obes Res 13: 333–341.
- 40. Vergnaud AC, Bertrais S, Oppert JM, Maillard-Teyssier L, Galan P, et al. (2008) Weight fluctuations and risk for metabolic syndrome in an adult cohort. Int J Obes (Lond) 32: 315–321.
- 41. von Rüsten A, Steffen A, Floegel A, van der A DL, Masala G, et al.. (2011) Trend in Obesity Prevalence in European Adult Cohort Populations during Follow-up since 1996 and Their Predictions to 2015. PLoS ONE 6.
- 42. Moons KG, Altman DG, Vergouwe Y, Royston P (2009) Prognosis and prognostic research: application and impact of prognostic models in clinical practice. BMJ 338: b606.
- 43. Korkeila M, Rissanen A, Kaprio J, Sorensen TI, Koskenvuo M (1999) Weight-loss attempts and risk of major weight gain: a prospective study in Finnish adults. Am J Clin Nutr 70: 965–975.
- 44. Field AE, Wing RR, Manson JE, Spiegelman DL, Willett WC (2001) Relationship of a large weight loss to long-term weight change among young and middle-aged US women. Int J Obes Relat Metab Disord 25: 1113–1121.
- 45. Kroke A, Liese AD, Schulz M, Bergmann MM, Klipstein-Grobusch K, et al. (2002) Recent weight changes and weight cycling as predictors of subsequent two year weight change in a middle-aged cohort. Int J Obes Relat Metab Disord 26: 403–409.
- 46. Gail MH (2008) Discriminatory accuracy from single-nucleotide polymorphisms in models to predict breast cancer risk. J Natl Cancer Inst 100: 1037–1041.
- 47. Schulze MB, Weikert C, Pischon T, Bergmann MM, Al-Hasani H, et al. (2009) Use of multiple metabolic and genetic markers to improve the prediction of type 2 diabetes: the EPIC-Potsdam Study. Diabetes Care 32: 2116–2119.
- 48. Li S, Zhao JH, Luan J, Luben RN, Rodwell SA, et al. (2010) Cumulative effects and predictive value of common obesity-susceptibility variants identified by genome-wide association studies. Am J Clin Nutr 91: 184–190.
- 49. Vimaleswaran KS, Loos RJ (2010) Progress in the genetics of common obesity and type 2 diabetes. Expert Rev Mol Med 12: e7.
- 50. Steyerberg EW (2009) Clinical Prediction Models – A Practical Approach to Development, Validation, and Updating. Gail M, Krickeberg K, Samet J, Tsiatis A, Wong W, editors. New York: Springer Science + Business Media.