Risk Profiles for Weight Gain among Postmenopausal Women: A Classification and Regression Tree Analysis Approach

Purpose Risk factors for obesity and weight gain are typically evaluated individually while “adjusting for” the influence of other confounding factors, and few studies, if any, have created risk profiles by clustering risk factors. We identified subgroups of postmenopausal women homogeneous in their clustered modifiable and non-modifiable risk factors for gaining ≥ 3% weight. Methods This study included 612 postmenopausal women 50–79 years old, enrolled in an ancillary study of the Women's Health Initiative Observational Study between February 1995 and July 1998. Classification and regression tree and stepwise regression models were built and compared. Results Of 27 selected variables, the factors significantly related to ≥ 3% weight gain were weight change in the past 2 years, age at menopause, dietary fiber, fat, alcohol intake, and smoking. In women younger than 65 years, less than 4 kg weight change in the past 2 years sufficiently reduced risk of ≥ 3% weight gain. Different combinations of risk factors related to weight gain were reported for subgroups of women: women 65 years or older (essential factor: < 9.8 g/day dietary factor), African Americans (essential factor: currently smoking), and white women (essential factor: ≥ 5 kg weight change for the past 2 years). Conclusions Our findings suggest specific characteristics for particular subgroups of postmenopausal women that may be useful for identifying those at risk for weight gain. The study results may be useful for targeting efforts to promote strategies to reduce the risk of obesity and weight gain in subgroups of postmenopausal women and maximize the effect of weight control by decreasing obesity-relevant adverse health outcomes.

Classification and regression tree (CART) analysis is an innovative approach which uncovers complex interactions among variables that may be overlooked in traditional analyses. [11,27] The CART methodology allows us to identify homogeneous subgroups of individuals based on shared factors related to an outcome of interest (i.e., !3% weight gain in our study) and examine the risk magnitude of given risk factors within the subgroup. [27,28] We employed this analytic approach to generate profiles of postmenopausal women who are at risk of 3% or more of weight gain after 3 years of follow-up, based on a set of modifiable and non-modifiable characteristics that have been known to be independently associated with postmenopausal obesity.

Study population
The study population consisted of women who were enrolled in an ancillary study of the Women's Health Initiative Observational Study (WHI-OS) at WHI clinical centers at Baylor College of Medicine and Wake Forest School of Medicine between February 1995 and July 1998. Women were eligible for the study if they were 50-79 years old, postmenopausal (defined as having had a hysterectomy or not having had menstrual bleeding for the previous 6 months [if !56 years old] or 12 months [if 50-55 years old]), planned to live in the area near clinical centers for at least 3 years after study enrollment, and were able to provide written consent. For the purpose of the study, women who self-reported their ethnicity/race as non-Hispanic white and African American (AA) were included. Details of the study rationale and design have been described elsewhere. [29] Of the 834 participants, 9 who did not complete a required baseline screening questionnaire and 213 who had missing information on either the outcome of interest (i.e., !3% weight gain since baseline, n = 7) or exposure variables (n = 206) were excluded, resulting in a final study population of 612 women (73% of 834 participants). This study was approved by the institutional review boards at the University of Texas MD Anderson Cancer Center, Baylor College of Medicine, and Wake Forest School of Medicine.

Data collection
Standardized written protocols and periodic quality assurance visits by the coordinating center were used to assure uniform data collection procedures. At a baseline screening visit, participants completed self-administered questionnaires including demographic and socioeconomic factors, medical and reproductive history, lifestyle behaviors, and general health characteristics. Trained staff performed anthropometric measures including height, weight, and waist and hip circumferences. We assumed that the baseline measurement of the exposure variables had not changed throughout the third annual visit (AV3) unless their measurements were reassessed at AV3 for their changes during the preceding 3 years. In the latter cases (i.e., patients with follow-up measurements at AV3), such as those who smoked or used exogenous estrogen, the AV3 measurements were postulated to dominate the 3-year period; thus, a new variable was created to account for changes in the measurements between baseline and AV3.
Of 42 variables initially selected based on a literature review for their association with obesity and weight gain, after a multicollinearity test (n of variables excluded = 4) and a univariate analysis (n of variables excluded = 11), 27 variables were finally selected for this study. Demographic and socioeconomic characteristics included age, race, and employment status (i.e., fulltime, part-time, or unemployed). To evaluate weight gain outcomes, age was classified using 65 as cutoff point of old/young group because in the preliminary CART analysis using weight changes as continuous outcomes, women younger than 65 years had different patterns of weight changes compared with those 65 years or older (average percent weight change = 2.2% in age !65 years vs. -0.5% in age < 65 years).
Reproductive history variables included oral contraceptive (OC) use, age at menopause, number of pregnancies, and postmenopausal hormone therapy. Exogenous estrogen use (both opposed and unopposed estrogen use) was classified as never, former, and current; former users were defined as those who stopped estrogen use upon enrollment, and current users included those who began to use estrogen either before or after enrollment and still took the medications at AV3.
Lifestyle variables included smoking status, dietary intake, sleep disturbance, depression, presence or absence of a lifetime sex partner, and physical activity. Data about dietary intake were obtained using the Food Frequency Questionnaire, and only the following variables known to be related to obesity were included: total calories, dietary alcohol and fiber, and percent of calories from saturated fatty acids (SFA), monounsaturated fatty acids (MFA), and polyunsaturated fatty acids (PFA). Metabolic equivalent task (MET) values were assigned for strenuous-, moderate-, and low-intensity activities as 7, 4, and 3 METs, respectively. [15] A total physical activity variable (METÁhoursÁweek -1 ) was then calculated by multiplying the MET level for the activity by the hours exercised per week and summing the values for all types of activities. [30][31][32] Total physical activity was stratified using 10 METs as a cutoff-point on the basis of current recommendations from the American College of Sports Medicine and the American Heart Association. [31] General health characteristics measured at AV3 included weight change for the past 2 years when participants were followed up at AV3. The weight change variable was created by subtracting the lowest weight from the highest weight for the previous 2 years. Additionally, lifetime variables of general health included BMI at 35 years and an intentional loss of 10 pounds or more within the past 20 years (except during times when participants were pregnant or sick).

Outcome variable
The outcome was a binary variable, weight gain. Weight change was estimated by subtracting the weight at baseline from the weight at AV3. The percentage of weight change was calculated by dividing the weight change by the baseline weight (ranged, -24% to 38%) and then classified as less than 3% or 3% or more. The cutoff point of 3% was determined based on the strong consensus about the percentage of weight gain at which risk for obesity-related health effects (e.g., cardiovascular disease and type 2 diabetes) begins to change. [33] Given that weight gain as much as 3% is considered potentially clinically relevant as well as accounting for small weight fluctuations [33], weight gain in our study was defined as 3% or greater of baseline body weight.

Statistical analysis
Multicollinearity was tested by using coefficient of multiple determination (R2), tolerance, and variance-inflation factor for each exposure variable using remaining covariates as its predictors. Differences in characteristics of participants by weight gain were evaluated using unpaired 2-sample t-tests for continuous variables and chi-square tests for categorical variables. If continuous variables were skewed or had outliers, the Wilcoxon rank-sum test was implemented.
CART analysis was implemented to explore the successive binary divergences of the exposure variables in order to identify subgroups on the basis of their homogeneity in relation to gaining 3% or more weight. The CART built a tree via recursive partitioning and the tree development included three steps: growing the tree, pruning the tree, and validating the tree structure. [12,27,28,34,35] First, a large and complex tree was grown with data from all study variables, each of which was evaluated based on the improvement score using the Gini index in the nominal outcomes (< 3% vs. !3% weight gain in our study) and sum of squares in the continuous outcomes (weight changes in our study) to determine the optimum cutoff value (continuous variables) or groupings (nominal variables) that gives the best discrimination between two outcome classes; finally, the strongest predictor variable and its splitting value were determined to split the data into two subgroups (i.e., daughter nodes). The subgroups were then split repeatedly into smaller subgroups representing the most homogeneous split (i.e., terminal node) or daughter nodes in a previous layer. Each terminal node was set to require a minimum of 5 individuals. Because the original tree was too large and statistically uninformative, we then pruned the tree to eliminate branches of the original tree to produce the "right-sized tree," representing the lowest misclassification. Finally, based on the lowest cross-validated error rate, as determined by a cost-complexity pruning algorithm using 10-fold cross-validation, the optimal tree was selected from our pruned trees, which was the best fit and did not over-fit the data. The CART is a nonparametric procedure that does not need any assumptions about the data distribution. [36] Analyses were performed by applying rpart version 4.1-8 for the open-source R statistical software..
Further, stepwise logistic regression was performed to compare findings with those produced by the CART; it produced odds ratios (ORs) and 95% confidence intervals (CIs) of exposure variables for weight gain, stratified by race or age. A 2-tailed P value of <0.05 was considered significant. R version 2.15.1 was used.

Results
Baseline characteristics between < 3% vs. !3% weight gain stratified by age (< 65 years vs. !65 years) are presented (Table 1). Among women < 65 years, those with !3% weight gain were more likely to have undergone an early menopausal transition (P = 0.03) and more likely to have intentionally lost 10 pounds or more during the past 20 years (P = 0.02). In women 65 years or older, women who gained !3% weight were more likely to consume fewer total calories (P = 0.02), less dietary alcohol (P = 0.04), and less dietary fiber (P = 0.003). In both age groups, women with !3% weight gain were more likely to have had a greater change in weight during the 2 years prior to AV3 (P = 0.001 in the younger group; P = 0.02 in the older group). Participants were also stratified by race (AA vs. white women), and their characteristics between < 3% vs. !3% weight gain were compared (S1 Table).

Classification tree
Risk profiles of women who gained !3% weight, stratified by age. In the preliminary classification analysis using weight change as a continuous outcome, age, classified as < 65 years vs. !65 years, was identified as the most determinant variable for weight-gain outcome. In addition, the older and younger women differed in the cluster of characteristics associated with !3% weight gain. In women < 65 years, the prevalence for gaining !3% weight was 42% (Fig 1A). The first split from total participants < 65 years (root node, n = 403) indicating a dominant effect was according to weight change for the past 2 years. With those with < 4.1 kg weight change for the past 2 years as the reference, Fig 1A presents naïve ORs for other terminal nodes. The percentage of women gaining !3% weight decreased from 42% (root node) to 30% for women who had < 4.1 kg weight change for 2 years (terminal node 1). Women with !4.1 kg weight change during the past 2 years were further split by age at menopause. Compared with women < 4.1 kg weight change for the past 2 years, women who had !4.1 kg weight change and entered menopause at < 44 years were more likely to gain !3% weight (61%, OR = 3.76, 95% CI, 2.17-6.59, terminal node 5). When women who entered menopause at !44 years were further split by alcohol intake and dietary fiber, the percentage of women who gained !3% weight increased to 60% for women who consumed < 6 g/day alcohol and < 10.3 g/day dietary  1B). In women !65 years, the percentage of women gaining !3% weight was 26%. The first split representing a dominant effect was !9.8 g/day dietary fiber intake. With those with !9.8 g/day dietary fiber intake as the reference, Fig 1A also shows naïve ORs for other terminal nodes. The lowest risk of gaining !3% weight was observed among women who consumed !9.8 g/day dietary fiber (22%, terminal node 1). When women who consumed < 9.8 g/day dietary fiber were then split by age at menopause, the prevalence of gaining !3% weight increased to 72% for women who entered menopause at < 51 years (OR = 9.27, 95% CI, 3.28-30.36, terminal node 3). On the contrary, women who entered menopause at !51 years reduced risk (23%) of gaining !3% weight (terminal node 2).
Risk profiles of women who gained !3% weight, stratified by race. Additionally, we identified within racial groups, homogeneous subgroups based on risk characteristics related to !3% weight gain. Compared to white women, AA women presented different risk profiles. In AA women, the prevalence of !3% weight gain was 37% (Fig 2A), and the dominant variable that split the root node was smoking status. The highest risk of gaining !3% weight was observed in women who were current smokers (87%, terminal node 5). Variables that were  involved in the next splits were dietary fiber intake, weight change for the past 2 years, and the percentage of calories from MFA. Among never and former smokers, the combination of lower dietary fiber intake, greater weight change for the past 2 years, and higher percentage of calories from MFA increased the risk of !3% weight gain (76%, terminal node 4). Likewise, in white women, the percentage of gaining !3% weight was 36% (Fig 2B). The variable demonstrating the greatest impact on !3% weight gain was weight change for the past 2 years. Women who had < 5 kg weight change for the past 2 years, when further split, presented different risk patterns according to dietary fiber intake. For example, among women with < 9.5 g/day dietary fiber intake (terminal node 2), the percentage of gaining !3% weight increased to 65%; however, women with !9.5 g/day dietary fiber intake had decreased risk (23%, terminal node 1). Additionally, women with !5 kg weight change for the past 2 years were then split by BMI at 35 years. While women who had < 20 kg/m 2 BMI at 35 years decreased the risk of !3% weight gain to 23% (terminal node 3), women who had !20 kg/m 2 BMI at 35 years increased the risk, but not in a linear pattern; that is, in women who had 20 kg/ m 2 to 21.7 kg/m 2 BMI at 35 years, the risk of !3% weight gain (69%, terminal node 4) were higher than the risk in women with !21.7 kg/m 2 BMI at 35 years (43%, terminal node 5).

Stepwise regression
The findings from regression analyses were overall comparable to those in CART analyses (Tables 2 and 3). In women < 65 years, weight change for the past 2 years, age at menopause, and dietary fiber intake were related to gaining !3% weight; these findings parallel those in the corresponding tree. However, the alcohol intake that emerged in the tree was not significant in the regression analysis. For women !65 years, dietary fiber intake, which was the primary splitter in the tree, was found to be the only significant factor in the analysis. Additionally, among AA women, smoking status, dietary fiber intake, and weight change for the past 2 years, involved as splitters in the tree, were significant factors in the stepwise analysis. For white women, all variables that were splitters in the tree (i.e., dietary fiber intake, weight change for Risk Profiles for Weight Gain the past 2 years, and BMI at 35 years) were also significant in the regression analysis; however, age and BMI at baseline and !10 pounds lost intentionally within the past 20 years, which were significant in the regression, were not significant predictors in the CART analysis.

Discussion
Using the CART approach, we sought to construct risk profiles for !3% weight gain over 3 years in postmenopausal women within the context of a wide array of modifiable and nonmodifiable factors. The individual factors selected in our study have been well documented for their association of obesity and weight gain; however, to our knowledge, such associations in a population of postmenopausal women by clustering these variables into unique risk profiles have not been reported. We demonstrated that when women were stratified by age or race, complex combinations of risk factors differ among subgroups. In addition, the factors that emerged in CART analyses were confirmed using the traditional stepwise regression analyses and the main predictor of weight gain was identified as weight change in the past 2 years for women < 65 years and white women, dietary fiber for women !65 years, and smoking status for AA women. The most dominant factor that predicted weight gain within 3 years in every age and race subgroup was weight change in the past 2 years. Recent studies showed that in postmenopausal women, after weight-loss intervention, weight regain was associated with weight gain after menopause [3,37], indicating that positive weight change (i.e., weight gain) after menopause was a risk factor of regaining weight. Our novel finding was that 5 kg weight fluctuations (including weight gain as well as weight loss) during the past 2 years were related to gaining !3% weight at AV3.
In agreement with previous studies [2,3], age at menopause, regardless of the age subgroups, was the next factor increasing the risk of weight gain. Early menopause is associated with weight gain because withdrawal of estrogen reduces lean body mass while increasing fat mass. In addition, compared to premenopausal women, postmenopausal women have a greater ratio of upper body fat to lower body fat. [2] Among postmenopausal women, exogenous estrogen users have been reported to decrease this shift, reducing the risk of obesity-relevant diseases. [2,38] However, we did not observe the significant role of exogenous estrogen use in decreasing the risk of weight gain. Additionally, our subgroup analysis within estrogen users according to duration of hormone use did not show any apparent differences. This is consistent with another study [38] suggesting that hormonal therapy status did not predict postmenopausal weight gain or fat accumulation, rather diminishes the shift of fat from hip to waist.
Across subgroups, the most frequently involved factor in the risk of gaining weight was dietary fiber intake. All participants had a decreased risk of !3% weight gain when they consumed 10 g/day or more dietary fiber, except AA women, who had reduced risk if they had !22 g/day dietary fiber. Dietary fiber promotes satiety and may reduce energy absorption or stimulate energy expenditure. [19][20][21][22] Smoking has a weight suppressant effect and weight gain is a following result of smoking cessation. [39] For our small subgroup (n = 8) in AA women, on the contrary, the risk of gaining !3% weight was 87% in current smokers, which might be an artifact; however, several studies [26,40,41] reporting the positive association between smoking and obesity suggest that the effect of smoking on weight loss is minimal in the short term and rather contribute to obesity and weight gain in the long term.
This study had limitations. The self-reporting of the dietary intake, smoking, and physical activity data limits study conclusions regarding these variables due to the likely prevalence of underreporting of dietary intake and smoking and overreporting of physical activity, especially in obese people. Further studies are warranted to collect data on additional variables, including obesity-relevant genes and biomarkers to increase the accuracy of predictions based on the classification. This study is exclusively based on postmenopausal women, which limits the generalizability to other populations. Additionally, the CART method is exploratory (i.e., it is not based on the probabilistic method), indicating that a composite of trees derived from other populations can be useful to illustrate the possible variability of interactions among risk factors related to weight gain. Despite its shortcomings, CART analysis has advantages. It is well-suited to summarize multiple covariate inter-relationships and provides a simple and easily viewed tree, which is useful for decision making. CART can deal with large numbers of variables and decrease type II errors. [42] In conclusion, this study revealed that among 27 selected modifiable and non-modifiable variables, greater weight change during the past 2 years, larger than !20 kg/m2 body size at 35 years, early menopause, lower intake of dietary fiber, higher intake of fat and alcohol, and smoking were the most relevant factors for gaining !3% weight. We used an analytic tree as a means of identifying higher and lower risk groups. Identifying factors related to weight gain within the subgroups of postmenopausal women may allow researchers to target efforts to promote strategies to reduce the risk of obesity and weight gain and maximize the effect of weight control by decreasing obesity-relevant adverse health outcomes.
Supporting Information S1 Table. Characteristics of participants, stratified by percent weight change according to race, enrolled in an ancillary study of the Women's Health Initiative Observational Study at Baylor College of Medicine or Wake Forest University School of Medicine between February 1995 and July 1998. (DOCX)