Predictors for Physical Activity in Adolescent Girls Using Statistical Shrinkage Techniques for Hierarchical Longitudinal Mixed Effects Models

We examined associations among longitudinal, multilevel variables and girls’ physical activity to determine the important predictors for physical activity change at different adolescent ages. The Trial of Activity for Adolescent Girls 2 study (Maryland) contributed participants from 8th (2009) to 11th grade (2011) (n=561). Questionnaires were used to obtain demographic, and psychosocial information (individual- and social-level variables); height, weight, and triceps skinfold to assess body composition; interviews and surveys for school-level data; and self-report for neighborhood-level variables. Moderate to vigorous physical activity minutes were assessed from accelerometers. A doubly regularized linear mixed effects model was used for the longitudinal multilevel data to identify the most important covariates for physical activity. Three fixed effects at the individual level and one random effect at the school level were chosen from an initial total of 66 variables, consisting of 47 fixed effects and 19 random effects variables, in additional to the time effect. Self-management strategies, perceived barriers, and social support from friends were the three selected fixed effects, and whether intramural or interscholastic programs were offered in middle school was the selected random effect. Psychosocial factors and friend support, plus a school’s physical activity environment, affect adolescent girl’s moderate to vigorous physical activity longitudinally.


Introduction
The prevalence of childhood overweight and obesity has increased dramatically in the United States over the past decades such that 1/3 of adolescents are either overweight or obese [1]. Physical inactivity, a risk factor for obesity [2], is high. National data suggest that only 8 percent of adolescents participate in the recommended amount of physical activity; that is, 60 minutes per day [3]. The resulting health problems that can occur from physical inactivity and obesity, such as type 2 diabetes, high blood pressure, and sleep disorders, have been on the rise in children and adolescents in recent years [4]. The Council on Sports Medicine and Fitness and Council on School Health recommend that increasing physical activity in children and adolescents can be effective in reducing the prevalence of obesity and resulting future health problems [5].
Physical activity declines during adolescence. Kimm and colleagues observed a 64% decline among white and a 100% decline in black girls between the ages of 9-10 years to 18-19 years [6]. Further, the decline is steeper in girls than in boys [7]. In order to create effective interventions and strategies to increase physical activity among adolescent girls, it is useful to determine important predictors of physical activity among this population. While previous studies examined factors that influence physical activity participation in adolescents [8], most have used cross-sectional designs. In addition, while it is now recognized that physical activity is a complex behavior that is determined by factors at multiple levels [9], only a few have included variables at the individual, social, and environmental levels [10][11][12][13]. There are even fewer studies of multi-level predictors of longitudinal physical activity among adolescents [14].
Social ecological models support the notion that multi-level factors influence health behavior and are becoming increasingly used to determine predictors of physical activity [9]. The Trial of Activity for Adolescent Girls (TAAG) developed a social ecological framework that guided the development of the intervention and measurement protocols for the trial, which was designed to reduce the amount of physical activity decline among adolescent girls [15]. The model included psychosocial theories relevant to behavior change as well as appraisals of environmental settings in which adolescents can be active (e.g., schools, community, home). The theories used included operant conditioning (i.e., increased positive reinforcement, reduced barriers for physical activity and reduced positive reinforcement for sedentary behaviors) [16], social cognitive theory (i.e., increased self-efficacy, positive outcome expectations, positive role models) [17], and organizational change (i.e., enhancing environments to promote physical activity) [18]. Constructs from this framework can be used to predict physical activity longitudinally.
Using the TAAG social ecological framework, the trial's subsequent study (Trial of Activity in Adolescent Girls 2 [TAAG 2]) provides longitudinal data for adolescent girls across six middle schools in Maryland to identify predictors of physical activity. The study used an objective measure of physical activity, accelerometry, which is considered the state of the art for physical activity assessment. The data are hierarchical and longitudinal in that there are individual level and school level predictors over two time points. It is necessary to use a mixed effects model in order to account for the within-individual correlations in observations over time, as well as to account for the within-group correlation in observations across the six schools. Young et al selected and estimated predictors of the TAAG at separate time points independently using the lasso variable selection technique [19]. In this paper, in order to determine significant predictors of physical activity over time, a new method of variable selection for linear mixed effects models was applied. In this model, not only the change of physical activity over time is considered, but also the predictors are allowed to be longitudinal (i.e., the values of predictors can change over time). Moreover, the clustering effects of schools can also be assessed in this model. The variable selection technique produces a parsimonious hierarchical longitudinal mixed effects model that selects important predictors of physical activity in adolescent girls over time and eliminates irrelevant predictors. These predictors can be used to guide future interventions.

Study Design
TAAG was a school and community-based, multisite trial targeted at girls with the goal of reducing the typical decline in physical activity during adolescence [20,21]. The study was initially conducted at 36 middle schools across six geographically diverse areas in the United States, specifically Arizona, California, Louisiana, Maryland, Minnesota, and South Carolina. This study was approved by the University of Maryland Institutional Review Board.
As a part of TAAG, data were collected to assess the sustainability of the program in the spring of 2006 in a group of 8th grade girls at all 36 schools. A follow-up study was conducted in the spring of 2009 with the girls at the Maryland field center when the girls were in 11th grade. The data for this analysis include those collected during the 2006 and 2009 time points from the Maryland site.

Data
Data for this analysis include those collected at the individual and school levels and are consistent with the constructs of the TAAG social ecological framework and its embedded behavior theories. Table 1 displays the key concepts used to identify relevant variables and the theories/ constructs they represent. Girls completed surveys at 8 th and 11 th grade that queried on demographic, psychosocial, and perceived environmental factors known to be associated with adolescent MVPA [8]. Middle school-level variables were collected when the girls were in 8 th grade.

Individual Demographic and Health Behavior Factors
Girls self-identified as non-Hispanic white, black or African American, Hispanic or Latino, Asian/Pacific Islander, American Indian or Alaska Native, or other. They reported participation in free or reduced-price school lunch program, parent employment status, parent education, 1 or 2-parent household status, time spent at home alone unsupervised, and cigarette use. Height and weight were measured using standard procedures; body mass index (kg/m 2 ) was calculated and girls were placed into normal weight, overweight, and obese categories based on Table 1. Description of how key constructs apply to the settings and theories included in the Trial of Activity for Adolescent Girls Social Ecological Framework.

Characteristic
Environmental Setting Operant conditioning the Centers for Disease Control and Prevention charts [22]. Percent fat was assessed from skinfold thicknesses using previously-described procedures [23]. Physical activity self-efficacy was assessed using an eight-item questionnaire developed for adolescent girls [24] [25], with Cronbach alpha between 0.81 and 0.84 and test-retest correlation of 0.69 at 8 th grade [26]. Perceived physical activity barriers were assessed from an instrument adapted for TAAG with high internal consistency and reliability (Cronbach alpha = 0.88, 2-week test-retest reliability = 0.90) [27]. Outcome-expectancy value about physical activity was measured from nine items consisting of belief and corresponding value statements [28]. The belief statements had Cronbach alpha that ranged between 0.82 and 0.84 and the 8 th grade test-retest correlation was 0.68 [26], The Cronbach's alpha for the value statements were between 0.92 and 0.94 and had a test-retest correlation of 0.58 [26], Seven items from the Physical Activity Enjoyment Scale (PACES) was used to determine enjoyment [29], and 1 item with a 5-point likert-type scale was used to identify enjoyment of physical education class [30]. PACES has been shown to be positively associated with physical activity among adolescent girls [29]. Depressive symptoms, which is associated with lower physical activity [31], was assessed from the Center for Epidemiological Studies-Depression Scale [32], a screen for major depressive disorder in adolescent populations [33]. Self-management strategies score was measured from an eight-item scale representing cognitive and behavioral strategies associated with physical activity, an instrument shown to explain small, but statistically significant, improvements in college-aged women's physical activity [34].
Participation in sports, physical activity classes or lessons over the past year was assessed by asking if girls had participated in any of a list of 14 sports either at school or outside of school (e.g., softball, soccer, gymnastics) or any of a list of 17 classes or lessons (e.g., dance, martial arts, lacrosse). Items were summed. The number of years in which girls enrolled in physical education class was assessed for middle and high school.

Social factors
Social support from parents and peers was assessed from separate scales developed for the Amherst Health and Activity Study [35], with Cronbach alpha from 0.74 to 0.79 and test-retest reliability of 0.86 [26]. Support from teachers was determined from two items regarding girls' perception of how the teachers thought how important physical activity was for girls. A similar three-item scale was used to assess how boys perceived attitudes and a one-item scale for girls' perceived attitudes [36]. Hours spent home alone after school without adult supervision, associated with 6 th grade physical activity, was assessed from two questions [37].

Perceived environmental factors
Ten items were used to assess perceived neighborhood walkability and safety, with five response options ranging from disagree a lot to agree a lot [38]. Three items with four response options queried on the level of difficulty to get home from school every day from after school activities at school or to get to and get home from after school activities at another location. Test-retest reliability for the instrument, indicated by kappa coefficients, ranged from 0.38 to 0.41 [26]. Perceived access to a list of 14 physical activity facilities was queried with the following question: "Is it easy to get to and from this place from home or school" [39]? The intraclass correlation was 0.78 and Cronbach alpha ranged between 0.80 and 0.81 [26].

Middle school factors
A 43-item, 30-45 minute structured interview was administered to school principals, which included questions on physical education policies, physical activity promotions, policies that support or constrain physical activity, transportation policies for active commuting, structured and unstructured physical activity opportunities, and collaborations with community organizations that provide physical activity programs. School websites were used to obtain each middle school's race/ethnic, free and reduced price lunch, and mathematics and English state examination score profiles. MVPA in physical education class was assessed by the System for Observing Fitness Instruction Time [40].

Outcome-MVPA
The outcome variable was the average minutes of average minutes of MVPA per day in the adolescent girls at each time point, which was collected using Actigraph accelorometers (model 7164, Fort Walton Beach, FL). Girls wore the accelerometers for 7 consecutive days over their right hip; data were reduced and processed using previously described methods [41]. Occasional missing data were replaced via an imputation technique based on the Expectation Maximum algorithm; details on the imputation method have been published [42]. MVPA was defined as 30-second accelerometer counts exceeding 1,499 [41]. Only girls with observations at both 2006 and 2009 time points were included in the analysis.

Analyses
Missing data from the individual level predictors were imputed using the Sequential Regression Imputation Method [43].

Modeling strategies
In order to account for the clustering effects of the school and the correlation between individual girls' time points, a hierarchical longitudinal linear mixed effects (LME) model was used [44]. The fixed effects account for the mean responses of the model at different time points, which are shared across all participants. The random effects account for individual-specific (longitudinal) and cluster-specific (school-level) correlation. Ignoring the correlation within individuals and clusters observations can lead to incorrect conclusions, such as inaccurate standard error estimates in fixed effects and false hypothesis results [45].
There were a large number of variables collected at both time points that reflected the TAAG social ecological model (total of 66 variables of interest consisting of 47 fixed effects at the individual level, collected at both time points, and 19 school-level random effects predictors, collected only at 2006). While these variables represent the TAAG model and empirical studies indicate they are important predictors of adolescent physical activity, including all of these variables would lead a large model that is difficult to interpret and can lead to instability in estimating parameters. Additionally, as more variables are included in the model, the risk of correlation between variables, or multicollinearity, increases.
In order to reduce the size of the model for predicting MVPA, the doubly regularized selection and estimation technique for LME models developed by Wang et al. was applied to the fixed and random effects [46]. This method places two penalty functions on the LME model: A penalty for the fixed effects and a penalty for the random effects. This shrinks unimportant predictors to zero while selecting important predictors to be included in the model.
The process is summarized as: (1) the data are standardized such that each variable has mean 0 and norm 1; (2) a grid search of two tuning parameters associated with each penalty function is applied to the model (one for the fixed effects and one for the random effects), which control the strength of the shrinkage; (3) the model that gives a minimum Bayesian Information Criteria [47] is chosen as the optimal model that gives the best balance of fit and parsimony; and (4) the model is refit with only the selected fixed and random effects, with no constraints applied to the coefficients, using PROC GLIMMIX in SAS (SAS Institute, Carey, NC). Using this selection and estimation algorithm is more efficient than traditional methods--for example, best subset selection-where a model has to be fit for every possible combination of fixed and random effects and it is sometimes impossible to perform due to the huge number of possible combinations.

Results
Demographic and descriptive statistics for the initial variables that were evaluated across the six schools are displayed in Table 1. The average daily MVPA across the 561 girls in 8 th grade was 20.8 minutes and was 19.9 minutes in 11 th grade. The average body mass index increased in five out of the six schools although the mean percent of those overweight and obese declined. Also of note is the reduction in difficulty getting to and from after school activities, the increase in hours spent home alone each week (mean (SD) 7.3 (7.8) hours in 8 th grade to 10 (8.9) hours in 11 th grade), and the decline in the number of sports team participation and participation in physical activity classes or lessons. Mean scores for the psychosocial and perceived neighborhood variables did not substantially change over time.
Also provided in Table 1 are the middle school level variables. There was substantial diversity in school level demographics (percent students who were non-Hispanic white ranged from 23 to 67), participation in free or reduced-price lunch (range 17% to 60%), and percent of students receiving passing scores on the state mathematics examination (range 28% to 76%). Less than 5% of students walked or bicycled to any of the middle schools. All but one school allowed unstructured physical activity opportunities either before, during, or after school. The number of school physical activity programs ranged from 7 to 20.
The variable selection analyses resulted in three selected fixed effects variables and one random effect variable. The fixed effects variables were self-management strategies, perceived barriers, and social support from friends. Although it was not selected, time was included in the final model to determine the effect of time on average daily MVPA. For the random effects variables, the predictor for offering interscholastic or intramural physical activity programs was selected. Therefore, the final LME model used for analysis contained four fixed effects and one random effect. All other 61 variables are eliminated from the pool of candidate variables.
Based on the nonzero variance components of the random effects selection, there was heterogeneity in MVPA across the population from the effects of interscholastic or intramural physical activity programs (σ = 6.64) offered in middle schools attended. This result can be interpreted as attending a middle school that offered interscholastic or intramural physical activity programs introduced additional school-level randomness to a girl's MVPA.

Discussion
This study was conducted to determine the longitudinal, multi-level predictors of physical activity among adolescent girls. Starting with 66 variables at the individual, social, perceived environment, and school levels that have been documented to be associated with physical activity, the doubly regularized selection and estimation technique reduced those to four variables to be included in hierarchical, longitudinal modeling. Our results indicated that greater use of selfmanagement strategies, more social support from friends, fewer perceived barriers, and having interscholastic and intramural sports programs available in middle school predicted greater    Girls whose school had grounds changes in prior year, n, % We identified multi-level factors that were longitudinally associated with physical activity. Hearst and colleagues examined predictors of 24-month change among boys and girls initially aged 10-16 years [14]. Among girls, they found that age, baseline MVPA, pubertal status, and perceived barriers-all individual level variables-were associated with follow-up physical activity. While our results regarding perceived barriers agreed with their results, Hearst et al did not find social support to be a significant factor nor did they measure physical activity-related cognitive/behavioral strategies.
The variables selected have tremendous potential for interventions and are, in fact, often included in youth physical activity interventions [48]. Many interventions are based in social cognitive theory, which posits that health behaviors are influenced by the environment (e.g., access to safe exercise settings and social support), the individual (e.g., beliefs about the causes and outcomes of behavior), and the behavior itself (e.g., perceptions about physical activity) [17] as well as the ecological model that includes sociocultural and environmental variables [49]. The selected variables represent these constructs. Potentially effective interventions for older adolescent girls should include developing cognitive and behavioral strategies, such as identifying benefits, reducing barriers, setting realistic goals, fostering social support (especially from friends) and building physical activity skills, and providing opportunities and programs that are easily accessible for girls. It is interesting to examine the variables that were not selected as important predictors for longitudinal physical activity. Self-efficacy, or a person's belief that she has the capability to successfully engage in a specific health behavior, is a key concept in the social cognitive theory [17]. However, it was not selected from the statistical shrinkage technique analysis. Hearst et al. also did not find self-efficacy to be associated with 24-month MVPA change [14]. Lytle et al. found that self-efficacy declined in the intervention compared with the control TAAG schools from 6 th to 8 th grade, although increased self-efficacy was an intervention mediator for increased MVPA [26]. They posited that girls exposed to the intervention may have become more aware of the challenges associated with increasing physical activity, which resulted in lower self-efficacy overall, albeit that those who were able to increase their self-efficacy became more active. Our previous cross-sectional work examining multi-level variables and MVPA found that in 8 th grade barriers, time spent alone, school-level socioeconomic status and academic achievement, and perceived neighborhood safety and in 11 th grade barriers, physical activity enjoyment, self-efficacy, availability of intramural programs at school, sidewalk availability, and close distance from home to any school were associated with MVPA [19]. Selfefficacy may be an inconsistent predictor of longitudinal physical activity across adolescence.
The change in MVPA from 8 th to 11 th grade was not significant. We previously found a significant decline in MVPA from 6 th to 8 th grade in metabolic equivalent-weighted MVPA but not for daily MVPA among the entire TAAG cohort [50]. From 6 th to 8 th grade the annual percent change in mean daily MVPA was -2.1% compared with -1.4% from 8 th to 11 th grade in the TAAG 2 cohort. While it is well-documented that physical activity declines during adolescence, there is less information available regarding the period in which it may stabilize-particularly when physical activity is assessed with accelerometry. Our results suggest that perhaps the slope of the decline becomes less steep throughout middle and high school for overall MVPA. More studies are needed that track physical activity throughout adolescence into young adulthood using accelerometry methods.
The variable selection technique used in the linear mixed effects model has great advantages and potential for clustered, longitudinal studies. It is an automatic and objective procedure to select the most important variables from a large pool of candidate variables. Although our initial selection of variables was theoretically based, no primary knowledge about which variables to be included is needed and it can be used as a great exploratory tool. Hearst et al. [14] and McKay et al. [13] chose several sets of predictor variables and fit models with different variable sets, which requires prior knowledge to determine which predictor variables to select. The employed regularization variable selection approach (by adding penalties to the objective function) considers all the variables simultaneously, not different combinations of variables as in traditional selection methods. Different than univariate selection in which correlation coefficients or univariate models are calculated/fit for each variable on the response and the variables with the highest ranks (e.g., according to p-values) get selected [51], the selection technique we used consider the correlation among variables so multicollinearity is avoided. Specifically, the method used in this paper is designed for linear mixed effects models so clustering and longitudinal effects can be considered. The item parceling used by Heitzler et al. [11] has potential problems as it depends on the order of variables and may also suffer from the subjective grouping of variables. Factor analysis, on the other hand, belongs to the "variable extraction" category of dimension reduction, instead of variable selection. In this approach, the original variables are projected into lower dimensional spaces by finding the "best" linear combinations of the original variables and those linear combinations are used as the new variables in subsequent modeling. In contrast, variable selection chooses a possible best subset from the original variables, allowing for meaningful interpretation of the results to be maintained.
Strengths of this study include a large, diverse sample of girls that were representative of middle school girls living in the Baltimore/Washington DC region in 2008, with excellent follow-up in 2011. Daily MVPA was measured by accelerometry and, for the most part, the selfreported measures had established validity and reliability. We were able to include a comprehensive set of variables that were multi-level and have been associated with MVPA. The variable selection technique that we used to build the hierarchical longitudinal model resulted in information useful for future intervention planning.
There are also study limitations. Only one geographical region was included in the study so results may not be generalizable to other areas of the U.S. We studied only girls, but since predictors of physical activity are known to differ between boys and girls [7], we would not expect the results to be generalizable to boys. There was only one follow-up point over a 3-year period and results may not be stable over a different follow-up.
The low amount of physical activity among adolescent girls is a significant public health problem and a critical factor for obesity prevention. We used a broad array of variables across multiple levels and applied an innovative statistical technique to identify the important predictors of longitudinal physical activity among middle to high school-aged girls. Greater use of self-management strategies, more social support from friends, fewer barriers, and having interscholastic and intramural sports programs available in middle school predicted greater MVPA. This is important information needed to create effective public health interventions and policies to halt physical activity decline and reverse the current child/adolescent obesity epidemic.