How Does Physical Activity Intervention Improve Self-Esteem and Self-Concept in Children and Adolescents? Evidence from a Meta-Analysis

Objective To perform a systematic review and meta-analysis for the effects of physical activity intervention on self-esteem and self-concept in children and adolescents, and to identify moderator variables by meta-regression. Design A meta-analysis and meta-regression. Method Relevant studies were identified through a comprehensive search of electronic databases. Study inclusion criteria were: (1) intervention should be supervised physical activity, (2) reported sufficient data to estimate pooled effect sizes of physical activity intervention on self-esteem or self-concept, (3) participants’ ages ranged from 3 to 20 years, and (4) a control or comparison group was included. For each study, study design, intervention design and participant characteristics were extracted. R software (version 3.1.3) and Stata (version 12.0) were used to synthesize effect sizes and perform moderation analyses for determining moderators. Results Twenty-five randomized controlled trial (RCT) studies and 13 non-randomized controlled trial (non-RCT) studies including a total of 2991 cases were identified. Significant positive effects were found in RCTs for intervention of physical activity alone on general self outcomes (Hedges’ g = 0.29, 95% confidence interval [CI]: 0.14 to 0.45; p = 0.001), self-concept (Hedges’ g = 0.49, 95%CI: 0.10 to 0.88, p = 0.014) and self-worth (Hedges’ g = 0.31, 95%CI: 0.13 to 0.49, p = 0.005). There was no significant effect of intervention of physical activity alone on any outcomes in non-RCTs, as well as in studies with intervention of physical activity combined with other strategies. Meta-regression analysis revealed that higher treatment effects were associated with setting of intervention in RCTs (β = 0.31, 95%CI: 0.07 to 0.55, p = 0.013). Conclusion Intervention of physical activity alone is associated with increased self-concept and self-worth in children and adolescents. And there is a stronger association with school-based and gymnasium-based intervention compared with other settings.

SE is defined as feelings of one's personal self-worth (SW) [14], reflecting person's evaluation of his or her own worth.And SC is a person's perceptions of himself or herself, namely, what a person thinks about himself [15,16].They both have pervasive impact on human mental status and behavior [17,18].Positive SC is viewed as a desirable outcome in many educational and psychological situations, and SC is regarded as a mediating variable for promoting the achievement of certain outcomes, such as academic achievement [19].Furthermore, physical SC is suggested to be a mediator of the association between PA and SE, which is inversely related to depression [4].SE has been recognized as a component of a variety of psychopathologies.A search of the DSM-IV-TR [20] shows that the term "self-esteem" appears in 24 different diagnostic contexts as a criterion for disorders.For teenagers, it is suggested that low SE predicts adolescents' report of mental status and health compromising behaviors, such as depression, anxiety, problem in eating and suicidal ideation [21][22][23].Low level of SE in children and adolescents also predicts poor health, criminal behavior, and limited economic prospects during adulthood [24,25].Thus it is important to determine effective interventions for improving SE and SC for juveniles.
Despite that extensive research has evaluated the effects of PA on SE and SC in juveniles, contradictory findings have been suggested.Although many studies found that there were significant positive effects of PA on SE and SC [13,26,27], others did not detect such effects [28][29][30], let alone several others suggested negative effects [31][32][33].Therefore, it is critical to comprehensively synthesize available evidence to determine the exact effects of PA on SE and SC in children and adolescents.Besides, whether the effects of PA intervention on SE and SC are context-dependent by moderators should be clarified to reveal in which conditions the effects exist.Meta-analysis of all available evidence is an appropriate design to clarify these questions.
Despite the overall synthesization, there are several limitations for these studies.The majority of included studies in Ekland's meta-analysis suffered from both high risk of bias and small sample size.On the other hand, as Dishman suggested, it is warranted to pay attention to important moderator variables to better clarify the research questions [36].Although Ekland and colleagues examined a potential moderator (study quality) of the association between PA and SE, the evaluation is compromised by the limited number of studies involved in the subgroups [34].Additionally, neither of the two meta-analyses specially explored any other potential moderators of PA intervention on SE or SC, such as participant type, intervention setting, and so on.
Since the meta-analysis conducted by Ahn et al, eleven trials examining the effects of PA on SE or SC have been published.The availability of these studies makes it possible to perform more comprehensive meta-regression analyses to identify additional moderators.Therefore, it is necessary to conduct an updated meta-analysis to provide a more accurate estimation for these research questions.
The purpose of the present study was thus to perform a meta-analysis of available literature to evaluate the efficacy of PA intervention on SE and SC in juveniles, and conduct a metaregression analysis to identify effect moderators.We aimed to figure out whether PA intervention might exert positive effects, and in which participants and settings the positive effect persist.Based on dose-response models [36,37] and relevant meta-analysis [3], we examined several potential moderators by meta-regression analysis, including: target population, PA setting and PA characteristics (intensity per session, frequency and length of intervention), and study quality.

Selection of study
We followed the PRISMA guidelines [38] to report this systematic review and meta-analysis (S1 File).The electronic databases of PubMed, EBSCO, and Web of Science (up to July 2014) were searched for RCTs or non-RCTs in children and adolescents without restriction of population, publication type, and language.The following MeSH terms and their combinations of the Title/Abstract/Subject were used in the search: physical activity / exercise / sport Ã ; self esteem Ã / self worth Ã /self concept Ã /self perception Ã ; children / adolescent Ã / boy Ã / girl Ã / teen Ã ; random Ã / intervention Ã / trial Ã .The asterisk means that larger words that contained the word or word fragment were included in the search.Furthermore, the reference lists of eligible articles were scrutinized by hand to identify additional studies.
The studies were included when the following inclusion criteria were met: (1) intervention should be supervised PA or PA combined with other strategies; (2) reported sufficient data to estimate pooled effect sizes of PA intervention on SE or SC; (3) sample participants' age ranged from 3 to 20 years; (4) included a non-PA control or comparison group.When multiple reports representing the same study were found, the most relevant or complete report was included.Reports stratified by gender were treated as separate reports.Owing to the equating of the concept and operation of SE, SC, and SW in various studies [13,39,40], we took all three outcome measurements into account and evaluated their benefits from PA intervention.Only the most relevant self outcome type was included in analysis.

Statistical methods
Relevant data from the included studies were extracted independently by two authors using EpiData 3.1 and Excel software.The following information was extracted from each study: first author, year, study design, participant characteristics, outcome measure, PA intervention design and effect sizes.Any disagreements were discussed until consensus was reached.When the eligible studies did not present sufficient data, corresponding or first authors were contacted.
Study quality was assessed using the modified Cochrane risk of bias tool [41] for RCTs and the modified Methodological Index for Non-Randomized Studies (MINORS) [42] for non-RCTs.The former consists of seven items: randomization sequence generation, inclusion and exclusion criteria, balance between groups at baseline, allocation concealment, blinding of participants, dropout and withdrawals, and follow up.A score of 1 was given for each of the points described above.Higher scores indicate higher study quality.The quality scale ranges from 0 to 7 points.Studies achieved ! 5 points were considered to be with high quality.The MINORS includes twelve items: stated aim of the study, inclusion of consecutive participants, prospective collection of data, endpoint appropriate to the study aim, unbiased evaluation of endpoints, follow-up period appropriate to the major endpoint, loss to follow up not exceeding 5%, a control group having the gold standard intervention, contemporary groups, baseline equivalence of the sample size, and statistical analyses adapted to the study design.Scoring formula was similar to RCTs'.And studies achieved !6 points were considered to be with high quality.
Subtypes were coded separately by two authors.There were three primary classifications: study design, PA intervention, and participant characteristics.Study design was coded according to research design (RCT or non-RCT), outcome measure (SE, SC, SW), and study quality (score ! 5 or score < 5 in RCTs; score !6 or score < 6 in Non-RCTs).PA intervention was coded by PA intervention type (PA alone or PA combined with other skills), intervention setting (school-based, gymnasium-based, family-based, clinic-based, detention facility-based, or camp-based), intensity of PA intervention (minutes per session), frequency (times per week) (1 or 2 or 3 or 4 or 5 or 6), and length of intervention (weeks).Participant characteristics consisted of gender (female or male), and sample type (normal, overweight, cerebral palsy, youth offender, sedentary, disability, or asthma).
We used R software (version 3.1.3)and Stata (version 12.0) to conduct all statistical analyses.Hedges adjusted g was used as a measure of effect size because of its unbiased properties for small sample sizes compared with Cohen's d [43].The effect size of each included study was calculated by computing the mean difference in gains (posttest-pretest) between the intervention group and control group and dividing by the pooled standard deviations of pre-test scores [44], since pretest standard deviations will not be influenced by different treatments and thus tend to be consistent across studies [45].For pooling estimates, we combined the data on PA intervention outcomes from the included studies using the random effects model with weighing each effect size estimated by the DerSimonian & Laird.When the heterogeneity was zero, the fixed effects model was applied.We converted DerSimonian & Laird results to Hartung and Knapp results when the pooling analysis included more than five studies [46], since in such case the distribution of the intervention effects is unknown and it does not necessarily follow the normal or t-distribution when study numbers are small [46].All analysis results were reported with 95% confidence intervals (CIs).And statistical significance was defined as p < .05.
To determine pooled effect sizes for different subgroups according to subtype of outcome measurement, PA intervention, and study design, we conducted separate meta-analysis for subtype of PA intervention (PA alone or PA combined with other strategies), in combination with subtype of study design (RCTs or nor-RCTs) and outcome measure (SE, SC, or SW).
Heterogeneity of effect size was assessed using the Q value and I-square statistic.To explore sources of heterogeneity, we performed planned meta-regression analysis.Based on previous analysis [35], potential between-study moderators were examined, including study quality, intervention format (intensity, frequency, and length), setting of intervention, and participant type.Meta-regression analyses according to subtypes of study design were also conducted.
Additionally, we conducted sensitivity analyses.Outlier was identified if the relative residual standardized mean difference effect size fell in the region of z< -1.96 or z>1.96 by omitting the study.Publication bias was assessed by a funnel plot accompanying with Begg's test [47]and Egger's test [48].Adjusting for publication bias was assessed by the trim and fill method if there was significant publication bias [49].
Among the included studies, 25 were RCTs and 13 were non-RCTs (Tables 1 and 2).Three assessment variables were used to measure the outcome variable, including SE in 19 studies, SC in 7 studies (1 study used self-image to measure SC), and SW in 12 studies.It is worth noting that, although several studies used self-perception scales, only one subscale (SE, SC or SW) was considered as the outcome in each study.About 16 questionnaires or scales were applied to measure the 3 variables.The most commonly used were Rosenberg Self-Esteem Scale (RSE) (N = 7), Self-Perception Profile for Children (SPPC) (N = 6), Self-Esteem Inventory (SEI) (N = 5), and Piers-Harris Children's Self-Concept Scale (PHCSCS) (N = 3).
Twenty-four studies used intervention of PA alone and other 14 studies used intervention of PA combined with other strategies.Intervention setting varied across studies.Generally, the risks of bias for included RCTs were from moderate to high.Twelve RCTs clearly stated that they used randomization sequence generated via computer.Ten studies reported allocation concealment, 7 studies stated blinding method, and 7 studies indicated the follow-up information.On the other hand, the majority of studies reported inclusion and exclusion criteria of participants (N = 22) and explained the reasons of dropout (N = 21).The detailed description of the characteristics of included RCTs is demonstrated in Table 1.For non-RCTs, the risks of bias for included studies were general low.Scores ranged from 4 to 8  points.The detailed description of the characteristics of included non-RCTs is shown in Table 2.

Meta-analysis of studies with intervention of PA alone
Pooling data from 18 RCTs showed small but significant positive effect for intervention of PA alone (Hedges' g = 0.29; 95% CI: 0.14 to 0.45; P < 0.001) (Fig 2).There was a low heterogeneity across studies (Q total = 9.26; p = 0.93; I 2 = 1.5%), suggesting that the beneficial effect of intervention of PA alone was relatively consistent across RCTs.Stratified analyses by outcome revealed that significant pooled effect sizes were found for intervention of PA alone on SC (Hedges' g = 0.49, 95% CI: 0.10 to 0.88; p = 0.014) and SW (Hedges' g = 0.31; 95% CI: 0.13 to 0.49), with no heterogeneity between the subgroup studies.No significant pooled effect size was found on SE.

Meta-analysis of studies with intervention of PA combined with other strategies
There was no significant pooled effect size for intervention of PA combined with other strategies on general self outcome.High heterogeneity was found in 8 RCTs (Q total = 7; p = 0.05; I 2 = 60.2%), but not in 7 non-RCTs (Figs 4 and 5).With regards to subtype of outcome, no significant pooled effect size was found in any subgroup, regardless of type of study design.Only for SE outcome, there was high heterogeneity across RCTs (Q total = 14.65; p = 0.002; I 2 = 79.5%).

Meta-regression analysis
For RCTs, the associations between general self outcome and PA intervention were not substantially altered by intensity of intervention (minutes/session), frequency of intervention (times/week), length of intervention (weeks), participant type (normal vs non-normal), and study quality (low vs high).However, there was a significant association between intervention effect sizes and settings (school, gymnasium, clinic, detention facility, family, or others).Specifically, there was a stronger association with school/gymnasium-based settings than with other settings (regression coefficient β = 0.31, 95% CI: 0.07 to 0.55; p = 0.01) (Table 3).In multivariate meta-regression analysis, after adjusting for potential confounders, the association of intervention effect sizes with settings persisted (β = 0.31, 95% CI: 0.03 to 0.64; p = 0.07).

Sensitivity analysis and publication bias analysis
For non-RCTs with intervention of PA alone, sensitivity analysis revealed that heterogeneity between studies was mainly caused by one study conducted by Percy et al [51].After we omit this study from the analysis, there was no significant heterogeneity for general self (I 2 from 69.1% to 0%), or SE (I 2 from 74.7% to 0%) (S1 Fig) .Exclusion of this study from the analysis did not substantially alter the overall effect size.For RCTs with intervention of PA combined  4).

Discussion
Results of this meta-analysis suggest that intervention of PA alone is an effective method to improve SW and SC in juveniles, although the effect sizes were small in magnitude.The lack of publication bias and very low heterogeneity in RCTs evaluating intervention of PA alone and non-RCTs evaluating intervention of PA combined with other strategies suggest that our results were relatively robust.However, caution is required in interpretation of effects of intervention of PA alone in non-RCTs and intervention of PA combined with other strategies in RCTs, since there were significant heterogeneities across studies.
We further identified that, one non-RCT [51] with intervention of PA alone with small sample size, as well as one RCT with intervention of PA combined with other strategies [26] contributed to the majority of the observed heterogeneity.However, sensitivity analyses showed that omitting these studies did not substantially alter the pooled effect sizes.
Meta-regression analysis revealed that the associations between PA intervention and self outcome could be altered by setting of intervention in RCTs.There were stronger effects in studies with school-based or gymnasium-based interventions compared with studies with family-based, clinic-based, and detention facility-based interventions.One possible explanation is that schools and gymnasiums are places where services are both mandated and free for juveniles [35,52].There is also a possibility that studies focused on other settings are few, making the power to be insufficient [53,54].More studies with PA interventions focusing on these settings are warranted to clarify this issue.
Our results confirm the findings of the meta-analysis conducted by Ekland and colleagues [34].Our findings were also consistent with those of the meta-analysis conducted by Ahn et al. in RCTs, but varied widely regarding to non-RCT study analysis [35].This might be due to that the non-RCT studies in their analysis included those with within-subject design (i.e., pretest-posttest single group design) and between-subject design (i.e., posttest-only-control group design), while only independent-groups design (pretest-posttest-control group design) was included in the current meta-analysis.In contrast to a previous analysis by Ekland et al, we did not find any beneficial effect of intervention of PA combined with other strategies on self outcomes, regardless of study design.Considering the baseline difference between the treatment group and control group, our analysis pooled effect sizes by computing effect sizes in each treatment condition and then subtracting the effect size of the control group from the No significant association was found between effect sizes and intensity of intervention, frequency of intervention, length of intervention, participant type, and study quality in RCTs, as well as all assessed factors in non-RCTs.doi:10.1371/journal.pone.0134804.t003intervention group.However, previous studies only considered post-intervention mean difference when pooling effect sizes, which may induce statistical errors when there was significant difference with regard to the outcome at baseline for relevant studies [55,56].
For RCTs with intervention of PA alone, our results were consistent with the previous studies [34,35].However, the effect sizes on different outcome measurements were not equal.For example, Ekeland et al. found a moderate effect size of intervention of PA alone on SE, comparing with minor effect sizes on general self outcome and SW, a moderate effect size on SC, and no significant effect size on SE in our meta-analysis.This may be due to the different outcome definitions used between the two analyses.The definitions of our outcomes were based on the final measurements of outcomes for each study rather than the planned outcomes, since there are a wide range of different definitions for SE and SC.Actually, some studies used SC or SW questionnaires to measure SE.For example, one study [13] conducted before 1990 was considered to have assessed SE outcome in Ekeland et al's analysis, however, we coded the outcome as SC since it used a "Thomas Self-Concept Values Test".Compared with the analyses conducted by Ekeland et al. and Ahn et al., our study performed additional analyses for evaluating the effect sizes of PA intervention on different outcomes according to outcome measurements.Our results showed that these concepts may not be completely equal for representing individual self.Therefore, the use of terminology in the research questions of interest should be cautious, and further research is warranted.Moreover, small sample sizes of included studies in the subgroup of SE outcome may influence the effect size of intervention of PA alone.Thus, studies with larger sample sizes are needed to further clarify this issue.
It is worth to note that we found no beneficial effect of PA intervention on self outcomes in non-RCTs, regardless of study design and outcome measurements.This is different from findings of the meta-analysis conducted by Ahn et al., which showed a relatively large effect size (Hedgs' g = 0.78) of PA intervention on SE.This may be due in part to a wider range of non-RCTs study designs included in Ahn et al's meta-analysis, such as within-subject design (i.e., pretest-posttest single group design) and between-subject design (i.e., posttest-only-control group design) [35].Moreover, different methods of synthesis of effect sizes may contribute to the discrepancy.However, we need to note that there was no significant beneficial effect of PA on SC in Ahn et al.'s analysis for non-RCTs.We supposed that non-RCT design might lead to more bias on effect sizes.For example, the potentially different conditions between the treatment group and the control group may induce the difference.Also the effect of potential baseline difference cannot be excluded.These may partially lead to the high heterogeneity in the analysis of non-RCTs.

Strengths and limitations of the review
The present meta-analysis was based on prospective RCTs and non-RCTs involving multiple types of juvenile.The involved sample size was relatively large.Analyses for subtype of PA intervention design (PA alone, or PA combined with other strategies), in combination with subtype of study design (RCTs, or non-RCTs) and outcome measure (SE, SC, or SW) were conducted.Moreover, we performed several sensitivity analyses and meta-regression to explore the potential sources of heterogeneity and moderators.However, there were several limitations.First, a wide range of different contents in PA intervention made it difficult to identify intervention type.Studies with intervention of PA alone may be classified as with the intervention of PA combined with other strategies.There is a similar difficulty for identifying study design, in that some studies which were claimed as RCTs were actually non-RCTs.In the current study we classified them as non-RCTs if we could not find sufficient information indicating that RCT design was used.This may lead to incorrect categories.Furthermore, there were differences in measurement of self outcome across studies.These factors may affect the pooled effect sizes.The limited number of studies in some subtype groups, such as different sample type, setting of PA intervention, and outcome measurement, precluded further subgroup analyses or meta regression analysis.

Conclusion
In conclusion, this meta-analysis provides further evidence that intervention of PA alone plays a role in improving SW and SC in children and adolescents.The results support current recommendations for increasing PA to promote physical and mental health.Results of this review also reveal that setting of PA intervention is potentially important to affect the effect of PA intervention on self outcome, and there is stronger association with school-based and gymnasium-based intervention compared with other settings.However, the inherent limitations of imbalanced sample type, outcome measurement, and setting of PA intervention within included studies prevent us from attaining definitive conclusions.Future well-designed RCTs and multiple levels of PA intervention targeting on various sample types are warranted to validate the findings of the current analysis.

Practical Implications
For children and adolescents, intervention of PA alone is beneficial to SW and SC.Moreover, school-based and gymnasium-based PA interventions may exert stronger effects on developing SE and SC.
Twenty-four studies were school-based, 2 interventions were family-based, 5 interventions were gymnasium-based, 3 interventions were clinic-based, 3 interventions were performed in detention facilities, and 1 study was camp-based.The length of intervention ranged from 4 to about 80 weeks (2 academic years).Among them, 7 interventions lasted less than 8 weeks, 24 interventions lasted between 8 to 20 weeks, and 7 trainings lasted over 20 weeks.Exercise intensity for each session ranged from 20 to about 120 minutes.Only 2 studies were with less than 30 minutes per session; 11 studies took 30-45 minutes per session; 21 interventions took more than 45 minutes per session; and the remaining 4 studies did not indicate the exact time.Most PA interventions were administered 2 times (N = 9), 3 times (N = 16), or 5 times (N = 6) per week.The remaining studies held sessions for either 1 time (N = 5), 4 times (N = 1), or 6 times (N = 1) per week.Sample sizes of the 38 studies ranged from 17 to 464 (the median sample size was 79).Among the 38 studies, 26 studies involved both male and female participants, 8 with only males, and 4 with only females.Participants' ages ranged from 4 to 20 years.In most studies participants were either normal population (N = 12) or overweight or obese (N = 12); 4 studies targeted cerebral palsy children; 4 studies focused on youth offenders; 3 studies targeted sedentary children or adolescents; 2 studies were focused on individuals with learning or cognitive disability; and 1 study targeted children with asthma.

Table 2 .
Characteristics of included Non-Randomized control trial studies.