Health-Related Quality of Life, Self-Efficacy and Enjoyment Keep the Socially Vulnerable Physically Active in Community-Based Physical Activity Programs: A Sequential Cohort Study

Physical inactivity is most commonly found in socially vulnerable groups. Dutch policies target these groups through community-based health-enhancing physical activity (CBHEPA) programs. As robust evidence on the effectiveness of this approach is limited, this study investigated whether CBHEPA programs contribute to an increase in and the maintenance of physical activity in socially vulnerable groups. In four successive cohorts, starting at a six-month interval, 268 participants from 19 groups were monitored for twelve months in seven CBHEPA programs. Data collection was based on repeated questionnaires. Socio-economic indicators, program participation and coping ability were measured at baseline. Physical activity, health-related quality of life and on-going program participation were measured three times. Self-efficacy and enjoyment were measured at baseline and at twelve months. Statistical analyses were based on a quasi-RCT design (independent t-tests), a comparison of participants and dropouts (Mann-Whitney test), and multilevel modelling to assess change in individual physical activity, including group level characteristics. Participants of CBHEPA programs are socially vulnerable in terms of low education (48.6%), low income (52.4%), non-Dutch origin (64.6%) and health-related quality of life outcomes. Physical activity levels were not below the Dutch average. No increase in physical activity levels over time was observed. The multilevel models showed significant positive associations between health-related quality of life, self-efficacy and enjoyment, and leisure-time physical activity over time. Short CBHEPA programs (10–13 weeks) with multiple trainers and gender-homogeneous groups were associated with lower physical activity levels over time. At twelve months, dropouts' leisure-time physical activity levels were significantly lower compared to continuing participants, as were health-related quality of life, self-efficacy and enjoyment outcomes. BMI and care consumption scored significantly higher among dropouts. In conclusion, Dutch CBHEPA programs reach socially vulnerable, but not necessarily inactive, groups in terms of socio-economic and health-related quality of life outcomes. Our findings suggest that CBHEPA programs particularly contribute to physical activity maintenance in socially vulnerable groups, rather than to an increase in physical activity behaviour over time.

enjoyment outcomes. BMI and care consumption scored significantly higher among dropouts. In conclusion, Dutch CBHEPA programs reach socially vulnerable, but not necessarily inactive, groups in terms of socio-economic and health-related quality of life outcomes. Our findings suggest that CBHEPA programs particularly contribute to physical activity maintenance in socially vulnerable groups, rather than to an increase in physical activity behaviour over time.

Background
Physical inactivity has been identified by the WHO as the fourth leading risk factor for global mortality [1,2]. Health disorders associated with inactivity, including impaired health-related quality of life, as well as direct and indirect economic costs, impose a substantial burden on societies and health systems [3]. In the Netherlands, socially vulnerable groups, e.g., those with low socio-economic status (SES) or of non-Dutch origin, are less engaged in sport and physical activity (PA) than high SES groups [4,5]. Over the past decade, Dutch policy has been to promote community-based health-enhancing physical activity (CBHEPA) programs in order to improve physical activity behaviour and health-related quality of life, in particular targeting socially vulnerable groups [6,7].
The relationship between PA behaviour and health-related quality of life is, however, a rather complex one. Demographic factors, as well as biological, psychosocial, behavioural, social and cultural factors, influence this relationship [2,8,9]. CBHEPA programs aim to change individual PA behaviour and to enhance PA maintenance and program adherence, using concepts such as attitude, subjective norms, self-efficacy [10,11], social support [12,13] and PA enjoyment [14,15]. The need to address interpersonal aspects alongside individual approaches is widely recognised in PA promotion [16,17]. Consequently, the theoretical grounds of CBHEPA programs are based on an ecological perspective on human health [18,19]. The ecological perspective emphasises the need to take into consideration interaction between factors within and across different levels, such as individual, group and community level [20,21].

Evaluating the effectiveness of CBHEPA programs
The ecological perspective used in CBHEPA programs, as well as differences described in the literature between PA initiation and PA maintenance [22], pose several challenges to evaluating the effectiveness of CBHEPA programs. Firstly, most research on the explanatory variables and correlates of PA behaviour has focused on individual level factors [2]. The multiple levels addressed by CBHEPA programs require a multilevel approach to hypothesis testing, taking into account the interdependencies within and between individuals, groups and communities [18,19,21,[23][24][25]. Secondly, Dutch CBHEPA programs often target specific societal groups within a community, such as the socially vulnerable. Identifying indicators and instruments suitable to measure PA behaviour and health-related quality of life in these groups is a challenge [26]. Thirdly, alongside measurement issues, recent literature indicates that factors predicting initial change in PA behaviour differ from those predicting PA maintenance [22,[27][28][29][30]. So far, no uniform standards are in use to define PA maintenance [31]. A commonly used definition is being physically active once a week for a period of at least six months [32]. Some studies indicate that factors relevant for PA behaviour initiation are best defined in terms of pre-motivational and motivation factors, such as awareness, knowledge and (health) risk perception, attitude, self-efficacy and social influence [22]. In PA maintenance, post-motivational factors, i.e. psychological constructs bridging the gap between intention and behaviour, such as self-regulatory processes, the ability to cope with stressors in daily life [33,34] and socalled maintenance self-efficacy, are factors of importance [22,27,35,36]. In addition, PA enjoyment is found to be a moderator of self-efficacy in PA behaviour [17]. Studies indicate that not only self-control and discipline, but also enjoyment, pleasure and 'not worrying', are key values in maintaining an active and healthy lifestyle [14,15,37]. Fourthly, evaluating CBHEPA programs requires group effects to be taken into consideration. Several studies illustrate the importance of group support and group dynamics for the effectiveness of (CBHE)PA programs. Group dynamics in CBHEPA programs are, however, often implicit and not accounted for. CBHEPA programs are usually group-based for organisational reasons (costcovering), rather than for behavioural change reasons [38]. Nevertheless, some studies indicate that group dynamics strategies, explicitly applied in group-based PA interventions, are more effective in establishing change in PA behaviour than individually targeted interventions with social support, which, in turn, are more effective than individual interventions without additional social support [39,40].
Although many strategies have been developed to increase PA levels [41,42], affect sizes are usually small to moderate [2]. Most evidence is built on correlational, cross-sectional studies at participant level, lacking insight into causal relationships between factors influencing PA [2,41,43]. Longitudinal designs including time varying determinants of PA behaviour and maintenance are rare [18]. In view of the aims of Dutch group-based CBHEPA programs, our study focuses on evaluating participants' PA behaviour and maintenance in relation to multilevel explanatory factors and time varying covariates. With a sequential cohort study, we aim to contribute to the evidence-base of CBHEPA programs and their potential to increase and sustain PA levels and health-related quality of life in inactive, socially vulnerable people. The advantage of a sequential cohort design, monitoring CBHEPA program participants for a specified period of time, is that simultaneously multiple (intermediate) outcomes can be studied over a period of time and can increase the power of the statistical procedures used to determine whether a change has taken place. It allows us to control for possible history and maturity effects [44]. Consequently, to measure effects, a sequential cohort design is a promising alternative to a randomised controlled trial (RCT) design, which is considered less appropriate to assess the effectiveness of CBHEPA programs [45,46]. In this paper, we address the question: Do CBHEPA programs contribute to an increase and maintenance of physical activity in socially vulnerable groups over time?

Methods
To assess the outcomes of CBHEPA programs at participant level, we examined on-going Dutch CBHEPA programs, summarised under the denominator 'Communities on the Move' (CoM). CoM was developed and disseminated by the Netherlands Institute for Sports and PA (NISB) from 2003 to 2012. CoM targets inactive, socially vulnerable groups with the aim of enhancing PA levels, hence contributing to participants' health-related quality of life. Since 2012, CoM has been subject to a comprehensive evaluation study, including assessment of its effectiveness at participant level [21]. or more groups per CBHEPA program. Recruitment of participants within groups was based on a non-randomised, purposive sampling approach. Participation was on a voluntary basis.
A total of 268 participants was included at baseline, mostly women (86.7%). Personal and socio-economic indicators showed that mainly middle-aged participants (mean age 58.6 years;

Data collection
Our study was based on a sequential cohort design. Participants were recruited and monitored in four sequential cohorts. Data collection for cohort 1 started in autumn 2012, and for cohort 4 in spring 2014. In order to reach the generally hard-to-reach socially vulnerable groups [47], we applied a personalised approach, reaching out to gatekeepers, such as the exercise trainer, and making ourselves known to CBHEPA participants. Data were collected by a researcher (first author) and a group of trained assistants at three points in time: T 0 , T 1 at six months and T 2 at twelve months (Fig 1). Questionnaires were developed based on validated survey instruments available for the Dutch population. Thus, we tried to select instruments most appropriate for the socially vulnerable target group. Socio-economic indicators, program participation and sense of coherence to assess coping ability were measured at baseline. Data on socio-economic indicators (age, income, education, employment status, living conditions) were collected in accordance with standardised questions of the Local and National Monitor Public Health in the Netherlands [48,49]. Data on individual motivations to participate in the CBHEPA program were collected using an open-ended question. Data on past and present sport and PA behaviour were collected, assessing program participation time prior to baseline measurement and (former) sports club membership. People's ability to cope with stressors in daily life was measured using the SoC three-item, three-point scale for sense of coherence [50][51][52][53]. Questions were: Do you usually see solutions to problems and difficulties that other people find hopeless (manageability)? Do you usually feel that your daily life is a source of personal satisfaction (meaningfulness)? And: Do you usually feel that the things that happen to you in your daily life are hard to understand (comprehensibility)?
PA behaviour, health-related quality of life and on-going program participation were measured three times. PA and sport behaviour were measured using the validated Short Questionnaire for Sport and Physical Activity (SQUASH), measuring self-reported work-related, domestic, leisure-time and sport-related physical activities in minutes per week [54,55]. The SQUASH generates data that can be compared with national and regional data, as Dutch trend analyses for PA behaviour over the past two decades are based on the SQUASH, offering a vast body of reference data for our study [5].
Health-related quality of life data were repeatedly measured at all three time points using two indicators: the five-dimension, three-level descriptive Euro Quality of Life questionnaire (EQ-5D-3L), assessing self-reported levels of complaints on 'mobility', 'self-care', daily activity', 'pain' and 'anxiety' [56,57]. Based on the outcomes of the EQ-5D-3L, the EQ-Index (ranging from -1 to 1) was computed, defining a 'health state' using the Dutch time-trade-off value set [58,59]. Perceived health was measured using a visual analogue scale (EQ-VAS), ranging from 0 to 100 [56]. EQ-VAS measures how participants perceive their health at a particular point in time [59].
PA self-efficacy and PA enjoyment were measured at baseline and at the last measurement (T 2 ). PA self-efficacy was measured using a five-item, five-point scale [60]. Statements were: I am confident that I am able to continue to participate in the PA program during the coming months, and I am confident that I am able to continue to participate in the PA program when I am tired. PA enjoyment was measured using a nine-item, five-point scale, translated and adapted from the Physical Activity Enjoyment Scale [61]. Statements were: When I do exercise or sports, I enjoy it, and When I do exercise or sports, is it fun to do, or When I do exercise or sports, I feel bored.
In the supporting information (S1 Table) an overview is presented of variables measured over time in relation to PA behaviour.
At each measurement, questionnaires were individually completed by participants during or after a group training session at the sports venue. Informed consent was arranged orally on the spot and confirmed in writing for each respondent. The researcher explained the purpose of the study at each session. Both the researcher and trained assistants helped respondents who had difficulty filling out the questionnaire by giving instructions or by adopting an interview style. The number of assistants varied with group composition: from one for groups with only Dutch native speakers to a maximum of five in groups with migrant respondents. Dutch was the working language, since ethnic diversity within groups was large (>10 countries of origin). Interpretation, if needed, was provided by an assistant or a Dutch speaking fellow group member from a similar background. Completion of the baseline questionnaire took on average 35-40 minutes, and of the follow-up questionnaires on average 20-25 minutes. After filling out the questionnaire, respondents were treated to fruit snacks and drinks.
Follow-up rate for all four cohorts at T 1 was 60% (n = 161). In response to these follow-up rates, additional data collection strategies were initiated during the third year (2014). Participants and ex-participants were contacted in places where they habitually assembled, usually a community centre. Follow-up questionnaires were sent to home addresses, accompanied if possible by a telephonic reminder after two weeks. Overall follow-up rate at T 2 was 55% (n = 146), showing a 91% recovery rate of T 1 participants.
Reasons for program dropout were either personal (health issues or life events) or program related (program activities ceased to exist). Reasons for not being willing to participate in follow-up measurements, given in 5% of cases, were: reluctance to fill out questionnaires in general, not being able to fill out the questionnaire by themselves, doubt about the relevance of the questions, and sometimes people told the researchers that there was no need, since 'nothing changes anyway'.
Information about the organisation of the CHEPA program and group composition was collected during each session by the researcher and assistants, reported in observational notes. Thus, information was gathered about the measurements, e.g., difficulties in understanding questions or concepts, as well as additional information on group developments and participants.

Data analysis
In order to investigate the effectiveness of CBHEPA programs comprehensively, addressing the question whether CBHEPA programs contribute to an increase in and maintenance of physical activity in socially vulnerable groups, we tested three hypotheses using a combination of statistical procedures (SPSS22). Alongside significance, effect sizes (Cohen's d and Pearson's r) were reported for the main outcomes of interest.
First, based on a rather traditional approach, we compared groups who participated for a year with groups which had just started. The hypothesis was: Participation in a CBHEPA program for one year leads to higher PA levels and health-related quality of life outcomes in its participants compared to starters (H1). A quasi-randomised control trial (RCT) design was used to measure change in PA behaviour and health-related quality of life outcomes between groups. The T 0 comparability of the different cohorts was first tested. Then baseline group means of cohort 4 (nine groups; n = 91), treated as 'control group by proxy', were compared with T 2 group means after twelve months for cohorts 1 and 2 (four groups; n = 38), using an independent t-test. It was decided to compare group means using independent t-tests to take into account the interdependency of observations within PA groups. Cohort 3 was not included in this analysis since the measurements overlapped with measurements in cohorts 1 and 2.
Second, we compared participants who remained active in the CBHEPA programs with those who were no longer active ('program dropouts'). The hypothesis was: CBHEPA participants perform better on physical activity and health-related quality of life outcomes than participants who dropped out of the CBHEPA program (H2). The Mann-Whitney U test was used to compare PA levels and health-related quality of life outcomes.
Third. since these types of analysis still did not provide for deeper insights in the main question whether CBHEPA programs contribute to an increase in and maintenance of physical activity in socially vulnerable groups over time, we developed an integrated multilevel model. The hypothesis was: Participation in a CBHEPA program leads to increase in and maintenance pf its participants' daily physical activity levels over time (H3). A longitudinal multilevel analysis was used to examine the growth model of PA levels over time. As a result of our data collection strategy, our dataset was characterised by intra-individual interdependencies in the repeated measurements, as well as inter-individual interdependencies in the group wise measurements. Therefore, multilevel modelling was used because it is less sensitive to absence of normality in the data and lack of independent sampling of participants and observations. It takes into account group interdependencies, which are considered of importance for effectiveness in CBHEPA programs [44,62]. Another advantage of multilevel analysis of longitudinal data is its ability to handle missing data [63]. This includes the ability to handle models with varying measurement occasions [64,65]. Unlike fixed occasion models, for example MAN-OVA, multilevel regression models do not assume equal numbers of observations, or fixed measurement occasions, so respondents with missing observations pose no special problems, and all cases can remain in the analysis. This is an advantage, because larger samples increase the precision of the estimates and the power of the statistical tests [44]. To deal with missingness, in our study we assumed data to be data missing at random (MAR), a indicating that the missingness may depend on other variables in the model, and through these be correlated with the unobserved values [44].
For our data, three levels were defined: intrapersonal, estimating variance of repeated measurements within individuals; interpersonal, estimating variance of fixed factors between individuals; and group level, estimating variance between groups (Table 3). Leisure-time physical activity (LTPA) was used as primary outcome indicator, since the CBHEPA programs included in our study offered leisure-time PA schemes. We therefore assumed that LTPA was a more sensitive indicator for change than overall PA behaviour. Since the outcome of LTPA was not normally distributed, we used a log transformed LTPA variable (LOG LTPA). Three-level regressions models were developed to assess change over time in LTPA (minutes/week) (Fig 2).
Forward multilevel modelling was used [62], starting with a null model based on LOG LTPA as outcome indicator, time (repeated measurements) and program participation. Interaction terms for time and program participation were included. Then stepwise fixed factors, such as gender, age, ethnic origin, educational level and program participation time were included, as well as SoC (coping ability), followed by time varying covariates for health-related quality of life, BMI, PA self-efficacy and PA enjoyment. Model estimation was based on the restricted maximum likelihood (REML). REML estimates the variance components after removing the fixed effects from the model. REML estimates have less bias than full maximum likelihood estimates, are more realistic and therefore thought to be more suitable when the number of groups is small [44]. As we were dealing with repeated measurements, we used the autoregressive structure (AR (1)) as first order covariance structure. For random effects, we used the scaled identity covariance structure [66]. The group level was defined as first level, since participants are nested within groups; the participants were defined as second level and the repeated measurements as third level. Parallel multilevel modelling procedures were conducted, taking into consideration two different indicators for health-related quality of life: one for perceived health (EQ-VAS) and one for self-reported levels of health problems (EQ-Index). An example of the syntax developed for multilevel modelling in SPSS 22 is presented in the supporting information (S1 Text).
The authors declare that the study was conducted in accordance with general ethical guidelines for behavioural and social research in the Netherlands, peer-reviewed and approved by the review board of the Wageningen School of Social Sciences. Guarantees of anonymity were given prior to each round of data collection. Participants were able to withdraw from the study at any time for any reason.
Baseline sport and PA outcomes showed that mean overall PA level scored 1513 minutes/ week (sd: 1094). Most time was spent on household PA, on average 778.6 minutes/week (sd: 848.3). Many participants (83.4%) were involved in LTPA (e.g., walking, cycling and gardening) at baseline, on average 355 minutes/week (sd: 473). Fewer participants (43.3%) were involved in sports, on average 70.8 minutes/week (sd: 140.4). The majority were not members of a sports club (75.9%). Prior to the baseline inquiry, over half of the participants (52.2%) had participated for less than three months in the CBHEPA program, 15.3% between three and six months, and 32.5% longer than six months. The majority (68.9%) participated once a week, 28.5% more than once a week and 2.6% less than once a week. Mean PA self-efficacy (scale 5-25; Cronbach's α = 0.70) scored relatively highly: 20.12 (sd: 3.97). Mean PA enjoyment (scale 9-45; Cronbach's α = 0.73) scored also relatively highly: 39.9 (sd: 6.1) ( Table 4).
Individual motivations to join a CBHEPA program were mostly health and physical fitness, followed by sociability, value attribution to physical activity, enjoying physical activity and weight loss. Participants often reported more than one motivation (Fig 3).
Measuring effectiveness using a 'control group by proxy' At baseline, no significant differences were found between cohorts 1, 2 (four groups; n = 70) and cohort 4 (nine groups; n = 91) for gender, age, income, and low and moderate educational levels (z-approximation of Mann-Whitney U test). High educational levels were significantly found more in groups of cohort 4 (z = 2.27, p = 0.024). For PA levels, no significant differences (t-test) were found between cohorts 1, 2 and 4 for baseline group means LOG LTPA (t(11): -0.04, p = 0.97) and for group means (log transformed) total PA behaviour (t(11)-0.42, p = 0.68) ( Table 5). For health-related quality of life, no significant differences were found between cohorts 1, 2 and 4 in baseline group means for EQ-Index, EQ-VAS and BMI, indicating comparability in health-related conditions between the groups. Also, no significant differences were found between cohorts 1, 2 and 4 in baseline group means SoC scores and group means PA self-efficacy scores. For PA enjoyment, baseline group means scores were significantly lower in cohort 4 than in cohorts 1 and 2 ( Table 5). The effect size (Cohen's d) was 1.5, indicating a large difference in self-reported PA enjoyment between the cohorts at baseline.
To measure the effectiveness of CBHEPA programs, the next step was to compare T 2 group means-measured after twelve months-of cohorts 1 and 2 (4 groups; n = 38) with baseline group means of cohort 4 (9 groups; n = 91) for PA and health-related quality of life outcomes (t-test). No significant differences were found between the 'active' and 'control group by proxy' for LOG LTPA (t(11) 1.14, p = 0.28) and (log transformed) total PA (t(11) -0.57, p = 0.58). Also, no significant differences were found for the health-related quality of life indicators EQ-Index, EQ-VAS, BMI and PA self-efficacy. For PA enjoyment, the T 2 group means scores were significantly higher after twelve months among the 'active' participants than in the groups just starting (t(11) -4.85, p = 0.001) ( Table 5). The effect size (Cohen's d) was 2.9, nearly double the effect size at baseline, indicating a large effect. The dataset used for the groups means comparison can be found in the supporting information (S1 Dataset).
We did not find evidence to support hypothesis (H1) that participation in a CBHEPA program for one year leads to higher physical activity levels and health-related quality of life among its participants compared to a starting control group. We did find, however, significant differences in PA enjoyment scores between groups in cohorts 1, 2 and 4 at baseline as well as at T 2 .

CBHEPA participants versus program dropouts
Over the course of six months, between group comparisons showed that program dropouts scored significantly lower for LTPA in minutes/week (z = 1.99, p = 0.047) and perceived health status (EQ-VAS; z = 2.88, p = 0.004). No between group differences were found for overall PA, EQ-Index, BMI and contact with care professionals (Table 6).
We did find evidence to support the hypothesis (H2) that CBHEPA participants performed better on physical activity and health-related quality of life outcomes than participants who dropped out of the CBHEPA program. The hypothesis (H2) was confirmed at T 1 for perceived health and LTPA and at T 2 for LTPA, and for variables relating to self-reported health complaints, BMI and care consumption. At T 2 we also found significant differences for PA self-efficacy and PA enjoyment. For all but one indicators showing significant differences, effect sizes based on the z-scores (r) were small (r<0.20). PA enjoyment showed a medium effect size (r>0.30) ( Table 6).
Increase in leisure-time physical activity over time Tables 7 and 8 summarise the results of the three-level growth models for LTPA. Table 6 presents the results of the analysis of LOG LTPA as outcome variable with perceived health (EQ-VAS) as health-related quality of life indicator. Starting with the null model (M0), stepwise correction was made for gender, age, ethnic origin and low educational level. Age proved to be the only factor improving the fit of the model, based on a significant decrease in REML (not reported in the table), but this effect disappeared when the SES factors were clustered (M1). Participation time, i.e. how long people participated in the CBHEPA program prior to the evaluation study, significantly improved the fit of the model (M2). Findings relating to the fixed effects at intrapersonal level in all models showed no significant within-subject differences in LOG LTPA at the three points of measurement. Time in interaction with program dropout in the full growth model (M8) showed a significant decrease in LOG LTPA among program dropouts compared to participants (E = -0.426, p< 0.050). After correction for SES variables, the change in LOG LTPA with perceived health showed a significant downward trend in the full growth model (M8) at T 1 and T 2 compared to baseline (F(2, 9.889, p<0.001). Differences between T 1 and T 2 were not significant.
Findings relating to the fixed effects at interpersonal level showed that women scored significantly lower at baseline on LOG LTPA (p<0.010) than men, but not in follow-up measurements. No significant differences were found between participants for age or ethnic origin. Findings relating to the full model (M8) for educational level suggested that LOG LTPA was significantly higher (p<0.050) among participants with higher educational levels, but that there was no significant difference in educational level between participants and program dropouts.
The time varying covariates in the successive models showed a significant improvement in the fit of the model at each step, except for SoC (M5), based on calculated differences in REML. This indicated that each covariate partly explained the variance in LOG LTPA. Perceived health (EQ-VAS) was significantly associated with higher levels of LOG LTPA in all models, whereas BMI and SoC were not. PA self-efficacy and PA enjoyment were also significantly associated with higher levels of LOG LTPA (p<0.050).  Findings relating to the fixed effects in the full model (M8) at group level showed that short CBHEPA programs (10-13 weeks) with multiple trainers, addressing gender homogeneous groups, were significantly associated with lower LOG LTPA levels whereas continuous CBHEPA programs with a single, known trainer, addressing gender-heterogeneous groups were not. Calculated effect sizes (Cohen's d) for the different group types at the three points in times showed a medium effect at T 0 (d = 0.51), and small effects at T 1 (d = -0.12) and T 2 (d = 0.07).
The variance of the intercepts between CBHEPA groups across the eight models was not significant, indicating that groups did not vary significantly in LTPA. The intercepts of participants (id) nested in PA groups, significant in the null model (M0), showed a gradual decline across the eight models. None of the included factors or covariates, however, significantly explained individual variance within groups (Table 7). Table 8 presents the results of the parallel modelling of LOG LTPA as outcome variable with self-reported health complaints (EQ-Index) as health-related quality of life indicator. The estimation results for the models M0 to M2 were the same as reported in Table 6. Findings for modelling LOG LTPA and self-reported health complaints (EQ-Index) were similar to those for modelling LOG LTPA and perceived health (EQ-VAS). The full growth model (M8) for LOG LTPA with self-reported health complaints showed a significant downward trend at T 1  and T 2 compared to baseline (F(2,11.206), p<0.001). Differences between T 1 and T 2 were not significant. The dataset used for the multilevel analysis of the growth model can be found in the supporting information (S2 Dataset).
Findings relating to the fixed effects at intrapersonal level in all models showed no significant within-subject differences in LOG LTPA at the three points of measurement. Time in interaction with program dropout in the full model (M8) showed a significant decrease in LOG LTPA in program dropouts compared to participants (E = -0.42, p< 0.050).
Findings relating to the fixed effects at interpersonal level showed that women scored significantly lower at baseline on LOG LTPA (p<0.010) than men, but not in follow-up measurements. No significant differences were found between participants for age or ethnic origin. Findings relating to the full model (M8) for differences in educational level suggested that LOG LTPA was significantly higher (p<0.050) among participants with higher educational levels, but that there was no significant difference in educational level between participants and program dropouts.
The time varying covariates in the successive models showed that lower scores on selfreported health complaints were significantly associated (p<0.050) with higher levels of LOG LTPA in all models, whereas BMI and SoC were not. PA self-efficacy and PA enjoyment were both significantly associated (p<0.050) with higher levels of LOG LTPA. SoC did, however, improve the fit of the model significantly (M5), indicating that SoC explained part of the variance in this model. Findings relating to the fixed effects in the full model (M8) at group level were similar to those for the model LOG LTPA with perceived health: short CBHEPA programs (10-13 weeks) with multiple trainers, addressing gender homogeneous groups, significantly associated with lower LOG LTPA levels whereas continuous CBHEPA programs with a single, known trainer, addressing gender-heterogeneous groups were not. The development of the intercepts of CBHEPA groups across the eight models was similar to the pattern reported for the modelling of LOG LTPA and perceived health described above, as were the values for effect sizes (Cohen's d) for the different group types at the three points in time.
In relation to the REML values in the parallel growth models for the two health-related quality of life indicators, the growth model for LOG LTPA with EQ-Index (REML = 475.34) showed a slightly better fit of model than the LOG LTPA with EQ-VAS (REML = 483.53). It is possible that perceived health is more strongly correlated with the other factors and covariates included in the model, such as BMI, SoC, PA self-efficacy and PA enjoyment, than EQ-Index.
We did not find evidence to confirm the hypothesis (H3) that participation in a CBHEPA program leads to an increase in its participants' leisure-time physical activity levels over time. The positive association over time between health-related quality of life outcomes, physical activity self-efficacy and enjoyment, and leisure-time physical activity is, however, supported in the multilevel regression model.

Discussion
In order to evaluate the effectiveness of group-based CBHEPA programs, the aim of this study was to assess whether or not CBHEPA programs contribute to increasing and maintaining physical activity in socially vulnerable groups over time. Based on a combination of statistical analyses, our findings do not univocally support the proposition that participation in a CBHEPA program leads to an increase in overall PA levels (quasi-RCT) or an increase in leisure-time PA at participant level after twelve months, as was hypothesised. The multilevel models showed significant positive associations between individual factors, such as higher education and being female, and leisure-time PA. Women scored significantly lower at baseline than men, but the gender-related difference in PA was not found in follow-up measurements. No significant differences were found between participants for age or, somewhat surprisingly, for ethnic origin. Health-related quality of life, PA self-efficacy and PA enjoyment were intrapersonal time varying covariates, significantly associated with higher levels of physical activity. Short CBHEPA programs (10-13 weeks) with multiple trainers were group-related factors associated with lower leisure-time PA over time compared to participants in on-going CBHEPA programs with a known, single trainer.
At twelve months, leisure-time PA levels of program dropouts were significantly lower compared to continuing participants, as were health-related quality of life, PA self-efficacy, and PA enjoyment outcomes. BMI and care consumption also scored significantly higher among dropouts. On the basis of our findings, it seems that intrapersonal time varying covariates are more relevant in explaining PA maintenance than interpersonal characteristics (e.g., gender, age or ethnic origin) or group level characteristics.

Population reached
A first aspect relating to CBHEPA program effectiveness is whether or not the intended target population is reached. Socio-economic baseline data show that a majority of CBHEPA program participants have low educational levels (48.6%), low income (52.4%) and low employment rates (11%), compared to Dutch population data. Statistics Netherlands shows that 27% of the general population is lowly educated (no, or only primary, school), 10% have low income, and over 90% are employed [67][68][69]. Likewise, health-related quality of life indicators at baseline are lower than comparative research outcomes in Dutch population groups [58], and participants show a weaker SoC compared to other Dutch studies [70]. With an average BMI of 29.5 found in CBHEPA participants, the majority of the target group are overweight or obese. BMI data for the general population show 30% overweight (BMI 25-30) and 14% (BMI>30) obesity for women, and 47% overweight and 13% obesity for men [71]. BMI values require, however, a nuanced perspective since 32% of the CBHEPA participants are older than 65 years and over 60% are of non-Dutch origin, including a substantial number of participants from Asiatic backgrounds. The literature indicates that BMI is less appropriate as a measure for overweight in older and/or Asian population groups [72][73][74]. In terms of socioeconomic and health-related quality of life outcomes at baseline, CBHEPA programs reach the intended target group (Table 9).
Overall PA levels, at an average of 216 minutes per day, are not low compared to Dutch trend analyses on sport and PA ( Table 9). The latest trend report describes an increase from 169 to 202 minutes for Dutch adults (age 15-64) spent in PA during 2000-2011, mainly resulting from an increase in light and moderate intensity activities (in particular activities at work/school and at home). For older people (age 65 plus), there was an increase in PA from 100 to 130 minutes [5]. Our findings indicate that more than half of younger CBHEPA participants (< 65 years) were less active compared to the age-specific Dutch reference value (202 min/day) at all measurement points, whereas a majority of older CBHEPA participants ( 65 years) were more active compared to the age-specific Dutch reference value (130 min/day). These results suggest that CBHEPA programs reach both relatively inactive and active people. In terms of physical activity, it seems that, compared to the reference physical activity levels for adults, CBHEPA programs reach more inactive younger people (< 65 years) than inactive older people ( 65 years). Increase in PA levels over time?
A second aspect regarding CBHEPA program effectiveness is whether or not CBHEPA programs contribute to increasing and maintaining physical activity in socially vulnerable groups over time. Our findings do not show an increase over time. What is more, a significant decrease compared to baseline was observed. An American longitudinal multilevel study on community-based PA (neighbourhood walking) similarly reported a downward trend in PA over time [75]. There are several possible explanations for our findings. First, for practical reasons of recruitment, participants were included at baseline only after the start of a CBHEPA program. Some programs had already existed for a number of years. At baseline, half of the participants had been active in the program for three months or more, resulting in the absence of genuine baseline data for PA and health-related quality of life.
Second, all data were assessed with self-report measures. For measuring PA, this is considered less reliable than an objective measure like an accelerometer [76]. We did not find, however, validated objective measurement instruments suitable for our target group, interpretable without additional self-report measures such as those collected with SQUASH. Self-report measures may also induce a question-behaviour effect: asking questions about a behaviour may change the behaviour in question [77,78]. This usually leads to bias in a socially normative direction. During the repeated measurements, participants may have become also more experienced in answering the questions and at the same time may have developed a more realistic perspective on their own PA behaviour and health-related quality of life. A meta-analysis, though, found the question-behaviour effect on health-related behaviour to be rather small [79].
Third, the absence of an expected increase in leisure-time PA can be explained from a time allocation perspective. People tend to allocate only a certain amount of time daily to leisure time activities in general, and to PA or sport more particularly. This perspective is elaborated in the SLOTH model-a time-budget model incorporating Sleep, Leisure, Occupation, Transportation and Home-based activities-identifying possible economic factors of influence on individuals' choices about utilisation of time in relation to PA behaviour and maintenance [80,81].

PA maintenance in participants and program dropouts
Comparison of the multilevel models for the two health-related quality of life indicators reveals that perceived health (EQ-VAS) is possibly stronger correlated with other factors explaining leisure-time PA, such as BMI, SoC, PA self-efficacy and PA enjoyment, than self-reported health complaints (EQ-Index). Both models, however, offer solid indications that PA maintenance is strongly related to health-related quality of life on the one hand, and PA self-efficacy and PA enjoyment on the other. These findings are in line with other studies showing evidence for the interrelatedness of health and PA behaviour [8] and the role of (post) motivational factors in PA maintenance [29,35,36].
Our findings indicate that leisure-time PA, health-related quality of life indicators, BMI, PA self-efficacy, and PA enjoyment score worse among program dropouts. One explanation is that health impairments are the main reason given for participants to quit the program. Dutch CBHEPA programs targeting socially vulnerable groups may, therefore, need to focus on actions to prevent lapses resulting from health complaints, and help people cope with risk situations for lapses, thus enforcing program adherence and PA maintenance. [27,82].

Group level characteristics
Our findings show that group effects do have an impact on (leisure-time) PA behaviour and maintenance. Short CBHEPA programs (10-13 weeks) with multiple trainers, addressing gender-homogeneous groups, were significantly associated with lower leisure-time PA levels than on-going CBHEPA programs with a single, known trainer, addressing gender-heterogeneous groups. The observed decline in effect sizes over time may be a result of the fact that participants of short-term programs may have been less represented in the follow up measurements. The findings from this quantitative multilevel study are, however, supported by several qualitative studies on group effects, indicating that group dynamics, group composition and social support, and exercise trainer characteristics contribute substantially to effective PA programs [38,39,83,84].

Methodological issues
Our findings should be interpreted in the context of several strengths and limitations. A first strength of our study is that we evaluated on-going field practice, rather than conducting an experimental setup, to investigate the determinants of PA behaviour and maintenance in socially vulnerable groups. Creating controlled experimental conditions are of limited value to contribute substantially to a (practice based) body of evidence needed to understand what works for whom in CBHEPA programs [45,85,86]. For example, the use of adequate control groups can be problematic, since matching for non-observable differences such as initial motivation, is not easily done. Therefore, our study locked onto natural experiments-the CBHEPA programs-by design. Natural experiments have an important contribution to make to the health and PA inequalities agenda, including assessment of effective interventions, an area which is acknowledged as lacking an evidence-base [87]. In our experience, the sequential cohort design, in which the intervention effects are measured repeatedly using the T 0 measurements as point of reference, proves a feasible approach. In addition, it offers the possibility to compare between cohorts, i.e. in our case between program adherents and starters [44].
A second strength is the use of multilevel modelling in this study to monitor physical activity development over time in socially vulnerable groups. Multilevel analysis and repeated measurements are not often used to assess CBHEPA program effectiveness, and our use of these techniques adds to the commonly used individual-level research design paradigm [25,75]. The inclusion of intra-individual factors (covariates), as well as inter-individual and group-level factors contributes to the strength of the study.
A third strength is the longitudinal nature of the study, addressing a critical need for data on patterns of PA behaviour and maintenance and how these may change over time. As some researchers indicate, a multilevel perspective allows researchers to identify significant and potentially modifiable factors, and this in turn can inform policy changes and facilitate the design of interventions to change health and PA behaviour at societal level [25,88].
Limitations to our study relate first to the limited number of determinants of potential influences on PA behaviour in socially vulnerable groups, included in our data collection. Given our target group, we were challenged to balance our information needs and the target group's responsive capacity and competences. Questionnaire use can be difficult in socially vulnerable groups. Lack of health literacy, lack of basic skills in reading and writing and different beliefs about health concepts across cultures may lead to difficulties in understanding and interpreting the questions [47,89], eventually leading to non-response [88]. Alternatives, however, such as translations, working with images or digital devices, suffer similar limitations [90,91]. During our study, we did experience a number of these barriers in data collection. Steps were taken to deal with response difficulties by limiting the number of questions reducing the number of indicators, or by choosing restricted scales, such as the SoC three-item instead of the SoC thirteen-item instrument [51]. It thus forced us to limit ourselves to collect information about the most important explanatory factors for PA behaviour and maintenance found in CBHEPA programs, such as health-related quality of life, PA self-efficacy and PA enjoyment. Using a personalised data collection strategy [47], advocated by CBHEPA professionals and practitioners, was successful in reaching out to and inclusion of a satisfactory number of participants. We cannot, however, rule out the fact that other contextual influences (e.g., family situation, community or neighbourhood), not included in our study, may also have been important in explaining PA behaviour and maintenance. In particular, neighbourhood factors have been found to play a significant role in PA and other health behaviours [92].
A second limitation relates to the validity of the standardised instruments compiled in our questionnaire, when using them in our target group. The SQUASH instrument in particular was perceived as complicated by participants, because of its number of items and the seven-day recall structure. Moreover, participants had (to be able) to reflect on their PA behaviour and make time calculations. To tackle this issue, we monitored the data collection procedure closely throughout our study by making observational notes, and by reviewing the forms for missing items, illegible handwriting, inadequate answers and logical inconsistencies among responses after each data collection session. Errors thus identified were resolved by checking back with the participant, the trainer or the assistant [93].
A third limitation of our study relates to potential sources for bias. Recruitment of participants, done in collaboration with practice and on voluntary basis, may have suffered from a selection bias. Only people willing to participate were included. It also resulted in a lack of genuine baseline data, since the researcher could not contact participants before PA groups had started. Similarly, in comparing participants and program dropouts, a selection bias may have plaid a role, as we relied on people willing to fill out questionnaires after having quit the CBHEPA program.
The survey settings, usually the PA group setting at the sports venue, may have influenced people's responses. Using the sport venue, however, as communal factor throughout the study has contributed to minimising this bias. In addition, using the multilevel analysis helped to correct for possible interdependencies in responses within groups.

Future research
Over the past decade, the ecological perspective has gained ground as a new paradigm in research on PA behaviour and maintenance [19,[94][95][96]. It is to be expected that this will lead to more transdisciplinary research [97] and the use of hierarchical data structures and multilevel statistical procedures [25,75,88]. What our study shows is that studying socially vulnerable groups from the perspective of PA and health inequalities, applying multilevel modelling, still suffers from highly abstracted social concepts to make them measurable and interpretable. Concise, interpretative mixed-method research, combining quantitative and qualitative research data in one study, could help identify the contextualised explanatory factors for particular groups in more detail, hence improving the accuracy of statistical procedures [98].

Conclusion
Dutch CBHEPA programs reach relatively socially vulnerable, but not necessarily inactive, groups, in terms of socio-economic and health-related quality of life outcomes. No increase in leisure-time physical activity behaviour could be observed over time, but health-related quality of life, self-efficacy and enjoyment were found to contribute to physical activity maintenance. A decrease became manifest in physical activity as well as in health-related quality of liferelated outcomes among dropouts. Our findings suggest that CBHEPA programs contribute to physical activity maintenance in socially vulnerable groups. These programs should, therefore, be valued for their potential in encouraging program adherence, rather than being made accountable for increasing physical activity.