Validation of the SQUASH Physical Activity Questionnaire in a Multi-Ethnic Population: The HELIUS Study

Purpose To investigate the reliability and validity of the SQUASH physical activity (PA) questionnaire in a multi-ethnic population living in the Netherlands. Methods We included participants from the HELIUS study, a population-based cohort study. In this study we included Dutch (n = 114), Turkish (n = 88), Moroccan (n = 74), South-Asian Surinamese (n = 98) and African Surinamese (n = 91) adults, aged 18–70 years. The SQUASH was self-administered twice to assess test-re-test reliability (mean interval 6–7 weeks) and participants wore an accelerometer and heart rate monitor (Actiheart) to enable assessment of construct validity. Results We observed low test-re-test reliability; Intra class correlation coefficients ranged from low (0.05 for moderate/high intensity PA in African Surinamese women) to acceptable (0.78 for light intensity PA in Moroccan women). The discrepancy between self-reported and measured PA differed on the basis of the intensity of activity: self-reported light intensity PA was lower than measured but self-reported moderate/high intensity PA was higher than measured, with wide limits of agreement. The discrepancy between questionnaire and Actiheart measures of moderate intensity PA did not differ between ethnic minority and Dutch participants with correction for relevant confounders. Additionally, the SQUASH overestimated the number of participants meeting the Dutch PA norm; Cohen’s kappas for the agreement were poor, the highest being 0.30 in Dutch women. Conclusion We found considerable variation in the test-re-test reliability and validity of self-reported PA with no consistency based on ethnic origin. Our findings imply that the SQUASH does not provide a valid basis for comparison of PA between ethnic groups.


Introduction
Physical activity is an important determinant of health [1]. In the Netherlands recent figures indicate that 61% of the population meets the Dutch healthy Physical Activity (PA) guideline of 30 minutes moderate intensity PA daily [2]. However, in ethnic minority and migrant groups this percentage is much lower, 45%, 49% and 56% in Turkish, Moroccan and Surinamese respectively [3]. Hence, low PA may contribute to the disparities in health observed among these populations.
Physical activity levels are often evaluated on the basis of self-report [4]. Questionnaires are the preferred method for assessing in large-scale monitoring and observational studies due to their low cost compared to accelerometers or other objective measures and because they provide detailed estimates of the contexts and domains where PA takes place. However, they are prone to measurement error and bias due to misreporting, either deliberately (due to social desirability) or because of difficulties with remembering and estimating the frequency and duration of different activities [5,6]. In a multi-ethnic population additional bias may arise if particular constructs are interpreted differently by different groups, or if certain groups are more likely to provide socially desirable answers [6][7][8]. Bias may also arise due to the use of items that are more suited to one group's language and/or cultural experiences, for example questions about cycling. Additionally, participants with low PA levels (as is presumably the case among ethnic minorities) may have less structured PA patterns and thus have more difficulty with recalling activities [9].
The SQUASH (Short QUestionnaire to ASsess Health enhancing physical activity) is a commonly used instrument in the Netherlands to assess PA, particularly in surveillance studies [3]. It was developed by the Dutch National Institute of Public Health and the Environment (RIVM) to measure PA with respect to occupation, leisure time, household, transportation means, and other daily activities. The SQUASH was designed to give an indication of the habitual activity level and was structured in such a way that it would be possible to assess compliance to physical activity guidelines. [10] While it has been shown to be valid in measuring PA among the Dutch population [10,11], its reliability and validity among ethnic minority groups has not been established.
We aimed to evaluate the test-re-test reliability and construct validity of the SQUASH questionnaire in measuring self-reported habitual PA in a multi-ethnic population using a combined accelerometer and heart rate measurement device (Actiheart). We expected that the SQUASH would be relatively reliable (i.e. would perform consistently) in all groups. However, as the SQUASH was designed for the Dutch population, we expected that construct validity may be lower in the ethnic minorities.

Study population
We included participants of the baseline examination of the HELIUS study (Healthy Life in an Urban Setting), a multi-ethnic cohort study that included adults aged 18-70 years in Amsterdam, the Netherlands [12]. The HELIUS study protocols were approved by the AMC Ethical Review Board, and all participants provided written informed consent. The SQUASH validation study aimed to include 100 (50 men and 50 women) persons per ethnic group: African Surinamese, South Asian Surinamese, Moroccan, Turkish and Dutch origin. This was based on the COSMIN (COnsensus-based Standards for the selection of health Measurement INstruments) criteria which indicates that for a validation study a sample size of 30-49 participants is considered "fair," while 50-99 participants is considered "good" [13]. Of 1077 persons invited to participate 151 were not able to be contacted for an appointment, 92 were not eligible for the study and 356 persons refused to participate. Of the participants, 4 did not wear the Actiheart for the full five days, 8 had either 'bad' or incomplete data and 4 had an 'other' ethnic origin. Thus 462 participants were included in the analysis of validity. Of these, 119 persons did not complete the SQUASH a second time, resulting in 343 participants in the reliability study.

Study procedure
The study took place from November 2012 to November 2013. HELIUS participants who indicated willingness to participate in sub-studies as part of the informed consent procedure were sent an invitation letter and brochure explaining the aims of the study and study procedures, followed by a telephone call within 2 weeks. During this call, individuals were screened for eligibility. Exclusion criteria were, 1) inability to complete the calibration of the Actiheart, i.e. chest pain upon exertion or difficulties in climbing stairs due to cardiac reasons; 2) use of medication that might interfere with heart rate measurements (β-blockers); 3) pregnancy.
Participants were invited to the study location for the placement of the Actiheart and were sent the SQUASH questionnaire for completion prior to their appointment. On the day of their appointment, participants were requested to avoid ingesting a heavy meal and to abstain from coffee, alcohol and cigarettes for 3 hours, as these substances may influence heart rate measurements and thus, the calibration study. During the visit to the study location we calibrated the Actiheart (see below for description of this procedure) and set it up for long-term recording. Participants were asked to wear the Actiheart continuously for a five day period during free-living (including at least one weekend day), after which they returned it to the research location.
Participants also received a second copy of the SQUASH with instructions to complete it four weeks after their appointment. Postage-paid envelopes were provided for the return of the questionnaire. We provided travel compensation in the form of a gift card and, upon receipt of the second copy of the SQUASH, participants received personalised feedback based on the Actiheart measurements.

Self-reported physical activity (SQUASH)
Physical activity was measured using the SQUASH questionnaire [14]. The SQUASH includes questions on multiple activities referring to a 'normal' week in recent months: actively commuting (walking, cycling), physical activity at work or school, household activities, leisure time activities (sports, walking, gardening, cycling). Respondents were asked to fill in how many days a week they engaged in the activities, the average time per day spent on each activity (hours and minutes) and the intensity at which they did the activity (low, moderate, high). Missing and unlikely values were handled according to standardised methodology described in the user manual that accompanies the SQUASH, as described by Wendel-Vos et al. [14]. Briefly, results from the SQUASH were converted to minutes per week spent in light and moderate/high intensity activities based on age-specific Metabolic Equivalent Tasks (METs) derived from Ainsworth's compendium of physical activity [15].The Dutch PA norm (NNGB) includes different cut-points for moderate-and high intensity activities for adults younger than 55 years (4.0 and 6.5 METs, respectively) and adults aged 55 years and older (3.0 and 5.0 METs, respectively) [2]. We categorized participants according to whether they met the NNGB by summing up the number of days per week for each moderate and high intensity activity lasting at least 30 minutes. A minimum of 5 days resulted in participants being categorized as achieving the NNGB.
The questionnaire was available in Dutch and was also translated to Turkish (using forward and back-translation techniques). Surinamese people generally speak the Dutch language fluently while the main language used among Moroccan people in the Netherlands is Berber, which is not a written language, thus the SQUASH was not translated for these two groups. Based on previous experience we expected that at least 70% of Moroccan participants could complete the SQUASH in Dutch. We aimed to test the validity of the questionnaire as it is intended for use, i.e. self-completed, thus we did not offer help with filling it in, however, it is possible that some participants received help from family members.

Objectively measured physical activity
Objectively measured physical activity was assessed using Actiheart devices (version 4, CamNtech Ltd, UK). The Actiheart is a chest-worn monitoring device that records heart rate and physical activity in order to measure activity energy expenditure. Brage et al [16] reported that Actiheart measures of movement and heart rate generally agreed well with criterion measures of acceleration and heart rate: the linear relationship between movement and acceleration was strong (R 2 = 0.99, P<0.001) and correlations with activity intensity were high (R 2 >0.84, P<0.001). During the visit to the study location the Actiheart was individually calibrated for each participant using a sub-maximal linear step test. Details of this step test have previously been described [16] but briefly: the step test consists of 8 min of stepping up and down a 200mm high step, starting with one foot placement per second for the first minute, steadily increasing and followed by a 2 minute recovery phase. Upon completion of the step test ECG signals from the Actiheart were downloaded into the manufacturer's software. The Actiheart was placed in the position with the most optimal ECG signal (established prior to conducting the step test) and participants were instructed on to wear the device for five full days, including the day of placement.
Upon return of the Actiheart the long-term data were downloaded and processed in the manufacturer's software. This involved an initial cleaning step, followed by a branched equation algorithm to translate the accelerometer and heart rate data into an estimate of minutes per day spent in several MET-categories [17]. We included data from 4 x 24 hour days per participant; the day that the Actiheart was placed was not included as it did not cover a full 24 hour period. Incomplete days and data of participants that did not wear the Actiheart for 4 days were excluded from further analysis (see method section for specification of numbers excluded).
For each complete day measured, the minutes of moderate-and PA were determined by using age adjusted MET values as described in the questionnaire section above. Light Intensity PA was calculated as all activity between 1.5 < 4.0 METs for the younger age group, and 1.5 < 3.0 METs for the older age group. In order to compare Actiheart outcomes (minutes per day) with the SQUASH, which derives minutes per week we calculated the average minutes per day of the days recorded and multiplied this by 7. The NNGB was calculated as follows: If the amount of time spent in moderate and high intensity PA was at least 30 minutes on a single day then those days were summed for determining the NNGB. Participants were categorized as achieving the NNGB when the fraction of a week was 5/7 of the total number of days measured (i.e. 3 of the 4 days in this study as we only included 4 full days of Actiheart measurements).

Other variables measured
Potential covariates (Ethnicity, migration generation level, BMI, waist circumference and educational level) included in the analyses were obtained from the HELIUS study baseline data, as described by Stronks et al [12].
A participant was considered as being of non-Dutch ethnic origin if at least one of his parents was born abroad. Additionally, Surinamese subgroups (South-Asian or African) were classified according to self-reported ethnic origin. Generation status was defined on the basis of birthplace. Those born outside the Netherlands and having at least one parent born outside the Netherlands were considered first generation migrants, those born in the Netherlands and having one or both parents born abroad were considered second generation [18]. Trained research assistants measured weight and height in duplicate in barefoot subjects wearing light clothes only. Waist circumference was measured using a tape measure at the level midway between the lowest rib margin and the iliac crest. Body mass index (BMI) was calculated as weight (kg) divided by height squared (m2) [19]. Participants' highest completed education was categorized as "never been to school or elementary schooling only", "lower vocational schooling or lower secondary schooling", "intermediate vocational schooling or intermediate/higher secondary schooling (general)", and "higher vocational schooling or university", at the time of filling in the HELIUS questionnaire.
Social desirability was examined using the Social Desirable Response Set (SDRS) from Hays et al [20]. The SDRS consists of a set of five propositions, scored using a 5-point Likert scale (e.g. I am always courteous even to people who are disagreeable). Participants receive 1 point per question for the most extreme score-with question 1 and 5 scoring 'opposite' to the other 3 questions. Thus scores range from 0 to 5 points-with 5 representing a high tendency to provide socially desirable answers.

Analysis
Test-re-test reliability. Participants filled in the SQUASH twice, with an interim period of 4 weeks. Only participants who filled in the second questionnaire within the required time (maximum 12 weeks after the first) were included. We expected that this period was long enough to ensure that participants could not recall what they filled in the first time and short enough to prevent large changes in physical activity levels. For continuous measures (min/ week engaged in light and moderate/high intensity physical activity) we calculated Intraclass Correlation Coefficients (ICC) using a two-way random effects model for absolute agreement. ICCs higher than 0.7 were considered strong. We expected reliability coefficients to be in the range of 0.6-0.7, as was found in a recent review of reliability of PA questionnaires [5]. Additionally we calculated the 95% Limits of Agreement between subjective and objective PA as a parameter of measurement error [21].
Construct validity. Spearman's correlations were calculated to compare the minutes per week spent in light and moderate/high intensity physical activity as measured by the Actiheart and the SQUASH (filled in at the first time point). Validity coefficients were expected to be higher than 0.5 [22]. As the SQUASH is designed with the Dutch population in mind, we expected that the validity coefficients would be lower among the ethnic minority groups. We used the Bland-Altman method to evaluate bias, 95% limits of agreement were used for describing the total error between the two methods. The Actiheart values were regarded as the 'standard' against which the difference between the two measures was compared [21]. Differences at the individual level between the two measures of moderate to high intensity PA should in theory be within range of the NNGB which specifies that persons are sufficiently active if they engage in moderate to high intensity PA for at least 30 minutes per day, five days per week.
Ethnic differences. Ethnic differences between objective (Actiheart) and subjective (SQUASH) measures were studied using linear regression with the Dutch group as reference accounting for determinants of PA (education level and BMI) as well as for relevant confounders (age and social desirability).
Analysis was conducted with IBM SPSS 20.

Results
The characteristics of participants are shown in Table 1. The average age (SD) of participants ranged from 40.2 (9.6) years among Turkish women to 53.0 (6.5) years among Dutch origin men. Among the ethnic minority groups, the majority were first generation migrants (78 to 97%). Body weight varied amongst the different sub-groups, the highest mean BMI amongst men was observed in Turkish participants, while amongst women this was observed in African Surinamese (28.5 (3.9) and 30.7 (7.0) respectively). Educational level also differed per ethnic group, with Dutch origin participants having completed a higher level of education than the ethnic minority groups; 62-66% had a high education level compared to 15% among Turkish men and 17% among Moroccan women. The ethnic minority groups scored higher on social desirability (range 1.3-2.0 out of 5 points) compared to Dutch origin participants who scored between 0.7-1.0. The response rate differed between ethnic groups, and was lowest in the Turkish women (26%) and Moroccan men (31%). Information from the HELIUS study enabled us to compare responders and non-responders on main characteristics (full data not shown). Statistically significant differences between responders and non-responders, per ethnic group were as follows: Among Dutch, participants had a lower BMI than non-responders. Among South Asian Surinamese and Turkish, more women participated than men. Among African-origin Surinamese and Moroccan more first generation persons participated. No other differences between groups were observed. Table 2 presents the results of the test-re-test reliability of the SQUASH. The average testre-test time span was 6.5 weeks. Reliability varied greatly with no consistent patterns on the basis of ethnicity or the outcome variable used. The ICCs for continuous measures ranged from extremely low (0.05 for moderate/high intensity PA in African Surinamese women) to acceptable (0.78 for light intensity PA in Moroccan women, 0.64 for moderate/high intensity PA in Dutch women). Participants reported higher PA at the second time point for all continuous variables, as is reflected in the negative mean differences in the tables; Moroccan men were the only exception, reporting lower light intensity PA at the second time point. The limits of agreement between the measures at the two time points were wide for all groups. For meeting the norm for PA (NNGB) Cohen's kappa ranged from poor (0.01 among South Asian Surinamese men) to, at best moderate (0.38 in African Surinamese men).
Construct validity for light and moderate/high intensity PA is presented in Table 3. Objectively measured light intensity was on average higher than what was reported using the SQUASH in all groups. At the population level, mean differences between the two measures of light intensity PA ranged from 1053 minutes/week (150 min/day) in Dutch men to 3144 minutes/week (449 min/day) in Turkish women. The 95% limits of Agreement were wide (ranging from ±2930 to ±4215 min/week). Bland-Altman plots indicate that differences between the measures were larger with higher duration of objectively measured activity (see Figs A and B in S1 File "Bland Altman Plots for Light Intensity Physical Activity" in men and women respectively. Objectively measured moderate/high intensity PA was lower than was reported using the SQUASH. At the population level, mean differences between the two measures were, amongst Moroccan women (mean difference 251 minutes/week, 36 min/day) and Dutch men (mean difference 68 minutes/week, 10 min/day). African Surinamese women were the only exception (mean difference 8 minutes/week, 1 min/day). Although the differences between the Actiheart and SQUASH measures of moderate/high intensity PA were smaller than they were for low intensity PA, wide limits of agreement in all groups, indicate poor agreement between SQUASH and Actiheart measures at the individual level. Finally, Spearman's correlations between the Actiheart and SQUASH measures of PA varied per group and were smaller than expected; ranging from 0.08 for moderate/high intensity PA in Turkish men, to 0.41 for moderate/high intensity PA in Dutch women. Table 4 shows the results of the analysis of the ethnic differences in the discrepancy between the objective and subjective measures of light intensity and moderate/high intensity PA. Among women, differences with the Dutch reference group in light intensity PA seem to be explained by differences in educational level, except in Turkish women, where the differences persisted even after additional adjustment for BMI and social desirability (Beta 1135 (95%CI 325, 1945)). Differences in moderate and high intensity PA were only statistically significant in Moroccan women. Among men, significant differences with Dutch men were observed in light intensity PA and these persisted after full adjustment, with the exception of South Asian Surinamese men (Beta 464 (95%CI -311, 1239)).
Finally, Table 5 shows the level of agreement between the Actiheart and the SQUASH in categorizing people on the basis of fulfilling the NNGB. As with other outcome measures, there were no consistencies in findings on the basis of ethnicity. With the exception of Moroccan women, the SQUASH categorized a larger number of individuals as meeting the NNGB than Table 4. Ethnic differences in the discrepancy between objective (Actiheart) and subjective (SQUASH questionnaire) measures of Physical Activity in minutes per week.   did the Actiheart. Cohen's kappa values were poor, the highest being 0.30 in Dutch women, while the lowest (0.08) was among African Surinamese men. We also looked at the percentage of agreement between the Actiheart and the SQUASH, correct classification was defined as both methods correctly classifying participants as either achieving or not achieving the NNGB. Using this approach 64.2 and 64.9% of the cases, among Dutch women and Moroccan men respectively were correctly classified. Among the other groups this was lower, varying from 45.2% in South Asian Surinamese men to 58.3% in Moroccan women.

Discussion
We aimed to validate self-reported PA using the SQUASH questionnaire in a multi-ethnic population and found that the SQUASH had a lower than expected test-re-test reliability in all ethnic groups. Compared to objective measures, the SQUASH largely underestimated the time spent in light intensity PA. Self-reported moderate and high intensity PA was higher than measured (overestimated), although differences at the group level were smaller than they were for light intensity PA. Nonetheless, wide limits of agreement (larger than 30 minutes per day) indicated poor validity at the individual level. The questionnaire also overestimated the number of participants that met the norm for PA. Validity coefficients were not lower in ethnic minority groups than in Dutch participants and we found no differences between the ethnic minority groups and Dutch population in the discrepancy between objectively and subjectively measured moderate/high intensity PA. This is the first study to validate the SQUASH in a multi-ethnic population using an objective measure of PA, thus we are unable to compare our results with those among similar populations. A study by Wendel-Vos et al reported the SQUASH to be fairly reliable in a Dutch population, the Spearman's correlation for test-re-test in that study was 0.58 (total PA). While we cannot directly compare our results due to methodological differences, our results indicate a somewhat lower reliability. Wendel-Vos et al included office workers (who are presumably higher educated) while we included a population-based sample of, on average, lower educated participants [10]. Socio-cultural background, sex, age, literacy and cognitive ability have been reported to influence reliability estimates of PA questionnaires [5]. Thus the lower test-re-test statistics in our study might reflect the ethnic and educational diversity of our population. The ICCs derived in our population for test-re-test reliability were also lower than what was reported in a recent review by Helmerhorst et al where median ICCs were 0.765 [5]. That review found reliability to be higher in studies with shorter test-re-test interval. We had a relatively long test-re-test interval of average 6-7 weeks. In designing the study we decided to use a longer time period for the test-re-test in order to minimize the chance that respondents remembered the answers they gave at the first administration as this may lead to an inflation of reliability estimates [24]. This might have increased the change that the second measurement in our study reflects real changes in the activity patterns over time in addition to measurement error and might be another reason for the low reliability statistics. Wendel-Vos et al also studied the validity of the SQUASH based on accelerometer data [10]. The Spearman's correlation was 0.45 which is moderate and comparable to what we found among Dutch women (0.41) but higher than our findings among Dutch men (0.17). In our study, the Actiheart and the SQUASH consistently classified 50-64% of Dutch men and women respectively on the basis of the NNGB, which is lower the 69.5% reported by de Hollander et al in a similar study [11].
Consistent with our findings, two previous studies of the validity of PA questionnaires in multi-ethnic populations showed mixed results. In a Singapore study, Nang et al examined the validity of two different questionnaires (the long IPAQ and the SP2PAQ, a questionnaire based on several existing instruments) in three ethnic groups (Chinese, Malay and Indian) and found this to vary depending on the questionnaire studied as well as the outcome measure (either moderate or vigorous activity) [25]. In a Swedish study, Arvidsson et al found estimates to vary between Swedish and Iraqi participants, based on the outcome measure studied [26].
Our finding that compared to Actiheart measures, light intensity PA is underestimated with the SQUASH while moderate and high intensity activities are overestimated by all ethnic groups is consistent with studies among 'general' populations in different settings [6,[27][28][29]. Structured and more vigorous PA can be more easily recalled compared to more vaguely defined light PA like "Standing work, such as cooking, washing the dishes, ironing, feeding or bathing a child". Additionally, the SQUASH was not designed to assess activities with an intensity of MET<2, which may explain the discrepancy with the Actiheart measures of light intensity activity in this study; the Actiheart provides specification for MET levels 1.5. Walking and cycling are measured in three domains by the SQUASH: during commuting; as a leisuretime activity; and, in the case of cycling, as a sport. Participants may inadvertently record these activities at multiple moments while filling in the questionnaire. The questionnaire was designed to be short, thus frequency, duration and intensity of each activity are asked in a single question. This means that participants have to consider the different dimensions of their routine PA simultaneously. Restructuring the questionnaire to take a more step-wise approach might improve reporting of PA, as might explicit inclusion of activities that are more commonly practiced by specific ethnic groups [30]. For example, dancing is popular among African origin Surinamese but may not be considered as a form of exercise by participants when filling in the questionnaire.
Our analysis indicated that the differences between reported and measured PA in the ethnic minority groups compared to Dutch participants were limited in the case of light intensity PA and not statistically significant in the case of moderate/high intensity PA, with the exception of Moroccan women. However, we observed large discrepancies in the categorization of participants in terms of meeting the NNGB, low Cohen's kappa values indicate that the agreement between the two measures of PA was poor, in the Dutch as well as the ethnic minority groups. Our findings imply that the SQUASH does not correctly categorize individuals in terms of their PA level and thus does not provide a valid basis for the comparison of PA within or between different ethnic groups.

Strengths and limitations
In the present study we used a combined accelerometer and heart rate sensor to validate the SQUASH. The Actiheart circumvents the limitations of accelerometers by providing indications of both time spent in physical activity and intensity levels [16]. It has been found to provide accurate measures of walking, running, sedentary, light-and moderate-intensity PAs in adults [16,31,32]. Individual calibration improves energy expenditure estimates and as the Actiheart is worn continuously during the measurement period, including during showering, water sports and sleeping, it provides a more complete picture of PA. However, despite the fact that the Actiheart can measure activity intensity, it does not provide information about the location or purpose of individual activities.
There are also some potential limitations, firstly, due to our study design. PA measures using the Actiheart may not be representative of habitual PA as the devices were worn for a maximum of 5 days during a single period. This decision was based on practical considerations as well as a desire to limit respondent burden. Despite the fact that weekend days were included in the study period, a full week might have better captured habitual PA. Additionally, the SQUASH was completed prior to wearing the Actiheart; differences between the measures may be due to true differences in PA during the week that participants wore the Actiheart, an alternative study design in which participants completed the SQUASH for the week in which they wore the Actiheart may have been preferable. Furthermore, within-individual seasonal differences in PA were not assessed [33]. Secondly, the response rate was low among some ethnic groups which may affect the external validity of our findings; this response rate is similar to what is commonly seen in studies among ethnic minority groups [34]. Education, language and literacy levels are likely to be important confounders, particularly as in this study the SQUASH was self-administered by participants. We did not assess level of literacy, but we found no difference in response on the basis of education level. However, we cannot rule out that language formed a barrier for response among Moroccan participants as we did not provide a translated version of the questionnaire. Thirdly, considerably fewer participants among the ethnic minorities filled in the SQUASH a second time, thus the sample size for the test-retest reliability analyses was poor to fair. Higher reporting of PA at the second time point may be due to participants being made aware of their PA, as a result of taking part in this study. To minimize this potential we gave feedback on PA levels only after full completion of the study.

Conclusions
We found considerable variation in the reliability and validity of the SQUASH with no consistent patterns on the basis of ethnicity. Although the ethnic minority participants were not any worse at reporting PA than were the Dutch origin men and women, our findings imply that the SQUASH does not provide a valid basis for the comparison of PA between different ethnic groups.

Acknowledgments
We are grateful to the participants of the HELIUS study and the management team, research nurses, interviewers, research assistants and other staff who have taken part in gathering the data of this study. In addition, we thank Sanne Schepers for her assistance during the data gathering, Wanda Wendel-Vos (National Institute for Public Health and the Environment) for her advice and the loan of the Actiheart devices. Leo Walsma (Wave Medical BV) for his support with issues relating to the Actiheart. Kate Westgate (MRC Epidemiology Unit, Cambridge) for advice regarding the handling of the Actiheart data.