Reproducibility and Relative Validity of a Food Frequency Questionnaire Developed for Female Adolescents in Suihua, North China

Background This study aims to evaluate the reproducibility and validity of a food frequency questionnaire (FFQ) developed for female adolescents in the Suihua area of North China. The FFQ was evaluated against the average of 24-hour dietary recalls (24-HRs). Methodology/Principal Findings A total of 168 female adolescents aged 12 to 18 completed nine three consecutive 24-HRs (one three consecutive 24 HRs per month) and two FFQs over nine months. The reproducibility of the FFQ was estimated using intraclass correlation coefficients (ICCs), and its relative validity was assessed by comparing it with the 24-HRs. The mean values of the 24-HRs were lower than those of the FFQs, except for protein (in FFQ1) and iron (in FFQ2). The ICCs for all nutrients and food groups in FFQ1 and FFQ2 were moderately correlated (0.4–0.8). However, all the ICCs decreased after adjusting for energy. The weighted κ statistic showed moderate agreement (0.40–0.6) for all nutrients and food groups, except for niacin and calcium, which showed poor agreement (0.35). The relative validity results indicate that the crude Spearman's correlation coefficients of FFQ1 and the 24-HRs ranged from 0.41 (for Vitamin C) to 0.65 (for fruit). The coefficients of each nutrient and food group in FFQ2 and the 24-HRs were higher than those in FFQ1 and the 24-HRs, indicating good correlation. Although all energy-adjusted Spearman's correlation coefficients were lower than the crude coefficients, de-attenuation to correct for intra-individual variability improved the correlation coefficients. The weighted κ coefficients of nutrients and food groups ranged from 0.32 for beans to 0.52 for riboflavin in FFQ1 and the 24-HRs, and 0.32 for Vitamin C to 0.54 for riboflavin in FFQ2 and the 24-HRs. Conclusion The FFQ developed for female adolescents in the Suihua area is a reliable and valid instrument for ranking individuals within this study.


Introduction
Dietary assessment methods should be valid and reliable, and should not impose any burdens on participants and research staff [1]. Among the various instruments designed to assess nutrient and food intake, the food frequency questionnaire (FFQ) is used to estimate individual perception of standard food intake over a defined period (e.g., a year or several months) [2,3]. Compared with traditional dietary assessment methods, such as the 24-hour dietary recall  and dietary records, the FFQ is less expensive and less difficult to accomplish, and easier to administer and evaluate despite the crude information provided [4,5].
FFQs are widely used throughout the world for epidemiology. Its use for adolescents has increased, but most have been designed for and tested among adolescents in Western countries [6][7][8][9]. Only a few validation studies on other ethnic groups have been conducted. All FFQs should be tailored to the population of the target market because of the vast difference in food items that contribute to the daily supply of nutrients, as well as the wide range of dietary habits among populations. Food intake also largely varies depending on the ethnic, social, and cultural background of the study population [10].
Suihua is located northeast of China, where the economic condition is not as thriving as that in the southern part of the country. Some chronic nutrition-related diseases, such as iron deficiency or iron deficiency anemia (IDA), are prevalent [11]. Thus, the relationship between diet and anemia was investigated among adolescent girls in Suihua. A new FFQ was designed for adolescent girls, with the specific aim of selecting food groups that provide major contributions to the intake of iron and other nutrients. This study describes the reproducibility of this FFQ and its relative validity against the 24-HRs in female adolescents.

Ethics Statement
This study was approved by the Research Ethics Committee of Harbin Medical University. Written informed consent was obtained from either the participants or their parents (for participants below 18 years old) before they were enrolled in the study.

Study population
Female adolescents aged 12-18 years were enrolled in the study using a multi-level cluster sampling method. The participants were recruited from three middle schools in the Suihua district of Heilongjiang Province in China. From 200 potential participants, the final sample was narrowed down to 168 subjects. Study participants who did not satisfactorily complete the FFQs (n = 18) or had more than three missing out of the three consecutive 24-HRs (n = 14) were excluded.

Study design
The study began in March 2009 and continued for the next nine months. Three consecutive 24-HRs were collected from each participant every month. The first FFQ (FFQ1) was administered during the first three consecutive 24-HR, while the second FFQ (FFQ2) was administered in November 2009 during the last three consecutive 24-HR. The study design is shown in Figure 1.

FFQ and 24-HR
The FFQ was developed in accordance with the methodology proposed by Willett [1]. Revisions ensured that the list of foods reflected the Chinese diet indicated in the 2002 National Health and Dietary Survey [12]. Additional revisions improved estimates of the intake of iron-rich food. Animal food was classified as red meat (livestock meat, animal liver, and blood), white meat (poultry and seafood), eggs, and milk. Food from plants was classified as cereal, vegetable, fruit, and soybean (Table 1). Consumption information across 8 food groups and 86 food items was collected. The respondents were requested to recall the average consumption of a given type of food item during the previous period through a graded scale, with seven levels ranging from never to $3 times a day. For seasonal fruits and vegetables, participants were asked to indicate how often these foods were eaten during the season. Portion size was also considered in the survey. Each subject was asked to choose from a set of colored photographs showing different-sized portions of 23 specific food items. Photographs of dishes, bowls, and cups were used to represent the average portion size for the other 63 food items. All FFQ data were double entered, and discrepancies were resolved by referring to the original forms. A nutrient calculator software (Fei Hua V2.3,  Institute for Nutrition and Food Security, Chinese Center for  Disease Control and Prevention), based on China Food Composition Tables [13], was used by a registered dietician in calculating the daily intakes of calories and nutrients.
Two dieticians visited each subject nine times to obtain a complete set of data for the three consecutive 24-HR. The interviews were meal sequence-based and covered a detailed assessment and description of the food consumed. The subjects were required to qualitatively and quantitatively describe all food consumed during the previous day by choosing the correct pictures to enable the dietician to estimate food intake. The 24-HR interviews were conducted face-to-face in a classroom on the evening of each visit. The dieticians were instructed to ask the participants key questions for a better qualitative description of food items (e.g., green leafy vegetables, livestock meat, animal liver, blood, etc.). We calculated daily mean intakes of energy, 13 nutrients, and 8 food groups, estimated by nine three consecutive 24-HRs. The mean 24-HR data were used as the standard to measure the relative validity of the FFQ.

Statistical Analysis
Means and standard deviations (SDs) for energy, nutrients, and food intake were determined for the FFQs and nine three consecutive 24-HRs. The residual method was employed to exclude the possibility of variation due to energy intake (1) . The reproducibility between the first and second FFQ administrations was estimated using the Wilcoxon signed rank test, intraclass correlation coefficient (ICC), weighted kappa (k) statistic, and misclassification (quartiles method) analyses [14]. The validity of FFQ relative to the 24-HR was assessed by Wilcoxon signed rank test, Spearman's correlation, weighted kappa (k) statistic, and misclassification (quartiles method) analyses. To take into account within-person variations caused by day-to-day fluctuations and seasonal variations, we ''de-attenuated'' the Spearman's correlation coefficients using the within-and between-person components of variation from the 24-HRs [15]. The Kruskal-Wallis rank sum test was chosen for multiple comparisons of heme iron/all iron intake ratios on different measured methods.
All statistical analyses were processed using SPSS (Version 13.0) and Excel 2003. Unless otherwise stated, a P value ,0.05 was considered significant.

Results
The mean age (SD) and body mass index of the participants were 16.1 (1.3) years and 25.9 (4.5) kg/m 2 , respectively.
The mean intakes of energy, 13 nutrients, and 8 food groups were calculated using the questionnaires as bases and the nutrient calculator software. The comparison of data from the participants is shown in Table 2.
All the mean values of the 24-HRs were lower than those of the FFQs, except for protein (FFQ1) and iron (FFQ2). In FFQ1, the decrements of several nutrients (fiber, niacin, Vitamin C, and calcium) were statistically significant (P,0.05). Compared with those of the 24-HRs, the mean values of FFQ2 were significantly higher for fiber, Vitamin E, and soybeans (P,0.05). The mean values of FFQ2 were higher than those of FFQ1 for Vitamin C (P,0.05). Table 2 shows that the average daily intake of iron was 24.88 mg as calculated using the 24-HRs, 25.42 mg using FFQ1, and 22.9 mg using FFQ2. However, the nutrition calculation software showed that heme iron was lower at only   17.3%, accounting for all iron intake in the 24-HRs. Heme iron intake was only 21.1% in FFQ1 and 19.9% in FFQ2. The Kruskal-Wallis rank sum test was the method chosen for multiple comparisons between heme iron and all iron intake ratios; this approach is the most basic compared with different measurement methods. The results showed no significant differences (P.0.05).
The crude-and energy-adjusted ICCs in FFQ1 and FFQ2 were calculated to assess the reproducibility of the FFQ (Table 3). All nutrients and foods were moderately correlated (0.4-0.8). After adjusting for energy, all the ICCs decreased. The average crude ICC was 0.59 out of a range of 0.43 (fiber) to 0.74 (riboflavin), while the average energy-adjusted ICC was 0.38 out of a range of 0.32 (Vitamin C) to 0.47 (carbohydrates).
The degree of misclassification associated with the categorized intakes that were assessed using the FFQs was examined as the proportion of participants were classified into same, adjacent, and opposite quartile ( Table 4). The proportion of subjects classified within one quartile (in the same and adjacent categories) by both FFQs ranged from 70.8% (for retinol) to 92.9% (for eggs). Extreme misclassification into opposite quartiles was observed for all nutrients and food groups less than 8%. Higher values were observed for retinol and niacin at 7.8 and 6.0%, respectively. The weighted k statistic showed moderate conformity, ranging from 0.40 to 0.6 for all nutrients and food groups, except niacin and calcium, which fared poorly at 0.35.
The crude, energy-adjusted, and de-attenuated Spearman's correlation coefficients of the FFQs (FFQ1, FFQ2, and averaged FFQ) and the averages resulting from the nine three consecutive 24-HRs are presented in Table 5. These values enable the assessment of the relative validity of the FFQ. The crude Spearman's correlation coefficients of FFQ1 and the 24-HRs ranged from 0.41 (for Vitamin C) to 0.65 (for fruit) with a mean value of 0.54. The energy-adjusted coefficients ranged from 0.32 (for Vitamin E) to 0.46 (for riboflavin) with a mean of 0.39, while the de-attenuated coefficients ranged from 0.44 (for fat and Vitamin C) to 0.66 (for milk) with a mean of 0.51. The coefficients for each nutrient and food group in FFQ2 and the 24-HRs were higher than those in FFQ1 and the 24-HRs. All crude coefficients, whether those of FFQ1 and the 24-HRs or FFQ2 and the 24-HRs, were greater than 0.4, indicating good correlation. All energyadjusted Spearman's correlation coefficients were lower than crude coefficients in terms of ICCs in FFQ1 and FFQ2. However, de-attenuation to correct for intra-individual variability improved the Spearman's correlation coefficients and led to changes in mean values in FFQ1 and the 24-HRs (0.52 to 0.56) and in FFQ2 and the 24-HRs (0.58 to 0.60).
The classification in quartiles (Table 6) yielded similar results for both FFQs with an average of more than 80% of the subjects classified in the same or adjacent quintiles. The weighted k coefficients for nutrients and food groups of the FFQs and the 24-HRs are also shown in this table. FFQ1 values ranged from 0.32 Table 4. Comparison and weighted k of adjusted daily mean intakes of food groups and nutrients based on FFQs. (for beans) to 0.52 (for riboflavin), with an average of 0.42. The weighted k value for FFQ2 also ranged from 0.32 (for Vitamin C) to 0.54 (for riboflavin), with an average value of 0.43. The lowest value for the averaged FFQ was 0.32 (for Vitamin C) and the highest was 0.54 (for both riboflavin and calcium), with an average value of 0.43.

Discussion
The number of food items listed in FFQs tends to vary in terms of importance. A review suggested that the number of items listed ranges from 5 to 350 [16]. The FFQ used in this study, which is composed of 86 food items, is considered to have an optimal number of items. With regard to time frame, varying time intervals between FFQ1 and FFQ2, from 15 days [17] to several years [18], have been reported in previous studies. Reproducibility tests are based on the assumption that diet does not change between two questionnaires; thus, reproducibility may ideally be obtained by two closely administered questionnaires [19][20][21][22]. In this case, however, subjects would likely remember and repeat their responses. In the present study, FFQ administration was repeated after a nine-month interval because the diet of individuals in Suihua is almost unchanged from March to November because of factors related to the local climate. The repeat administration can reduce the daily and seasonal variations in the study population. However, the time reference can reflect changes in intake caused by seasonality, which may have occurred in this study, possibly lowering true correlations especially for nutrients in which fruits and vegetables are the main source (e.g., Vitamin C and fiber).
In previous studies, the ICCs for nutrient intake generally ranged from 0.4 to 0.8 and 0.3 to 0.8 per food group [23][24][25][26]. The reproducibility of our instrument is similar because moderate correlations for all nutrients and food groups (0.43 to 0.74 for nutrients and 0.56 to 0.67 for food groups) were obtained. The ICCs ranged from 0.32 to 0.54 for nutrients and 0.37 to 0.51 for food groups after adjustments for energy. In our study, energy adjustment did not improve the correlations for nutrients and food items. According to Willett [1], energy adjustment increases correlation coefficients when the variability of nutrient consumption is related to energy intake. However, correlation coefficients decrease when the variability of nutrient consumption depends on systematic errors of overestimation and underestimation. In the present study, the lower correlations may be explained by an increase in correlated measurement error as a consequence of controlling for total energy intake. In addition, the comparison of average daily food and nutrient intakes derived from FFQ1 and FFQ2, based on joint classification by quartiles and the findings from the weighted k statistic, showed poor agreement for niacin and calcium (0.35). This result can be partially attributed to the indeterminate description of food intake; unclear descriptions may influence nutrient contents. The nine-month time interval is another possible factor because observed reproducibility may be lower than the true value as differences in responses may reflect true changes in dietary habits as well as variations in responses [27]. A major component of the validation process is selecting the appropriate reference method by which to assess the test measurement. No gold standard exists for dietary intake measurement, but it is crucial for the errors of both the methods used in the current study to be as independent of each other as possible. In a review on the validation of FFQs, the authors showed that 75% of the studies validated the FFQs against repeated 24-HRs [16]. Correlations ranging from 0.45 to 0.70 were evident among validation studies on dietary questionnaires [28,29]. The results of an evaluation of relative validity depend on several factors, including choice of reference method, degree of homogeneity of the intake values within the population, recall period, and the number of days of record collection [30]. In our study, relative validity was tested by comparing FFQ1 and FFQ2 with the average of the nine three consecutive 24-HRs. Good correlations were obtained, especially for the second questionnaire, in which the recalls were generally better. This result may have been caused by a learning effect in the second questionnaire [31]. The reference method used in our study was the average of the nine three consecutive 24-HRs over nine months. Our study population was a group of female adolescents with similar lifestyles (students in three middle schools who often dine in the school cafeteria), which may have contributed to the moderate correlations. The ability to understand abstract concepts and form mental images of one's diet that are as close as possible to actual situations [32] is another crucial factor in producing reliable estimates of habitual intake using the FFQ. Colored photographs of food taken in classrooms were used as a picture-sorting technique designed to make the task less tedious and encourage the interviewees to provide more accurate dietary information [33].
In this study, the average nutrient intake values were lower for the 24-HR data compared with the average values of the FFQs. Most daily intake values of nutrients from different measures are similar to the standards indicated in the Dietary Guidelines for the Chinese population [34], except for calcium, which was lower than the recommended criterion (1,000 mg per day for female adolescents). Although the daily average intake of iron from the FFQs and the 24-HRs satisfied the Dietary Guidelines, the nutrition calculation software showed that the intakes of heme iron Table 6. Comparison and weighted k of adjusted daily mean intakes of food groups and nutrients based on the average of 24-HRs. were still lower in Suihua female adolescents. Thus, this result can also explain the high prevalence of IDA in the Chinese population [12]. The results also showed an excessive intake of fat and Vitamin E, and an insufficient intake of vegetables, meat (red meat, poultry, fish, and shrimp), and milk. Although this result may be inaccurate, it may serve as reference for the evaluation of the habitual food intake of female adolescents.
The first limitation of this study is that the food groups designed did not include beverages, which may influence energy intake. Typically, the beverage consumption of Chinese female adolescents, especially girls living in a poverty-stricken, cold area like Suihua, is much lower than that of Western females. Another limitation is the small sample used. However, every step of the sampling was randomly performed so that the participants were representative of all teenage girls in the district.
In summary, our dietary assessment is both reproducible and valid, with the observed correlations similar to those reported by other cohort studies. This study suggests that the FFQ can measure the standard intake of major nutrients for female adolescents living in Suihua, North China. Whether this instrument can assess relationships between diet and disease in Chinese adolescent girls will be addressed in future work.

Author Contributions
Conceived and designed the experiments: LW WX. Performed the experiments: WX CS LZ XZ JW HW. Analyzed the data: WX CS LZ. Contributed reagents/materials/analysis tools: LW. Wrote the paper: WX LW.