Developing Item Banks for Measuring Pediatric Generic Health-Related Quality of Life: An Application of the International Classification of Functioning, Disability and Health for Children and Youth and Item Response Theory

The purpose of this study was to develop item banks by linking items from three pediatric health-related quality of life (HRQoL) instruments using a mixed methodology. Secondary data were collected from 469 parents of children aged 8-16 years. The International Classification of Functioning, Disability and Health-Children and Youth (ICF-CY) served as a framework to compare the concepts of items from three HRQoL instruments. The structural validity of the individual domains was examined using confirmatory factor analyses. Samejima's Graded Response Model was used to calibrate items from different instruments. The known-groups validity of each domain was examined using the status of children with special health care needs (CSHCN). Concepts represented by the items in the three instruments were linked to 24 different second-level categories of the ICF-CY. Eight item banks representing eight unidimensional domains were created based on the linkage of the concepts measured by the items of the three instruments to the ICF-CY. The HRQoL results of CSHCN in seven out of eight domains (except personality) were significantly lower compared with children without special health care needs (p<0.05). This study demonstrates a useful approach to compare the item concepts from the three instruments and to generate item banks for a pediatric population.


Introduction
Over the past decade, pediatric research has shifted its attention from advancing treatments and survival rates for children with various diseases and disorders to improving their functional status and health-related quality of life (HRQoL). In parallel to this paradigm shift, the World Health Organization (WHO) elaborated the concept of health by emphasizing its components and determinants, which include body functions, body structures, activities and participation, and environmental and personal factors [1,2]. The use of this bio-psycho-social model helps us understand the psychological, social, and environmental determinants of health outcomes, especially in children and adolescents with various diseases, at different developmental stages and who are often under-served [3][4][5].
Several generic-and disease-specific HRQoL instruments have been developed for use in pediatric populations. The commonly used generic HRQoL instruments are the Child Health and Illness Profile (CHIP) [6], the Child Health Questionnaire (CHQ) [7], the KIDSCREEN-52 [8], the KINDL-R [9], and the Pediatric Quality of Life Inventory (PedsQL) [10]. Ideally, pediatric HRQoL instruments should be brief in content, related to the child's age and developmental stage, and demonstrate good measurement properties such as reliability, validity, and responsiveness [11]. In addition, good generic instruments should be able to assess the HRQoL in children with varying health conditions and across different languages and cultures [12,13]. Using a large sample of children enrolled in the Medicaid program, our previous study based on the classical test theory (CTT) method suggested that none of the existing pediatric HRQoL instruments (the CHIP, KIDSCREEN, KINDL-R, or PedsQL) was superior to any other in the different psychometric properties [14,15]. Although the CTT has often been applied to compare or develop new HRQoL instruments, the use of the CTT alone may neglect item-level information, which might bias the conclusions in comparative effectiveness research and clinical applications [16]. Using qualitative methodologies to compare the heterogeneity of item content, followed by advanced quantitative methodologies (e.g., item response theory; IRT) [16,17] to quantify the measurement properties of individual items for the design of appropriate pediatric instruments, is important [4,18].
The International Classification of Functioning, Disability and Health for Children and Youth (ICF-CY) taxonomy represents an international classification system for pediatric health and disability. The ICF-CY has been used in research and clinical practice to understand the components and determinants of pediatric health outcomes and to help design HRQoL instruments [19]. The ICF-CY can also serve as a framework to compare the contents of items in existing HRQoL instruments, thus providing evidence of content validity [5,20,21]. Although researchers have used the ICF-CY to investigate the contents of items in pediatric HRQoL instruments [4,5,[22][23][24], few studies have used the ICF-CY to conduct head-to-head comparisons among items from different pediatric HRQoL instruments combined with quantitative methods to validate these items and generate item banks for use in pediatric settings [25].
Designing and administering HRQOL items in children must account for several pediatric issues related to age, neurocognitive development, and special health care needs. Although there is a consensus that children as young as 8 years old are able to and should self-report their HRQoL [26,27], the use of parent-proxy reports remains important, especially when the children are mentally disabled, too young, or too sick to self-report [26,28,29]. Parent-proxy reports of a child's HRQoL is of particular importance for children enrolled in public insurance programs such as Medicaid [30] because the parents are responsible for evaluating their child's health outcomes and making decisions for health services. Children from low-income families and those enrolled in Medicaid have a greater likelihood of a poorer HRQoL related to multiple chronic conditions than children of high-income families and those in private health insurance programs [31].
The main purpose of this study was to use a mixed qualitative and quantitative methodology to develop initial items banks on the basis of three most frequently used pediatric legacy HRQoL instruments (KIDSCREEN-52, KINDL-R, and PedsQL). The KIDSCREEN-52 and KINDL-R are among the most widely used pediatric generic HRQoL instruments in Europe, whereas the PedsQL is the most popular generic instrument in the United States. These three instruments capture the common aspects of HRQoL including physical, psychological and social domains, and are suitable for children and adolescents. Our first objective was to apply the ICF-CY framework to compare the contents of items from three pediatric legacy instruments and map these items into the domains suggested by the ICF-CY. The second objective was to apply IRT methodology to develop item banks using items mapped to the same HRQoL domain represented by the ICF-CY. The advantage of the IRT is that it can calibrate items from different instruments on the same metric by capturing comparable underlying constructs of HRQoL. The developed item banks were further validated using the Children with Special Health Care Needs (CSHCNs) Screener [32].

Study sample, inclusion and exclusion criteria, and data collection
The secondary data were analyzed using a 2009 survey comprising parents who had children aged 2-17 years old who were enrolled in Florida's Medicaid program. Only families with 12 months of continuous enrollment in Medicaid were considered eligible for this study. Eligible families were sent a primer letter followed by a phone call for recruitment (n = 5,789). Families with disconnected and non-working numbers were excluded (n = 2,783). A total of 908 parents among the remaining 3,006 eligible participants completed the telephone interview (response rate 30.2%: 908/3006) by trained interviewers using a structured questionnaire including three legacy instruments (KIDSCREEN-52, KINDL-R, and PedsQL) and demographic information. Ageappropriate versions of the instruments were administered to each child and adolescent ( Table 1). The present study analyzed the data of 469 surveys that were collected from the parents of children aged 8-16 years to allow for the comparison of all items across the three instruments.
The University of Florida's Institutional Review Board approved the study protocol. This study is part of the Quality Assurance project of the Florida Health Department for public insurance programs, and the survey was conducted via telephone. Per the University's IRB, we obtained a waiver for collecting written informed consent. Instead, we collected verbal agreement from all study participants over the phone when they were enrolled.

Survey instruments
The three pediatric legacy instruments, the KIDSCREEN-52, KINDL-R, and PedsQL, were chosen for the present study and are given in Table 1. Additionally, the parent participants answered the CSHCN Screener [32] for the analysis of the known-groups validity of the item banks. The domains and total scores were transformed into a 0-100 point scale (100 = best HRQoL and 0 = worst HRQoL). The characteristics of the three HRQoL instruments and the CSHCN Screener are described as follows: The KIDSCREEN-52 is the most commonly administered pediatric HRQoL instrument in Europe [8,33]. The instrument contains 10 domains (52 items) including physical well-being (5 items), psychological well-being (6 items), moods and emotions (7 items), self-perception (5 items), autonomy (5 items), parent relationship and home life (6 items), social support and peers (6 items), social acceptance and bullying (3 items), school environment (6 items), and financial resources (3 items).
The KINDL-R was developed to assess HRQoL in healthy, chronically ill, and acutely ill children [9]. The Kid/Kiddo KINDL was especially designed for older children and adolescents between 8 and 16 years of age. Each version has 6 domains with 4 items per domain. The domains include physical well-being, psychological well-being, self-esteem, family functioning, friends, and school functioning.
The PedsQL 4.0 was developed to assess the WHO's core concept of health (physical, mental, and social functioning) plus school functioning [10]. This instrument contains 23 items measuring the problems associated with performing daily functions. The four domains include physical functioning (8 items), emotional functioning (5 items), social functioning (5 items), and school functioning (5 items).
CSHCN are defined as having a chronic condition (physical, developmental, behavioral, or emotional) and requiring healthrelated services and/or medication. The CSHCN Screener consists of five questions to evaluate the presence and duration of health conditions captured by three domains (dependency on prescription medications, service use above routine levels, and functional limitations). If a parent responds ''yes'' to a health consequence item, two follow-up items are asked to determine if the consequence is due to a medical or health condition. Both follow-up items must be answered ''yes'' to qualify the child as a CSHCN for that domain.

Mapping, linking, and validation methodology
The ICF-CY framework. This study used the ICF-CY as a framework to link the items from the three pediatric HRQoL instruments. The ICF-CY includes four major components, and each has its respective classification codes and categories. The four components are 'body functions', represented as code letter b; 'body structure', represented as code letter s; 'activities and participation', represented as code letter d; and 'environmental factors', represented as code letter e. The numeric codes following these letters represent the chapter number (one digit), the second level (two digits), and the third and fourth levels (one digit each). The letters with the suffix of numeric codes are termed as categories. In this study, the items from the three HRQoL instruments were linked to the second-level categories of the ICF-CY [18], and these categories were used to form the domains of HRQoL and the item banks. The specific steps to map, link, and calibrate the items from the three instruments and to develop and validate the item banks are described as follows: Step 1: Mapping items from the three pediatric HRQoL instruments using the ICF-CY framework. The rules suggested by Cieza et al. [18] were used to link the meaningful concepts of the items in the three HRQoL instruments to the ICF-CY. Prior to linking the items, the meaningful concepts of individual items were extracted by two authors using a data extraction form (see below). Per Cieza et al. [18], when the concept from each item of the three HRQoL instruments could not be linked with a specific ICF-CY category, the item was identified as 'not definable (nd)' [18]. For example, the KINDL item of 'my child worried about his/her future' was assigned 'nd' because it could not be represented by any specific ICF-CY category. Additionally, the abbreviation 'nc' (not covered) was used when the ICF-CY classification did not include a specific concept of the item [18].
In addition to the rules of Cieza et al. [18], other rules for item linkage were applied in this study. If an item from the three HRQoL instruments was linked to more than one category of the ICF-CY, multiple categories were reported for that particular item. However, to create different item banks with each measuring a unidimensional concept of HRQoL, we chose the most relevant ICF-CY category to represent the content of each individual item.
In line with the rules of Cieza et al. [18], the linking procedure was conducted independently by two authors (PG and ICH). To resolve any disagreements, a third rater was consulted, and a final decision based on a consensus among the three raters was made. Items that were linked from different instruments and placed in the same ICF-CY category are supposed to measure the same underlying construct of HRQoL. Subsequently, specific domains were created for these items that capture the same underlying constructs, and individual item banks were developed to represent the specific domains of HRQoL.
Step 2: Psychometric analysis using IRT. Confirmatory factor analyses (CFA) were used to test the structural validity of the individual domains. Specifically, CFA was used to test unidimensionality and local independence, which are two basic assumptions of IRT analysis. For the unidimensionality of individual domains, various fit indices were used to assess the structural validity, including the goodness-of-fit index x 2 (a non-significant chi-square indicates a good model fit) and the root mean square error of approximation (RMSEA) (a value below 0.08 indicates a good model fit, and values below 0.05 indicate a close fit) [34]. Items with acceptable magnitudes of factor loading on the corresponding domains (l.0.4; p,0.05) were considered to be appropriate for the IRT application. Non-significant items and items with a lower factor loading (l,0.4; p.0.05) were removed from the analysis. For local independence, residual correlations of paired items from the same domains were investigated, and one of the paired items with a high residual covariance (.10.0) was considered for removal because both items might measure similar content.
Following the CFA, we used Samejima's Graded Response Model (GRM), a unidimensional IRT model for items with categorical response categories, to test and calibrate item parameter estimates and to calculate domain scores for specific HRQoL domains that were identified in the previous step. We examined different measurement properties to further remove some items from the item banks, including item thresholds and discrimination as well as item and test information functioning (IIF/TIF). Items with discrimination values.1.0, higher or lower threshold values (i.e., able to measure easiest or most challenging underlying HRQoL, respectively), and a higher IIF were considered for retention. Additionally, standardized local dependence (LD) x 2 statistics were examined to test local independence between paired items in a specific domain [35]. Items that failed to satisfy more than one of the criteria described above were deleted from the item banks. Various fit indices were adopted to assess the appropriateness of individual domains, including marginal reliability estimates ($0.60 as acceptable), the M 2 statistic [36,37], and RMSEA (,0.08 as a good model fit for unidimensionality and ,0.05 as a close fit).
Step 3: Validation analysis using the known-groups approach. The CSHCN Screener [32] [40] was used for the IRT analysis, and the SAS 9.1 software [41] was used for the remaining analyses.   Step 1: Mapping items from the three pediatric HRQoL instruments using the ICF-CY framework. Table 3 displays an overview regarding the mapping of the concepts of specific items to the second-level categories of the ICF-CY. The concepts represented by the items were linked to 24 different second-level ICF-CY categories, including nine categories in the body functions component, 13 categories in the activities and participation component, and two categories in the environmental factor component. None of the items from the three HRQoL instruments were assigned to the body structures component, and three items each were not definable or not covered. Meaningful concepts in items from the KIDSCREEN-52, KINDL-R, and PedsQL were almost equally represented by the body functions categories: six, four, and five second-level categories, respectively. In contrast, meaningful concepts of items from the PedsQL were greatly represented by the activities and participation categories: 10 second-level categories for the PedsQL compared with six and five second-level categories for KINDL-R and KIDSCREEN-52, respectively. The environmental factor component was not well represented by any of the three instruments as only one item each from the KIDSCREEN-52 and PedsQL was linked to a single category of the environmental factor component.

Description of sample
Appendix S1 shows the detailed mapping results between the concepts of individual items from the three instruments and the specific codes for the ICF-CY's body functions, activity and participation, and environmental factor components. This specifically informs how each item is represented by the ICF-CY components and categories. A higher representation for the items from the HRQoL instrument to a specific ICF-CY category suggests that the same concept was captured by potentially redundant items from this specific instrument. Items from the three instruments that were linked to the same ICF-CY category represented the same underlying construct, and a specific domain comprising items measuring this construct was created. As a result, 10 initial HRQoL domains were created based on the criterion of all the items in that domain that potentially measured the same underlying construct and had a minimum of four items per domain, including personality, emotion, mobility, energy, social function, task accomplishment, family function, school function, cognition, and experience of self (Table 3 and Appendix S2). However, two of the ten domains (cognition and experience of self) were not considered further as they contained only a few items measuring those domains. Three of the six items measuring the experience of self domain were deleted because they had factor loadings ,0.4 (see Step 2 below and Appendix S2). The remaining three items were considered too few to measure the experience of self. Four out of the five items from the cognition domain were assigned to different ICF-CY categories (b140, b144, b160, and b164), suggesting that these items might not measure the same underlying construct (Appendix S2).
Step 2: Psychometric analysis using IRT. CFA was performed to test the structural validity of the individual HRQoL domains with the goal of retaining items to meet the assumptions of unidimensionality and local independence to conduct the IRT analysis (Table 3 and Appendix S2). In the CFA, items that were significantly associated with the corresponding domains with factor loadings (l.0.4; p,0.05) and residual covariance (,10.0) were considered for retention ( Table 3). As a result, a total of six items were deleted due to either a low factor loading (,0.4) or a high residual covariance (.10.0), including one item in the mobility domain, three items in the social function domain, and two items in the task accomplishment domain (Appendix S2).
Item parameter estimates from the GRM for eight specific HRQoL domains are presented in Table 4. Items that did not satisfy one of the criteria explained in the Methods section were excluded from the analysis (Appendix S2). For example, item 23 of the PedsQL (name: Pedsql23) was deleted because of a poor item fit and a high correlation (.10.0) with item 22 from the PedsQL (name: Pedsql22). A total of 21 items were deleted across different domains due to poor item fit, high correlation, and/or poor discrimination value (Appendix S2). Overall, we did not consider 50 items to be assigned to any of the eight domains, leaving 49 items in the eight item banks that measure eight HRQoL domains (Table 3, Table 4, and Appendix S2). After deleting specific items across different domains, the RMSEA values for all eight domains suggested a good model fit (,0.08), and the marginal reliability of the eight domains ranged from 0.63-0.87. The majority of the threshold parameters (representing item difficulty) corresponding to the response categories of each item in a specific domain had negative values, suggesting that the majority of items captured lower levels of underlying HRQoL (Table 4).
Step 3: Validation analysis using known-groups approach. Table 5 shows known-groups validity related to CSHCN for the eight specific domains. Overall, the eight domains were able to distinguish the underlying HRQoL between children with and without special health care needs. Bivariate analyses suggested that the HRQoL domain scores for seven domains (with the exception of personality) in the CSHCN were significantly

Discussion
This study shows that the contents of items from the three pediatric HRQoL legacy instruments were well represented by the ICF-CY categories. Specifically, compared with the KIDSC-REEN-52 and KINDL, the PedsQL was found to better represent the activity and participation components of the ICF-CY. The KINDL-R and PedsQL corresponded less with respect to the environmental factors, which is consistent with a previous study that linked items of these instruments to the ICF-CY categories [5]. However, we only linked one item in the KIDSCREEN-52 to a single category of the environmental factor component, which contrasts with the results of a previous study showing that 24% of the items in the KIDSCREEN-52 were able to link to six different categories in the environmental factor component [5]. This discrepancy might have arisen because the latter study linked each item to more than one concept, resulting in the total number of concepts exceeding the number of items in the instrument. However, because our purpose was to develop different item banks with each capturing a unidimensional concept of HRQoL, we chose the most relevant ICF-CY category to represent the content of an individual item. Since our selection criteria for linkage methodology was different from the previous studies [4,5,[22][23][24], our findings do not allow for recommending which pediatric HRQOL instrument should be chosen. Instead, we argue that different pediatric HRQOL instruments contain items of different measurement properties and the inclusion of various items from different instruments will strengthen item banks with robust measurement properties.
This study uses the rules recommended by Cieza et al. [18] to link items from the three pediatric HRQoL instruments to the specific ICF-CY categories. Consequently, eight specific unidimensional domains emerged, which provides a foundation for developing item banks to measure pediatric HRQoL. CFA were used to test for structural validity, and IRT was used to calibrate items from different instruments on the same metric and to calculate domain scores for individuals. Items were deleted from the item banks if they were not represented by the appropriate ICF-CY categories or if they demonstrated poor performance on the basis of psychometric properties. This combined use of qualitative and quantitative approaches allowed for the concomitant establishment of the content validity of individual domains and the confirmation of content validity using IRT. This strategy directly led to robust item banks that measure unidimensional HRQoL constructs. We identified several items represented by the same ICF-CY category, indicating an overlap in the concepts across different pediatric HRQoL instruments. Our linkage also identified some items that were not included in the item banks as the ICF-CY unfortunately does not represent meaningful concepts for these items. Future studies might replicate our approach to explore the potential similarities or discrepancies in these findings and expand our item banks based on the ICF-CY framework to link existing items from other pediatric HRQoL instruments or to add novel items.
The linking process was conducted based on a parent-proxy report rather than on a child self-report. The use of a parent versus a child version of the instruments might not result in discrepant item mapping results because the contents in the child and parent versions are almost the same. However, data collected from a parent-proxy versus a child self-report might lead to different quantitative results in terms of construct validity and item parameters because parents and children possess different perceptions with regard to interpreting and answering pediatric HRQoL items [42]. A greater discrepancy between proxy-and self-reports has been found in the more abstract domains (e.g., emotional functioning) compared with the less abstract domains (e.g., physical functioning) [43].
We found that the mean underlying HRQoL scores for seven of the eight domains (except personality) were significantly lower (p, 0.05) in the CSHCN compared to the children without special health care needs in the bivariate and multivariate analyses ( Table 5). These findings suggest a high level capacity to discriminate and differentiate CSHCN with respect to different emotional (emotional function and social function) and activity and participation (mobility, energy, task accomplishment, family function, and school function) domains compared with children without special health care needs. The ES for the underlying HRQoL scores was the highest for the emotional function domain, followed by the energy domain. This finding echoes our previous study suggesting that CSHCN require more physical, developmental, and emotional support than children without special health care needs [44]. Not surprisingly, children with and without special health care needs had similar personality domain scores because personality is a trait that is less likely to be related to different levels of special health care needs. The linkage of ICF-CY categories to the concepts measured by the items in the three pediatric HRQoL instruments helped with the development of the item banks, which in turn assists clinicians in using the measurement tools to evaluate the comparative effectiveness of different interventions [3,45]. However, there is limited evidence of the ability of HRQoL tools to measure the impact of environmental and personal factors on a child's health status. The ICF-CY framework provides an important foundation for creating the unidimensional item banks. Through the ICF-CY framework we were able to identify items from different pediatric HRQoL instruments that specifically capture life experiences, activities and participation appropriate for the child's age and developmental stage. The broader perspective embedded in the ICF-CY framework provides a valuable opportunity to develop measures to capture factors influencing physical and emotional functioning and activity and participation, further facilitating the ability of researchers and clinicians to design appropriate interventions targeting these modifiable factors [3].
The ICF framework was primarily developed to measure disability, functional status, and social participation rather than quality of life [46]. The components of the ICF are more objective (e.g., ability to perform specific functioning) than subjective (e.g., satisfaction with health). In this regard, the linkage exercise using the ICF-CY framework might not have distinguishable concepts for some items that measure specific pediatric HRQoL. The concepts for some categories in the ICF-CY are general and thus susceptible to a broader description of the meaningful concepts for the items. For example, the ICF-CY body function category b152 (emotional functions) can be explained by several aspects regarding emotions such as sadness, laughter, and fear. In turn, this general nature resulted in linking a specific ICF-CY category to several items from the pediatric HRQoL instruments.
Several limitations should be noted when interpreting the findings of this study. First, the generalizability of the findings is limited due to the use of a Medicaid population from Florida. Second, the linkage was conducted based on the perspective of the investigators and did not include the perspectives of the parents and children themselves. Indeed, investigators, parents and children may interpret the meaning of items differently [47]. With an emphasis on a patient-centeredness approach, future studies might consider engaging parents and children as stakeholders alongside researchers in the item mapping process to strengthen the content validity and develop a robust methodology for the synergy of findings from different stakeholders. Third, the CSHCN status reported by parents was used for evaluating the known-groups validity. This information may result in varying outcomes compared with the use of categorical approaches, such as disease diagnoses. Finally, the survey response rate (30.2%) was lower than that of previous studies (usually 60%) focusing on Medicaid pediatric populations [48][49][50]. However, responders and nonresponders did not differ significantly on children's age and sex (p.0.05). The lower response rate is in part due to the inclusion of a lengthy survey (approximately 50 minutes per survey) that precludes subjects from the study participation. Although the lower response rate may threaten the generalizability of our findings, this population is important to assess because they were below 100% of the federal poverty level and possess greater risk of poor health status due to poor socioeconomic circumstances.

Conclusions
The ICF-CY serves as a useful framework to compare the concepts of items from the three pediatric HRQoL legacy instruments and to generate item banks for a pediatric population. This study provides useful insights regarding the content coverage of items from three instruments represented by the ICF-CY framework. This study has implications for researchers to refine pediatric HRQoL instruments and for clinicians to use these instruments in clinical practice.