Child-Report Measures of Occupational Performance: A Systematic Review

Introduction Improving occupational performance is a key service of occupational therapists and client-centred approach to care is central to clinical practice. As such it is important to comprehensively evaluate the quality of psychometric properties reported across measures of occupational performance; in order to guide assessment and treatment planning. Objective To systematically review the literature on the psychometric properties of child-report measures of occupational performance for children ages 2–18 years. Methods A systematic search of the following six electronic databases was conducted: CINAHL; PsycINFO; EMBASE; PubMed; the Health and Psychosocial Instruments (HAPI) database; and Google Scholar. The quality of the studies was evaluated against the COSMIN taxonomy of measurement properties and the overall quality of psychometric properties was evaluated using pre-set psychometric criteria. Results Fifteen articles and one manual were reviewed to assess the psychometric properties of the six measures–the PEGS, MMD, CAPE, PAC, COSA, and OSA- which met the inclusion criteria. Most of the measures had conducted good quality studies to evaluate the psychometric properties of measures (PEGS, CAPE, PAC, OSA); however, the quality of the studies for two of these measures was relatively weak (MMD, COSA). When integrating the quality of the psychometric properties of the measures with the quality of the studies, the PAC stood out as having superior psychometric qualities. Conclusions The overall quality of the psychometric properties of most measures was limited. There is a need for continuing research into the psychometric properties of child-report measures of occupational performance, and to revise and improve the psychometric properties of existing measures.


Introduction
Improving occupational performance is a key service of occupational therapists and clientcentred approach to care is central to clinical practice. As such it is important to comprehensively evaluate the quality of psychometric properties reported across measures of occupational performance; in order to guide assessment and treatment planning.

Objective
To systematically review the literature on the psychometric properties of child-report measures of occupational performance for children ages 2-18 years.

Methods
A systematic search of the following six electronic databases was conducted: CINAHL; Psy-cINFO; EMBASE; PubMed; the Health and Psychosocial Instruments (HAPI) database; and Google Scholar. The quality of the studies was evaluated against the COSMIN taxonomy of measurement properties and the overall quality of psychometric properties was evaluated using pre-set psychometric criteria.

Results
Fifteen articles and one manual were reviewed to assess the psychometric properties of the six measures-the PEGS, MMD, CAPE, PAC, COSA, and OSA-which met the inclusion criteria. Most of the measures had conducted good quality studies to evaluate the psychometric properties of measures (PEGS, CAPE, PAC, OSA); however, the quality of the studies for two of these measures was relatively weak (MMD, COSA). When integrating the quality of the psychometric properties of the measures with the quality of the studies, the PAC stood out as having superior psychometric qualities.
Introduction vary in what aspect of occupational performance they measure, specifically "capacity" versus "actual" performance [2]. However, it is important to note that there are difficulties in accurately determining "true" performance due to over-or under-estimation of performance by the individual. Subsequently, systematic reviews of available measures are integral to guiding selection of the most appropriate available self-report measures.

Assessments of Occupational Performance and Children
Commonly, childhood assessments of occupational performance rely on developmental tests that are based on the assumption that normalising processes, such as occupational performance, are integral to achieving better functioning [11]. The tools employed in the clientcentred approach for occupational performance assessments should capture children's perceptions of their strengths and capabilities of their daily activities, rather than solely focusing on impairment [12,13]. Occupational performance assessments should include identification of the child's occupations, what occupations are motivating and important, and the compatibility between characteristics of the child and their environment to create successful occupational performance [12,13].
There can be challenges when adopting a client-centred approach for assessment of occupational performance [3]. There are differing views around who should be the focus of assessment when the client is a child, as children are commonly in environments where the standards and expectations are set by others (e.g., at school by teachers) [3]. Therefore, there is uncertainty over how the child will be able to determine his or her needs and goals relating to occupational performance. As a consequence, many "self-report" instruments measuring occupational performance of children are in fact teacher-or parent-reports [14]. There is a need, however, to incorporate measures that are child-based, in order to gather data that is meaningful to the child. Additionally, occupational therapists and other allied health professionals have a duty to choose self-report measures that have established validity, reliability and clinical utility in order to inform holistic interventions for the child [13,15].
Participation and Activity. Under the International Classification of Functioning (ICF), Disability and Health, activity is defined as the 'execution of a task or action by an individual', while participation refers more broadly to 'involvement in a life situation' [29]. The promotion of health and wellbeing by enabling participation in occupations within a rehabilitation context is a typical goal in occupational therapy; participation measures are therefore commonly used when assessing occupational performance. Many systematic reviews of diagnostic groups have focused on measures of participation domains, commonly following the International Classification of Functioning, Disability and Health-Children and Youth (ICF-CY) domains of activity and participation, such as learning and applying knowledge, general tasks and demands, mobility, self-care and major life areas [30]. [19] reported in their systematic review that measures covered all ICF-CY domains of participation and activity for children with ABI, and that self-care in particular was covered well. The authors concluded that the occupational therapy assessments were more holistic in occupational performance, unlike medical assessments which were commonly related with bodily functions [19].
Self-care. A 2012 review focused on self-care as a specific domain of participation and activity. Ireland and Johnston (21) [21] systematically reviewed the validity, clinical utility, and reliability of measures which evaluated the self-care skills of children (0-12 years) with the congenital musculoskeletal condition of osteochondrodysplasia. The authors found that the available measures (Functional Independence Measure for Children (WeeFIM), the Activities Scale for Kids (ASK), and the Pediatric Evaluation of Disability Inventory (PEDI) ranged from adequate to excellent in reliability, and that there was evidence of validity, for the particular diagnostic group. This review indicates that assessments of self-care which employ these measures can give, in the least, an adequate understanding of children from this diagnostic groups' level of self-care.
Occupational Performance. Some systematic reviews have focused on the broader domain of occupational performance. For example, Parker and Sykes (31) [31] systematically assessed studies that examined the effects of outcome measures of the Canadian Occupational Performance Measure (COPM) on clinical occupational therapy practice using thematic analysis. The authors concluded that the COPM had the greatest impact within clinical practice, and that further research into other clinical areas as well as the need for more training in using the COPM as an outcome measure was needed [31]. Whilst this review is important in terms of clinical implications, the analysis is qualitative, and thus, does not shed light on the psychometric properties of the COPM.
These reviews have focused on the relevance of instruments measuring occupational performance in the context of these specific diagnostic groups and their needs. However, there is a notable lack of systematic reviews that focus on self-report measures used to assess occupational performance in children [32].

Study Aim
There is still a paucity of systematic reviews on child-report measures of occupational performance despite it being a key service for occupational therapists. Subsequently, the purpose of this systematic review is to identify instruments that measure occupational performance in children through child-report methods, and to appraise the psychometric properties of these measures. This systematic review focuses on the psychometric properties of instruments used by occupational therapists for samples of children 2-18 years, written in English. The COSMIN taxonomy of measurement properties and definitions for health-related patient-reported outcomes was used to evaluate each instrument in the domains of reliability, validity and responsiveness [33]. COSMIN aims to improve the selection of health measurement instruments by providing a checklist on methodological qualities of the tools [34]. Consideration of responsiveness, the ability of a measure to detect change in a construct over time [34], was deemed to be outside the scope of this review. Evaluating responsiveness as a psychometric property involves assessing all articles that used the included assessments as outcome measures. Given that including responsiveness would increase the size of this systematic review exponentially, we are of the opinion that an investigation of this property warrants a separate and more detailed systematic review. It is expected that this systematic review will assist in the choice of instruments measuring occupational performance, by providing an objective account of the advantages and disadvantages of self-report measures available for children.

Methods
Methodology and writing of the systematic review was guided by the use of the PRISMA statement [35]. The PRISMA statement is a checklist comprised of 27 item areas that are considered to be crucial for ensuring transparency when conducting systematic reviews. Please refer to S1 Table for the completed PRISMA checklist for the current review.

Eligibility Criteria
Studies deemed eligible for inclusion included both research articles and published manuals that detailed the psychometric properties of instruments designed to measure the occupational performance of children. Occupational performance assessments should include identification of a child's occupations, what occupations are motivating and important, and how the characteristics of that person combine with the environment in which the occupation occurs to create successful occupational performance [10,12,36]. Within this search, self-report instruments that measured skills and behaviours relating to occupational performance in children (2-18 years) were included. To be included, abstracts and instruments additionally needed to be primarily designed for use with children, be used by occupational therapists, and written in English. Articles and instruments were excluded if infants or adults were included in the sample, the selected instrument was not the main focus (i.e., another instrument was using the selected instrument for construct validity), and the full text was unable to be retrieved. Dissertations and conference papers were excluded.

Information Sources
A preliminary systematic literature search was performed on 25 th April, 2014 by two authors using the following four electronic databases: Embase; PubMed; PsycINFO; and CINAHL. Both subject headings and free text were used when searching each database. Date restrictions were imposed in free text searching. See Table 1 for a complete list of search terms used across all searches. A second literature search on 7 th July 2014 using the title of the instrument and its acronym was conducted in CINAHL, PsycInfo, EMBASE, and PubMed (see Table 1 for search terms) to identify psychometric articles. Additionally, manuals for the instruments were retrieved for appraisal. A third literature search in Google Scholar and EBSCO Host's Health and Psychological Instruments (HAPI) database using the title of the instrument and its acronym was conducted by two research assistants from 11 th November 2014 to 17 th November 2014. The aim of the search in Google Scholar was to identify any recently published articles (Publication year: 2013-2014). The HAPI database was used for more specific searching and results (see Table 1 for search terms used).

Study Selection
All abstracts were rated by a reviewer on the following inclusion criteria: the measure had to assess occupational performance; abstracts had to contain a child-based tool; the main target group of the instrument was children; and it needed to be used by occupational therapists. The names of instruments were retrieved from the identified abstracts. A flowchart depicting this process is shown in Fig 1. To determine the inter-rater reliability scores between both reviewers, a random sample of 40% of the abstracts was used. The interrater reliability between raters were deemed acceptable: Weighted Kappa = 0.72.

Data Collection Process and Data Extraction
First, data from studies and manuals for the development and validation of occupational performance assessment instruments were extracted under the following descriptive categories: study design, purpose of the study, study population, age of the population, and instrument  PsycINFO: (measurement) OR (Psychological assessment)) or (Cognitive assessment)) OR (Questionnaires)) OR (Neuropsychological assessment)) OR (Testing)) OR (Testing methods)) OR (Rating scales)) OR (Screening OR Screening tests) OR (Treatment outcomes)) OR (Evaluation) AND (Occupational therapy) English language; Human, Preschool age (2-5 years), School Age (6-12 years), Adolescence  characteristics. Additionally, the COSMIN [33] criteria were used to assess the methodological quality of the studies.

Methodological Quality
The COSMIN taxonomy of measurement properties and definitions for health-related patientreported outcomes were used to assess the methodological quality of the included studies [33,34]. The COSMIN checklist contains nine domains: internal consistency, reliability (relative measures: including test-retest reliability, inter-rater reliability and intra-rater reliability), measurement error (absolute measures), content validity (including face validity), structural validity, hypotheses testing, cross-cultural validity, criterion validity, and responsiveness. As interpretability was not considered to be a psychometric property, it was not included in this review. Definitions of each of the measurement properties of the COSMIN are shown in Table 2.

English Language; Human
PsycINFO: (Psychometrics/ OR statistical reliability/ OR statistical validity/ OR "error of measurement"/) AND ("Name of instrument" OR "Acronym of instrument)) English language; Human Embase: (Validation study/ OR validity/ OR psychometry/ OR reliability/ OR measurement accuracy/ OR measurement error/ OR measurement precision/OR measurement repeatability/) AND ("Name of instrument" OR "Acronym of instrument)) English language; Human Google Scholar: ("Psychometrics" OR "Measurement" OR "Test Construction" AND "instrument name (ACRONYM)" AND "child report") (reliability" OR "psychometry" OR "validity" OR "validation study" OR "instrument validation" AND " instrument name (ACRONYM)" AND "child report") Publication year: 2013-2014 Third search: Pyschometric articles Database and Search Terms using Subject Headings Limitations HAPI: ("Psychometrics" OR "Measurement" OR "Test Construction" AND "instrument name (ACRONYM)" AND "child report") (reliability" OR "psychometry" OR "validity" OR "validation study" OR "instrument validation" AND " instrument name (ACRONYM)" AND "child report") None doi:10.1371/journal.pone.0147751.t001 methods proposed by Terwee et al. [37]. Items are rated on a 4-point scale (excellent, good, fair, and poor), with an overall methodological quality score for each psychometric property calculated from the lowest-rated item in each domain. However, strict adherence to this rating system appears to inhibit differentiation between more subtle psychometric qualities of assessments [38], and therefore for this review a revised scoring was introduced [39]. The outcome was presented as a percentage of rating (Poor = 0-25.0%, Fair = 25.1%-50.0%, Good = 50.1%-75.0%, Excellent = 75.1%-100.0%). To take account of those COSMIN items that do not have all the four response options available, the following formula was used to calculate the total score for each psychometric property in order to most accurately capture the quality of the psychometric properties: Total score for psychometric property ¼ ðTotal score obtained À minimal possible scoreÞ Ä ðMaximum score possible À minimum score possibleÞ Â 100 For example, the COSMIN psychometric property content validity has 5 items to be scored. The following scores are allocated: poor = 1, fair = 2, good = 3 and excellent = 4. For 1 of the 5 items, poor as a rating is not possible, but all items have excellent as a rating option. Thus the minimum score possible for this property is 6 (4 items x score of 1 + 1 item x score of 2 = 6) and the maximum possible score is 20 (5 items x score of 4). So for example if a measure received a score of 15 for content validity, the total score for the psychometric property was calculated as (15-6)/ (20-6) x 100 = 64.3%, which classifies content validity as having good quality.
To ensure consistency on the COSMIN checklist ratings, training of an additional rater was completed by the second author who has extensive experience in the area and also was one of the raters. Both authors scored all the papers; consensus was reached where there were differences in ratings. The first author helped resolved differences in ratings where consensus could not be reached between the two raters.
Once the quality of the studies that examined the psychometric properties were assessed using the COSMIN system, the actual quality of the psychometric properties of the measures reported were evaluated using criteria set out by Terwee et al. [40]. Table 3 provides a summary   Table 2. COSMIN: Definitions of domains, psychometric properties, and aspects of psychometric properties for Health-Related Patient-Reported Outcomes (adapted from Mokkink, Terwee [42]).

Psychometric property
Domain: Definition a Validity: the extent to which an instrument measures the construct/s it claims to measure.

Content validity
The degree that the content of an instrument adequately reflects the construct to be measured.

Face validity b
The degree to which instrument (items) appear to be an adequate reflection of the construct to be measured.

Construct validity
The extent to which the scores of an instrument are consistent with hypotheses, based on the assumption that the instrument is a valid measure of the construct being measured.
Structural validity c The extent to which instrument scores adequately reflect the dimensionality of the construct to be measured.
Hypothesis testing c Item construct validity.
Cross-cultural validity c The extent that performance of the items on a translated or culturally adapted instrument adequately replicates the performance of the items of the original version of the instrument.

Criterion validity
The degree to which the scores of an instrument satisfactorily reflect a "gold standard".

Responsiveness
Responsiveness: the capability of an HR-PRO instrument to detect change in the construct to be measured over time.
Interpretability d Interpretability a : the extent to which qualitative meaning can be given to an instrument's quantitative scores or score change.
Reliability: the extent to which the measure is free from measurement error.

Internal consistency
The level of correlation amongst items.

Reliability
The proportion of total variance in the measurements due to "true" differences amongst patients.

Measurement error
The error of a patient's score, systematic and random, not attributed to true changes in the construct measured.
Notes.  Table 3. Criteria of psychometric quality rating (adapted from Terwee et al. [40] of the criteria for rating the psychometric quality of the measures. Studies that received a poor COSMIN rating were excluded from further analysis and was awarded a score of NE (Not evaluated). Finally an overall quality score for each measurement property for all assessments was determined using the criteria introduced by Schellingerhout et al. [41], which integrates the scores from the COSMIN ratings with the psychometric quality ratings by Terwee et al. [40], thus generating an overall quality rating.

Data Items, Risk of Bias and Synthesis of Results
All data items for each measure were obtained. 'NR' was recorded for items that were not reported. Inclusion of 'methodological limitations items' during the rating of the COSMIN checklist enabled assessment of risk of bias at an individual study level. The results were extracted and grouped under the following headers: 1) purpose of instrument, 2) year published, and 3) the instrument characteristics.

Systematic Literature Search
Following the removal of duplicate abstracts across six databases, a total of 79 measures were reviewed. Of these 79 measures, 73 were excluded for the following reasons: they were not measures of occupational performance (n = 64) and they were not self-report measures by children (n = 9). Thus, 6 measures met the inclusion criteria. Systematic searches across six databases retrieved 1,766 article abstracts, which were screened for inclusion in this review. Of these articles, 21 full-text articles were assessed for eligibility; 3 articles were excluded as the psychometric properties could not be rated and 3 were excluded as adults were included in the  1 for full details). One manual was located through the secondary (using the name of the identified measures) and tertiary searches (using Google Scholar and HAPI databases-see Table 1 and Fig 1). In summary, the psychometric properties were obtained for a total of 6 occupational performance measures, which were assessed through 15 articles and 1 manual.

Included Occupational Performance Measures
The characteristics of the included measures are reported in Table 4. All of the 6 measures were published after 2003. Regarding the purpose of the instrument, 4 measures are used to evaluate children's perceptions of their competence in performing the activities. The remaining measures, the CAPE and the PAC, are used to identify the participation patterns, perception of enjoyment and preferences in leisure and recreation activities from children's own perspectives. All of the measures use a Likert response scale as response options to evaluate the perception and preferences. The CAPE additionally reported the use of a dichotomous (i.e., yes or no) rating system and the use of categorical scales for participation patterns. Information on the development and validation of the 6 included occupational performance measures is reported in Table 5.
All measures demonstrated some evidence of development and validation although a few included relatively small sample sizes. Of the 6 measures, 4 were developed using children with and without disabilities, 1 using children with disabilities only, and 1 using typically developing children. With regard to the age of participants, 2 measures were developed with children up to 12 years of age (i.e., PEGS and MMD) and the rest with both children and adolescents (6-18 years). Table 6 summarises the quality ratings of the psychometric studies of all 6 measures as evaluated against the COSMIN quality criteria. Hypothesis testing was the most frequently reported property; all 6 measures had study ratings ranging from fair to excellent quality. This was followed by cross-cultural validity; ratings across the 5 measures ranged from poor to excellent quality. Conversely, no measure reported criterion validity. The ratings of the quality of the studies of the 4 measures reporting on internal consistency and reliability ranged from fair to excellent quality. Structural validity was reported by 3 measures with ratings of either fair or good quality. The ratings of the studies of the 2 measures reporting on measurement error was of excellent quality and content validity ranged from poor to excellent quality. Table 7 summarises the quality of the psychometric properties of the 6 measures based on the quality criteria described by Terwee et al. [40] (see Table 3). Table 8 provides an overall psychometric quality rating for each of the psychometric properties using the criteria from Schellingerhout et al. [41] (a description of the criteria is provided at the bottom of Table 8). This overall level of evidence score is derived by integrating: 1) the methodological quality of the studies that evaluated the psychometric properties of measures using the COSMIN checklist (Table 6), and 2) the quality criteria for psychometric properties of assessments (Table 7).

Discussion
The purpose of this systematic review was to identify and evaluate the quality of psychometric properties of child-report instruments developed to measure the occupational performance of children. We identified 6 child-report instruments that evaluated a component of occupational performance of children between the ages of 2 and 18 years. Additionally, we systematically searched for and retrieved 15 articles and 1 manual detailing the psychometric properties of   [43] To translate, adapt, and assess a Swedish-language version of the PEGS  Nordtorp et al. [49] To examine the test-retest reliability, measurement error, and internal To assess the structural validity of the COSA; Hypothesis testing to test the external validity of the COSA N = 502; Child clients of occupational therapist and physical therapist researchers and clinicians internationally Total sample: R = 6-17y 10m, M = 11y 11.7m, SD = 2 y10.4m Romero-Ayuso & Kramer [55] To assess the internal consistency and cross-cultural validity of the Spanish version of COSA for children with ADHD Hypothesis testing to test whether COSA is an appropriate measure for children with ADHD the included instruments. This systematic review of child-report measures provides a concise summary of the current selection of psychometric properties of these measures. The COSMIN framework was employed to guide a comprehensive summary of the psychometric properties of six instruments [34]. The application of the COSMIN checklist-based taxonomy allowed for a critical evaluation of the quality and extent of psychometric evidence of the 15 research articles and 1manual on the 6 child-report occupational performance instruments [33,34]. Responsiveness was outside the scope of the current review.

Quality of the Studies using the COSMIN Taxonomy
The COSMIN checklist provides information about the quality of the studies that examined the measures' psychometric properties [33,34]. In regards to reliability, internal consistency was detailed for half of the measures (CAPE, PAC, COSA), whilst four of the measures detailed reliability testing (CAPE, PAC, PEGS, OSA). This review indicated good to excellent study quality for both internal consistency as well as reliability for the majority of the measures, except for the MMD and COSA (one study) which received a fair rating for internal consistency.
Only two of the six measures (CAPE, PAC) reported measurement error. Both these measures received an excellent score for study quality. Consequently, the assessment of study quality of reliability for both the CAPE and the PAC is comprehensive, as these measures included psychometric properties for internal consistency, reliability, and measurement error; no other measures included properties for these three elements. Considering the lack of all three psychometric properties being reported, a true indication of overall reliability for four measurements (PEGS, MMD, COSA, OSA) is not possible. Consideration of the measurement error is essential when selecting outcome measures for a study, as low error allows the measure to be used to detect smaller treatment effects. A low measurement error in relation to its minimal important change (MIC) means that clinical trials require smaller sample sizes than measures where the opposite applies [58]. Therefore, future studies of the PEGS, MMD, COSA, and OSA should attempt to gain a more comprehensive picture of the psychometric properties relating to reliability by including assessment of internal consistency, reliability and measurement error.
Within the COSMIN taxonomy, construct validity consists of content validity, structural validity and hypothesis testing [33,34]. Detailing these components of construct validity is important, and a lack of reporting can have implications in clinical practice. For instance,  Systematic Review of Child-Report Occupational Performance Measures when a scale or measure is used without the documented measurement properties (such as construct validity) potential negative consequences can occur, such as an error in clinical judgment or the inaccurate interpretation of assessment results by practitioners. It is crucial that practitioners are able to investigate how well a measure assesses what it claims, as well as how well it holds its meaning across varied contexts and sample groups for confident use within clinical settings. The PAC was the only incorporated measure to include all three elements of construct validity, with ratings of study quality ranging between good and excellent quality found for each element. The PEGS, MMD, COSA, and OSA did not provide any evidence of content validity, highlighting a need for further research of the psychometric properties of these instruments. This review also revealed that the PEGS, CAPE, and MMD did not have any published information in the domain of structural validity; emphasising again a need for further research. For hypothesis testing, the majority of the measures (PEGS, CAPE, PAC, OSA) provided evidence of conducting studies at a good or excellent level of quality, whilst the COSA and MMD had evidence of studies of poor or fair level of quality. Taken together, the COSA and the Cross-cultural validity was reported for all measures except the MMD with variability in quality of studies ranging from poor (COSA) to excellent (PEGS). This indicates that four of the measures were adequately (PEGS, CAPE, PAC) translated or culturally adapted from the original version, whilst the translated or culturally adapted COSA and OSA were limited in their study quality. It is important to note that none of the six measures reported criterion validity. Thus, comparisons between these measures and a "gold standard" measure of occupational performance could not be made. However, as there is no widely accepted gold standard of assessment for the occupational performance of children, it is no surprise that we were unable to recover evidence of criterion validity.

Overall Quality of Psychometric Properties
Varying results were found for overall quality of measurement properties using the level of evidence criteria by Schellingerhout et al. [41]. The occupational performance self-report measure with the most robust psychometric properties to date was the PAC, given that 7 of the 8 psychometric properties were evaluated, with overall quality ratings of moderate to strong with positive results for four psychometric properties. Evidence for reliability was however found to be conflicting between studies, with strong negative results for measurement error. CAPE had five scores of moderate to strong quality, however produced negative ratings for its reliability and measurement error. The measures with the least evidence in terms of sound psychometric properties was the PEGS and MMD with ratings of indeterminate, limited poor and strong negative results. Interestingly, despite the studies investigating the reliability of the PEGS and CAPE rated as strong for methodological quality, negative results were found in terms of psychometric quality criteria. The same pattern was evident for measurement error for the CAPE and PAC. Four of the six assessments have psychometric properties of indeterminate overall quality due to not reporting on statistical analyses, such as factor analysis or having a doubtful design. Both the COSA and OSA assessments were rated as only having one psychometric property with moderate positive evidence, with both receiving ratings of indeterminate or conflicting levels of evidence. These findings highlight the need for further, rigorous testing of the properties of these measures before they may be deemed as being psychometrically sound.
The results of the current systematic review also revealed that a number of child-report measures of occupational measures were validated with modest sample sizes and/or developed with small sample sizes (< 300 children). For example, the MMD was developed and validated using a total sample size of 62 children [8]. The other measure developed and/or validated with small sample sizes was the PEGS [43,44]. Validation studies which use a limited sample size are not reliable for reaching conclusions about the psychometric properties of a measure, as the small number of participants may not be generalisable to a wider population. This can result in ill-informed clinical assessment. Thus, future studies of the COSA, MMD, and PEGS using large numbers in a normative sample are needed in order to increase the generalisability of the results of these measures to the general population. This will allow clinicians to make better informed assessments of children's occupational performance.

Limitations
Whilst this systematic review aimed to be rigorous, there were a few limitations. Information published in languages other than English were not included, thus, some relevant findings regarding the psychometric properties of child-report occupational performance may have been excluded. Furthermore, we did not contact all authors who published research on the psychometric properties of occupational performance measures directly, so some information may have been overlooked. Evaluating the quality of responsiveness as a psychometric property was outside the scope of this systematic review. Future studies could assess the responsiveness of child-report measures to change in occupational performance.

Conclusion
As occupational performance is central to the practice of occupational therapy, it is important to use sound measures in practice in order to provide measures with excellent psychometric quality to accurately assess and treat clients. The current systematic review reported the results of 15 studies and one manual reporting on evidence of the psychometric properties of six child-report measures of occupational performance for children. In order to consistently rate the reliability and validity information reported about the measures, the COSMIN taxonomy was used. Whilst the majority of instruments had conducted good quality studies to evaluate the psychometric properties of measures (PEGS, CAPE, PAC, OSA), the quality of the studies for two of these measures was relatively weak (MMD, COSA). When integrating the quality of the reported psychometric properties with the quality of the studies, only the PAC stood out as having superior psychometric qualities. These findings are concerning given that these measures are used routinely in clinical practice in the assessment and treatment of children. Thus, this review highlights the need for more research examining the psychometric properties of child-report measures of occupational performance, and an improvement of the psychometric properties of existing measures using sound techniques.
Supporting Information S1 Table. PRISMA checklist for the current review.