Evaluating the Psychometric Quality of Social Skills Measures: A Systematic Review

Introduction Impairments in social functioning are associated with an array of adverse outcomes. Social skills measures are commonly used by health professionals to assess and plan the treatment of social skills difficulties. There is a need to comprehensively evaluate the quality of psychometric properties reported across these measures to guide assessment and treatment planning. Objective To conduct a systematic review of the literature on the psychometric properties of social skills and behaviours measures for both children and adults. Methods A systematic search was performed using four electronic databases: CINAHL, PsycINFO, Embase and Pubmed; the Health and Psychosocial Instruments database; and grey literature using PsycExtra and Google Scholar. The psychometric properties of the social skills measures were evaluated against the COSMIN taxonomy of measurement properties using pre-set psychometric criteria. Results Thirty-Six studies and nine manuals were included to assess the psychometric properties of thirteen social skills measures that met the inclusion criteria. Most measures obtained excellent overall methodological quality scores for internal consistency and reliability. However, eight measures did not report measurement error, nine measures did not report cross-cultural validity and eleven measures did not report criterion validity. Conclusions The overall quality of the psychometric properties of most measures was satisfactory. The SSBS-2, HCSBS and PKBS-2 were the three measures with the most robust evidence of sound psychometric quality in at least seven of the eight psychometric properties that were appraised. A universal working definition of social functioning as an overarching construct is recommended. There is a need for ongoing research in the area of the psychometric properties of social skills and behaviours instruments.


Introduction
Impairments in social functioning are associated with an array of adverse outcomes. Social skills measures are commonly used by health professionals to assess and plan the treatment of social skills difficulties. There is a need to comprehensively evaluate the quality of psychometric properties reported across these measures to guide assessment and treatment planning.

Objective
To conduct a systematic review of the literature on the psychometric properties of social skills and behaviours measures for both children and adults.

Methods
A systematic search was performed using four electronic databases: CINAHL, PsycINFO, Embase and Pubmed; the Health and Psychosocial Instruments database; and grey literature using PsycExtra and Google Scholar. The psychometric properties of the social skills measures were evaluated against the COSMIN taxonomy of measurement properties using pre-set psychometric criteria.

Results
Thirty-Six studies and nine manuals were included to assess the psychometric properties of thirteen social skills measures that met the inclusion criteria. Most measures obtained excellent overall methodological quality scores for internal consistency and reliability. However, eight measures did not report measurement error, nine measures did not report crosscultural validity and eleven measures did not report criterion validity.

Social Functioning
Most theorists agree that social functioning is a complex construct that encompasses social skills as well as social behaviour and cognition during inter-personal interactions [1]. Social functioning involves the integration of emotional, linguistic and cognitive skills, which develop from early childhood to adolescence [1]. Social functioning is foundational for the development and maintenance of meaningful relationships and community participation and is critical for both physical health and psychological well-being [2]. Impairments in social functioning manifest in approximately one in every ten children [3], with greater levels of social impairment reported in developmental disabilities [4][5][6][7][8][9]. Impairments in social functioning are associated with an array of adverse outcomes in adolescence and adulthood, such as delinquency [10], social withdrawal and isolation [11].

Theoretical Frameworks on Social Functioning
Theoretical models of social functioning are most commonly embedded within psychology; with more recent models emerging from neuroscience [1]. These models are commonly embedded within a social information processing (SIP) framework and are focused on infant development and the cognitive processes needed for social skills in adults [12]. Researchers have extended SIP to include both cognitive and affective dimensions. In their extensions of SIP, Mostow et al. [13] and Guralnick [14] included emotional, cognitive and behavioural predictors of peer related social competence.
While few comprehensive theoretical models exist, analogous perspectives and definitions are evident throughout literature [1,[15][16][17][18]. There appears to be consensus among theorists that social functioning is an overarching construct which is reliant on a range of cognitive, emotional and linguistic skills; reflecting a person's overall performance in the area of social development [1,19,20].
Cognitive functions, social emotional and linguistic skills. In their model of socio-cognitive integration of abilities, Beauchamp and Anderson [1] considered social skills and functions from a range of perspectives, integrating them into a model of social competence. The socio-cognitive integration of abilities model defines the core dimensions of social functioning (biological-psychological-social) and their interactions within a developmental framework founded on empirical research and clinical principles.
Cognitive and executive functions have been central to most models of social functioning [1]. In most models, cognitive function is used to reflect a range of higher cognitive processes, such as: attentional control (e.g., selective and sustained attention, response inhibition, selfmonitoring, self-regulation) and skills linked to executive functioning (e.g., working memory, planning, problem-solving, strategic behaviour). Researchers have linked deficits in these skills to poor social outcomes, including: antisocial behaviour, emotional dysregulation, delinquency and peer-rejection [21][22][23].
Another principal component of most social functioning models is socio-emotional skills. These skills have been reported to include: face-emotion perception, theory of mind, and empathy. Face-emotion perception is fundamental to recognising emotion, which is needed for reciprocal social interactions [24,25]. Theory of mind is a social cognitive skill and involves understanding the emotion, intention and perceptions of others and how the knowledge or beliefs of someone else may differ from one's own [26]. Empathy involves identifying the emotional state of another, the capacity to take the perspective or role of the other, and the evocation of shared affective responses. Empathy is associated with pro-social behaviour and comprises both affective and cognitive components [27][28][29].
There is an array of literature highlighting the influence of communication skills on social functioning. However, linguistic skills are infrequently incorporated into models of social functioning [1,30]. In particular, pragmatic language skills have been described as fundamental to social functioning as they are needed to: a) integrate verbal and non-verbal communication; b) detect and interpret underlying meaning in social cues; c) respond appropriately during social interactions; and d) regulate emotions [30][31][32].
Approaches to social development and social competence. Some theorists view social functioning from a hierarchical perspective with 'social skills,' and 'social cognition' representing different levels of social behaviour under the auspice of social functioning. However, other theorists use the constructs 'social competence', 'social skills' and 'social functioning' interchangeably. In developmental literature, there is general consensus that children may exhibit externalising or internalising behaviour when social competence is lacking [33]. Social competence has often been conceptualised to reflect effectiveness or success in social interactions. However, there is a noteworthy lack of agreement on the nature of its relationship to social functioning (if viewed as a separate construct) and how to define, measure and approach the skills attributable to social competence [33].
Researchers have adopted four approaches to social functioning and competence in the field of social development: 1) a social skills approach, focusing on specific skills and pro-social behaviours; 2) a peer status approach, focusing on sociometric status and peer acceptance or rejection; 3) a relationship approach, based on one's ability to form and maintain friendships and positive relationships with parents, teachers and intimate relationships in adulthood; and 4) an adaptive approach, that takes into account the need for individuals to adjust their social interactions and behaviour across a variety of contexts and types of social situations [16,34]. Despite these divergent approaches to social functioning, there is consensus among theorists regarding social functioning. There is also agreement among theorists that these skills and behaviours are mediated by a range of internal and external factors.
Mediating factors of social development and social functioning. Much research has focused on factors that mediate social functioning. Mediating factors have often been categorised as external factors or internal factors which influence a person's natural predisposition when interacting with others [1]. Internal factors include brain development and integrity, personality and temperament and are often conceptualised within the domain of cognitive skills. External factors have been described to comprise environmental influences, such as: family factors [35], parent behaviours [36], socioeconomic status (SES) [37] and culture [38]. Finally there is agreement that the mediating effect of the stated internal and external factors facilitate the product of either well-developed social functioning or result in maladaptive social behaviours [1,19]. Surprisingly, little is known about the contextual factors and contexts within which social interaction occur, which may either inhibit or facilitate social functioning.
The influence of contextual factors on social functioning. The contexts within which the social interaction occurs (e.g., school, home, work) may also influence the way in which the individual interacts with others, as well as the nature and quality of social interactions [39,40]. Despite research having a strong focus on adaptive behaviour [16], there has been limited research into the influence of contextual factors on a person's social functioning; perhaps one of the most neglected areas in contemporary conceptual models [1,33].

Assessments of Social Functioning
In recognition of the importance of social functioning, the assessment and treatment of social difficulties has become a focus of research over past decades; particularly in the area of developmental disabilities [19,41,42]. Given the array of skills subsumed within the domain of social functioning, some measures attempt to broadly address these different domains, while others specifically target the assessment of a sub-set of skills [19].
The assessments examining social functioning are mostly in the form of pen-and-paper questionnaires with the majority of identified assessments relying on child self-report, while some revert to using peers, parents and teachers in proxy reporting [19]. Reliance on self-report alone is problematic due to: poor construct and/or criterion validity (low correlations with other assessments), susceptibility to social desirability, and reliance on a child's ability to comply with instructions [19,43]. Further, one-time parent and teacher ratings have been described as inadequate methods of measurement, as they solely rely on the objectiveness of the rater and perform poorly in capturing behavioural changes over time, across contexts and between different interactants [19,44].
In addition to the form of the assessment, it is important for the measure to be based on a sound theoretical model and for the underlying psychometric properties of the assessment to be evaluated [1]. Measures can have different prognostic and/or analytical functions. Measures can be prognostically used to: a) predict a later outcome; b) determine suitability for a particular intervention; c) report on the responsiveness to a particular intervention; or d) determine the amount of intervention required (dosage) [45]. Measures may also be used analytically to: a) explain or understand the contexts; b) classify or identify subgroups of patients; c) allow exploration of relationship between factors; d) detect within subject change or between subgroup differences; and e) enable comparison of patients to other population subgroups or norms [45]. If an outcome measure is used to evaluate changes in clients over time following a particular intervention, the quality of responsiveness becomes important. Conversely if the measure is used as a screening measure to accurately diagnose the presence or absence of a condition, identification accuracy and therefore the interpretability of the measure is of primary concern as it indicates the overall precision of making a diagnosis [46].

Reviews of Social Functioning Measures
In response to the plethora of assessments measuring social functioning, four reviews aiming to provide an overview of these measures have been conducted [9,19,47,48]. While these reviews provided valuable information on some of the many assessments focusing on social functioning, they have several limitations. Two of the earlier reviews by Demaray et al. [47] and Merrell [48] only evaluated a small number of measures and the review by Matson and Wilkins [9] did not adopt a systematic design [19]. However, a more recent systematic review by Crowe et al. [19] built on the review by Matson and Wilkins [9]. Crowe et al. [19] evaluated 86 assessments under the broad domain of social functioning. The review focused on: fairly recently published assessments (i.e., 1988-2010), whether psychometric properties were reported and the popularity of the measures (i.e., number of citations) [19]. Therein lies the main limitation of the Crowe et al. [19] review in that they only reported on whether and if authors reported their measure's psychometric properties, but the review did not report on all psychometric properties. Moreover, the review lacked rigor in evaluating the quality of the psychometric properties of the measures in a systematic and uniform manner.

Limitations of Current Assessments Measuring Social Functioning
Across the literature, several limitations to current social functioning assessments have been identified. These limitations include discrepancies regarding the definition of social functioning and subsequently a lack of connection between measures to a theoretical model [19,49]. We add two further limitations to current social functioning assessments: 1) a lack of observational assessments, and 2) lack of uniform reporting on the psychometric properties of these assessments.
Definitions of social functioning. Many assessments lack a clear connection to a theoretical framework and clear definitions of the domains of social functioning being measured. Under the umbrella term social functioning are the following constructs which are often used interchangeably: pro-social behaviour [50], social adjustment [12], social cognition [51], social competence [52], social outcomes [53] and social skills [54]. The discrepancies present several challenges to scientific literature; including comparisons across studies, evaluation of the quality of assessments being used and most importantly, barriers to determining the effectiveness of treatments aiming to ameliorate difficulties surrounding social functioning [1,49].
Lack of observational assessments. There is a near complete absence of well validated measures that assess social functioning through direct observation. Observational measures may provide numerous benefits including a social and ecologically valid approach, whereby individuals can be assessed performing the skills within the contexts they experience the difficulties. However, the few observational assessments that currently exist are study-specific, unpublished or were modified from adult or aggression measures [19].
Uniform reporting of psychometric properties. Evident within this area of research are lack of uniform reporting on both the description and psychometric properties of such measures. As noted by Crowe and colleagues [19], the literature on the psychometric properties of assessments of social functioning is continually being updated. Further reviews are needed to update and provide a comprehensive review into the psychometric properties of the available assessments.

Study Aim
The purpose of this systematic review was twofold. Firstly, this review aimed to provide an overview of information on existing assessments that measure areas of social functioning across the lifespan; highlighting current gaps in the age, type, or context in which assessments can be administered. Within the area of social functioning, we focused the review to assessments of social skills and social behaviour.
Secondly, within this review, a central aim was to comprehensively evaluate the quality of psychometric properties reported across these assessments. To guide this aim, we used the COSMIN taxonomy of measurement properties and definitions for health-related patientreported outcomes [55].

Methods
The PRISMA statement was used to guide the methodology and reporting of this systematic review. The PRISMA statement checklist contains a total of 27 item areas that are deemed Systematic Review of Social Skills Measures' Psychometrics essential for the transparent reporting of systematic reviews [56]. A completed PRISMA checklist applicable to the current review is accessible (see Table in S1 Table).

Eligibility Criteria
Eligibility criteria for studies in this review included research articles or published manuals on the psychometric properties of instruments designed to measure the social skills and behaviours of the general population. We adopted the following, widely used definition of social skills to guide our review, which comprises both skills and behavioural elements that result in positive social interactions, encompassing: 1) cooperation, 2) verbal and non-verbal communication, 3) engagement and participation, 4) empathy, and 5) self-regulation and adaptive behaviours in situations where interpersonal interaction occurs [16,57]. Within this search, instruments measuring these skills and behaviours in both children and adults were included. For instruments to be included in this review, their main components or subscales needed to meet the definition we adopted of social skills. Instruments or published articles written in languages other than English were not eligible. As we were interested in evaluating the quality of psychometric properties of contemporary measures being used in recent research, instruments were excluded if they were published before 1994. For the purpose of this review, instruments that had an update of their psychometric properties in the last 20 years at the time of the search were regarded as contemporary. Instruments were further excluded if they were developed for a specific target population (e.g., autism spectrum disorders) rather than a normative sample. Articles were excluded if no psychometric properties were reported. Conference abstracts, reviews, case reports, student dissertations and editorials were also excluded.

Information Sources
A systematic literature review was performed using four electronic databases: CINAHL, Psy-cINFO, Embase and Medline. Furthermore, Ovid's Health and Psychosocial Instruments (HAPI) database was used to identify potential instruments that met the inclusion criteria. The HAPI database provides access to information on instruments relevant to health related disciplines, including the fields of social sciences, organisational behaviour, and library and information sciences. Database searches were conducted between 3/05/2014 and the 15/05/2014. Search strategies included both free text words and subject headings (see Table 1), and comprised all journal articles up to May 2014. The second author conducted the searches because of her expertise in conducting systematic reviews. The databases were accessed from the libraries of Curtin University and James Cook University.
We searched for grey literature using Google Scholar and PsycEXTRA. PsycEXTRA is the American Psychological Association's grey literature database which accompanies the Psy-cINFO database. It combines bibliographic records with full-text professional and lay-audience literature in the behavioural and social sciences. To be comprehensive, we also searched the websites of three major publishers of assessments in social sciences (Pearson, Acer and Western Psychological Services) to identify potential assessment not identified in earlier search strategies.
heading, free text and limitations are reported in Table 1. Reference lists of the included articles were searched for additional literature.
The HAPI database identified 22 instruments that potentially met the inclusion criteria; thus warranting further scrutiny. Search of grey literature identified an additional 25 records;

Study Selection
Two independent abstract reviewers rated the abstracts on the following inclusion criteria: abstracts had to describe an instrument or outcome measure; address its psychometric measurement properties; and assess social skills and behaviours. A random sample of 40% of the abstracts was examined to determine the inter-rater reliability: Weighted Kappa = 0.79 (95% CI 0.72-0.86). To ensure all instruments measured social skills and/or social behaviour, two doctoral candidates with knowledge in the area of social skills reviewed the instruments together. At this level, the abstracts of the articles and descriptions of the assessment were located to ensure the instrument met the adopted definition of social skills [16,57].

Data Collection Process / Data Extraction
To capture the data contained within the included studies and manuals (45) we used the Cochrane Handbook for Systematic Reviews section 7.3a [59], and the Systematic Reviews Centre for Reviews and Dissemination [60]. Data were extracted under the following headings: study design, purpose of the study, study population, age of the population, and instrument characteristics. To both capture the data and assess the methodological quality of the data the COSMIN was used [55].

Methodological Quality
The psychometric quality of the included instruments were then analysed using the COSMIN taxonomy of measurement properties and definitions for health-related patient-reported outcomes [61]. The COSMIN checklist [55] is a standardised tool for assessing the methodological quality of studies on measurement properties and consists of nine domains: internal consistency, reliability (relative measures: including test-retest reliability, inter-rater reliability and intra-rater reliability), measurement error (absolute measures), content validity (including face validity), structural validity, hypotheses testing, cross-cultural validity, and criterion validity. Responsiveness as a psychometric property was not evaluated in this review. Definitions of all the psychometric properties, as defined in the COSMIN statement, are provided in Table 2.
Interpretability is not considered to be a psychometric property under the COSMIN framework and was therefore not described in this review. Each domain of the COSMIN checklist includes 5 to 18 items focussing on different aspects of study design and statistical analyses. Terwee et al. [62] proposed using a 4-point rating scale per item (excellent, good, fair, and poor), obtaining an overall methodological quality score per psychometric property by taking the lowest rating of any item in the corresponding domain. As this rating system appears to be so severe that it inhibits differentiation between more subtle psychometric qualities of instruments [63], a revised scoring was introduced. The outcome was presented as percentage of rating (Poor = 0-25.0%, Fair = 25.1%-50.0%, Good = 50.1%-75.0%, Excellent = 75.1%-100.0%). Given that some COSMIN items only have excellent and good as an option for rating, we calculated the total score for each psychometric property using the following formula to most accurately capture the quality of the psychometric properties: Total score for psychometric property ¼ ðTotal score obtained À minimum score possibleÞ ðMax score possible À minimum score possibleÞ X 100 To ensure consistency of the COSMIN checklist ratings, the first author trained two independent doctoral candidates to complete the COSMIN checklist. To ensure accuracy, the two raters completed COSMIN checklists together for 10 of the 13 instruments.

Data Items, Risk of Bias and Synthesis of Results
All data items for each instrument were obtained. When an item was not reported, an 'NR' was recorded. Risk of bias was assessed at an individual study level during the rating of the COS-MIN checklist through the inclusion of 'methodological limitations items'. The results were synthesised and grouped as follows: 1) development and validation of the instrument, 2) the psychometric properties of the instruments, and 3) the instrument characteristics.

Systematic Literature Search
After the removal of duplicate abstracts across the four databases, a total of 1,897 studies were screened for inclusion in this review. Of these studies, 129 full-text articles on 53 measures were assessed for eligibility (see Fig 1). Of these 53 measures, 13 measures met the inclusion criteria and 40 were excluded for the following reasons: 24 were published before 1994, 6 were diagnostic specific, and 10 did not meet the definition of social skills adopted for the purpose of this review. See Table 3 for an overview of the 40 social skills instruments and the reasons for exclusion. Through additional searches another 9 manuals were located. Thus, the psychometric properties were obtained for a total of 13 social skills measures which were accessed using 36 articles and 9 manuals. Psychometric property Domain and Definition a Reliability: the degree to which the measurement is free from measurement error.

Internal consistency
The degree of the interrelatedness among the items.

Reliability
The proportion of the total variance in the measurements which is because of "true" differences among patients.

Measurement error
The systematic and random error of a patient's score that is not attributed to true changes in the construct to be measured.
Validity: the degree to which an instrument measures the construct(s) it purports to measure.

Content validity
The degree to which the content of an instrument is an adequate reflection of the construct to be measured.

Face validity b
The degree to which (the items of) an instrument indeed looks as though they are an adequate reflection of the construct to be measured.

Construct validity
The degree to which the scores of an instrument are consistent with hypotheses based on the assumption that the instrument validly measures the construct to be measured.

Structural validity c
The degree to which the scores of an instrument are an adequate reflection of the dimensionality of the construct to be measured.
Hypothesis testing c Item construct validity.

Cross-cultural validity c
The degree to which the performance of the items on a translated or culturally adapted instrument are an adequate reflection of the performance of the items of the original version of the instrument.

Criterion validity
The degree to which the scores of an instrument are an adequate reflection of a "gold standard".

Responsiveness
Responsiveness: the ability of an HR-PRO instrument to detect change over time in the construct to be measured.
Interpretability d Interpretability a : the degree to which one can assign qualitative meaning to an instrument's quantitative scores/ score change. Notes.

Included Social Skills Measures
Information on the development and validation of the 13 included social skills measures is reported in Table 4. All measures had some evidence of development and validation through the use of a normative study population/sample; without targeting a specific diagnostic group. Of the 13 measures, 12 were developed using children up to 12 years of age; with 6 of these measures also using an adolescent sample (13-18 years). No measure was developed using an adult population alone (older than 18 years) and only 1 measure (i.e., the Evaluation of Social Interaction [ESI]) was developed and validated using all three age groups (i.e., children, adolescents and adults) The characteristics of the included measures are reported in Table 5. Of the 13 measures, 6 were published within the last 5 years (since 2009). Regarding the measure type, 9 measures used self-, parent-or teacher-report; with 3 of these measures using only teacher-report. Of the remaining measures, 3 were observation-based and only 1 used a semi-structured interview (see Table 5). Regarding the response options within the measures, 12 reported the use of Likert scales and only the ESI reported the use of a criterion-referenced rating scale. Of the 12 measures using a Likert response scale, 11 reported the use of a 3 to 5 point scale, and the Peer Social Maturity Scale (PSMS) reported the use of a 7-point scale. Additionally, the Interaction Rating Scale (IRS) reported the use of a dichotomous (yes or no) rating system for its scale.

Psychometric Properties
The quality ratings of the psychometric properties of all 13 measures, which were evaluated against the COSMIN quality criteria, are summarised in Table 6. The overall means and standard deviations of each psychometric property across all social skills measures were also calculated. Structural validity was most frequently reported; the mean rating across 12 measures was 78.1 (SD = 14.8), indicating excellent quality. The least reported psychometric property was criterion validity; 3 measures had a mean rating of 80.7 (SD = 7.9), indicating excellent quality. Overall, 11 measures had evidence for internal consistency; the mean rating across these measures was 84.5 (SD = 12.9), indicating excellent quality. The mean rating of the 11 measures reporting on reliability and hypothesis testing was 74.7 (SD = 11.9) and 56.8 (SD = 17.5) respectively; indicating good quality. The mean rating of the 6 measures that reported on measurement error was 70.0 (SD = 12.8); indicating good quality. Content validity was reported by 8 measures; mean rating 74.0 (SD = 17.0), indicating good quality. Cross-cultural validity was reported by 4 measures; mean rating 54.3 (SD = 13.2), indicating good quality.

Discussion
In this systematic review, we identified and evaluated the quality of psychometric properties of instruments that measure social skills and behaviours developed after 1994. We identified 13 instruments that evaluated a component of social skills and behaviours that fit the definition we used in the review. The vast majority of instruments (11) were developed mainly for school aged children and adolescents, with two instruments solely developed for children 2-5 years (i.e., QRSH-PR and SEEC).
Alongside the validity evidence, reliability findings also need to be reported. This systematic review of social skills and behaviour instruments using the COSMIN framework provided a comprehensive summary of this. Application of the COSMIN checklist based taxonomy provided the framework for a critical evaluation of the quality and extent of psychometric evidence of the 45 research articles and manuals on the 13 social skills and behaviour instruments. Based on the COSMIN taxonomy, the social skills and behaviour instrument with the most robust psychometric properties to date was the SSBS-2, given that all eight psychometric   Total sample: R = 3-6y; M = NR; SD = NR.

Reliability and Validity
The COSMIN checklist provides information about the instruments' properties with reliability testing for internal consistency. While these aspects of reliability were not reported for a number of instruments, the current review showed good to excellent reliability for the majority of the instruments. However, only six of the thirteen instruments reported on measurement error. When selecting appropriate outcome measures for a study, consideration of the measurement error of the instruments is important as a small measurement error will allow the instrument to detect smaller treatment effects and allow for stronger conclusions to be drawn. Thus, clinical trials will require smaller sample sizes if the measurement error is small in relation to its minimal important change (MIC), compared with instruments where the opposite applies [159].
The results of the current systematic review revealed considerable variability and range of sample sizes used for the validation and development of measures. For instance, the ESI was developed and validate using a total sample size of 6,552 people classified under various diagnostic categories [117], whereas the IRS-BC [124] was validated with a sample of 20 children. Other measures which were developed and validated with small sample sizes included the IRS-BC [124], IRS-A [122,123], SP [151,152]. Large numbers in normative samples used for validation and development increases the generalisability of the results of measures to a population, and allows clinicians to make informed assessments about a client's functioning in   Table 6.  Table 6.   Systematic Review of Social Skills Measures' Psychometrics relation to a representative sample of people with similar characteristics (e.g., age, sex). In contrast, validation studies using a limited sample size are not considered adequate for reaching conclusions about the clinical findings of the measure, as the small number of participants does not allow for informed clinical assessment. The theoretical construct being measured by an instrument must be clearly defined and then a body of evidence of the instrument's construct validity must be accrued. Within the COSMIN taxonomy, construct validity is comprised of content validity, structural validity and hypothesis testing. Assessment of content validity revealed excellent quality for the PKBS-2, QRSH-PR, SP, SSBS-2 and SISS. However, the SCI, ESI, three versions of the IRS, MESSY-II and PSMS did not provide any evidence of content validity, highlighting a need for further research. Reported structural validity revealed that the SCI, ESI, MESSY-II, four versions of the SEARS, PSMS, PKSB-2, SSBS-2 and SEEC all had published evidence of their structural validity leading to rankings of 'excellent'. Conversely, the three versions of the IRS did not have any published information in this domain; again highlighting a need for further research. The majority of the social skills and behaviour instruments (11 of 13) provided evidence of hypothesis testing ranging at the 'good' to 'excellent' level. Only the SP and the SEEC did not provide any evidence of hypothesis testing. Evidence for both cross-cultural validity (four measures: SSIS, MESSY-II, PKBS-2, and SSBS-2) and criterion validity (three measures: HCSBS, PSMS, and SSBS-2.) were the least reported psychometric property of validity.
When a scale is used without the documented measurement properties (such as construct validity) it can have potentially negative consequences, such as an error in clinical judgment or practitioners inaccurately interpreting assessment findings. Being able to investigate how well a scale measures what it claims to measure and its ability to hold its meaning across varied contexts and sample groups is vital so that it can be used with confidence in clinical settings. This systematic review of social skills and behaviour instruments provides a concise summary of the current state of play of the psychometric properties of these scales.
The importance of external or environmental influences has been emphasised by numerous theorists and researchers; however, few studies have examined the psychometric properties of an instrument in the different environmental contexts within which the social interaction occurs. It is likely that the environment (family, organisational or institutional structures, community, education, and culture) has a substantial impact on the social functioning of children and youth with and without disabilities. Therefore, future investigations of cross-cultural validity would seem valuable in the development of instruments that purport to measure social skills and behaviours.

Definition of Social Functioning as a Construct
In line with discrepancies within the theoretical models and frameworks [1,19], there was only moderate consensus between the instruments when the stated purposes of all instruments were compared. Among the stated aims of the instruments, the stated purpose was that they measured social competence, child-child interactions, social behaviours, or social and emotional problems, adjustment, or functioning. All terms were in agreement with widely accepted and long standing definitions of the components of social functioning as involving internal or person-related factors (i.e., cognitive, affective, linguistic and personality traits) [57,160]. However, it remains problematic that a uniform overarching definition of social functioning remains elusive within a body of research that focuses on the reliable, valid, and responsive measurement of social functioning. This is particularly so when current conceptual models highlight the influence of external factors as well [1]. The purpose of an instrument needs to include a robust definition or Systematic Review of Social Skills Measures' Psychometrics statement about the construct(s) that an instrument seeks to measure [46]. In the 13 instruments evaluated in this review, the articulated constructs may be viewed as the instrument developers' attempt to operationalise an important aspect of social functioning. Before evaluating the merits and weaknesses of the published data about multiple social interaction assessments, the first step is to situate the constructs they measure within theoretical underpinnings. This is necessary whether the instrument measures a single entity (e.g., expressive language) or part of an overarching broader construct (e.g., social functioning). Accordingly, we recommend communication and open discourse among researchers and practitioners who strive to operationalise social functioning as a universally acceptable and defined construct. In the absence of a universally agreed framework, the overlapping yet unclear differentiation of social functioning constructs, collectively described, may prove confusing for both researchers and practitioners. One way to overcome the heterogeneity of definitions is to apply a universally accepted framework with clear interrelated concepts. The International Classification of Functioning [161] has the potential to lend itself to be such a framework.

Application of the International Classification of Functioning (ICF)
The World Health Organization promotes the ICF as a potential guiding framework for professionals, organisations and governments seeking to address social and health inequalities among people with disabilities [161]. The ICF is non-discipline specific, theoretically neutral, and is based on the social model of disability. The ICF offers the advantage of being compatible with current leading psychological models and theorists described previously [1,16]. Such theories recognise internal factors such as brain integrity and personality (ICF: body structure and function and person factors), external factors such as family, organisational/institutional places (ICF: environmental factors) and engagement and participation within the person's natural environments (ICF: activity participation). Furthermore, evidence of cross-cultural validity testing would provide evidence of the relationships between ICF person and environment factors. Social skills and behaviour, as widely defined in this review, encompasses several key aspects of the ICF including a description of body structure and function (i.e. voice and speech functions); activity and participation factors (learning and applying knowledge, communication, domestic life, interpersonal relations and interactions, and community, social and civic life); person factors (i.e. age, gender, education level, culture); and environmental factors (support and relationships, attitudes, services, systems, and policies).

Limitations
This systematic review has a number of limitations. Information published in languages other than English were not included; therefore, some research findings may have been overlooked. Not all authors who published research on the psychometric properties of social skills and behaviour instruments were directly contacted; therefore, information may have been neglected. Evaluating the quality of responsiveness as a psychometric property was outside the scope of this systematic review, due to the size of this systematic review. We are of the opinion that evaluating the responsiveness of the included instruments warrants a review in itself, given that the number of papers to be evaluated would increase exponentially. Instruments developed for specific clinical populations were outside the scope of this systematic review and they may have sound psychometric qualities for clinical use; further research is needed to evaluate this.

Implications for Practice and Future Research
A number of implications arise from the findings of this systematic review. Measuring social functioning is complex as it involves numerous distinct yet related social skills that are used Systematic Review of Social Skills Measures' Psychometrics during interactions with multiple interactants with varying levels of social competence within a multitude of contexts. Therefore it is unlikely that one measure can address all the assessment needs of researchers and practitioners. As such it is not prudent to recommend one singe measure for use. The SSBS-2 is the social skills and behaviour measure with the most robust psychometric properties. Of particular strength is that all eight psychometric properties have been investigated. The SSBS-2 is recommended for use as a screening measure in educational settings. The PKBS-2 also has sound psychometric properties and is recommended for use as a diagnostic measure of social and emotional problems in children with significant behavioral, emotional and developmental problems. The SSIS has sound psychometric properties, but was clearly developed as an outcome measure; thus needing it to detect change over time following an intervention. Responsiveness of the SSIS is therefore one of the most important psychometric properties. As the review did not evaluate responsiveness, it is not possible to make a recommendation for the purpose it was developed. The HCSBS is another measure with robust psychometric properties and is recommended as a screening measure for home and community contexts. While there are other measures that were appraised in this review that show great promise in terms of the available evidence of the quality of their psychometric properties, more research is needed to evaluate the psychometric properties that have not been reported on to date.
It is important that researchers and practitioners utilise instruments with sound psychometric properties in support of evidence-based and research practices. It is recommended that practitioners collaborate with researchers to further develop the body of knowledge related to the reliability and validity of the social skills and behaviour scales. There is a need for ongoing research in the area of the psychometric properties of social skills and behaviours instruments. The body of psychometric evidence for instruments is dynamic and constantly being added to. It is strongly recommended that a universal working definition of social functioning as an overarching construct plus any related sub-constructs be generated. This would ensure that a consistent approach to evaluating the outcomes of social skills interventions is followed. In particular, it is recommended that the cross-cultural validity and criterion validity of all 13 instruments be further investigated.
There is a need to evaluate the responsiveness of the instruments and therefore to evaluate their suitability for use as an outcome measure of social skills and behaviour. Measures can be prognostically used to report on the responsiveness to a particular intervention or analytically to detect within subject change or between subgroup differences [45]. In an evidence-based practice era for all professionals who may work with clients presenting with impaired social functioning, the development of appropriate and psychometrically sound measurements is crucial to substantiate the effectiveness of interventions and programs. Consequently, instruments require statistical evaluation to determine stability over time in the absence of an intervention, as well as reliability, prior to thorough investigations of responsiveness and sensitivity to change over time. Of the included measures, a considerable number of measures had been validated within the last 10 years. Only the SEEC [150] and SCI [145] had not re-evaluated or updated their psychometric properties within the last 10 years. Furthermore, the SEEC and SCI measures had only been evaluated on a singular occasion. Future evaluation of psychometric properties is needed to determine the stability of these measures over time.

Conclusion
This systematic review presented the results of 45 studies and manuals that reported evidence of the psychometric properties of 13 social skills and behaviour instruments used with children and youth. The COSMIN taxonomy was used to rate the reliability and validity information Systematic Review of Social Skills Measures' Psychometrics reported about the instruments. Three social skills and behaviour scales were found to have the strongest level of psychometric evidence reported in at least seven of the eight psychometric properties that were appraised. The authors recommend that practitioners and researchers consider using the robust SSBS-2, HCSBS and PKBS-2 with children and youth for the purposes and context for which they have been developed. It is also recommended that a more consistent definition of social skills and behaviours as a construct be generated. Only the SSBS-2 has reported on all the psychometric properties evaluated in this systematic review. There is a need for the authors of the measures included in this systematic review to evaluate and report on the quality of the psychometric properties that have not been assessed to date. The body of psychometric evidence of any scale or measure is constantly changing and evolving and it is important for practitioners to be knowledgeable of the best instruments and outcome measures for use when monitoring and assessing children's social functioning.
Supporting Information S1