Clinimetric properties of lower limb neurological impairment tests for children and young people with a neurological condition: A systematic review

Background Clinicians and researchers require sound neurological tests to measure changes in neurological impairments necessary for clinical decision-making. Little evidence-based guidance exists for selecting and interpreting an appropriate, paediatric-specific lower limb neurological test aimed at the impairment level. Objective To determine the clinimetric evidence underpinning neurological impairment tests currently used in paediatric rehabilitation to evaluate muscle strength, tactile sensitivity, and deep tendon reflexes of the lower limb in children and young people with a neurological condition. Methods Thirteen databases were systematically searched in two phases, from the date of database inception to 16 February 2017. Lower limb neurological impairment tests were first identified which evaluated muscle strength, tactile sensitivity or deep tendon reflexes in children or young people under 18 years of age with a neurological condition. Papers containing clinimetric evidence of these tests were then identified. The methodological quality of each paper was critically appraised using standardised tools and clinimetric evidence synthesised for each test. Results Thirteen papers were identified, which provided clinimetric evidence on six neurological tests. Muscle strength tests had the greatest volume of clinimetric evidence, however this evidence focused on reliability. Studies were variable in quality with inconsistent results. Clinimetric evidence for tactile sensitivity impairment tests was conflicting and difficult to extrapolate. No clinimetric evidence was found for impairment tests of deep tendon reflexes. Conclusions Limited high-quality clinimetric evidence exists for lower limb neurological impairment tests in children and young people with a neurological condition. Results of currently used neurological tests, therefore, should be interpreted with caution. Robust clinimetric evidence on these tests is required for clinicians and researchers to effectively select and evaluate rehabilitation interventions.


Objective
To determine the clinimetric evidence underpinning neurological impairment tests currently used in paediatric rehabilitation to evaluate muscle strength, tactile sensitivity, and deep tendon reflexes of the lower limb in children and young people with a neurological condition.

Methods
Thirteen databases were systematically searched in two phases, from the date of database inception to 16 February 2017. Lower limb neurological impairment tests were first identified which evaluated muscle strength, tactile sensitivity or deep tendon reflexes in children or young people under 18 years of age with a neurological condition. Papers containing clinimetric evidence of these tests were then identified. The methodological quality of each paper was critically appraised using standardised tools and clinimetric evidence synthesised for each test.

Results
Thirteen papers were identified, which provided clinimetric evidence on six neurological tests. Muscle strength tests had the greatest volume of clinimetric evidence, however this evidence focused on reliability. Studies were variable in quality with inconsistent results. PLOS  Introduction child's comprehension of the test requirements, and therefore their performance. Physical disabilities that also may influence a neurological test protocol and results include, but are not limited to, the presence of muscle contractures, spasticity or variations in tone, and previous orthopaedic surgery. There is little evidence-based guidance on how to assist clinicians and researchers select and interpret an appropriate, paediatric-specific lower limb neurological test for children and young people with a neurological disorder. [12,20,21] While clinimetric evidence of activity and participation measures in children and young people with neurological diagnoses have been identified, [22] evidence of impairment measures remains limited. A recent systematic review found no conclusive clinimetric evidence to support the use of handheld dynamometry to measure muscle strength in children and young people with cerebral palsy, due to the poor methodological quality of primary papers. [23] Other systematic reviews have identified the lack of high quality clinimetric evidence for upper limb tests in children and young people with a neurological condition. [1,9] The clinimetric evidence for other lower limb neurological

Reliability
The extent to which repeated scores for a neurological test in a stable child are the same (consistent) [8,10] measuring the proportion of variability that is due to "true" a differences and "free" from measurement error. [8] Test-retest b Degree to which an individual achieves the same result on a repeated test(s) without involvement from a health practitioner. [16] Inter-rater Degree to which different health practitioners achieve the same result on the same occasion of testing [8] Intra-rater Degree to which the same health practitioner achieves the same result on different occasions of testing in a stable child [8] Validity Degree in which a neurological test measures what it intends to measure [8] Face validity Degree in which the neurological test appears to reflect the items required to measure the intended construct [8] Content validity Degree to which the domain, muscle strength, tactile sensitivity or deep tendon reflexes, is comprehensively sampled by the items within the test.
impairment tests for children and young people with cerebral palsy and other neurological conditions remains unknown. Therefore, the aims of this study were to: • Identify neurological impairment tests currently used to evaluate the lower limb neural integrity of muscle strength, tactile sensitivity, and deep tendon reflexes in children and young people with a neurological condition • Identify clinimetric evidence for neurological impairment tests used in children and young people with a wide range of neurological conditions • Critically appraise and synthesise the clinimetric evidence underpinning the lower limb neurological tests • Make recommendations regarding their use in clinical practice and research settings.

Method
This study was undertaken in two phases based on the works by Bialocerkowski and colleagues. [21,24] The first phase systematically identified lower limb neurological tests measuring muscle strength, tactile sensitivity or deep tendon reflexes, in children and young people. [25] The second phase systematically identified studies evaluating the clinimetric properties of these neurological tests specific to children and young people with a neurological condition.

Phase 1: Identification of neurological tests
Search terms, identifying lower limb neurological impairment tests for children (aged 2-18 years) with a neurological condition, were generated from previous search strategies. [

Study selection
Duplicates were removed from identified papers, before two researchers (RC and BH) independently evaluated them for the following inclusion criteria: 2. Participants with a neurological condition affecting the lower limb. These conditions included diseases of the nervous system, musculoskeletal system and connective tissue, injuries to the head or unspecified part of trunk and certain other consequences of external causes, certain conditions originating in the perinatal period, congenital malformations, deformations and chromosomal abnormalities that effect the central or peripheral nervous system including the spinal cord, peripheral nerves, nerve roots, autonomic nervous system and muscles. [2] 3. Papers reported using a neurological impairment test that measured or evaluated muscle strength (b730), and/or tactile sensitivity (b270), and/or deep tendon reflexes (b750) at the "body functions and structures" level of the ICF-CY. [4] 4. Neurological impairment tests were suitable for use within the clinical setting, using equipment that was typically available, inexpensive and portable. [15,31] 5. Quantitative studies with a level of evidence rated I-IV [32] (including systematic reviews (I), randomised controlled studies (RCTs)(II) and, pseudo-RCTs (III), comparative studies (III2, 3), and case series with pre/post studies (IV)) 6. Full text or abstract papers published in a peer-reviewed journal, as listed in Ulrichsweb. [32] 7. Published in the English language between 1985 to February 2017, as papers published after the mid-1980s were considered to coincide with a period for the use of evidence-based practice (EBP) to optimise clinical care. [33] Papers were excluded if: 1. The average age for participants could not be determined or the average age of participants was younger than 2 years of age or older than 18 years of age.
2. Participants were diagnosed with conditions limited to metabolic, orthopaedic or cardiovascular conditions (including, but not limited to systemic connective tissue disorders and other osteopathies, episodic and paroxysmal disorders and inflammatory diseases of the central nervous system).
3. Neurological tests were classified as activity or participation measures, as these measures represented a different ICF-CY construct. [4] 4. Papers reported only spasticity or primitive reflexes, as these were not the focus of this study. [28][29][30] 5. Neurological impairment tests with a low level of clinical utility due to expense or limited transportability of equipment (e.g. isokinetic dynamometer) or the specialised diagnostic nature of testing (e.g. electromyography or nerve conduction studies). [14] 6. Papers were editorials or opinion pieces, as they are not quantitative studies. [34] If eligibility was unclear, the two researchers (RC and BH) undertook a review of the full text article. A third reviewer (AB) was consulted to reach consensus in cases of continued disagreement. Included papers were reviewed in full text and the names of all relevant neurological tests were extracted by the same two researchers (RC and BH) and compared for agreement. If required, the third reviewer (AB) determined consensus.

Phase 2: Identification of clinimetric properties of neurological tests
Neurological impairment tests identified in Phase 1 were systematically searched for their clinimetric properties from their date of inception to 16 February 2017 using four health databases, CINAHL, EMBASE, Medline, Scopus. [24] By translating the validated Terwee, Jansma, Riphagen, et al. [35] protocol for each specific database (S2 Table), the search strategy involved combining: • a neurological test search, to identify measures of muscle strength, or tactile sensitivity, or deep tendon reflexes limited to the lower limb; • a population search, including paediatric participants aged less than 18 years; • a neurological test search, derived from the neurological impairment test names identified in Phase 1 and; • filtering for measurement properties, as outlined by Terwee, et al. [35] Papers were included if: 1. all paediatric participants were aged less than 18 years, as clinimetric properties are population specific. [17] 2. participants had a neurological condition affecting the lower limb. Neurological conditions were defined using the International Classification of Diseases (ICD-10) as per Phase 1.
3. papers contained clinimetric evidence on a lower limb neurological impairment test that evaluated muscle strength, tactile sensitivity and/or deep tendon reflexes in the lower limb as per the ICF-CY framework outlined in Phase 1.
4. quantitative studies with a level of evidence rated II-IV [34] (including randomised controlled studies (RCTs)(II) and, pseudo-RCTs (III), comparative studies (III2, 3), and case series with pre/post studies (IV)) 5. papers were published in full text in the English language and peer reviewed.
Consensus between two individual reviewers (RC and BH) was reached using the same method as Phase 1. Papers that contained additional evidence outside the scope of this paper were included only if data could be extrapolated that met the inclusion criteria. Systematic reviews (level I evidence) identified in this process were searched for primary papers that met the inclusion criteria through secondary searching. Additional primary papers that met the inclusion criteria were identified through secondary searching by hand through the reference lists of included papers and identified systematic reviews.

Quality assessment
The methodological quality of the included clinimetric papers was evaluated independently by two reviewers (RC and BH) using two critical appraisal tools: Brink and Louw critical appraisal tool [6] and the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN). [8] These critical appraisal tools [6,8] have previously been used in a number of published systematic reviews on health-related outcome measures [21,[36][37][38][39] to evaluate the aspects of the quality of psychometric evidence. Brink and Louw's [6] tool assessed the impact of 13 items on the overall quality of the primary paper's method, without calculating a composite score [6,21]. For each included primary paper, the percentages of "yes" responses for applicable items [6] was calculated by dividing the number of "yes" responses by the number of applicable items and converted into a percentage. [36,40] This provided an arbitrary evaluation of the overall methodological quality of each paper. Due to its wide use in health-related research the COSMIN was used to grade the methodological quality of included papers. [1,20,22,31,41] The COSMIN uses weighted items based on overall importance and a 'worst score counts' method. [42] Consensus for each item was gained through discussion and a third researcher (AB) was consulted if required. Kappa coefficients and 95% confidence intervals (CI) were calculated to assess the inter-reviewer reliability of the item response.

Data extraction
Additional data were extracted from each study, including the name of the authors, date of publication, name of the neurological test, type of clinimetric property evaluated, participant characteristics, rater characteristics, measurement characteristics, results of the clinimetric evaluation and information on the clinical utility of the test. Clinical utility was described based on information contained within the included papers on the portability, cost, and feasibility of using the equipment on children and young people with a neurological condition in a clinical setting. [14] Best evidence synthesis Evidence on each clinimetric property for each neurological test within primary papers was narratively synthesised and interpreted in combination with the methodological quality of the primary paper. Reliability correlation coefficients from the primary papers were interpreted using guidelines from Katz et al., [43] low = <0.40, moderate = 0.40-0.59, moderately high = 0.60-0.79 and very high = >0.80. The level of evidence for each neurological test was determined using guidelines from Terwee et al. [7] and Dobson et al., [31] which combined the quality of the paper for each neurological test with the consistency of the clinimetric evidence for that test (Table 2). [20,31,44]

Hip Adductors (n = 1) [47]
Supine [48] Hip and knees extended 0 o [48] SB [48] HHD, Spark [48] medial thigh, proximal to knee d [48] Contralateral lower extremity [48] Make [48] Peak [48] (Table 6). Body positions and measurement methods were comparable to those used with HHD with the addition of smaller muscle groups, such as the ankle invertors and evertors (Table 6). Body positions and measurement methods were comparable to those used with HHD with the addition of smaller muscle groups, such as the ankle invertors and evertors ( Table 6). The MMT in primary papers [49,55,56] did not require any equipment; therefore it is also a portable test. [14] Charcot marie-tooth paediatric scale. The Charcot-Marie-Tooth Pediatric Scale (CMTPedS) was evaluated in one paper with children and young people with Charcot-Marie-Tooth (Table 3, Table 4). [45] The methodological quality of the reliability component of the paper on CMTPedS identified 75% of quality items using the Brink and Louw [6] criteria, ( Table 3) yet was rated as poor using the COSMIN checklist [8] due to a sample size considered small ( Table 4). The CMTPedS had very high reported inter-rater reliability (ICC = 0.95). [45] The CMTPedS score, however, was a composite score, including upper and lower limb test components with subsets of muscle strength and tactile sensitivity tests comprising 36% of items within the test. While this test has high portability, the need for other equipment, including HHD, [45] at an approximate cost of USD$2657 [45] in conjunction with the need for additional training reduces its overall clinical utility. [14] ASIA impairment scale. The International Standards for Neurological Classification of Spinal Cord Injury American Spinal Injury Association (ASIA) scale was evaluated in one paper with children and young people with spinal cord injury ( Table 3, Table 4). [57] The methodological quality of the of the paper on the ASIA scale identified 50% of quality items using the Brink and Louw [6] criteria, (Table 3) and was rated as poor using the COSMIN checklist [8] due to methodological flaws considered major ( Table 4). The ASIA scale reported high intra-rater reliability (ICC = 0.71 to 0.98), with wide variation in 95% CI (0.23 to 0.99). [57] The ASIA impairment scale, is a composite score, including upper and lower limb test components with subsets of motor scores and tactile sensitivity tests (including pinprick and light touch). This test has high portability, without the need for other equipment, although requires some training. [57] Richmond quantitative measurement system. Clinimetric evidence for the Richmond Quantitative Measurement System was identified in one paper in children and young people with cerebral palsy (Table 3, Table 4). [55] The methodological quality of the paper [55] identified 63% of items scored "yes" with the Brink and Louw critical appraisal tool, [6] but was rated as poor using the COSMIN checklist [8] (Table 3, Table 4). The Richmond Quantitative Measurement System had moderate to very high inter-rater reliability (ICC = 0.56 to 0.97) ( Table 5). [55] This was determined in one study with poor methodological quality ( Table 4). The Richmond quantitative measurement system requires equipment with specialised software and training requirements reducing its portability and clinical utility.
Standing heel rise. One study provided evidence on the reliability of the Standing Heel Rise [52] in children and young people with cerebral palsy (Table 3, Table 4). The methodological quality of this paper [52] showed 75% of the Brink and Louw [6] items were rated as "yes", however the overall paper [52] was rated poor using the COSMIN checklist, [8] due to a sample size considered small (Table 4). Intra-rater reliability was very high (ICC = 0.84-0.99), [52] using the protocol by Van Vulpen et al. [52] This protocol was portable, however it involved the additional use of infra-red beams connected to a receiver which detected the heels lifting 1.7cm off the ground. There was no detail regarding training requirements or equipment costs for the SHR.

Synthesis of evidence
A best evidence synthesis for each of the six neurological tests showed HHD had conflicting evidence on reliability for children with cerebral palsy, moderate inter-rater reliability for children with spina bifida and moderate intra-rater reliability for children with Duchenne's muscular dystrophy (Table 7). MMT had conflicting evidence regarding intra-rater and inter-rater reliability in children with Duchenne's muscular dystrophy and spina bifida respectively. Moderate evidence was found for the Charcot-Marie-Tooth Pediatric Scale [45] and the Standing Heel Rise. [52] These tests had consistent evidence across multiple studies and were published in papers higher in methodological quality, which resulted in greater evidence ratings despite small bodies of evidence (Table 7). Conflicting evidence on intra-rater reliability was found for both the motor and sensory constructs of the ASIA Impairment scale when used on children and young people with a SCI. The Richmond Quantitative Measurement Scale [55] had unknown evidence of inter-rater reliability due to the poor methodological quality of the published papers.

Discussion
This is the first study to systematically identify clinimetric evidence on lower limb neurological impairment tests used on children and young people across a range of neurological disorders. Evidence of reliability was the only identified clinimetric property for six of the identified 21 neurological tests, demonstrating the paucity of evidence for neurological impairment testing. Clinimetric evidence for tactile sensitivity was identified in two primary papers [45,57] containing composite measures. However, tactile sensitivity evidence could only be extrapolated from one primary paper. [57] The limited to moderate body of evidence on reliability of lower limb muscle strength tests and composite tests including subsets of tactile sensitivity and Existing clinimetric evidence must be interpreted in conjunction with the methodological quality of the paper. [6,44] Ten of the 13 included papers in this study had greater than 60% of "yes" items for methodological quality using the Brink and Louw method, [6] compared with the COSMIN checklist grading 12 of 13 papers in this study with a 'poor quality' due to small sample sizes (less than 30). [8,44] The small sample size of children within primary papers has previously been highlighted as a potential limitation. [1,20] Benfer et al. [41] argued that smaller sample sizes are common in paediatrics, yet these studies may have adequate power to support their small sample. [49] Similar systematic reviews have used a 'second-worst' method with a modified COSMIN to combat this issue, however this technique has not been validated. [20,22,23,41,58] The Brink and Louw [6] critical appraisal tool highlighted specific methodological flaws of included papers, the most prevalent being the lack of reported stability for a child's condition across testing sessions. Ensuring the stability of a child's condition means any identified differences are due to measurement error [16] and not changes in their condition. [59][60][61][62] The time between testing sessions should be considered relative to the underlying diagnosis to ensure there is no expected change or fatigue. The stability results for a participants neurological condition reported in five [45,49,50,52,54] of the six primary papers, [45,49,50,52,54,57] (Item 8, Table 3) should be interpreted with caution, as these papers did not state whether the time frame between sessions was appropriate for their population-group or if the child or carer believed there to be no change in the child's status between testing sessions. Reliability cannot be inferred without measuring whether a child's condition is stable across testing sessions. [16] Reliability coefficients in primary papers could therefore be lower than reported due to the absence of stability measures.
Clinimetric evidence was only identified for muscle strength tests, and was limited to evidence on reliability. Reliability has also been the primary identified clinimetric property in a similar review of upper limb tests of muscle strength in children with cerebral palsy. [20] The paucity of additional primary papers since Mulder-Brouwer et al's [23] study also highlights the lack of an increase in the body of literature since 2013. The inconsistent reporting of evidence on reliability in the identified neurological tests makes interpretation and research translation difficult. Reliability was quantified using ICC or weighted kappa for 14 of the 15 primary papers, however the use of different measurement protocols made it difficult to draw conclusions and prevented a meta-analysis. Mahony et al. [49] calculated an ICC from ordinal data instead of the appropriate weighted kappa confounding the interpretation of their reliability values.
Reliability is defined as a measure that is consistent and free from random or systematic error (Table 1). [16,63] Additional statistics, such as the 95% CI and systematic error of measurement (SEM), aid in the interpretation of the test's reliability. [59,63] Wide CIs in the few primary papers reporting 95% CI, indicated variation in this measurement property in children. [47,57] The SEM provides clinicians and researchers with information on the systematic and random error of a patient's score that is not attributed to true change. [54,63] Reliability for neurological tests reported in the seven papers [45,48,50,51,[55][56][57] that did not report SEM should therefore be interpreted with caution. Comparisons of SEM, where reported, could not be made between primary papers in this study due to different units of measurement, muscle groups tested and protocols used. A standardised measurement protocol would therefore provide the same units for SEM to aid in reporting random error [16] and assist in synthesising results from multiple primary studies.
Results of this study indicate that the same clinician should perform each neurological test due to consistently higher intra-rater reliability coefficients compared to inter-rater reliability coefficients (Fig 2). All clinicians who used the neurological tests in the included papers were reported to have six or more years of clinical experience. Without reporting the clinician's experience in using the neurological test, or comparing to clinicians with less than six years of clinical experience, the effect clinician experience has on the outcome of neurological testing on children and young people with a neurological condition is unknown. [16] A recent reliability study of manual muscle testing in children and young people with spina bifida suggested experienced clinicians should assist in training novice clinicians to improve measurement reliability. Tan et al.'s 2016 (in press) [64] study reported an overall weighted kappa of 0.95 (CI 0.94-0.96) for MMT using the Daniel's and Worthingham's protocol, yet the methodological quality would have been graded as poor using the COSMIN checklist due to a small sample size. [8,64] Manual muscle testing is typically recommended in weaker muscles, with equal or less than gravity strength, [54,64] yet this test becomes more variable when the clinician needs to apply increasing amounts of resistance. (i.e. grade IV to V) [65] Clinicians should use the make method when performing hand held dynamometry and manual muscle testing as the larger body of evidence and increased reliability (Fig 2) supports this method compared to the break test. Evaluation of the ankle plantarflexors was an exception to this finding. [53] The ankle plantarflexors are known to be a strong muscle group that acts upon a short lever arm, making it challenging for clinicians to apply sufficient manual resistance for muscle testing. [66][67][68][69] This often limits muscle testing of ankle plantarflexors group to the relative strength of the clinician performing the test. [70] However, clinician strength was not a reported variable in the primary papers. The moderate evidence for standing heel rise test may suggest using this as an alternative test for measuring plantarflexion muscle strength in ambulatory children.
Inconsistent muscle strength testing methods between the primary papers confirms that a standardised test protocol for muscle strength testing does not exist. There is also wide variability in grading scales when using MMT, with four different scales reported in the three MMT papers [49,55,56] with clinimetric evidence and the motor subscale of the ASIA scale [57]. The conflicting evidence on reliability for hand held dynamometry found in papers of fair quality means additional high quality research, using a standardised measurement protocol, is required to make recommendations. Consensus between clinicians on a standard protocol is recommended prior to further clinimetric testing. Without clinimetric evidence, lower limb rehabilitation trials for children and young people are at risk of bias due to the use of neurological tests with unknown clinimetric properties. [71] Reliability is only one of many clinimetric properties, which include validity, responsiveness and clinical utility, [7,14] The Charcot-Marie-Tooth Pediatric Scale (CMTPedS) [45] has clinimetric evidence of both reliability and validity, however the age range of participants differed between those who participated in the reliability and validity studies. The evidence on reliability, which was included in this study, was for children and young people aged 5-15 years. However, evidence on the validity of the CMTPedS was for children and young people aged 3-20 years. [45] Evidence of validity for the CMTPedS could not be included in this study due to the age ranges of participants exceeding 18 years of age, as per the exclusion criteria. Without clinimetric evidence presented for different age groups, it is unclear whether the validity evidence for the CMTPedS [45] is specific to the paediatric population. Clinimetric evidence for the ASIA scale [57] was included in this study by extrapolating data for children and young people aged 4-15, while the 16-21 year old age group data were not included in this study.
Currently there is no universally accepted definition of the upper age limits [25] for a paediatric population from other paediatric systematic reviews. [20,22,30,39,58] A definition of paediatrics as children less than 18 years was used in this study to align with previous systematic reviews with a paediatric population [1,[20][21][22]39] and Medical Subject Headings definitions for a targeted search strategy. [27] The comprehensive search strategy used in this study [24,26,27] ensured the identification of lower limb impairment neurological tests that were specific to children and young people with a neurological condition. [42] Future studies may broaden the paediatric age range up to 22 years of age as suggested by Clark et al. [25] Until future research supports this upper age limit, papers should report evidence for different paediatric age ranges to allow for greater research translation. [25,72] In contrast to previous reviews [9,20,23,30,31,41], this study covers a broad paediatric age range and multiple neurological conditions. Recommendations for a clinimetrically-sound neurological test require a standardised test protocol with population-specific evidence, as clinimetric properties from other populations are not inherently transferable. [13,14,42] The majority of papers identified in this review had clinimetric evidence of neurological impairment tests used on children with cerebral palsy, which likely reflects cerebral palsy as the most prevalent paediatric neurological condition with motor and sensory impairment. [73] This study was limited to three components of a neurological examination at the 'body function and structures' level of the ICF-CY as other neurological impairment tests such as spasticity are dependent on the diagnosis of the child. [4] For a comprehensive neurological examination other components of a neurological examination should be included, such as measures from the 'activity' and 'participation' levels of the ICF-CY. [4,74] Selection of these neurological tests will be dependent on the diagnosis of the child. Limited evidence for clinimetrically-sound measures of 'activity' for children and young people with a neurological condition have been found, [22,58] demonstrating a similar shortage of high-quality studies in these constructs.
Synthesising best evidence, through combining a consistent body of clinimetric evidence with robust methodological qualities, can guide clinicians and researchers to select appropriate paediatric-specific lower limb neurological tests. [7,31] Guidance on best evidence of clinimetrically-sound measures cannot be made with reliability evidence alone. Without evidence of reliability, validity, responsiveness and clinical utility, recommendations to clinicians for neurological tests can only be made with caution until further clinimetric evaluation can be used to support best practice. [7,75]

Conclusion
There is a lack of robust clinimetric evidence on neurological impairment tests to use on children and young people with a lower limb neurological condition. Clinimetric evidence was only found on the reliability of neurological impairment tests evaluating muscle strength. Performing standardised testing protocols, such as the make method, with manual or belt stabilisation in a stable population-specific group, are recommended as a starting point for further clinimetric studies. In the absence of clinimetrically-sound neurological tests, clinicians should use the best available evidence. Without clinimetrically-sound neurological tests it is difficult for clinicians and researchers to select and perform a test in clinical practice, which becomes increasingly complex when requiring a combination of these tests for a thorough neurological examination. High quality, population-specific studies are required to provide a strong body of clinimetric evidence for clinicians and researchers to make future recommendations for use of a neurological examination in clinical practice and research.
Supporting information S1