Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Prechtl’s method to assess general movements: Inter-rater reliability during the preterm period

  • Angélica Valencia ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Writing – review & editing

    anmvdiaz@gmail.com

    Affiliations Faculty of Psychology, Universidad Cooperativa, Cali, Colombia, Cognitions Humaine et Artificielle -EPHE-PSL, CHArt Laboratory, Aubervilliers, France

  • Carlos Viñals ,

    Contributed equally to this work with: Carlos Viñals, Elsa Alvarado, Marcela Balderas

    Roles Data curation, Investigation, Methodology

    Affiliation Cerebral Palsy Department, Instituto Nacional de Rehabilitación: Luis Guillermo Ibarra Ibarra, México City, México

  • Elsa Alvarado ,

    Contributed equally to this work with: Carlos Viñals, Elsa Alvarado, Marcela Balderas

    Roles Data curation, Investigation, Methodology

    Affiliation Cerebral Palsy Department, Instituto Nacional de Rehabilitación: Luis Guillermo Ibarra Ibarra, México City, México

  • Marcela Balderas ,

    Contributed equally to this work with: Carlos Viñals, Elsa Alvarado, Marcela Balderas

    Roles Data curation, Investigation, Methodology

    Affiliation Cerebral Palsy Department, Instituto Nacional de Rehabilitación: Luis Guillermo Ibarra Ibarra, México City, México

  • Joëlle Provasi

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Writing – review & editing

    Affiliation Cognitions Humaine et Artificielle -EPHE-PSL, CHArt Laboratory, Aubervilliers, France

Abstract

Introduction

Prechtl’s method (GMA) is a test for the functional assessment of the young nervous system. It involves a global and a detailed assessment of the general movements (GMs) and has demonstrated validity. Data on the reliability of both assessments in the preterm period are scarce. This study aimed to evaluate the inter-rater reliability for the global and detailed assessments of the preterm writhing GMA.

Materials and methods

The study participants were 69 infants born at <37 gestational weeks and admitted to the neonatal intensive care unit. They were randomly assigned to five pairs of raters. Raters assessed infants’ GMs using preterm videos. Outcome variables were (a) the GMs classification (normal versus abnormal; normal versus abnormal subcategories) and (b) the general movements optimality score (GMOS), obtained through the global and detailed assessments. The Gwet’s AC1 and the intraclass correlation coefficient (ICC) were calculated for the GMs classification and the GMOS, respectively.

Results

The global assessment presented an AC1 = 0.84 [95% CI = 0.54,1] for the GMs binary classification and an AC1 = 0.67 [95% CI = 0.38,0.89] for the GMs classification with abnormal subcategories. The detailed assessment presented an ICC = 0.72 [95% CI = 0.39,0.90] for the GMOS.

Conclusions

Inter-rater reliability was high and substantial for the global assessment and good for the detailed assessment. However, the small sample size limited the precision of these estimates. Future research should involve larger samples of preterm infants to improve estimate precision. Challenging items such as assessing the neck and trunk, poor repertoire GMs, and tremulous movements may impact the preterm writhing GMA’s inter-rater reliability. Therefore, ongoing training and calibration among raters is necessary. Further investigation in clinical settings can enhance our understanding of the preterm writhing GMA’s reliability.

Introduction

Preterm infants are at risk of neurodevelopmental disorders [1]. Thus, international guidelines recommend performing appropriate neurodevelopmental assessments to identify infants who could benefit from early intervention [2]. Prechtl’s method (GMA) is a video-based test for the functional assessment of the young nervous system. The GMA has high clinical utility for premature infants because it consists of an observational assessment of the quality of general movements (GMs). GMs originate endogenously in the entire body in a sequence of spontaneous movements [3]. Variability, complexity, and fluidity are the characteristics that determine the quality of GMs and reflect the integrity of the young nervous system. These characteristics of GMs can be negatively affected by brain abnormalities [4, 5] thus becoming a neurodevelopmental marker [6]. The GMA facilitates the evaluation of GMs in three different developmental periods. The preterm writhing GMA (before term age) and writhing GMA (from term age) identify various neurodevelopmental disorders, including functional, motor, and cognitive issues [7, 8]. The fidgety GMA (from 9 weeks post-term) mainly detects cerebral palsy and cognitive disorders [9, 10].

The three GMA periods include both (a) a global and (b) a detailed assessment of the GMs [11]. During the preterm writhing GMA and writhing GMA, the global assessment involves analyzing the quality of GMs and classifying them as either normal or abnormal. The detailed assessment consists of scoring the GMs’ characteristics in the limbs to obtain the GMs optimality score (GMOS). During the fidgety GMA, the global assessment classifies GMs as present, abnormal, or absent, while the detailed assessment scores the motricity and posture to obtain the motor optimality score (MOS).

Clinicians, therapists, and neuroscientists worldwide have received training from the GMs Trust group in using both the global and detailed assessments of the GMA. As a result, studies in clinical and research contexts have evaluated the GMA’s validity and reliability in preterm infants. Validity and reliability are psychometric properties that express the appropriateness of a test. Validity indicates the scoring accuracy, and reliability shows its consistency. Inter-rater reliability evaluates scoring consistency when different raters assess the same patients with the same test, and agreement is the degree to which scoring is identical [12]. Studies found that the preterm writhing GMA presents high validity (94%) [13] predicting neurodevelopment in preterm infants and high inter-rater agreement (90%) [14]. Specifically, the inter-rater reliability of the global assessment was high (k > 0.80) for preterm writhing GMA and writhing GMA [15], from moderate (k = 0.50) to substantial (k = 0.80) for preterm writhing GMA and writhing GMA in combined analyzes [16, 17], and from substantial (k = 0.64) to high (k = 0.92) for fidgety GMA [15, 18, 19]. A recent study evaluated the reliability of a checklist for guiding raters during the global assessment. That study found that the inter-rater reliability values (k = 0.68–0.80) for the preterm writhing GMA and writhing GMA were similar to those in previous studies [16]. While limited published data exist on the reliability of detailed assessment, studies have report excellent inter-rater reliability (ICC > 0.87) for both the fidgety MOS and the fidgety MOS-revised [19, 20]. However, to our knowledge, data on inter-rater reliability for detailed assessment in the preterm writhing period are scarce.

Preterm writhing GMA inter-rater reliability needs to be identified because most studies either focus on fidgety GMA, combine preterm writhing GMA and writhing GMA analyses, or do not evaluate the detailed assessment. Furthermore, there is a scarcity of data regarding the inter-rater reliability for the global and detailed assessments of the preterm writhing GMA within the same sample of preterm infants. This gap highlights the need for studies to complement existing evidence on the inter-rater reliability of the preterm writhing GMA.

Consequently, this study aimed to evaluate the preterm writhing GMA inter-rater reliability for (a) the global and (b) the detailed assessment. Specifically, this study addressed the global assessment because of its utility on preterm infants. It also considered the detailed assessment because studies suggest using it to complement the global assessment [21]. Given its well-documented high validity in predicting later development in high-risk preterm infants [13], understanding the reliability of the preterm writhing GMA is crucial for clinical practice. Establishing this reliability is essential to inform clinicians about the preterm writhing GMA’s ability to consistently identify at-risk babies, thus facilitating timely interventions during this critical period of neuroplasticity [22]. According to the hypotheses, inter-rater reliability of the preterm writhing GMA would be high (AC1 > 0.80) for the global assessment and excellent (ICC > 0.75) for the detailed assessment.

Materials and methods

Design and setting

This psychometric study on the inter-rater reliability of the preterm writhing GMA is part of a broader prospective longitudinal study on the neurodevelopmental assessment of preterm infants. The infants were recruited between January and November 2017 from the Necker Enfants-Malades and Armand Trousseau University hospitals in France. This research followed the Guidelines for Reporting Reliability and Agreement Studies (GRRAS) [12].

Ethical approval

This study obtained the approval of the Ethical Committee for the Protection of Persons of Île-de-France V (CPP Ref: d-10-16) and followed the principles set out in the Declaration of Helsinki. The infants’ parents were sent an informative letter outlining the study’s objectives and methods. In reply, they gave written informed consent to their children’s inclusion in the study and the video evaluation of the infant’s GMs.

Participants and raters

The sample size selection complied with Bonett’s parameters that recommend 28 participants for reliability studies with five raters and an intraclass correlation coefficient value of 0.80 [23]. The sample was recruited conveniently based on the following inclusion criteria: infants born at <37 gestational weeks and admitted to the neonatal intensive care unit (NICU). Infants with congenital anomalies and a severe illness at the time of GMs assessment were excluded. The study used a computerized randomization method to select pairs of raters from a list of GMA-certified raters (n = 5). According to Table 1, each participant (n = 69) was randomly assigned to be evaluated by a different pair (n = 5) of raters.

thumbnail
Table 1. Assignment of subjects to be evaluated by each pair of raters.

https://doi.org/10.1371/journal.pone.0301934.t001

Data collection

The data consisted of the participants’ GMs video filmed at <37 corrected weeks before their discharge from hospital. Following the GMA parameters, a remotely controlled video camera was positioned above the infants to capture the entire body. The infants were filmed in the supine position wearing little clothing. The filming began when the infants exhibited spontaneous movements following routine nursing care. Infants of <36 corrected weeks were filmed awake or asleep. Infants of ≥36 corrected weeks were filmed in stage 4 (eyes open, no crying, movement present) [24]. During the filming, interaction with caregivers and movement-limiting objects was avoided. Rater A recorded and edited the videos. She also collected the participants’ clinical information from their electronic medical records after completing the GMs assessment.

General movements assessment

The raters received the videos via an online secure platform with a fifteen-day time frame to complete the GMs assessment. Based on these videos, the raters conducted a global and a detailed GMs assessment according to the preterm writhing GMA criteria [11]. In the global assessment, the raters classified GMs as normal if they exhibited varying speed and spatial occupation of moving extremities, along with complex and fluid articulatory rotations. The raters classified the GMs as abnormal if they showed decreased variability, complexity, and fluency. Abnormal GMs included these four subcategories: (a) poor repertoire if the limbs presented monotonous speed and spatial occupation, (b) cramped synchronized if the limbs presented high rigidity, (c) chaotic if the movements were disorganized and (d) hypokinetic if no GMs were observed. Next, the raters performed the detailed assessment using the scoring sheet to evaluate (from 0 to 2) the following items separately: sequence of the GMs, neck and trunk involvement, superior extremities, and lower extremities. The total of the scores for the items gave the GMOS (from 5 to 42). GMOS was not calculated when GMs were hypokinetic.

The raters were in different countries and did not communicate with each other. The raters were blinded to the participants’ assignment, clinical data, and individual identity information (except filming age in corrected weeks) and conducted the evaluations independently. After 45 minutes of viewing the videos, the raters took a 5-minute break to calibrate their perception. They participated in two online previous training sessions, which included studying a preterm writhing GMA pedagogical video and reaching an agreement on two cases.

Outcome variables

The outcome variables were (a) the GMs classification (normal versus abnormal; normal versus abnormal subcategories) obtained through the global assessment and (b) the GMOS (from 5 to 42) obtained through the detailed assessment.

Statistical analysis

The data reporting used statistical descriptors according to the level of measurement of the variables, with mean and standard deviation (±SD) for continuous variables, median and interquartile range (IQR) for ordinal variables, and frequency and percentage for nominal variables. The analysis considered two types of GMs classification. Firstly, the binary classification of GMs in normal versus abnormal. Next, the classification of the GMs in normal versus abnormal GMs subcategories (poor repertoire, cramped synchronized, chaotic, or hypokinetic). The Kolmogorov-Smirnov test confirmed the normal distribution of the GMOS.

The study calculated the percentage of agreement and Gwet’s AC1 coefficient for the GMs classification to evaluate the inter-rater reliability of the global assessment. The AC1 was used because it corresponds better with the percentage of agreement than the k coefficient and controls the problems associated with the prevalence [25]. Additionally, the AC1 has already been used to assess inter-rater reliability of the writhing GMA in postoperative infants [26]. The interpretation of the AC1 considered fair (≤ 0.40), moderate (from 0.41 to 0.60), substantial (from 0.61 to 0.80), and high (from 0.81 to 1.00) inter-rater reliability [27].

The evaluation of the inter-rater reliability of the detailed assessment required the calculation of the intraclass correlation coefficient (ICC) for the GMOS. The model of One-way Random Effect, absolute agreement, and single assessment was used. That model is suitable for designs in which subjects are evaluated by different pairs of randomly selected raters [28]. The interpretation of the ICC considered poor (≤ 0.40), fair (from 0.41 to 0.59), good (from 0.60 to 0.74), and excellent (from 0.75 to 1) inter-rater reliability [29]. The standard error of measurement (SEM) was calculated with the formula using the SD of the differences between each pair of raters [30].

Reliability coefficients were calculated for each pair of raters and then averaged to obtain a single inter-rater reliability index. This study considered a two-tailed p-value of < 0.05 as significant and calculated 95% confidence intervals (CI). The data analysis was performed using SPSS Statistics version 25.0 and AgreeStat 360 [31].

Results

Participants and raters

Eighty-two infants were recruited to ensure sufficient data in the contingency table. Thirteen infants were excluded due to withdrawal (n = 1), congenital disease (n = 1), full-term birth (n = 3), and unavailability for filming before reaching 37 corrected weeks (n = 8). In total, 69 infants were included with 35 (51%) males. Table 2 presents the participants’ clinical characteristics. The participants were filmed at 35±1 corrected weeks for an average of 3±1 minutes to obtain 5±1 sequences of GMs.

thumbnail
Table 2. Clinical characteristics of participants (n = 69).

https://doi.org/10.1371/journal.pone.0301934.t002

The raters (n = 5) were three physicians and two psychologists with an average of 18±2 years of experience in child neurodevelopment. Three raters are clinical rehabilitation specialists and the other two work in research. The raters are GMs Trust group certified with 6±5 years of experience in working on the global and detailed assessments of the preterm writhing GMA. The results considered the pairs of raters numbered 1,2,3, and 5. The number 4 pair of raters was excluded due to insufficient participants (n = 4) assignment (see Table 1). Thus, four pairs of raters completed the global and detailed assessments of the GMs in 65 preterm infants during the preterm writhing period.

Inter-rater reliability for the global assessment

Table 3 shows the distribution of the subjects across GMs categories. The GMs categories presented an agreement percentage of 84% (from 69% to 100%). The category with the highest disagreement among the raters was poor repertoire GMs (from 69% to 75%). None of the GMs categories obtained 100% of agreement among all pairs of raters.

thumbnail
Table 3. Distribution of subjects per pair of raters and category of the GMs evaluated during the preterm period (n = 65).

https://doi.org/10.1371/journal.pone.0301934.t003

Table 4 presents the inter-rater reliability for GMs classification. The GMs binary classification (normal versus abnormal) obtained an inter-rater agreement percentage of 88% (from 80% to 100%) with a coefficient AC1 = 0.84±0.5 (from 0.68 to 1). The GMs classification with abnormal subcategories (poor repertoire, cramped synchronized, chaotic, or hypokinetic) demonstrated an inter-rater agreement of 72% (from 69% to 77%) and an AC1 = 0.67±0 (from 0.63 to 0.73).

thumbnail
Table 4. Inter-rater reliability per pair of raters for the GMs classification during the preterm period (n = 65).

https://doi.org/10.1371/journal.pone.0301934.t004

Inter-rater reliability for the detailed assessment

Table 5 presents the inter-rater reliability for the GMOS. The GMOS obtained an ICC = 0.72±8 (from 0.66 to 0.79). The lowest inter-rater reliability was related to the item assessing the neck and trunk involvement with an ICC = 0.44±1 (from 0.19 to 0.66). No GMOS item obtained a perfect inter-rater agreement among all pairs of raters.

thumbnail
Table 5. Inter-rater reliability for the GMOS by pair of raters during the preterm period (n = 64).

https://doi.org/10.1371/journal.pone.0301934.t005

Discussion

This study aimed to assess the inter-rater reliability of the preterm writhing GMA, providing evidence on both (a) the global and (b) the detailed assessment of the same sample of preterm infants. We will therefore discuss inter-rater reliability estimates for both the GMs classification and the GMOS, then the precision of these estimates will also be addressed.

Inter-rater reliability for the global assessment

As expected in our hypothesis, the inter-rater reliability of the global assessment was high (AC1 = 0.84 [95% CI = 0.54,1]) for the GMs binary classification (normal versus abnormal). However, it was substantial (AC1 = 0.67 [95% CI = 0.38,0.89]) for the GMs classification with abnormal subcategories (poor repertoire, cramped synchronized, chaotic, or hypokinetic). Our findings align with prior research on preterm writhing, writhing, and fidgety GMA, which found higher inter-rater reliability for the GMs binary (k > 80) classification compared to the GMs classification with abnormal subcategories (k = 50) [15, 17, 19]. Therefore, we will discuss the factors that influence this disparity in the inter-rater reliability for the global assessment.

The decrease in inter-rater reliability for the GMs classification with abnormal subcategories can be attributed to the nature of reliability coefficients, which tend to diminish when there are more than two classification categories [35]. Additionally, items that are difficult to interpret can impact the inter-rater reliability for the GMs classification with abnormal subcategories [30]. The poor repertoire GMs category could be a challenging item due to the highest bias (0.23) and the highest disagreement (76%) among raters. Although disagreement could have been reduced by using the checklist to guide GMs assessment, our inter-rater reliability estimates agree with those obtained for the checklist (k = 0.68–0.80) [16]. Observations in this study align with previous studies that have suggested the lack of precision of poor repertoire GMs in identifying neurodevelopmental disturbances in preterm infants [36]. Studies have shown that infants with poor repertoire GMs can transition into normal GMs by the time they reach term age [37]. This imprecision of poor repertoire GMs could also impact inter-rater reliability for the GMs classification with abnormal subcategories.

Therefore, clinical studies recommend combining the binary GMs classification with other neurological measures and neuroimaging to enhance the identification of preterm infants at neurodevelopmental risk [38]. Also, researchers have suggested using the global assessment of the preterm writhing GMA in the framework of longitudinal neurodevelopmental follow-up monitoring [21].

Inter-rater reliability for the detailed assessment

Contrary to our hypothesis, the detailed assessment demonstrated good inter-rater reliability (ICC = 0.72 [95% CI = 0.35,0.89]) for the GMOS. This observation differs from previous studies that reported higher inter-rater reliability for the fidgety MOS and the fidgety MOS-revised [19, 20]. Therefore, we will now consider the factors that may have influenced the inter-rater reliability value for the GMOS in this study.

Earlier studies have shown that the differences among raters’ expertise levels can affect inter-rater reliability for the fidgety GMA [18]. Although the raters have comparable expertise levels in preterm writhing GMA, they come from different professional clinical and research backgrounds. Raters (pair 2) with a research background demonstrated higher inter-rater reliability (ICC = 0.79 [95% CI = 0.56,0.90]) for the GMOS compared to raters from clinical fields. This observation aligns with a study that revealed lower inter-rater reliability for the global assessment of the writhing GMA and fidgety GMA in clinical settings than in research settings [17]. Professional background differences among raters in this study may have influenced inter-rater reliability for the GMOS.

Additionally, the item assessing neck and trunk involvement may be challenging because it presented the highest disagreement (ICC = 0.44 [95% CI = 0.12,0.78]) among raters. The response options for this item make it hard to discriminate between little and no involvement. The item evaluating the presence of tremulous movements may also be challenging because it had the second-highest rater disagreement. Two previous studies could support these observations. One study observed tremulous movements in both normal GMs and abnormal GMs [21] during the preterm writhing and writhing period. The other study demonstrated the clinical imprecision of tremulous movements during the writhing period in identifying neurodevelopmental disturbances in preterm infants [39]. The lack of precision of these challenging items may have impacted the inter-rater reliability for the GMOS.

Given the clinical utility of the detailed assessment, previous studies recommend using it in combination with the global assessment to gain a deeper understanding of specific parameters and trajectories of the GMs in preterm infants [21].

The precisión of estimates

We also observed wide 95% confidence intervals, which suggest potential limitations in the precision of the inter-rater reliability estimates. Confidence intervals for any inter-rater reliability estimate depend on two factors: sample size and sample variability related to the assessed parameter (in this case, the GMs) [40]. Therefore, we will address how these two factors could have influenced the precision of inter-rater reliability estimates for both the global and the detailed assessments.

Firstly, a small sample size can lead to increased error and increased uncertainty in inter-rater reliability estimates [40]. While our sample size is suitable for reliability studies involving categorical and numerical variables [23, 35], the relatively small number of subjects (n = 69) might affect the precision of inter-rater reliability estimates for both the global and detailed assessments. Two previous studies with smaller sample sizes reported similar findings. One study (n = 39) on the global assessment of the preterm writhing GMA reported high inter-rater reliability (k > 0.80 [95% CI = 0.40,1]) for the GMs classification but noted a wide 95% confidence interval [15]. Another study (n = 24), focusing on the detailed assessment, reported high inter-rater reliability (ICC = 0.87) for the fidgety MOS but noted a slightly higher measurement error than expected.

Secondly, an increased variability within the sample reduces the precision of inter-rater reliability estimates [41]. The oscillation in the proportion of abnormal GMs (ranging from 53% to 80%) and the standard deviation (25.3±8) of the GMOS may suggest variability within our sample. Thus, the relative heterogeneity related to the GMs within the sample could have influenced the precision of inter-rater reliability estimates for both the global and the detailed assessments. These observations contrast with the findings of a recent study involving a more heterogeneous sample of preterm and term infants with diverse clinical characteristics [20]. In that study, the fidgety MOS-revised exhibited higher inter-rater reliability values (ICC = 0.98 [95% CI = 0.97,0.99]). It is important to note that the study included a significantly larger sample (n = 252).

Limitations and future research

Our sample size met Bonett’s parameters for inter-rater reliability studies [23] but the relatively low number of subjects was a limiting factor for this study. Therefore, future research should consider larger samples of preterm infants to increase the precision of inter-rater reliability estimates for the preterm writhing GMA. While the randomized assignment of participants to randomly formed pairs of raters might minimize bias [28], the convenience recruiting participants from the NICU was another limitation for this study. Participants recruiting in NICU might to explain the high rate of PVL in our sample. Therefore, future studies on high-risk preterm infants could consider recruiting participants upon admission to the NICU to improve the sample’s representativeness and the generalizability of inter-rater reliability estimations for the preterm writhing GMA.

Conclusions

Reliability in identifying preterm infants at neurodevelopmental risk is a critical concern in assessments. This study provides insights into the inter-rater reliability of the preterm writhing GMA for evaluating the functionality of a young nervous system. We observed high and substantial inter-rater reliability for the global assessment, with the binary GMs classification being the most reliable. The detailed assessment showed good inter-rater reliability for the GMOS. However, our small sample size limited the precision of these estimates. Several challenging items, such as assessing neck and trunk involvement, poor repertoire GMs, and tremulous movements contributed to substantial inconsistency among raters. Therefore, ongoing training and rater calibration is necessary to enhance inter-rater reliability for the preterm writhing GMA. The preterm writhing GMA seems to have better inter-rater reliability in research settings than in a clinical environment. Given the utility of the preterm writhing GMA, further investigation in clinical settings is necessary to better understand its inter-rater reliability in identifying preterm infants at a high risk of neurodevelopmental issues.

References

  1. 1. Blencowe H, Vos T, Lee AC, Philips R, Lozano R, Alvarado MR, et al. Estimates of neonatal morbidities and disabilities at regional and global levels for 2010: introduction, methods overview, and relevant findings from the Global Burden of Disease study. Pediatr Res. 2013;74: 4–16. pmid:24366460
  2. 2. Wang CJ, Mcglynn EA, Brook RH, Leonard CH, Piecuch RE. Quality-of-Care Indicators for the Neurodevelopmental Follow-up of Very Low Birth Weight Children: Results of an Expert Panel Process. Pediatrics. 2006;Volume 117: 2080–2092. pmid:16740851
  3. 3. Einspieler C, Prechtl HFR. Prechtl’s assessment of general movements: A diagnostic tool for the functional assessment of the young nervous system. Ment Retard Dev Disabil Res Rev. 2005;11: 61–67. pmid:15856440
  4. 4. Cioni G, Bos A, Einspieler C, Ferrari F, Martijn A, Paolicelli PB, et al. Early neurological signs in preterm infants with unilateral intraparenchymal echodensity. Neuropediatrics. 2000;31: 240–51. pmid:11204281
  5. 5. Spittle AJ, Brown NC, Doyle LW, Boyd RN, Hunt RW, Bear M, et al. Quality of general movements is related to white matter pathology in very preterm infants. Pediatrics. 2008;121: e1184–9. pmid:18390959
  6. 6. Peyton C, Einspieler C. General Movements: A Behavioral Biomarker of Later Motor and Cognitive Dysfunction in NICU Graduates. Pediatr Ann. 2018;47: e159–e164. pmid:29668025
  7. 7. Olsen JE, Cheong JLY, Eeles AL, FitzGerald TL, Cameron KL, Albesher RA, et al. Early general movements are associated with developmental outcomes at 4.5–5 years. Early Hum Dev. 2020;148: 105115. pmid:32615517
  8. 8. Ferrari F, Cioni G, Einspieler C, Roversi MF, Bos AF, Paolicelli PB, et al. Cramped synchronized general movements in preterm infants as an early marker for cerebral palsy. Arch Pediatr Adolesc Med. 2002;156: 460–467. pmid:11980551
  9. 9. Novak I, Morgan C, Adde L, Blackman J, Boyd RN, Brunstrom-Hernandez J, et al. Early, accurate diagnosis and early intervention in cerebral palsy: Advances in diagnosis and treatment. JAMA Pediatr. 2017;171: 897–907. pmid:28715518
  10. 10. Einspieler C, Bos A, Libertus M, Marschik P. The General Movement Assessment Helps Us to Identify Preterm Infants at Risk for Cognitive Dysfunction. Front Psychol. 2016;7: 406. pmid:27047429
  11. 11. Einspieler C, Prechtl HFR, Bos A, Ferrari F, Cioni G. Prechtl’s method on the qualitative assessment of general movements in preterm, term and young infants. Clin Dev Med. 2004;167: 1–91.
  12. 12. Kottner J, Audig L, Brorson S, Donner A, Gajewski BJ. Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed. 2011;64: 96–106. pmid:21130355
  13. 13. Craciunoiu O, Holsti L. A Systematic Review of the Predictive Validity of Neurobehavioral Assessments During the Preterm Period. Phys Occup Ther Pediatr. 2016;2638: 1–16. pmid:27314272
  14. 14. Valentin T, Uhl K, Einspieler C. The effectiveness of training in Prechtl’s method on the qualitative assessment of general movements. Early Hum Dev. 2005;81: 623–627. pmid:15975743
  15. 15. Mutlu A, Einspieler C, Marschik PB, Livanelioglu A. Intra-individual consistency in the quality of neonatal general movements. Neonatology. 2008;93: 213–216. pmid:17992022
  16. 16. Aizawa CYP, Einspieler C, Genovesi FF, Ibidi SM, Hasue RH. The general movement checklist: A guide to the assessment of general movements during preterm and term age. J Pediatr (Rio J). 2021;97: 445–452. pmid:33147443
  17. 17. Bernhardt I, Marbacher M, Hilfiker R, Radlinger L. Inter- and intra-observer agreement of Prechtl’s method on the qualitative assessment of general movements in preterm, term and young infants. Early Hum Dev. 2011;87: 633–639. pmid:21616611
  18. 18. Peyton C, Pascal A, Boswell L, deRegnier R, Fjørtoft T, Støen R, et al. Inter-observer reliability using the General Movement Assessment is influenced by rater experience. Early Hum Dev. 2021;161. pmid:34375936
  19. 19. Fjørtoft T, Einspieler C, Adde L, Strand LI. Inter-observer reliability of the “Assessment of Motor Repertoire—3 to 5 Months” based on video recordings of infants. Early Hum Dev. 2009;85: 297–302. pmid:19138831
  20. 20. Örtqvist M, Marschik PB, Toldo M, Zhang D, Fajardo‐Martinez V, Nielsen‐Saines K, et al. Reliability of the Motor Optimality Score‐Revised: a study of infants at elevated likelihood for adverse neurological outcomes. Acta Paediatr. 2023. pmid:36895106
  21. 21. Einspieler C, Marschik PB, Pansy J, Scheuchenegger A, Krieber M, Yang H, et al. The general movement optimality score: a detailed assessment of general movements during preterm and term age. Dev Med Child Neurol. 2016;4: 361–368. pmid:26365130
  22. 22. Hadders-Algra M. The neuronal group selection theory: promising principles for understanding and treating developmental motor disorders. Dev Med Child Neurol. 2000;42: 707–715. pmid:11085302
  23. 23. Bonett DG. Sample size requirements for estimating intraclass correlations with desired precision. Stat Med. 2002;21: 1331–1335. pmid:12111881
  24. 24. Prechtl HFR. The behavioural states of the newborn infant (a review). Brain Res. 1974;76. pmid:4602352
  25. 25. Wongpakaran N, Wongpakaran T, Wedding D, Gwet KL. A comparison of Cohen’s Kappa and Gwet’s AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples. 2013. Available: http://www.biomedcentral.com/1471-2288/13/61
  26. 26. Crowle C, Galea C, Morgan C, Novak I, Walker K, Badawi N. Inter-observer agreement of the General Movements Assessment with infants following surgery. Early Hum Dev. 2017;104: 17–21. pmid:27914275
  27. 27. Landis JR, Koch GG. The Measurement of Observer Agreement for Categorical Data. Biometrics. 1977;33: 159–174. pmid:843571
  28. 28. Hallgren K. Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial. Tutor Quant Methods Psychol. 2012;8: 23–34. pmid:22833776
  29. 29. Cicchetti D V. Interreliability Standards in Psychological Evaluations. Psychol Assess. 1994; 284–290.
  30. 30. De Vet H, Terwee C, Mokkink L, Knol DL. Measurement in medicine: A practical guide. Measurement in Medicine: A Practical Guide. 2011.
  31. 31. Gwet KL: Gwet K. Handbook of inter-rater reliability. The definitive guide to measuring the extent of agreement amongst raters 4th edition. Gaithersburg: Advanced Analytics LLC; 2014. 2010.
  32. 32. Ment LR, Bada HS, Barnes P, Grant PE, Hirtz D, Papile LA, et al. Practice parameter: Neuroimaging of the neonate: Report of the Quality Standards Subcommittee of the American Academy of Neurology and the Practice Committee of the Child Neurology Society. Neurology. 2012;59: 1663–1664. pmid:12451226
  33. 33. de Vries LS, Eken P, Dubowitz LMS. The spectrum of leukomalacia using cranial ultrasound. Behavioural Brain Research. 1992;49: 1–6. pmid:1388792
  34. 34. Hadchouel A, Delacourt C. Premature infants bronchopulmonary dysplasia: Past and present. Rev Pneumol Clin. 2013;69: 207–216. pmid:23867575
  35. 35. Sim J, Wright CC. The kappa statistic in reliability studies: Use, interpretation, and sample size requirements. Phys Ther. 2005;85: 257–268. pmid:15733050
  36. 36. Nakajima Y, Einspieler C, Marschik PB, Bos AF, Prechtl HFR. Does a detailed assessment of poor repertoire general movements help to identify those infants who will develop normally? Early Hum Dev. 2006;82: 53–59. pmid:16153788
  37. 37. Olsen JE, Brown NC, Eeles AL, Lee KJ, Anderson PJ, Cheong JLY, et al. Trajectories of general movements from birth to term-equivalent age in infants born <30 weeks’ gestation. Early Hum Dev. 2015;91: 683–688. pmid:26513629
  38. 38. Morgan C, Romeo DM, Chorna O, Novak I, Galea C, Del Secco S, et al. The pooled diagnostic accuracy of neuroimaging, general movements, and neurological examination for diagnosing cerebral palsy early in high-risk infants: A case control study. J Clin Med. 2019;8. pmid:31694305
  39. 39. Spittle A, Walsh J, Potter C, Mcinnes E, Olsen J, Lee K, et al. Neurobehaviour at term-equivalent age and neurodevelopmental outcomes at 2 years in infants born moderate-to-late preterm. Dev Med Child Neurol. 2017;59: 207–215. pmid:27775148
  40. 40. Gardner MJ, Altman DG. Statistics in Medicine Confidence intervals rather than P values: estimation rather than hypothesis testing. Br Med J (Clin Res Ed). 1986;292: 746–750. pmid:3082422
  41. 41. O’Brien SF, Yi QL. How do I interpret a confidence interval? Transfusion (Paris). 2016;56: 1680–1683. pmid:27184382