Figures
Abstract
Introduction
Prechtl’s method (GMA) is a test for the functional assessment of the young nervous system. It involves a global and a detailed assessment of the general movements (GMs) and has demonstrated validity. Data on the reliability of both assessments in the preterm period are scarce. This study aimed to evaluate the inter-rater reliability for the global and detailed assessments of the preterm writhing GMA.
Materials and methods
The study participants were 69 infants born at <37 gestational weeks and admitted to the neonatal intensive care unit. They were randomly assigned to five pairs of raters. Raters assessed infants’ GMs using preterm videos. Outcome variables were (a) the GMs classification (normal versus abnormal; normal versus abnormal subcategories) and (b) the general movements optimality score (GMOS), obtained through the global and detailed assessments. The Gwet’s AC1 and the intraclass correlation coefficient (ICC) were calculated for the GMs classification and the GMOS, respectively.
Results
The global assessment presented an AC1 = 0.84 [95% CI = 0.54,1] for the GMs binary classification and an AC1 = 0.67 [95% CI = 0.38,0.89] for the GMs classification with abnormal subcategories. The detailed assessment presented an ICC = 0.72 [95% CI = 0.39,0.90] for the GMOS.
Conclusions
Inter-rater reliability was high and substantial for the global assessment and good for the detailed assessment. However, the small sample size limited the precision of these estimates. Future research should involve larger samples of preterm infants to improve estimate precision. Challenging items such as assessing the neck and trunk, poor repertoire GMs, and tremulous movements may impact the preterm writhing GMA’s inter-rater reliability. Therefore, ongoing training and calibration among raters is necessary. Further investigation in clinical settings can enhance our understanding of the preterm writhing GMA’s reliability.
Citation: Valencia A, Viñals C, Alvarado E, Balderas M, Provasi J (2024) Prechtl’s method to assess general movements: Inter-rater reliability during the preterm period. PLoS ONE 19(4): e0301934. https://doi.org/10.1371/journal.pone.0301934
Editor: Claudia Brogna, Fondazione Policlinico Universitario Gemelli IRCCS, ITALY
Received: June 8, 2023; Accepted: March 24, 2024; Published: April 18, 2024
Copyright: © 2024 Valencia et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting Information files.
Funding: The author received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Preterm infants are at risk of neurodevelopmental disorders [1]. Thus, international guidelines recommend performing appropriate neurodevelopmental assessments to identify infants who could benefit from early intervention [2]. Prechtl’s method (GMA) is a video-based test for the functional assessment of the young nervous system. The GMA has high clinical utility for premature infants because it consists of an observational assessment of the quality of general movements (GMs). GMs originate endogenously in the entire body in a sequence of spontaneous movements [3]. Variability, complexity, and fluidity are the characteristics that determine the quality of GMs and reflect the integrity of the young nervous system. These characteristics of GMs can be negatively affected by brain abnormalities [4, 5] thus becoming a neurodevelopmental marker [6]. The GMA facilitates the evaluation of GMs in three different developmental periods. The preterm writhing GMA (before term age) and writhing GMA (from term age) identify various neurodevelopmental disorders, including functional, motor, and cognitive issues [7, 8]. The fidgety GMA (from 9 weeks post-term) mainly detects cerebral palsy and cognitive disorders [9, 10].
The three GMA periods include both (a) a global and (b) a detailed assessment of the GMs [11]. During the preterm writhing GMA and writhing GMA, the global assessment involves analyzing the quality of GMs and classifying them as either normal or abnormal. The detailed assessment consists of scoring the GMs’ characteristics in the limbs to obtain the GMs optimality score (GMOS). During the fidgety GMA, the global assessment classifies GMs as present, abnormal, or absent, while the detailed assessment scores the motricity and posture to obtain the motor optimality score (MOS).
Clinicians, therapists, and neuroscientists worldwide have received training from the GMs Trust group in using both the global and detailed assessments of the GMA. As a result, studies in clinical and research contexts have evaluated the GMA’s validity and reliability in preterm infants. Validity and reliability are psychometric properties that express the appropriateness of a test. Validity indicates the scoring accuracy, and reliability shows its consistency. Inter-rater reliability evaluates scoring consistency when different raters assess the same patients with the same test, and agreement is the degree to which scoring is identical [12]. Studies found that the preterm writhing GMA presents high validity (94%) [13] predicting neurodevelopment in preterm infants and high inter-rater agreement (90%) [14]. Specifically, the inter-rater reliability of the global assessment was high (k > 0.80) for preterm writhing GMA and writhing GMA [15], from moderate (k = 0.50) to substantial (k = 0.80) for preterm writhing GMA and writhing GMA in combined analyzes [16, 17], and from substantial (k = 0.64) to high (k = 0.92) for fidgety GMA [15, 18, 19]. A recent study evaluated the reliability of a checklist for guiding raters during the global assessment. That study found that the inter-rater reliability values (k = 0.68–0.80) for the preterm writhing GMA and writhing GMA were similar to those in previous studies [16]. While limited published data exist on the reliability of detailed assessment, studies have report excellent inter-rater reliability (ICC > 0.87) for both the fidgety MOS and the fidgety MOS-revised [19, 20]. However, to our knowledge, data on inter-rater reliability for detailed assessment in the preterm writhing period are scarce.
Preterm writhing GMA inter-rater reliability needs to be identified because most studies either focus on fidgety GMA, combine preterm writhing GMA and writhing GMA analyses, or do not evaluate the detailed assessment. Furthermore, there is a scarcity of data regarding the inter-rater reliability for the global and detailed assessments of the preterm writhing GMA within the same sample of preterm infants. This gap highlights the need for studies to complement existing evidence on the inter-rater reliability of the preterm writhing GMA.
Consequently, this study aimed to evaluate the preterm writhing GMA inter-rater reliability for (a) the global and (b) the detailed assessment. Specifically, this study addressed the global assessment because of its utility on preterm infants. It also considered the detailed assessment because studies suggest using it to complement the global assessment [21]. Given its well-documented high validity in predicting later development in high-risk preterm infants [13], understanding the reliability of the preterm writhing GMA is crucial for clinical practice. Establishing this reliability is essential to inform clinicians about the preterm writhing GMA’s ability to consistently identify at-risk babies, thus facilitating timely interventions during this critical period of neuroplasticity [22]. According to the hypotheses, inter-rater reliability of the preterm writhing GMA would be high (AC1 > 0.80) for the global assessment and excellent (ICC > 0.75) for the detailed assessment.
Materials and methods
Design and setting
This psychometric study on the inter-rater reliability of the preterm writhing GMA is part of a broader prospective longitudinal study on the neurodevelopmental assessment of preterm infants. The infants were recruited between January and November 2017 from the Necker Enfants-Malades and Armand Trousseau University hospitals in France. This research followed the Guidelines for Reporting Reliability and Agreement Studies (GRRAS) [12].
Ethical approval
This study obtained the approval of the Ethical Committee for the Protection of Persons of Île-de-France V (CPP Ref: d-10-16) and followed the principles set out in the Declaration of Helsinki. The infants’ parents were sent an informative letter outlining the study’s objectives and methods. In reply, they gave written informed consent to their children’s inclusion in the study and the video evaluation of the infant’s GMs.
Participants and raters
The sample size selection complied with Bonett’s parameters that recommend 28 participants for reliability studies with five raters and an intraclass correlation coefficient value of 0.80 [23]. The sample was recruited conveniently based on the following inclusion criteria: infants born at <37 gestational weeks and admitted to the neonatal intensive care unit (NICU). Infants with congenital anomalies and a severe illness at the time of GMs assessment were excluded. The study used a computerized randomization method to select pairs of raters from a list of GMA-certified raters (n = 5). According to Table 1, each participant (n = 69) was randomly assigned to be evaluated by a different pair (n = 5) of raters.
Data collection
The data consisted of the participants’ GMs video filmed at <37 corrected weeks before their discharge from hospital. Following the GMA parameters, a remotely controlled video camera was positioned above the infants to capture the entire body. The infants were filmed in the supine position wearing little clothing. The filming began when the infants exhibited spontaneous movements following routine nursing care. Infants of <36 corrected weeks were filmed awake or asleep. Infants of ≥36 corrected weeks were filmed in stage 4 (eyes open, no crying, movement present) [24]. During the filming, interaction with caregivers and movement-limiting objects was avoided. Rater A recorded and edited the videos. She also collected the participants’ clinical information from their electronic medical records after completing the GMs assessment.
General movements assessment
The raters received the videos via an online secure platform with a fifteen-day time frame to complete the GMs assessment. Based on these videos, the raters conducted a global and a detailed GMs assessment according to the preterm writhing GMA criteria [11]. In the global assessment, the raters classified GMs as normal if they exhibited varying speed and spatial occupation of moving extremities, along with complex and fluid articulatory rotations. The raters classified the GMs as abnormal if they showed decreased variability, complexity, and fluency. Abnormal GMs included these four subcategories: (a) poor repertoire if the limbs presented monotonous speed and spatial occupation, (b) cramped synchronized if the limbs presented high rigidity, (c) chaotic if the movements were disorganized and (d) hypokinetic if no GMs were observed. Next, the raters performed the detailed assessment using the scoring sheet to evaluate (from 0 to 2) the following items separately: sequence of the GMs, neck and trunk involvement, superior extremities, and lower extremities. The total of the scores for the items gave the GMOS (from 5 to 42). GMOS was not calculated when GMs were hypokinetic.
The raters were in different countries and did not communicate with each other. The raters were blinded to the participants’ assignment, clinical data, and individual identity information (except filming age in corrected weeks) and conducted the evaluations independently. After 45 minutes of viewing the videos, the raters took a 5-minute break to calibrate their perception. They participated in two online previous training sessions, which included studying a preterm writhing GMA pedagogical video and reaching an agreement on two cases.
Outcome variables
The outcome variables were (a) the GMs classification (normal versus abnormal; normal versus abnormal subcategories) obtained through the global assessment and (b) the GMOS (from 5 to 42) obtained through the detailed assessment.
Statistical analysis
The data reporting used statistical descriptors according to the level of measurement of the variables, with mean and standard deviation (±SD) for continuous variables, median and interquartile range (IQR) for ordinal variables, and frequency and percentage for nominal variables. The analysis considered two types of GMs classification. Firstly, the binary classification of GMs in normal versus abnormal. Next, the classification of the GMs in normal versus abnormal GMs subcategories (poor repertoire, cramped synchronized, chaotic, or hypokinetic). The Kolmogorov-Smirnov test confirmed the normal distribution of the GMOS.
The study calculated the percentage of agreement and Gwet’s AC1 coefficient for the GMs classification to evaluate the inter-rater reliability of the global assessment. The AC1 was used because it corresponds better with the percentage of agreement than the k coefficient and controls the problems associated with the prevalence [25]. Additionally, the AC1 has already been used to assess inter-rater reliability of the writhing GMA in postoperative infants [26]. The interpretation of the AC1 considered fair (≤ 0.40), moderate (from 0.41 to 0.60), substantial (from 0.61 to 0.80), and high (from 0.81 to 1.00) inter-rater reliability [27].
The evaluation of the inter-rater reliability of the detailed assessment required the calculation of the intraclass correlation coefficient (ICC) for the GMOS. The model of One-way Random Effect, absolute agreement, and single assessment was used. That model is suitable for designs in which subjects are evaluated by different pairs of randomly selected raters [28]. The interpretation of the ICC considered poor (≤ 0.40), fair (from 0.41 to 0.59), good (from 0.60 to 0.74), and excellent (from 0.75 to 1) inter-rater reliability [29]. The standard error of measurement (SEM) was calculated with the formula using the SD of the differences between each pair of raters [30].
Reliability coefficients were calculated for each pair of raters and then averaged to obtain a single inter-rater reliability index. This study considered a two-tailed p-value of < 0.05 as significant and calculated 95% confidence intervals (CI). The data analysis was performed using SPSS Statistics version 25.0 and AgreeStat 360 [31].
Results
Participants and raters
Eighty-two infants were recruited to ensure sufficient data in the contingency table. Thirteen infants were excluded due to withdrawal (n = 1), congenital disease (n = 1), full-term birth (n = 3), and unavailability for filming before reaching 37 corrected weeks (n = 8). In total, 69 infants were included with 35 (51%) males. Table 2 presents the participants’ clinical characteristics. The participants were filmed at 35±1 corrected weeks for an average of 3±1 minutes to obtain 5±1 sequences of GMs.
The raters (n = 5) were three physicians and two psychologists with an average of 18±2 years of experience in child neurodevelopment. Three raters are clinical rehabilitation specialists and the other two work in research. The raters are GMs Trust group certified with 6±5 years of experience in working on the global and detailed assessments of the preterm writhing GMA. The results considered the pairs of raters numbered 1,2,3, and 5. The number 4 pair of raters was excluded due to insufficient participants (n = 4) assignment (see Table 1). Thus, four pairs of raters completed the global and detailed assessments of the GMs in 65 preterm infants during the preterm writhing period.
Inter-rater reliability for the global assessment
Table 3 shows the distribution of the subjects across GMs categories. The GMs categories presented an agreement percentage of 84% (from 69% to 100%). The category with the highest disagreement among the raters was poor repertoire GMs (from 69% to 75%). None of the GMs categories obtained 100% of agreement among all pairs of raters.
Table 4 presents the inter-rater reliability for GMs classification. The GMs binary classification (normal versus abnormal) obtained an inter-rater agreement percentage of 88% (from 80% to 100%) with a coefficient AC1 = 0.84±0.5 (from 0.68 to 1). The GMs classification with abnormal subcategories (poor repertoire, cramped synchronized, chaotic, or hypokinetic) demonstrated an inter-rater agreement of 72% (from 69% to 77%) and an AC1 = 0.67±0 (from 0.63 to 0.73).
Inter-rater reliability for the detailed assessment
Table 5 presents the inter-rater reliability for the GMOS. The GMOS obtained an ICC = 0.72±8 (from 0.66 to 0.79). The lowest inter-rater reliability was related to the item assessing the neck and trunk involvement with an ICC = 0.44±1 (from 0.19 to 0.66). No GMOS item obtained a perfect inter-rater agreement among all pairs of raters.
Discussion
This study aimed to assess the inter-rater reliability of the preterm writhing GMA, providing evidence on both (a) the global and (b) the detailed assessment of the same sample of preterm infants. We will therefore discuss inter-rater reliability estimates for both the GMs classification and the GMOS, then the precision of these estimates will also be addressed.
Inter-rater reliability for the global assessment
As expected in our hypothesis, the inter-rater reliability of the global assessment was high (AC1 = 0.84 [95% CI = 0.54,1]) for the GMs binary classification (normal versus abnormal). However, it was substantial (AC1 = 0.67 [95% CI = 0.38,0.89]) for the GMs classification with abnormal subcategories (poor repertoire, cramped synchronized, chaotic, or hypokinetic). Our findings align with prior research on preterm writhing, writhing, and fidgety GMA, which found higher inter-rater reliability for the GMs binary (k > 80) classification compared to the GMs classification with abnormal subcategories (k = 50) [15, 17, 19]. Therefore, we will discuss the factors that influence this disparity in the inter-rater reliability for the global assessment.
The decrease in inter-rater reliability for the GMs classification with abnormal subcategories can be attributed to the nature of reliability coefficients, which tend to diminish when there are more than two classification categories [35]. Additionally, items that are difficult to interpret can impact the inter-rater reliability for the GMs classification with abnormal subcategories [30]. The poor repertoire GMs category could be a challenging item due to the highest bias (0.23) and the highest disagreement (76%) among raters. Although disagreement could have been reduced by using the checklist to guide GMs assessment, our inter-rater reliability estimates agree with those obtained for the checklist (k = 0.68–0.80) [16]. Observations in this study align with previous studies that have suggested the lack of precision of poor repertoire GMs in identifying neurodevelopmental disturbances in preterm infants [36]. Studies have shown that infants with poor repertoire GMs can transition into normal GMs by the time they reach term age [37]. This imprecision of poor repertoire GMs could also impact inter-rater reliability for the GMs classification with abnormal subcategories.
Therefore, clinical studies recommend combining the binary GMs classification with other neurological measures and neuroimaging to enhance the identification of preterm infants at neurodevelopmental risk [38]. Also, researchers have suggested using the global assessment of the preterm writhing GMA in the framework of longitudinal neurodevelopmental follow-up monitoring [21].
Inter-rater reliability for the detailed assessment
Contrary to our hypothesis, the detailed assessment demonstrated good inter-rater reliability (ICC = 0.72 [95% CI = 0.35,0.89]) for the GMOS. This observation differs from previous studies that reported higher inter-rater reliability for the fidgety MOS and the fidgety MOS-revised [19, 20]. Therefore, we will now consider the factors that may have influenced the inter-rater reliability value for the GMOS in this study.
Earlier studies have shown that the differences among raters’ expertise levels can affect inter-rater reliability for the fidgety GMA [18]. Although the raters have comparable expertise levels in preterm writhing GMA, they come from different professional clinical and research backgrounds. Raters (pair 2) with a research background demonstrated higher inter-rater reliability (ICC = 0.79 [95% CI = 0.56,0.90]) for the GMOS compared to raters from clinical fields. This observation aligns with a study that revealed lower inter-rater reliability for the global assessment of the writhing GMA and fidgety GMA in clinical settings than in research settings [17]. Professional background differences among raters in this study may have influenced inter-rater reliability for the GMOS.
Additionally, the item assessing neck and trunk involvement may be challenging because it presented the highest disagreement (ICC = 0.44 [95% CI = 0.12,0.78]) among raters. The response options for this item make it hard to discriminate between little and no involvement. The item evaluating the presence of tremulous movements may also be challenging because it had the second-highest rater disagreement. Two previous studies could support these observations. One study observed tremulous movements in both normal GMs and abnormal GMs [21] during the preterm writhing and writhing period. The other study demonstrated the clinical imprecision of tremulous movements during the writhing period in identifying neurodevelopmental disturbances in preterm infants [39]. The lack of precision of these challenging items may have impacted the inter-rater reliability for the GMOS.
Given the clinical utility of the detailed assessment, previous studies recommend using it in combination with the global assessment to gain a deeper understanding of specific parameters and trajectories of the GMs in preterm infants [21].
The precisión of estimates
We also observed wide 95% confidence intervals, which suggest potential limitations in the precision of the inter-rater reliability estimates. Confidence intervals for any inter-rater reliability estimate depend on two factors: sample size and sample variability related to the assessed parameter (in this case, the GMs) [40]. Therefore, we will address how these two factors could have influenced the precision of inter-rater reliability estimates for both the global and the detailed assessments.
Firstly, a small sample size can lead to increased error and increased uncertainty in inter-rater reliability estimates [40]. While our sample size is suitable for reliability studies involving categorical and numerical variables [23, 35], the relatively small number of subjects (n = 69) might affect the precision of inter-rater reliability estimates for both the global and detailed assessments. Two previous studies with smaller sample sizes reported similar findings. One study (n = 39) on the global assessment of the preterm writhing GMA reported high inter-rater reliability (k > 0.80 [95% CI = 0.40,1]) for the GMs classification but noted a wide 95% confidence interval [15]. Another study (n = 24), focusing on the detailed assessment, reported high inter-rater reliability (ICC = 0.87) for the fidgety MOS but noted a slightly higher measurement error than expected.
Secondly, an increased variability within the sample reduces the precision of inter-rater reliability estimates [41]. The oscillation in the proportion of abnormal GMs (ranging from 53% to 80%) and the standard deviation (25.3±8) of the GMOS may suggest variability within our sample. Thus, the relative heterogeneity related to the GMs within the sample could have influenced the precision of inter-rater reliability estimates for both the global and the detailed assessments. These observations contrast with the findings of a recent study involving a more heterogeneous sample of preterm and term infants with diverse clinical characteristics [20]. In that study, the fidgety MOS-revised exhibited higher inter-rater reliability values (ICC = 0.98 [95% CI = 0.97,0.99]). It is important to note that the study included a significantly larger sample (n = 252).
Limitations and future research
Our sample size met Bonett’s parameters for inter-rater reliability studies [23] but the relatively low number of subjects was a limiting factor for this study. Therefore, future research should consider larger samples of preterm infants to increase the precision of inter-rater reliability estimates for the preterm writhing GMA. While the randomized assignment of participants to randomly formed pairs of raters might minimize bias [28], the convenience recruiting participants from the NICU was another limitation for this study. Participants recruiting in NICU might to explain the high rate of PVL in our sample. Therefore, future studies on high-risk preterm infants could consider recruiting participants upon admission to the NICU to improve the sample’s representativeness and the generalizability of inter-rater reliability estimations for the preterm writhing GMA.
Conclusions
Reliability in identifying preterm infants at neurodevelopmental risk is a critical concern in assessments. This study provides insights into the inter-rater reliability of the preterm writhing GMA for evaluating the functionality of a young nervous system. We observed high and substantial inter-rater reliability for the global assessment, with the binary GMs classification being the most reliable. The detailed assessment showed good inter-rater reliability for the GMOS. However, our small sample size limited the precision of these estimates. Several challenging items, such as assessing neck and trunk involvement, poor repertoire GMs, and tremulous movements contributed to substantial inconsistency among raters. Therefore, ongoing training and rater calibration is necessary to enhance inter-rater reliability for the preterm writhing GMA. The preterm writhing GMA seems to have better inter-rater reliability in research settings than in a clinical environment. Given the utility of the preterm writhing GMA, further investigation in clinical settings is necessary to better understand its inter-rater reliability in identifying preterm infants at a high risk of neurodevelopmental issues.
References
- 1. Blencowe H, Vos T, Lee AC, Philips R, Lozano R, Alvarado MR, et al. Estimates of neonatal morbidities and disabilities at regional and global levels for 2010: introduction, methods overview, and relevant findings from the Global Burden of Disease study. Pediatr Res. 2013;74: 4–16. pmid:24366460
- 2. Wang CJ, Mcglynn EA, Brook RH, Leonard CH, Piecuch RE. Quality-of-Care Indicators for the Neurodevelopmental Follow-up of Very Low Birth Weight Children: Results of an Expert Panel Process. Pediatrics. 2006;Volume 117: 2080–2092. pmid:16740851
- 3. Einspieler C, Prechtl HFR. Prechtl’s assessment of general movements: A diagnostic tool for the functional assessment of the young nervous system. Ment Retard Dev Disabil Res Rev. 2005;11: 61–67. pmid:15856440
- 4. Cioni G, Bos A, Einspieler C, Ferrari F, Martijn A, Paolicelli PB, et al. Early neurological signs in preterm infants with unilateral intraparenchymal echodensity. Neuropediatrics. 2000;31: 240–51. pmid:11204281
- 5. Spittle AJ, Brown NC, Doyle LW, Boyd RN, Hunt RW, Bear M, et al. Quality of general movements is related to white matter pathology in very preterm infants. Pediatrics. 2008;121: e1184–9. pmid:18390959
- 6. Peyton C, Einspieler C. General Movements: A Behavioral Biomarker of Later Motor and Cognitive Dysfunction in NICU Graduates. Pediatr Ann. 2018;47: e159–e164. pmid:29668025
- 7. Olsen JE, Cheong JLY, Eeles AL, FitzGerald TL, Cameron KL, Albesher RA, et al. Early general movements are associated with developmental outcomes at 4.5–5 years. Early Hum Dev. 2020;148: 105115. pmid:32615517
- 8. Ferrari F, Cioni G, Einspieler C, Roversi MF, Bos AF, Paolicelli PB, et al. Cramped synchronized general movements in preterm infants as an early marker for cerebral palsy. Arch Pediatr Adolesc Med. 2002;156: 460–467. pmid:11980551
- 9. Novak I, Morgan C, Adde L, Blackman J, Boyd RN, Brunstrom-Hernandez J, et al. Early, accurate diagnosis and early intervention in cerebral palsy: Advances in diagnosis and treatment. JAMA Pediatr. 2017;171: 897–907. pmid:28715518
- 10. Einspieler C, Bos A, Libertus M, Marschik P. The General Movement Assessment Helps Us to Identify Preterm Infants at Risk for Cognitive Dysfunction. Front Psychol. 2016;7: 406. pmid:27047429
- 11. Einspieler C, Prechtl HFR, Bos A, Ferrari F, Cioni G. Prechtl’s method on the qualitative assessment of general movements in preterm, term and young infants. Clin Dev Med. 2004;167: 1–91.
- 12. Kottner J, Audig L, Brorson S, Donner A, Gajewski BJ. Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed. 2011;64: 96–106. pmid:21130355
- 13. Craciunoiu O, Holsti L. A Systematic Review of the Predictive Validity of Neurobehavioral Assessments During the Preterm Period. Phys Occup Ther Pediatr. 2016;2638: 1–16. pmid:27314272
- 14. Valentin T, Uhl K, Einspieler C. The effectiveness of training in Prechtl’s method on the qualitative assessment of general movements. Early Hum Dev. 2005;81: 623–627. pmid:15975743
- 15. Mutlu A, Einspieler C, Marschik PB, Livanelioglu A. Intra-individual consistency in the quality of neonatal general movements. Neonatology. 2008;93: 213–216. pmid:17992022
- 16. Aizawa CYP, Einspieler C, Genovesi FF, Ibidi SM, Hasue RH. The general movement checklist: A guide to the assessment of general movements during preterm and term age. J Pediatr (Rio J). 2021;97: 445–452. pmid:33147443
- 17. Bernhardt I, Marbacher M, Hilfiker R, Radlinger L. Inter- and intra-observer agreement of Prechtl’s method on the qualitative assessment of general movements in preterm, term and young infants. Early Hum Dev. 2011;87: 633–639. pmid:21616611
- 18. Peyton C, Pascal A, Boswell L, deRegnier R, Fjørtoft T, Støen R, et al. Inter-observer reliability using the General Movement Assessment is influenced by rater experience. Early Hum Dev. 2021;161. pmid:34375936
- 19. Fjørtoft T, Einspieler C, Adde L, Strand LI. Inter-observer reliability of the “Assessment of Motor Repertoire—3 to 5 Months” based on video recordings of infants. Early Hum Dev. 2009;85: 297–302. pmid:19138831
- 20. Örtqvist M, Marschik PB, Toldo M, Zhang D, Fajardo‐Martinez V, Nielsen‐Saines K, et al. Reliability of the Motor Optimality Score‐Revised: a study of infants at elevated likelihood for adverse neurological outcomes. Acta Paediatr. 2023. pmid:36895106
- 21. Einspieler C, Marschik PB, Pansy J, Scheuchenegger A, Krieber M, Yang H, et al. The general movement optimality score: a detailed assessment of general movements during preterm and term age. Dev Med Child Neurol. 2016;4: 361–368. pmid:26365130
- 22. Hadders-Algra M. The neuronal group selection theory: promising principles for understanding and treating developmental motor disorders. Dev Med Child Neurol. 2000;42: 707–715. pmid:11085302
- 23. Bonett DG. Sample size requirements for estimating intraclass correlations with desired precision. Stat Med. 2002;21: 1331–1335. pmid:12111881
- 24. Prechtl HFR. The behavioural states of the newborn infant (a review). Brain Res. 1974;76. pmid:4602352
- 25. Wongpakaran N, Wongpakaran T, Wedding D, Gwet KL. A comparison of Cohen’s Kappa and Gwet’s AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples. 2013. Available: http://www.biomedcentral.com/1471-2288/13/61
- 26. Crowle C, Galea C, Morgan C, Novak I, Walker K, Badawi N. Inter-observer agreement of the General Movements Assessment with infants following surgery. Early Hum Dev. 2017;104: 17–21. pmid:27914275
- 27. Landis JR, Koch GG. The Measurement of Observer Agreement for Categorical Data. Biometrics. 1977;33: 159–174. pmid:843571
- 28. Hallgren K. Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial. Tutor Quant Methods Psychol. 2012;8: 23–34. pmid:22833776
- 29. Cicchetti D V. Interreliability Standards in Psychological Evaluations. Psychol Assess. 1994; 284–290.
- 30. De Vet H, Terwee C, Mokkink L, Knol DL. Measurement in medicine: A practical guide. Measurement in Medicine: A Practical Guide. 2011.
- 31. Gwet KL: Gwet K. Handbook of inter-rater reliability. The definitive guide to measuring the extent of agreement amongst raters 4th edition. Gaithersburg: Advanced Analytics LLC; 2014. 2010.
- 32. Ment LR, Bada HS, Barnes P, Grant PE, Hirtz D, Papile LA, et al. Practice parameter: Neuroimaging of the neonate: Report of the Quality Standards Subcommittee of the American Academy of Neurology and the Practice Committee of the Child Neurology Society. Neurology. 2012;59: 1663–1664. pmid:12451226
- 33. de Vries LS, Eken P, Dubowitz LMS. The spectrum of leukomalacia using cranial ultrasound. Behavioural Brain Research. 1992;49: 1–6. pmid:1388792
- 34. Hadchouel A, Delacourt C. Premature infants bronchopulmonary dysplasia: Past and present. Rev Pneumol Clin. 2013;69: 207–216. pmid:23867575
- 35. Sim J, Wright CC. The kappa statistic in reliability studies: Use, interpretation, and sample size requirements. Phys Ther. 2005;85: 257–268. pmid:15733050
- 36. Nakajima Y, Einspieler C, Marschik PB, Bos AF, Prechtl HFR. Does a detailed assessment of poor repertoire general movements help to identify those infants who will develop normally? Early Hum Dev. 2006;82: 53–59. pmid:16153788
- 37. Olsen JE, Brown NC, Eeles AL, Lee KJ, Anderson PJ, Cheong JLY, et al. Trajectories of general movements from birth to term-equivalent age in infants born <30 weeks’ gestation. Early Hum Dev. 2015;91: 683–688. pmid:26513629
- 38. Morgan C, Romeo DM, Chorna O, Novak I, Galea C, Del Secco S, et al. The pooled diagnostic accuracy of neuroimaging, general movements, and neurological examination for diagnosing cerebral palsy early in high-risk infants: A case control study. J Clin Med. 2019;8. pmid:31694305
- 39. Spittle A, Walsh J, Potter C, Mcinnes E, Olsen J, Lee K, et al. Neurobehaviour at term-equivalent age and neurodevelopmental outcomes at 2 years in infants born moderate-to-late preterm. Dev Med Child Neurol. 2017;59: 207–215. pmid:27775148
- 40. Gardner MJ, Altman DG. Statistics in Medicine Confidence intervals rather than P values: estimation rather than hypothesis testing. Br Med J (Clin Res Ed). 1986;292: 746–750. pmid:3082422
- 41. O’Brien SF, Yi QL. How do I interpret a confidence interval? Transfusion (Paris). 2016;56: 1680–1683. pmid:27184382