Improving the staging of neck injuries using a new index, the Neck Functional Holistic Analysis Score: Clustering approach to determine degrees of impairment

Background Traumatic cervical spine injuries are amongst the traffic injuries that can cause most harm to a person. Classifying subtypes of clinical presentations has been a method used in other pathologies to diagnose more efficiently and to address the appropriate treatment and the prognosis. The management of patients suffering from cervical injuries could be improved by classifying the severity of the impairment. This will allow clinicians to propose better treatment modalities according to the severity of the injury. Materials and methods The present study is a retrospective cohort study performed with the clinical data from 772 patients stored at Fisi-(ON) Health Group. All the patients treated for cervical spine injuries are evaluated using the EBI-5® system, which is based on inertial measurement unit (IMU) technology. The normalized range of motion of each patient was incorporated into a single index, the Neck Functional Holistic Analysis Score (NFHAS). Results Clustering analysis of the patients according to their NFHAS resulted in five groups. The Kruskal-Wallis H test showed that there were statistically relevant differences in the ROM values and NFHAS of the patients depending on the cluster they were assigned to: FE X2(4) = 551.59, p = 0.0005; LB ROM X2(4) = 484.58, p = 0.0005; RT ROM X2(4) = 557.14, p = 0.0005; NFHAS X2(4) = 737.41, p = 0.0005. Effect size with ηp2 for the comparison of groups were: FE = 0.76, LB = 0.68, RT = 0.76 and NFHAS = 0.96. Conclusion The NFHAS is directly correlated to the available ROM of the patient. The NFHAS serves as a good tool for the classification of cervical injury patients. The degree of impairment shown by the cervical injury can now be staged correctly using this new classification.


Introduction
Traumatic cervical spine injuries are amongst the traffic injuries that can cause most harm to a person. Traumatisms to the cervical spine can potentially injury the spinal cord, cause respiratory disfunction or produce internal bleeding, all of which could be fatal to the patient. Most of neck injuries result from an acceleration / deceleration of the head because of the collision. The injury results from a whiplash mechanism that takes the following sequence: in the first 50 milliseconds (ms) the head moves directly back; between 50 and 75 ms the spine takes the shape of an "S", the upper part of the spine would be in flexion while the lower part would remain in extension; finally, from 75 to 100 ms the head goes to hyperextension in relation to the spine. The severity of the injury increases with the impact speed since the head experiences twice the deceleration of the car at the moment of impact [1]. Exhaustive attention is made to discard severe injuries, with different protocols such as the Canadian C-Spine, the Low risk criteria, and the flexion-extension radiography. All these protocols make relationships between the movement of the cervical spine and the integrity of the structure. However, discarding structural damage does not imply that no damage has been done to the cervical spine.
The ICD-11 includes many diagnostic terms to address cervical spine injuries [2]. The most identifiable pathology from traffic car accidents is Whiplash Associated Disorder (WAD). This injury exemplifies the presence of pathology in the absence of identifiable structural damage. A preliminary investigation in 2015 highlighted the possibility of having spinal cord injuries in subsamples of WAD patients [3], while others argue about the existence of morphologic and compositional changes in the neck muscles of WAD patients [4][5][6]. None of these findings are definitive and normally radiological signs are negative. The usual examination of this patients mainly involves visual examination and the evaluation of patient reported symptoms. Diagnosis, severity, and prognosis are established relying on subjective information gathered and reported by both the clinician and the patient, respectively.
Neck injuries pose a challenge for health professionals because of their high incidence (300 cases/100000 inhabitants per year), the absence of conclusive clinical tests, the controversial relationship with insurance compensations and the malingering [7,8]. The given diagnosis is mainly based on the subjective complaints from the patient and the causal factor of being involved in a traffic collision [9]. Some studies indicate that almost 85% of injured people from car accidents report neck complaints after the accident [10,11]. It has also been noted that the incidence of patients reporting neck injuries increases with decreasing severity of the collision [7].
Some attempts have been made to classify patients into degrees of injury such as the Quebec Task Force (QTF) classification for WAD. This grading is made relying on gross kinematic observations and reported symptomatology [12]. Grades 3 or 4 are easily identifiable because the first one implies the existence of neurological signs and the second one implies structural compromises. However, grades 1 and 2 are not easily distinguishable since it is difficult to draw a line that clearly separates stiffness from decreased range of motion (ROM) [13]. Visual examination demonstrated poor reliability for ROM assessment [14] and hand held goniometers do not provide good reliability either for the assessment of neck ROM [15]. Assigning a QTF grade 1 or 2 is quite an arbitrary decision based upon consensus rather than on the objective measurement of the ROM and the proper appraisal of what is a ROM limitation and what is not. Furthermore, decreased ROM in grade 2 is not further stratified. Mild reductions of ROM are being equated to severe reductions, while both situations can imply very different outcomes for the patient and handling difficulties for the health professional.
Other investigations have already used the kinematics for the subclassification of pain [16] or the identification of different types of abnormal movement patterns in a joint [17]. Classifying subtypes of clinical presentations can help to diagnose more efficiently, propose the best treatment and address prognosis. Subclassifying can highlight different disfunctions that used to be present under the same diagnostic label and allow for a full understanding of the problem [18]. This classification will give the best results if it is based on objective measures. With the increasing availability of cheap and reliable measuring devices based on inertial measurement units (IMU) and the advancement of health informatics, more professionals start to rely on objective evaluations [19]. Still, there is a lack of a clear consensus on which main variables from neck movement should be addressed on a basis [16,[20][21][22][23][24][25] and which is their association to the blunt trauma suffered [26]. Even the previous state of the cervical spine of the patient, difficult linking the trauma to the findings obtained [7]. Nevertheless, some investigators have been able to use kinematics to correctly differentiate patients suffering from WAD from others that do not [27]. As kinematics can differentiate healthy from injured patients, there could be also differences between the kinematics of the injured ones. Moreover, the progression of pain along treatment has shown to correlate with the progression of kinematics such as ROM [28]. A classification based on ROM impairment could help with the management of these patients since it could identify the ones with more severe symptoms.
According to the previous information, three main objectives will be sought in this research:

Study design
The present research is a retrospective cohort study formulated with the clinical data from 772 patients. All the patients had suffered a traffic car accident and have been diagnosed with neck pathology. None of the included patients was still in treatment during the time this investigation was undertaken. Personal data from the patients was anonymized from their medical records before the research team could work with the data. The principles of the Helsinki declaration were considered throughout the design and performance of the investigation. The project received ethics approval from Hospital Clínico Universitario San Carlos, Madrid, España (Spain), with identification code 18/405-E under the name "Estudio de datos cinemáticos de la columna cervical". This investigation has a single blind nature. The ROM measurements of the patients were performed by teams independent to the research team, no contact existed between them. All the teams were trained alike in the use of the EBI-5 1 , which is an IMU based system.

Subjects
The inclusion criteria were: age over 18 years old, injuries due to a traffic car accident, diagnosis of a cervical spine pathology, evaluated between August and October 2018, measured with the EBI-5 1 [29]. Only the initial ROM evaluation conducted before rehabilitation was selected. The exclusion criteria were: rehabilitation prior to ROM assessment, neurological signs or structural compromise of the neck.
Information from all patients in this investigation was comprised into the medical records of Fisi(ON) Health group, a provider of medical services. Convenience sampling was used. All the subjects had signed an informed consent form prior to evaluation and treatment. Patients in the database are from all over Spain, 90 possible measuring spots were available at the time.
All the included patients followed the same steps through medical care provided by Fisi (ON) Health group. Within a period of ten days from the accident, the patients are appointed for medical examination. If the patients are diagnosed with a neck injury and rehabilitation is prescribed, they are appointed for biomechanical evaluation before starting the treatment. The information from that biomechanical assessment was gathered for this investigation. This flux is represented in Fig 1.

Data acquisition and variables
The EBI-5 1 is an accelerometry based system with a precision of ±0.1 o validated and independently tested by an audit from App + metrological company. The device uses two IMU for the recordings. One is placed upon the occiput with an elastic headband, and the other one between the spinous processes of T2 and T3 with a double-sided hypoallergenic sticker. All participants perform the maximum number of repetitions in a period of 45 seconds for each plane of motion: flexion-extension (FE), lateral bending (LB) and rotations (RT). All motions are performed in an upright sitting position at a self-determined velocity. The selection of 45 seconds of measuring time allows for all patients to have enough time to perform enough number of repetitions of the movement to have a clear picture of their actual mobility. The patients can make the movements at their own pace depending on the degree of injury or age capability, without external interference to adapt to preset procedures. Since the device uses two IMUs, all compensation movements are excluded from the recording. The measure given by the system is the difference between the measure of one IMU against the other.
To accomplish our objectives several variables were gathered. All variables used in this investigation are described in Table 1.
Five categories of medical diagnoses were identified in the sample, with an additional "others" category for diagnoses that could not be contained in the previous five. The ICD-11 code is included to link the individual diagnoses with the global reference [2]. Diagnoses where matched to the closest term on ICD-11, the investigators did not reevaluate the diagnose of any patient. All the diagnoses used were the ones given by medical doctors during routine care. The normalized ROM values for each plane of motion where obtained using the following equations: Where ROM represents the full mean range of motion of each plane of motion, SD the standard deviation and nROM the normalized ROM. Normative data was obtained from the work of Swinkels et al. [30].
The values of the nROM (FE, LB, RT) are employed as vertices of a polygon located in a cartesian coordinate system. Since the coordinate system axis are orthogonal to each other, the distance between the vertices can be calculated using the Pythagoras theorem. Each of the ROM values was placed in the coordinate system as: (FE, 0, 0), (0, LB, 0), (0, 0, RT). The equation to calculate the sides of this polygon is: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Where a and b are any two of the ROM with their x, y and z coordinates. The letter D represents the distance between two ROM vertices.
With the three sides obtained, the area of the polygon can be calculated using Heron's formula: A ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi sðs À aÞðs À bÞðs À cÞ p Where a, b and c are the sides of the polygon, and s represents the semiperimeter calculated by: The index created to encompass the ROM of all the three planes (FE, LB, RT) is the Neck Functional Holistic Analysis Score (NFHAS). This index is calculated by dividing the area created with the value of the three ROM (FE, LB, RT) by the area of the regular triangle that would be obtained with a perfect score of 100% in all three ROM.
The clustering of the patients was performed with MATLAB 2018b with the k-means function. This function was used to iteratively obtain 20 clusters. Within cluster variance helped to determine the final optimal number of clusters. Silhouette analysis confirmed the decision of the most appropriate number of clusters.

Statistical analysis
IBM SPSS, Version 20.0 (IBM Corp. Released 2011. IBM SPSS Statistics for Windows, Version 20.0. Armonk, NY: IBM Corp.) was used for the statistical analysis. Homoscedasticity of the data was checked using the Kolmogorov-Smirnov test. None of the studied variables followed a normal distribution according to this test.
Two different classifications were tested in this investigation to find differences in the impairment of the patients. The first classified patients using classical diagnoses, the second classified the patients according to the NFHAS. The ROM of the created groups was used to assess the differences.
Statistical inference was made using Kruskal-Wallis test. The α value was set at 0.05. In the event that statistically relevant differences were encountered, post-hoc comparisons were made using Mann-Whitney's U test. Alfa value was adjusted by a Bonferroni correction. This correction is made with the following formula: Where the m value is the number of tested hypothesis and α is the original significance level.
Effect sizes were presented by means of ηp 2 for the Kruskal-Wallis H test, and r in the case of Mann-Whitney's U [31][32][33].
Spearman's rank-order correlation was used to test the relationship of age with every ROM. It was also used to test the relationship of age and every ROM with the NFHAS.

Descriptive statistics
A total of 772 patients were included in this research. In order to have a balanced sampling, researchers requested that this database was a gender-balanced sample [34]. Descriptive statistics of the analyzed variables can be seen in Table 2. Young people suffer more accidents; therefore, the skewness observed in age answers to the natural behavior of the phenomenon. The skewness of the ROM data is conditioned by the natural limit of ROM since few people present values larger than 100% of the normative data interval.
Normalized values for the ROM where obtained with reference values from the matching age interval [30]. Median value for the normalized flexion-extension is 76.05% with an IQR of 30.83, median value for the normalized lateral bending is 78.94 o and an IQR of 29.94; median value for the normalized rotation is 77.04 o and an IQR of 30.74.

Analysis of the groups defined by classical diagnosis
According to the ICD-11 five different types of classical diagnoses were identified plus the extra category labeled as others. A 29.6% of patients were diagnosed with cervical sprain (ICD-11:NA23.4), those where further stratified into grades I (26.6%), grade II (1.7%) and grade III (1.3%). A 60.1% of patients were diagnosed with cervicalgia (ICD-11:ME84.0). Only 2.7% of patients were identified with WAD (ICD-11:NA23.4). Finally, 7.6% of the diagnoses had to be grouped into a various category (ICD-11:NA23.4Z) due to the lack of resemblance to any item in the ICD-11.
The Kruskal-Wallis H test was applied to compare the age and ROM from the different groups created by medical diagnosis. This test ranks the variables and compares the mean ranks of these variables between the different groups. The comparison of the ranks made by the test can be seen in Table 3. The distribution of the patients is asymmetric, thus reflecting a possible overdiagnosis of some pathologies and an underdiagnosis of others.
Spearman's rank-order correlation was run to determine the relationships of every ROM and age. Between age and ROM, only the LB showed a significant correlation with r s = -0.81 and p = 0.024. however, all ROM showed moderate positive correlations between themselves. The FE with the LB resulted in a r s = 0.54 and p = 0.0005 FE with RT resulted in a r s = 0.65 and p = 0.0005. The LB with RT resulted in a r s = 0.56 and p = 0.0005.

Clustering based on functional results
Within cluster variance was calculated from having just 1 cluster, up to 20 (Fig 2). By applying the elbow technique, two optimal options were obtained: 5 or 6 clusters. When using 5 clusters, the sum of squares equaled 1.177x10 6 , while with 6 clusters the sum of squares results in 1.066x10 6 , which results in a difference of just 0.111x10 6 . This difference in variance is smaller than the one previously achieved by increasing the number of clusters. When performing a silhouette analysis, the use of 5 clusters resulted in an average silhouette value of 0.7236, while The Fig 3 shows and compares the results of classifying the patients both by classical diagnosis and the NFHAS categories.
Post-hoc tests with a Bonferroni adjustment of α = 0.05/10 = 0.005 on the ROM and NFHAS using Mann-Whitney's U test are shown in Table 4.
All the post-hoc comparisons reflect statistical differences between the groups. Effect sizes are lower between adjacent groups. All the NFHAS values of each group present a large effect size of the difference.
Spearman's rank-order correlation was run to determine the relationship of the NFHAS with every ROM (FE, LB, RT) and the age of the patients. There was no significative correlation between NFHAS and age r s = -0.37, p = 0.308. On the other hand, strong and significant positive correlations were found for every ROM with the NFHAS. Results for the FE and NFHAS were r s = 0.85, p = 0.0005, for the LB and NFHAS r s = 0.81, p = 0.0005 and for the RT and NFHAS r s = 0.86, p = 0.0005.

The NFHAS relation to the movement
The neck movements of daily living tasks rarely involve just one plane of motion [35] or a single joint. These daily living tasks depend on available range of motion, strength, and appropriate motor control. When the motion of the neck is evaluated, addressing three separate planes of motion greatly increases the difficulty of determining the overall degree of ROM impairment.
The normalized values of the ROM present high correlation coefficients between themselves. Therefore, a combination of them in a single value can facilitate the evaluation of the global range of motion. One option to join all the information into a single value would be to use arithmetical averages; however, these are less robust in the presence of outliers. The implementation of the NFHAS mitigates the effect of outliers on the interpretation of ROM impairment.
Furthermore, Spearman's rank-order correlation analysis of the NFHAS with the original normalized ROM demonstrates a high correspondence. The NFHAS fluctuates consistently with the variations of ROM in each plane. This relationship shows that the NFHAS provides a full picture of the overall ROM limitation and, therefore, can be used to stage the patient into degrees of ROM impairment.

The classical diagnoses as stages of movement impairment
Health professionals use classical diagnoses to explain the problems experienced by the victim of a traffic car accident. It was within the scope of this research to check whether the classical diagnoses for the neck injuries are given depending on the severity of signs assessed with the clinical exploration. Inference statistics showed that there was no difference in the ROM impairment between groups with different diagnoses. The scatter plot presented in panel (A) of Fig 2 shows that the multiple diagnoses do not clearly distinguish patients with different impairments of movement. If the classical diagnoses had any relation to the impairment, some clustering should be observed in the data. However, patients with poor results can be seen  There are patients with all levels of impairment inside each cluster. The low effect sizes recovered using the ηp 2 also suggest that there is no relationship between giving a specific diagnosis and the degree of ROM impairment. Since the ROM integrity will later be used to decide the best treatment and the end of rehabilitation, it would be desirable to have a true reflection of the impairment severity in the diagnosis. Based on these results, classical diagnoses lack information regarding the degree of the patient's injury. This uncertainty in the assignment of a diagnosis matching the actual capabilities of the patient empowers the "cured by a verdict" myth present in the insurance environment. The way the patients are diagnosed in each country affects the global consideration of the pathology and, therefore, might predispose health professionals to face the patient with mistrust [36]. Guidelines for the assessment of the WAD advise to objectively quantify the signs of the patient with the objective of correctly staging them and proposing the best therapeutic options [37]. It seems, however, that the staging of the patient according to objective signs is not done on a basis in the gathered sample of cases. The degree of impairment should be in combination with the QTF classification to give a more complete overview of the patient's health. Current guidelines focus mainly on patient reported measures [38] but could benefit from the addition of objective measures in their workflows.

The NFHAS as clustering tool
Since the classical diagnosis of neck injuries does not have relation to the degree of impairment, the NFHAS was proposed as a quantitative value to perform clustering. Using the K-means algorithm, 5 main groups were generated. These groups demonstrate statistically relevant differences between them and can allocate different types of impairments.
The key differentiating characteristics do not overlap between groups, which indicates a good classification system [39]. Study of the effect sizes reveals that differences between adjacent groups have a low effect, while for groups that are further apart the effect size of the differences becomes moderate to high. It is important to notice that the effect size of the comparison is always high regarding the NFHAS. The associated ROM values vary in a lower proportion between adjacent groups. While the global movement of the patient is highly different between groups, individual ROM values show smaller differences between adjacent groups.
The grading given to the patient is inverse to the NFHAS value. NFHAS type I patients have greater ROM than NFHAS type V patients. This grading of the patients is highly correlated to all the original ROM normalized values and the NFHAS score. The number of stages was decided using the elbow technique, the silhouette analysis, and the clinical meaning of the number of stages. There is not much difference in the within clusters sum of squares between 5 or 6 clusters, nevertheless, there is a clearly lower slope beyond 6 clusters. Silhouette analysis had a greater mean value with 5 clusters than with 6. However, the final decision of choosing 5 clusters answers to the clinical relevance of having an additional cluster. When 6 clusters were made, the division did not give a sufficiently different group regarding impairment. The classification would not benefit from an additional staging of these patients. With 5 groups, patients inside group I are mainly over 90% of the total normalized ROM in every plane, and therefore can be considered as not ROM limited; type II patients have a mild ROM loss; type III a moderate ROM loss; type IV a grave ROM loss and type V a severe loss of ROM. This staging is like the one already used in the decision of the degree of final impairment. The QTF also identifies five categories with increasing severity; however, their grading just considers three categories as uncomplicated damage to soft tissues. This widely used classification only matches the degree of severity of the signs with the staging of the patient on a qualitative basis rather than a quantitative one. This makes possible that a patient with low disability and no neurological signs is classified in the same group as one with high disability and no neurological signs either [9].
A study in patients with low back pain had similar results when staging their patients according to magnetic resonance findings. The study reports a classification where they had one group labeled as 'no or few findings', a second group as 'mild spinal degeneration', a third group as 'moderate/severe spinal degeneration', a fourth group as 'moderate/severe spinal degeneration and mild sacroiliac joint findings', and a fifth group as 'mild spinal degeneration and moderate/severe sacroiliac joint findings' [40]. It seems appropriate that we have 1 group with no injury or patients considered as not significantly injured, and latter groups with scaling seriousness. The lower bound of the first group lies at 85% of the global functionality, which equates to a mean 90% of the ROM in each plane of motion. A study performed in a WAD population using the NDI identified the 15% of disability as the optimal cutoff for differentiating between real injured and not injured. Although the NDI reflects self-reported disability and not the objective ROM exploration, it is interesting to see that their cutoff matches numerically the one we proposed for the NFHAS type I [41]. Even more, some literature reports that 29 to 38% of the individuals exposed to rear end low energy impacts present symptoms of minimal severity that would disappear within 24 hours of the impact [42]. None of the patients considered in the present study was measured before 48 hours of the impact, since they had to be attended first in the emergency room and later be visited by a physician who prescribed the biomechanical assessment.
Having a more appropriate staging of impairments, like the one proposed, may allow for a proper modelling of the expected outcome of each group depending on the treatments applied. By clustering the neck injured by their ROM impairment, the quality and specificity of the treatments offered to these groups of patients can be improved [39].

Limitations
It is difficult to gather equal numbers of patients with every diagnosis, so the first limitation is the extreme asymmetry of the groups depending on the clinical diagnosis. This investigation has also highlighted that there is a growing trend to include neck injured patients into the cervicalgia generic diagnosis more than in any other. The diagnosis of the patients was not controlled by the investigation team, and there was no interaction between the investigation team and any of the medical doctors that prescribed those diagnoses, therefore, it only reflects the situation. However, we do not expect that the results would change substantially if we had matching groups for every diagnosis.
Another limitation is that the NFHAS only contains information about the ROM. In prospect investigations, more relevant attributes related to function should be evaluated to be included in the NFHAS and improve the classification. Correlating information of pain level, strength, psychosocial factors, or the function capability of other extremities (such as the upper limb) with the NFHAS could further improve the insight on the WAD pathology.
As it was out of the scope of this paper, no longitudinal assessment of these patients was made. In future studies, patients into each NFHAS group must be examined for their response to rehabilitation. Studying if the groups respond differently to rehabilitation can help in the management of these patients.
Patients are appointed with a medical doctor within ten days of the traffic car accident, however, the information of the exact number of days since the accident was not controlled. This investigation identified degrees of injury that could be present at any stage of the injury, so not having controlled the number of days from the accident should not affect the results substantially. However, in future longitudinal investigations, the information on the time passed since the accident could have influence on the patient's recovery and should be therefore gathered.

Conclusions
Classical cervical neck pathology diagnoses are not sensitive enough to correctly stage patients according to their impairment. The NFHAS is directly correlated to the available ROM of the patient. The NFHAS serves as a good tool for the classification of patients into meaningful groups according to movement impairment.