A Rasch Analysis of the Charcot-Marie-Tooth Neuropathy Score (CMTNS) in a Cohort of Charcot-Marie-Tooth Type 1A Patients

The Charcot-Marie-Tooth Neuropathy Score (CMTNS) was developed as a main efficacy endpoint for application in clinical trials of Charcot-Marie-Tooth disease type 1A (CMT1A). However, the sensitivity of the CMTNS for measuring disease severity and progression in CMT1A patients has been questioned. Here, we applied a Rasch analysis in a French cohort of patients to evaluate the psychometrical properties of the CMTNS. Overall, our analysis supports the validity of the CMTNS for application to CMT1A patients though with some limitations such as certain items of the CMTNS being more suitable for moderate to severe forms of the disease, and some items being disordered. We suggest that additional items and/or categories be considered to better assess mild-to-moderate patients.


Background
Charcot-Marie-Tooth (CMT) disease is the most common inherited disorder of the peripheral nervous system [1,2]. CMT type 1A (CMT1A), caused by a duplication of the myelin protein encoding gene PMP22 [3,4], accounts for 50% of patients with CMT [1,2,5]. A typical feature of CMT1A is weakness of the foot and lower leg muscles, which may lead to foot drop and a high-stepped gait with frequent tripping or falls. Currently, there are no approved treatments for CMT1A disease though there have been considerable interest in the potential of ascorbic acid (AA) as a therapy leading to six clinical trials investigating the efficacy of AA on CMT1A [6][7][8][9][10][11]. Unfortunately, no beneficial clinical effects of AA were identified in any of these trials, as confirmed by two meta-analyses [12,13]. Recently, a clinical trial of PXT3003 (a fixed combination of baclofen, naltrexone and sorbitol) showed preliminary evidence of efficacy in an exploratory phase 2 study [14], which was also confirmed by a meta-analysis [12]. A conclusion shared by all of these studies is that selecting a clinically meaningful efficacy endpoint for CMT1A trials is challenging. Among the reasons for this are questions surrounding the relevance of efficacy endpoints, which remains an active topic of discussion. With regard to this, a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 the Charcot-Marie-Tooth Neuropathy Score (CMTNS) was first proposed and validated by Shy et al. to provide a reliable measure of impairment in CMT [15]. The CMTNS is composed of 9 items evaluating different functions related to the disease: 5 of impairment ('Sensory Symptoms', 'Pin Sensibility', 'Vibration', 'Strength Arms' and 'Strength Legs'), 2 of activity limitations ('Motor Symptoms Arms' and 'Motor Symptoms Legs') and 2 electrophysiological measures ('Ulnar CMAP' and 'Ulnar SNAP'). Each item is scored from 0 to 4 and the total sum of the item scores provides a global measure of disease severity, with higher scores indicating worsening function [15].
The CMTNS has been used as the primary or main endpoint in most completed clinical trials for CMT1A to date. However, the ability of the CMTNS to measure responses to treatment has not been demonstrated and among all the studies published, meta-analysis reveals significant improvements on the CMTNS only under PXT3003 versus placebo [12]. With this in mind, the sensitivity of the CMTNS to change and its psychometric properties are still debated. In particular, it has been suggested that some components of the CMTNS are too insensitive, mainly because of floor and ceiling effects [16]. Therefore, a modified version of the scale (CMTNS-v2) has been proposed by Murphy et al. [17] in an attempt to reduce the aforementioned effects and to standardize patient assessment. This version has also been questioned recently and a 'weighted' alternative has been suggested as a potential improvement [18]. Finally, a modified CMTNS (called CMTNS-Mod) has also been proposed by adding three functional measures (9-hole peg test, foot dorsiflexion and walk test) while removing four of the initial items ('Ulnar SNAP', 'Pin Sensibility', 'Vibration' and 'Strength of Arms') [19]. However, none of these modified versions have been evaluated in natural history or therapeutic trials. As the CMTNS is the only CMT specific outcome measure available and has been widely applied, it is important to review its properties and find directions in which it could be improved. Firstly, it is important to demonstrate the sensitivity of CMTNS scores with disease progression. With regard to this, the CMTNS showed modest changes over time in longitudinal studies. Shy et al. reported a mean increase of 0.69 points per year for natural progression [20]. In parallel, clinical trials showed that CMT1A under placebo deteriorates even more slowly with a mean increase of 0.16 points per year [12].
To be a valid indicator of disease severity, the CMTNS should also comply with requirements of modern measurement theory such as unidimensionality (which implies that the scale measures only one construct, which allows the items to be summed together to form a scale with only one dimension), internal construct validity and reliability [21]. One well-accepted way to provide such evidence is to perform a Rasch model analysis [22], which has been widely employed in clinical scale construction and validation [21,23,24]. The Rasch Model assesses a latent trait, such as disease severity, by the responses of patients to a set of items [25]. It provides a range of diagnostic information that can be used to determine how well each item contributes to the measurement of the latent trait and doing so, it helps in assessing the validity of the scale and its possible axes for improvement.
Here, we performed a Rasch analysis of the CMTNS in a cohort of 277 mild-to-severe CMT1A patients from merging of two French clinical trials [9,14] and one non-investigational study.

Participants and setting
CMT1A patients involved in this study initially participated in the French phase 2 clinical trial of ascorbic acid led by Micallef et al. [9] and/or in the phase 2 clinical trial of PXT3003 led by Attarian et al. [14] and/or in a subsequent non-investigational clinical study (BMK-CMT) sponsored by Pharnext. The ethics committee "Comité de Protection des Personnes Sud-Méditerranée I" has approved this study and the RCB ID is 2010-023097-40. Participants provide their written informed consent to participate in the study. Patients were included from six hospital sites in France: Marseille, Lille, Limoges, Lyon, Nantes and Paris. A total of 277 patients completed the CMTNS scoring in at least one of the 3 aforementioned clinical studies. Among them, there were 110 men and 167 women, with ages ranging from 18 to 69 and an average age of 45. Patient CMTNS scores ranged from 2 to 31 and were classified into mild (CMTNS 10, 47 patients), moderate (11 CMTNS 20,201 patients) or severe (CMTNS ! 21, 29 patients).

The Rasch model
The Rasch model is a mathematical framework initially proposed to analyze rating scales and evaluates a latent variable not measurable directly from a set of categorical items (eg, disability, cognition or quality of life). In this model, the raw score of each item is transferred into interval scaling by a logistic function where data are found to meet the model assumptions. Both the person's ability and the item difficulty, also referred to as person and item parameters, are defined on the same dimension. If a person's ability is known, it is possible to predict how that person is likely to perform on a given item. Specifically, the probability of a response is modeled as a logistic function of the difference between the person and the item parameters. The Rasch model was initially developed for dichotomous items, and then extended to polytomous items in which successive integer scores represent categories of increasing level of disability such as the CMTNS.

Rasch model analysis
The Rasch model analysis provides an integrated framework for evaluating if a sum score (such as the CMTNS) satisfies a set of requirements listed hereafter. They first include the three assumptions of the Rasch Model, that of local independence, unidimensionality and invariance, briefly explained as follows: Local independence means that, conditionally on the latent person ability, the response of a particular individual to an item depends neither on the responses to other items nor on the responses given by other people to the same item. This is examined by the residual correlations between items, which should be no more than 0.3 for each pair of items [26].
Unidimensionality. The Rasch model assumes that the response to each item depends on a unique latent trait. It can be assessed by creating two subsets of items using a Principal Component Analysis (PCA) of the item residuals, with those loading negatively forming one set, and those loading positively forming the second set. Each person parameter estimated from one set of items is then compared to those derived from the other set of items using a t-test. If less than 5% of these tests are significant at the 5% level, then unidimensionality is supported [27]. Another approach to examine unidimensionality is to apply a generalization of the Martin-Löf test to the two subsets of items defined previously [28]. A non-significant p-value for this test at the 5% level supports the assumption of unidimensionality.
Invariance means that item difficulties remain the same across different groups, such as age or gender. The invariance of items is assessed through an analysis of variance of the residuals where the key group of interest is the main factor. If the inter-person-group variance is statistically significant, the item bias is called Differential Item Functioning [29,30]. When it is present, the probability of an item response cannot be explained totally by the person and item parameters, as it is also influenced by other group properties such as age and gender. Here, each item was checked for Differential Item Functioning across two subgroups: gender (male and female), age (younger or older than 45 years).
Once the three assumptions of local independence, unidimensionality and invariance are met, it is possible to use the Rasch model to further evaluate the scale by investigating overall goodness-of-fit, reliability, fitness of individuals or items, and consistency of items, as introduced following: Overall goodness-of-fit. Most publications dealing with Rasch analysis estimate the overall goodness-of-fit using a chi-square test [22,23,31]. If the data fit the Rasch Model, a summary chi-square interaction statistic should be non-significant. However, recent studies show that chi-square approaches are problematic: these indices are too powerful and the appropriate degree of freedom is often not clear [32,33]. Instead, the Andersen's likelihood-ratio test [34] shows high power and acceptable type-I error rate in Rasch Model estimation [35]. To perform this test, subjects are split into g = 1, . . ., G score-level subgroups in which a conditional likelihood is computed and compared to the total conditional likelihood computed in the complete sample of subjects. The statistic of the test is given by: where L ðgÞ C is the conditional likelihood of subgroup g and L C is the total conditional likelihood. This statistic has an asymptotic chi-square distribution with degrees of freedom equal to the number of parameters estimated in the score groups minus the number of parameters estimated in the complete data set. A non-significant p-value for this test indicates goodness-of-fit for the Rasch model.
Reliability of the CMTNS scale is estimated by the Person Separation Index (PSI) given by the proportion of true variance relative to the true and error variance. In practice it measures the internal consistency and the discrimination power of the scale, i.e. the ability of the scale to discriminate amongst persons with different levels of the trait. It is equivalent to the Cronbach's alpha [36], but it uses the person estimates in logits instead of the raw scores. A PSI value greater than 0.7 is considered as acceptable.
Item fit can be assessed by several indicators. The residual item fit statistics are expected to approximate a Normal distribution (mean close to 0 with a SD close to 1), which is tested using a chi-square test [21]. A significant chi-square test based p-value may indicate misfit. In parallel, a similar analysis could be performed for the test of person fit. Then, fit statistics can be computed and focus on two aspects: infit (means inlier-sensitive fit) and outfit (means outlier-sensitive fit). Infit is more sensitive to the overall pattern and less influenced by outliers and thus infit problems are more of a threat to measurement than outfit ones. Infits and outfits are reported in both mean squares and standardized fit t-statistics. The mean squares indicate the amount of distortion of the measurement system whereas the t-statistics indicate how likely the item is misfit [37]. Mean-squares greater than 1.3 indicates underfit to the Rasch model, i.e., the data are less predictable than the model expects; mean-squares less than 0.7 indicate overfit to the Rasch model, i.e., the data are more predictable than the model expects. High t-statistics (> 2.0) show that the item distorts or degrades the measurement system as underfit while low t-statistics (< -2.0) mean data are too predictable or overfit, but not degrading. Underfit and overfit to the model have different implications for measurement. Underfit degrades the quality of the measurement and should prompt reflection on its cause. Overfit might mislead one into concluding that the quality of the measure is better than it really is, and has less practical implication than underfit [38].
Consistency of items. A particularly useful output of the Rasch analysis is the personitem map (also sometimes referred to as 'Wright map'). This map displays the difficulty of the items on the same latent dimension as the impairment of the patients. For each item, a threshold of a category is defined as the location at which the cumulative probability of selecting this category versus all the other options reaches 0.5. In doing so, thresholds should follow the same order as categories. A disorder of categories in an item occurs when the ordinal numbering of categories is not in accord with their fundamental meaning or when individuals have difficulties in consistently discriminating categories. In this case, the disordered categories should be rearranged and Item Characteristic Curves representing the probability of selecting each category for one item can be plotted in order to examine whether this disorder item from under or over-selection of one category.

Implementation
The Rasch Model has various mathematical variations. Here, we precisely considered the Partial Credit Model [37] allowing different response format for each item, which is the case of the CMTNS. A more detailed introduction to the Partial Credit Model can be found in Wang et al [39]. Analyses were performed with R (http://cran.r-project.org). The dimensionality, local dependency and invariance analyses were carried out using custom-made R functions, while the other Rasch analyses were performed with the R package eRm [40]. Statistical significance was considered at the 5% level and Bonferroni correction for multiple testing was applied where appropriate.

Results
We performed a Rasch analysis of the CMTNS using responses from the 277 individuals included in the study. A well targeted sample size of at least 150 individuals is required to reach a 99% confidence that the estimated item difficulty is within +/-0.5 logit of its stable value [41]. Our sample of 277 CMT1A patients was therefore adequate for the analysis. A preliminary quality control based on a significant p-value of the person fit chi-square test excluded 15 individuals (5.4%). From there, 262 individuals were included in the Rasch analysis.

Local independency, unidimensionality and invariance
We investigated the compliance of the CMTNS to the main assumptions of the Rasch model. Firstly, local independency was shown by the absence of pairwise correlations between item fit residuals greater than 0.3 (Fig 1). Then, unidimensionality was supported by the fact that only 2 patients of 262 total (much less than 5%) had a significant p-value following the PCA approach described in the methods. The p-value of the Martin-Löf test was not significant (p = 0.919). Finally, the response residuals of different subgroups (gender, age) in each item do not display significant Differential Item Functioning, which means invariance of items. These results led us to conclude that the CMTNS meet the assumptions of the Rasch Model in our cohort of CMT1A patients.

Overall goodness-of-fit and reliability
A non-significant p-value of the Andersen's likelihood-ratio test (p = 0.435) indicates a good overall fit of the CMTNS to the Rasch model. The PSI calculated on our data equals 0.715, pointing to acceptable reliability of the CMTNS, although this value is not particularly high.

Item fit
On the item level, 'Ulnar SNAP' is the only item of the CMTNS that has a significant chisquare based p-value at the 5% level (p = 0.044). However, it is not significant after Bonferroni correction. Residuals of all items have a distribution with means close to 0 and SD close to 1 (see Table 1). None of the infit t statistics are superior to 2 (Fig 2), which is to say no item distorts the measurement. Both infit and outfit of the 'Strength Legs' and 'Strength Arms' items are inferior to -2, and the fit mean squares of 'Strength Legs' was slightly lower than 0.7, which points to responses to the two items as being too predictable, possibly leading to overfit.

Consistency of items
The person-item map (Fig 3) displays the location of person abilities and item difficulties respectively along the same latent dimension. Although the category thresholds of most items cover mild-to-severe range of disability well, item difficulty locations clump at the range of patients with higher person parameters (right side of the latent dimension), which means that they have more probability to differentiate patients with higher level of disease severity. For instance, 'Motor Symptom Arm' shows the highest item difficulty meaning that mild-to-moderate patients are more likely to answer '0' (i.e. no disability in arms) for this item. Three items have disordered categories ('Sensory Symptoms', 'Motor Symptoms Legs' and 'Ulnar SNAP') indicated in red on the person-item map (Fig 3). To further investigate these disordered items, we examined the Item Characteristic Curves (Fig 4). Category 2 in 'Motor Symptoms Legs' (i.e. ankle-foot orthosis on at least one leg or ankle support) was under-selected, which causes the observed disorder. 'Sensory Symptoms' and 'Ulnar SNAP' have the categories 0 and 4 evidently over-selected compared to other categories, which means that they are not adapted to discriminate CMT1A patients well.

Proposed modification of the CMTNS
In attempts to improve item fit to the model, a common strategy is to collapse adjacent categories when they have disordered thresholds. Given our results, we collapsed Categories 2, 3 and 4 into one category in 'Sensory Symptoms' and 'Ulnar SNAP' and Categories 2, 3 in 'Motor Symptoms Legs'. The person-item map of the modified data shows that all items are now wellordered (Fig 5). However, after these modifications, the PSI of the CMTNS does not improve (= 0.713 now), and the infit t-statistics of 'Sensory Symptoms' increased from 1.49 to 1.88. Although item categories are well ordered after our modifications, this modification does not enhance the overall fitness of the CMTNS to the Rasch Model.

Discussion
The CMTNS was developed by Shy et al. [15] as the first composite clinical scale dedicated to quantifying impairment and measuring progression in CMT patients. Although the validity of CMTNS to assess severity has never been questioned, its sensitivity to change and its ability measure a response to treatment are still debated. Subsequent versions of the CMTNS have been proposed, such as the CMTNS-v2 by Murphy et al. [42] to attempt to reduce floor and ceiling effects, the CMTNS-Mod by Mannil et al. [19] by adding three functional measures while removing three of the initial items, and finally a 'weighted' alternative of the CMTNS-v2 by Sadjadi et al. [18] resulting from a Rasch analysis. None of these modified versions have been evaluated in natural history or therapeutic trials. In order to further investigate some key properties of the original CMTNS and to identify possible directions of improvement, we performed a validation of this scale based on a Rasch analysis in a cohort of 277 mild-to-severe CMT1A French patients, made possible by the integration of 3 studies including 2 published clinical trials [9,14]. Our first result is that overall and in the context of the Rasch analysis, the CMTNS appears as a valid measurement for  CMT1A: the three main assumptions of the Rasch model (local independency, unidimensionality and invariance) were met, the scale showed good overall fit to the Rasch model and an acceptable measure of reliability. When analyzed individually, only two items ('Strength Legs' and 'Strength Arms') showed an overfit to the model (infit and outfit t-statistics < -2), which has little major implication for the quality of the measurement. In the Rasch analysis of Sadjadi et al. [18] of the CMTNS-v2, all of the items showed good fit supporting the idea that they belong in the scale and contribute to the overall score of impairment. Although the CMTNS- v2 presents some modifications to the CMTNS in terms of categories or instruments of measure, they are very similar and can be discussed together here.
As a limitation, the person-item map suggests that the items of the CMTNS are more suitable for assessing moderate to severe forms of the disease, with the exception of 'Ulnar SNAP'. Sadjadi et al. [18] arrived at the same conclusion, though in the CMTNS-v2, SNAP is measured on the radial nerve instead of the ulnar nerve, underlining the consistency of this result. This finding is also supported by a comparison study of the CMTNS-v2 and a pediatric version of the CMTNS (called CMTPedS and proposed in Burns et al. [31]) where the authors observed a lack of sensitivity of the CMTNS-v2 for assessing mild patients [43].
Finally, we found that 3 items ('Sensory Symptoms', 'Motor Symptoms Legs' and 'Ulnar SNAP') had disordered categories, meaning that they are not adapted to discriminate CMT1A patients well. Category 2 in 'Motor Symptoms Legs' (i.e. ankle-foot orthosis on at least one leg or ankle support) was under-selected while extreme categories (0 and 4) in items 'Sensory Symptoms' and 'Ulnar SNAP' were evidently over-selected. A slight modification of the CMTNS by collapsing the disordered categories corrected this problem but did not improve the overall fit to the Rasch model.
In conclusion, the choice of clinical endpoints for assessing disease severity, progression and response to treatment remains an active topic in the field of chronic neuromuscular diseases. In this context, the CMTNS (first and second versions) is the only clinical scale specific to CMT and as such, has been widely applied in natural history and therapeutic trials. In light of our results and a review of the literature, it is clear that the choice of applying the CMTNS as a measure of severity, disease progression or therapeutic efficacy in clinical practice is a choice to be made after careful consideration. Our current position is that, by integrating different components of the disease, the CMTNS remains an appropriate measure of impairment, particularly useful for classifying patients into mild, moderate and severe. Finally, further refinement of the CMTNS and/or its modified versions is certainly worth consideration in order to overcome the limitations identified here and move towards an optimal scale.