Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Cross-cultural adaptation and psychometric properties’ evaluation of the modern standard Arabic version of Cumberland Ankle Instability Tool (CAIT) in professional athletes

  • Vasileios Korakakis ,

    Contributed equally to this work with: Vasileios Korakakis, Mohsen Abassi

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Writing – original draft, Writing – review & editing

    Affiliation Aspetar, Orthopaedic and Sports Medicine Hospital, Doha, Qatar

  • Mohsen Abassi ,

    Contributed equally to this work with: Vasileios Korakakis, Mohsen Abassi

    Roles Conceptualization, Data curation, Investigation, Project administration, Writing – review & editing

    Affiliation Aspetar, Orthopaedic and Sports Medicine Hospital, Doha, Qatar

  • Argyro Kotsifak ,

    Roles Formal analysis, Methodology, Writing – review & editing

    ‡ These authors also contributed equally to this work.

    Affiliation Aspetar, Orthopaedic and Sports Medicine Hospital, Doha, Qatar

  • Hassine Manai ,

    Roles Data curation, Resources, Writing – review & editing

    ‡ These authors also contributed equally to this work.

    Affiliation Aspetar, Orthopaedic and Sports Medicine Hospital, Doha, Qatar

  • Anas AbuEsba

    Roles Data curation, Resources, Writing – review & editing

    ‡ These authors also contributed equally to this work.

    Affiliation Aspetar, Orthopaedic and Sports Medicine Hospital, Doha, Qatar

Cross-cultural adaptation and psychometric properties’ evaluation of the modern standard Arabic version of Cumberland Ankle Instability Tool (CAIT) in professional athletes

  • Vasileios Korakakis, 
  • Mohsen Abassi, 
  • Argyro Kotsifak, 
  • Hassine Manai, 
  • Anas AbuEsba



To cross-culturally adapt the Cumberland Ankle Instability Tool into modern standard Arabic and to assess its psychometric properties.


Cross-cultural adaptation followed a combination of guidelines and for psychometric evaluation a sample of 107 athletes as recruited. All recommended measurement properties by the Consensus-based Standards for the selection of health status Measurement Instruments were evaluated, including face, structural, convergent, and discriminant validity; reproducibility; distribution-based responsiveness, and interpretability. We also used a structured content analytic method to evaluate content validity.


The tool presented excellent internal consistency (α = 0.92) and reliability (ICC 0.75–0.98), and good convergent validity compared with Lower Extremity Functional Scale (ρ = 0.67). For reproducibility testing: Minimal detectable change ranged from 0.41 to 6.0 points; for responsiveness assessment: the effect sizes were large (Glass’Δ range 2.03–2.08, Cohen’s d range 2.22 to 2.53) and the Area under the Curve was 0.869. Its unidimensionality was proved by a 1-factor solution explaining 63.8% of the variance.


The Arabic version of Cumberland Ankle Instability Tool presented acceptable psychometric properties comparable to the original version. The questionnaire is understood across most of the Arabic speaking world and can be used in research and clinical practice to assess patients suffering from chronic ankle instability.


Chronic ankle instability (CAI) is a common consequence of acute lateral ankle sprain (LAS). [1] Despite adequate initial treatment, more than 30% of patients with LAS will develop CAI which in turn leads to persisting complaints of “giving way”, recurrent LAS injuries, and pain. [2] Evidence indicates that balance, proprioception, reaction time and strength are impaired in CAI patients compared with healthy controls. However, the inclusion of heterogeneous participants in most of the studies limits the generalizability of the former to the entire “chronically unstable” population. [3] The International Ankle Consortium, based on the best available evidence, provided a position statement of selection criteria for patients with CAI to be used in future research. [4] The total score of the Cumberland Ankle Instability Tool (CAIT) was included among these criteria; a patient-rated outcome measure (PROM) aiming to determine the presence of functional ankle instability and to grade the severity of the instability. [5] The 9-item questionnaire inquires as to the degree of instability of performance of functional activities on Likert scales; the total score ranges from 0 to 30 with lower scores indicating more severe instability.

Despite that the number of available PROMs has increased dramatically over the past decades, most of these instruments are developed for English-speaking patients. PROMs in order to be used in different language and culture populations require a specific methodology with aim the adequate linguistic translation, but more importantly the cultural adaptation to maintain the content validity of the instrument across different cultures. [6] The CAIT has already been cross-culturally adapted in several languages. [713] CAIT utilizes lay terminology in simple English sentence format making it feasible to translate into modern standard Arabic (MSA) for a lay population. Regional Arabic dialects often create barriers to clear communication whereas MSA is widely used and understood in the Middle East and North Africa (MENA) region. Accordingly, a cross-culturally adapted questionnaire understood across most of the Arabic speaking world would yield the most practical and usable tool. Therefore, the main objectives of this study were: i) to cross-culturally adapt the CAIT for a wide spectrum of Arabic-speaking athletes with CAI, and ii) to evaluate its psychometric properties.


The cross-cultural adaptation process adhered to published guidelines. [6, 14, 15] The CAIT’s assessment of psychometric properties followed quality criteria on the evaluation of health status questionnaires [16] and the Consensus-based Standards for the selection of health status Measurement Instruments (COSMIN) recommendations. [17]

This study conducted in our institutional rehabilitation department from May 2017 to March 2018. Ethics approval was obtained from the Institutional Review Board (Anti-Doping Lab Qatar—SCH-ADL-A-070) and all participants gave written informed consent.

Translation and cross-cultural adaptation

The CAIT [5] was developed for English-speaking patients and translated into MSA. The process followed adapted steps from published recommendations [6, 14, 15] (Table 1) to ensure uniformity between source and target language.

Table 1. The steps of translation and cross-cultural adaptation of the CAIT questionnaire for Arabic-speaking patients.

Sample size calculation and participants

Sample size calculation was based on the intraclass correlation coefficient (ICC) and the maximum width of the 95% confidence intervals (95%CI) from previous cross-cultural adaptation publications. [713] The formula used to calculate the sample size [18] was n = 16p(1−p)/w2, where p was the lowest expected ICC (0.826) and w was the maximum reported width (0.156) of the 95%CI. The minimum required sample size required was 95 individuals, but we recruited a bigger sample of professional athletes in order to ensure stability of the variance-covariance matrix in the dimensionality analysis [16] and to account for non-attendances at rehabilitation sessions or retest occasions, ensuring that reproducibility testing would be done in “stable” patients/individuals. [16] Four groups of participants that were not participants in the translation and cross-cultural adaptation process were included in the study: 60 patients with CAI (n = 30) or LAS (n = 30), as well as 47 asymptomatic “at risk” for ankle sprain basketball and football players (healthy group), and 60 patients with other lower limb injuries than ankle injuries or instability (i.e. muscle injuries, meniscal tears, knee sprains) to evaluate interpretability of the tool (Table 2). The participants were recruited through direct contact during their physiotherapy or training sessions. Participants had to be ≥18 years of age, speak Arabic as a first language, participating in at least 7 hours of physical activity per week, and willing to give informed consent. Reported standard inclusion criteria described in detail elsewhere [19] were used for CAI group (history of at least one significant ankle sprain, recurrent sprain, “feelings of instability and/or giving way”, CAIT≤24). Participants at LAS had a recent (≤4 days) lateral ankle sprain, while at risk-healthy individuals had no ankle injury the last year or instability in their lifetime. General exclusion criteria were no other lower limb injury, previous ankle fracture or surgery. The rest of the exclusion criteria in terms of injury or not, and type were related to respective group allocation.

Table 2. Descriptive characteristics of the participants in the study.


The CAIT-Arab (S1 File) was administered to the athletes (n = 107) and completed twice within a range of 4 to 5 days (other than ankle injury group was excluded) in the presence of one of the investigators in order to standardize the procedure. On completion of the questionnaire if the investigator identified a missing item due to oversight the patient was asked to respond to the item. Based on previously published methodology [20, 21] we only included participants who self-rated their condition as unchanged at the second administration. Finally, the CAIT was administered a third time to both CAI and LAS groups by the same investigator, following a 6-week rehabilitation programme (S2 File).

Validity testing

Face Validity [2224] was assessed: a) in 3 reported steps of the translation and adaptation process, b) by the individuals that appraised the extent to which the instrument assessed their condition during pre-testing of the CAIT, and d) formally during the content analysis procedure (see content validity).

Content Validity [22, 24] was tested individually in each item of CAIT-Arab through a structured content analytic method. [25] The items were distributed to 7 judges (Arabic-speaking patients representing the MENA region population) during the fifth step of translation/adaptation process and another 6 judges (2 sport physicians, 2 physiotherapists, and 2 professional athletes; all holding higher degrees in relevant areas) after the pretesting process. The 13 judges matched each of the 20 items based on their content to a five-point Likert scale (1 = poor, 2 = fair, 3 = good, 4 = very good, or 5 = excellent match). Finally, the content validity was also assessed by the 15 patients participated in the pre-testing phase of CAIT-Arab.

A criterion scale does not exist for functional CAI, hence we evaluated convergent validity [2224] by using Lower Extremity Functional Scale [26] (LEFS-MSAr) and expecting a moderate correlation (ρ = 0.50) with CAIT-Arab based on data and methodology of the scale development publication. [5] Patients with an ankle injury or instability (n = 60) during the initial assessment and before the administration of CAIT-Arab were asked to complete the LEFS-MSAr.

For criterion validity we used discriminant validity [22, 24] in order to evaluate whether CAIT could discriminate between individuals with and without functional ankle instability. History of ankle sprain was used as the discriminative measure [5] and we hypothesized that individuals without an ankle sprain would score close to the maximum possible score of the scale, while patients with CAI would score lower.

Structural validity [22, 24] of CAIT-Arab was tested by exploratory factor analysis (EFA) given that the unidimensionality of the tool has not been confirmed in previous studies. [5, 8, 11, 10] CAIT has been suggested to be a unidimensional scale, [5] however evaluation in other cross-cultural adaptations revealed two [10, 11] or 3-factor solutions. [8]

Construct validity was also evaluated by known groups validity using the contrasted-groups approach. [22, 24] We hypothesized that individuals at risk for an ankle injury will score significantly higher in CAIT compared to CAI, LAS and also other than ankle injuries groups.

Reliability testing

Inter-item reliability [22, 24] was assessed by using Cronbach’s alpha coefficient (α) (coefficient α was planned to be calculated individually for all possible sub-scales). [24]

Test-retest reliability [16, 2224] assessed to evaluate the temporal stability. The CAIT-Arab was administered twice (range 4–5 days) and the interval between administrations was long enough to ensure that participants do not recall their original responses, but short enough to ensure clinical stability of the condition.

Utility evaluation

To review the acceptability and the ease of administration of the CAIT-Arab we recorded the percentage of unanswered questions and the time spent by the participants filling it out. [22, 23]

Ceiling and floor effects

The CAIT-Arab would be considered to have ceiling and floor effects [16] if more than 15% of the patients scored the maximum and minimum possible score respectively. Relative to each item of the questionnaire ceiling and floor effects were considered to have occurred if at least 75% of the patients scored the maximum or minimum score to that item, respectively.


Responsiveness [16, 27, 28] reveals the ability of a questionnaire to detect clinically important changes over time. We used distribution-based methods for responsiveness assessment. The CAIT-Arab was administered to both CAI and LAS groups on two occasions 6 weeks apart. We hypothesized that clinically meaningful differences should be displayed in scores obtained by these patients as a result of rehabilitation.


Interpretability [16, 24] is the degree to which one can assign qualitative meaning to a self-rated outcome measure’s quantitative scores or change in scores. We assessed interpretability of CAIT-Arab scores: a) by presenting normative scores of at risk for an ankle injury individuals, b) by comparing the scores of CAI and LAS patients between two time points following treatment of known efficacy. We hypothesize that CAIT-Arab scores will increase following rehabilitation and we expected at least a 3-point increase for a true difference according previously reported minimal clinically important change of the scale. [29]

Statistical analyses

Statistical analyses were performed using SPSS v19.0. We used non-parametric tests in statistical analyses as CAIT is in the ordinal scale. The level of significance was set at p>0.05. Descriptive statistics were used to calculate the characteristics of the participants, the scores of CAIT-Arab and LEFS-MSAr questionnaires, the mean scores for each item for comparability of language and similarity of interpretability, acceptability, and the ceiling and floor effects. Missing values were listwise excluded. The other than ankle injury group of participants was used only for known groups and discriminant validity testing.

Validity testing.

For item-content relevant analyses the judges’ ratings were evaluated based on the validation procedure of Aiken’s item-content validity coefficient (V). [30] The V statistic provides statistical significance of judges’ ratings about an item’s content-match with its construct and its values range from 0 to 1 (1 = perfect agreement). The values were then compared against a right-tailed binominal probability table provided by Aiken [30] (V scores >0.70 considered as having acceptable validity, p<0.01). Convergent validity was assessed with Spearman rho (r) between the scores obtained from CAIT-Arab and LEFS questionnaires. [26]

To explore the factorial validity of CAIT-Arab an EFA (principal axis factoring) with varimax rotation was used. Eigenvalues over 1 were chosen and extracted, and items loading more than 0.40 were regarded as loading on a specific factor. Items loading more than 0.40 on 2 factors were assigned to the factor with a higher correlation. [31] The group with other than ankle injuries was excluded from dimensionality assessment.

Discriminant validity (cutoff score of CAIT) was assessed with a receiver operating characteristic (ROC) curve and area under the curve (AUC). The most upper left point in the diagram represents the optimal cut-off change score, which most effectively discriminates between patients with ankle instability and those without. [3234] Additionally, the AUC reflects the probability of correctly discriminating between unstable and stable ankles improved and non-improved patients. This area varies from 0.5 (the questionnaire does not discriminate more effectively than chance) to 1.0 (perfect discrimination). [33, 34]

Known groups differences were calculated using the Kruskal-Wallis test. For post hoc comparisons we used the Mann-Whitney U-test with appropriate Bonferroni correction for multiple testing, resulting from the formula k(k– 1)/2, where k is the number of groups (padj = 0.0083). The Wilcoxon test was used for within-group differences between administrations.

Reproducibility testing.

The internal consistency of CAIT’s sub-scales was assessed by using Cronbach’s α. Values of ≥0.70 have been proposed as a measure of good internal consistency. [16] Reproducibility was evaluated by using both Spearman’s rho and 2-way random effects model Intraclass Correlation Coefficient, type agreement (ICC2,1), because systematic differences are considered to be part of the measurement error. [16, 35] As a measure of agreement the absolute measurement error was expressed as the standard error of measurement (SEMAgreement = SD x √1-ICC), including the systematic differences in order to distinguish them from real changes, e.g., due to treatment or natural history. [16] In addition, the minimal detectable change (MDC95 = 1.96 x √2 x SEM) was calculated, which corresponds to the minimal within-person change in score that, with p<0.05, can be translated as a real change above measurement error. [16, 36] Bland-Altman methods were used to indicate absolute agreement for test–retest measurements including a scatter plot of differences between applications, with 95% limits of agreement (mean change in scores of repeated administrations). [37]


There is still no consensus on the most suitable statistical analysis to assess responsiveness. [16, 28, 33, 38] The Wilcoxon test, using scores separated by 6 weeks was conducted to evaluate longitudinal validity (data also used for interpretability). Also, effect size (ES) by using both baseline and pooled standard deviation (SD) (for the purpose of interpretability) and standardised response mean (SRM) were calculated [28] and interpreted according to published recommendations (values of 0.20, 0.50, and 0.80 or greater represent small, moderate and large responsiveness, respectively). [39]


An overview of the measurement properties of CAIT from the present study and all studies assessing its psychometric properties [5, 713] are presented in Table 3.

Table 3. Summary of measurement properties of all adapted versions of CAIT questionnaire.

Translation and cross-cultural adaptation

Minor linguistic discrepancies were easily resolved through the collaboration of content experts and native Arabic speakers of the committees during the consensus meetings. In item 5, the sentence “on the ball of my foot” could not be translated into Arabic and as a result the decision was made to partially abandon a literary description in favour of an image that clearly demonstrates the area referred to in the question.

Validity testing

The face validity of CAIT-Arab was appraised as excellent from participants at pre-testing, patients with CAI and LAS, members of the expert committees, authors, and judges at item-content relevance testing. Construct validity was assessed through a structured content analytic method [25] and regarded as being well addressed by all 13 judges and 15 patients at pre-testing. All 9 items presented V values ranging from 0.83 to 0.98 (p<0.01).

A moderate significant correlation for convergent validity evaluation was found between LEFS-MSAr and CAIT-Arab (rho = 0.67, p<0.001) with a clear ceiling effect of the LEFS, as expected. [5]

The ROC curve showed that there was a distinct discrimination score (≤ 23 points) that can identify patients with an ankle sprain (AUC = 0.869, p<0.001). The maximum Youden’s index related to this cutoff score was 0.752, and had sensitivity of 100% a specificity of 75.2%, with a positive likelihood ratio of 4.0 and a negative likelihood ratio of 0.

Factorial validity testing extracted one-factor (Fig 1) with eigenvalues over Kaiser’s criterion of 1 explaining the 63.79% of total variance (Table 4) (KMO = 0.896, Bartlett’s sphericity test (x2(36) = 811.613, p<0.001).

Fig 1. Exploratory factor analysis.

Scree plot for CAIT-Arab dimensionality assessment depicting 1-factor solution.

Table 4. Exploratory factor analysis with varimax rotation suggesting 1-factor solution for CAIT-Arab.

In terms of normative values for CAIT-Arab, Kruskal-Wallis tests revealed significant differences (p<0.001) for mean scores at first and second administration. No within group differences at Wilcoxon tests were found at both administrations (p>0.05). No significant differences were found in CAIT scores between CAI and LAS patients (p = 0.310). Both CAI and LAS group patients scored significantly lower (p<0.017) than both athletes at risk and patients with other lower limb injuries (Table 5).

Table 5. Total scores for the CAIT-Arab questionnaire at both administrations.

Reliability testing

Reliability results are presented in Table 3. The Cronbach alpha for internal consistency was 0.92, and the Cronbach alpha if item deleted (for each item) varied from 0.91 to 0.92.

The Bland-Altman plot (Fig 2) showed no systematic bias; the mean differences were plotted around the zero line and within the limits of agreement (3.7 to -4.0) with a few outliers.

Fig 2. Bland-Altman plot.

A Bland-Altman plot visualizing the agreement for test-retest with the limits marked as mean difference ±SD in a 30-point scale.

Utility evaluation

The CAIT-Arab revealed a maximum response rate, which might be affected by the presence of one of the investigators. The completion of the questionnaire required 2 to 3 minutes revealing the ease of administration.

Ceiling and floor effects

No ceiling effect was found for CAIT-Arab total score at first (0%) or second administration (0%), or a floor effect (0% respectively). Moreover, most of individual items of the scale were not scored at their maximum or minimum score by more than 75% of the patients at first administration (floor range 6.7–33.3%, ceiling range 3.3–36.7%) except item 9 that 80% of the participants scored the minimum score. Regarding or second assessment no individual items presented floor or ceiling effect by more than 75% of the patients (floor range 6.7–66.7%, ceiling range 3.3–36.7%).


The Wilcoxon test revealed statistically significant changes of the CAIT-Arab from first (Median = 16.0) to second (Median = 24.5) administration for this group (p<0.001) representing large effect sizes (Table 3).

Based on a change score equal or larger to the MDC95 (6 points) at the final administration of the LEFS-Arab, 80.0% of the patients were rated as improved, while 20% were found with no change.


Normative scores of patients with ankle injury and of individuals at risk for an ankle injury are presented in Table 5. Statistically significant changes over time [pre-treatment (mean±SD = 14.5±5.7; median = 16.0) and post-treatment (mean±SD = 23.8±4.4; median = 24.5)] were found for CAIT-Arab (p<0.0001) with a large effect size (2.03).


The CAIT-Arab is a brief, valid and reliable outcome measure, available to be used across the MENA region in Arabic-speaking patients with CAI. As a psychometrically robust tool, it can be used to identify and assess the severity of CAI, as well as to evaluate outcomes for clinical and research purposes.

Translation and cross-cultural adaptation

There are at least 12 major sets of guidelines available for questionnaires translation [15] and to our knowledge there is no consensus on a set of rigid procedures in the area of translation and cross-cultural adaptation. We implemented a rigorous adaptation process by following mixed methodology from published guidelines [6, 14, 15] and including content experts, bilingual and bicultural committee members, and native Arabic speakers. Also, we introduced a new step into the translation validation process using a formal evaluation of comparability of language and similarity of interpretability, [14] and testing for cognitive equivalence [15] involving native Arabic speaking patients representing the Arab population (Qatar, Jordan, Syria, Lebanon, Morocco, Tunisia, and Egypt). The process captured the differences in Arabic dialects and resulted in a widely comprehensible and practical tool for use across the MENA region. Finally, a problem with a phrase that could not be translated into Arabic was resolved by abandoning a literary description in favour of an image; a methodology previously applied in the cross-cultural research. [20, 21]

Validity testing

As hypothesised, the CAIT-Arab demonstrated good translational and construct validity. Evaluation of content validity by a structured analytic method [25] added to the psychometric properties of CAIT, as to our knowledge this is reported first time since most of previous studies used floor and ceiling effects to examine this form of validity. [7, 9, 11, 13] The results also confirmed our hypothesis regarding the convergent validity of CAIT-Arab with the LEFS [26] scores presenting a moderate correlation as previously reported, probably because of a ceiling effect with the LEFS. [5] The ceiling effect of LEFS has been attributed to its insufficient sensitivity to identify problems related with CAI, as most of the tasks included in the scale are not sufficiently challenging for the lower leg. [5] Nonetheless, CAIT was only moderately correlated with SF-36, [8, 10] FAAM, [11] FAOS, [13] and Karlsson score [12] in previous studies, suggesting the administration of condition-specific outcome measures in clinical evaluation of CAI.

Guidelines suggest a confirmatory factor analysis for the assessment of PROMs’ structural validity in the presence of an existing theoretical model or because the factor structure has been determined previously. [16] However, given that previous analyses revealed both a 2-factor [10, 11] and a 3-factor solution [8] for CAIT, even though it has been suggested to be a unidimensional scale, [5] we decided re-explore its structure by using EFA. Internal consistency is an important measurement property for questionnaires that intend to measure a single underlying construct by using multiple items, [16] but it can be affected by the sample’s configuration. In the development publication of CAIT [5] the unidimensionality of the scale was not confirmed and this was attributed to the lack of homogeneity in the sample used in Rasch analysis. The EFA revealed that the CAIT-Arab was a unidimensional construct for functional CAI and this can be partially explained by the recruitment of a relatively homogeneous athletic population.

Hiller et al, [5] established a cut-off score of ≤27 as indicative of CAI that differs from the 23 points calculated in the present study. It has been rationally argued that with this cut-off value a patient could be classified as having an unstable ankle based only on a low score on the first question. [13] Additionally, in other studies evaluating CAIT’s psychometric properties this value was lower and ranged from 11 to 25 points. [12, 13, 40] A plausible explanation can be given by the fact that some studies [5, 12] used a history of ankle sprain alone to define group membership (CAI or not) when calculating the cutoff score instead of using self-reported ankle instability. [13, 40] Recently, the International Ankle Consortium [4] recommended that a cut-off score ≤24 should be used in CAI diagnostic criteria, a score consistent with the present report and confirmed by another relevant study. [40]

Reliability testing

An excellent reliability was demonstrated for all participants (ICC = 0.97) in accordance with all previous studies (Table 3). The MDC reflects the smallest within-patient change in score that can be interpreted as a true change (i.e. because of treatment) beyond the measurement error. [16] The MDC of the CAIT-Arab score on the individual level was 6 points and was comparable to the MDCs reported (4.8 to 6.5) in studies used similar methodology. [10, 11] It must be noted that studies used subjectively experienced functional ankle instability to group and analyse participants’ data reported lower MDC values (2.28–3.08). [13, 29]


In the present adaptation we used only distribution-based methods to assess responsiveness and we acknowledge this as a limitation of the study. [16] Clinically meaningful score differences with large effect sizes, and a SRM of 1.73 were displayed, reflecting ability of CAIT-Arab to effectively distinguish changes over time. The ES reported here (2.03-Glass’ Δ and 2.22 Cohen’s d) is greater than that in other studies ranging from 0.70 to 1.07. [79] The 6-week interval between two evaluations used in the present study can probably explain the size of the difference with the other studies that treatment duration lasted 3–4 weeks. From a clinical perspective, treatment’s effect size is strongly affected by the interval between test-retest and must be interpreted with caution.


We acknowledge that the sample of patients with CAI was relatively small and of male gender. The extent to which our results can be generalized to female or non-professional athletes is unknown. Also, our methodology considered only classical test theory; given the inconsistency in available literature regarding the unidimensionality of the scale a rigorous Rasch analysis is much needed to re-examine in detail the internal structure of the CAIT. Finally, we acknowledge as a limitation of the study the use of distribution-based methods to assess responsiveness of the CAIT. These methods are considered measures to interpret changes in the condition, or to interpret the magnitude of the intervention, rather than measures of the quality of the instrument or the validity of the change score, therefore future studies should use anchor-based methods to evaluate responsiveness of the CAIT.


The cross-cultural adaptation CAIT for Arabic-speaking patients with CAI was proven valid, reliable and responsive and can used for clinical and research purposes. In addition, to avoid barriers from regional Arabic dialects we cross-culturally adapted the CAIT into modern standard Arabic, resulting in a widely comprehensible and practical tool for use across the Middle East-North Africa region.

Supporting information

S1 File. Modern standard Arabic version of Cumberland Ankle Instability Tool (CAIT).


S2 File. Acute first time and recurrent lateral ankle sprain rehabilitation guidelines.



The authors would like to thank Noor Al Marri, Jaleleddine Ahmed, Omar Al Sayrafi and Abdalla Itani for their valuable contribution in the cross-cultural adaptation process.


  1. 1. Attenborough AS, Hiller CE, Smith RM, Stuelcken M, Greene A, Sinclair PJ. Chronic ankle instability in sporting populations. Sports Med. 2014;44(11):1545–56. pmid:24981244
  2. 2. van Rijn RM, van Os AG, Bernsen RMD, Luijsterburg PA, Koes BW, Bierma-Zeinstra SMA. What Is the Clinical Course of Acute Ankle Sprains? A Systematic Literature Review. Am J Med. 2008;121(4):324–31. pmid:18374692
  3. 3. Thompson C, Schabrun S, Romero R, Bialocerkowski A, van Dieen J, Marshall P. Factors Contributing to Chronic Ankle Instability: A Systematic Review and Meta-Analysis of Systematic Reviews. Sports Med. 2018;48(1):189–205. pmid:28887759
  4. 4. Gribble PA, Delahunt E, Bleakley C, Caulfield B, Docherty C, Fourchet F, et al. Selection criteria for patients with chronic ankle instability in controlled research: a position statement of the International Ankle Consortium. Br J Sports Med. 2014; 48(13):1014. pmid:24255768
  5. 5. Hiller CE, Refshauge KM, Bundy AC, Herbert RD, Kilbreath SL. The Cumberland Ankle Instability Tool: A Report of Validity and Reliability Testing. Arch Phys Med Rehabil. 2006;87(9):1235–41. pmid:16935061
  6. 6. Beaton DE, Bombardier C, Guillemin F, Ferraz MB. Guidelines for the process of cross-cultural adaptation of self-report measures. Spine. 2000;25(24):3186–91. pmid:11124735
  7. 7. De Noronha M, Refshauge KM, Kilbreath SL, Figueiredo VG. Cross-cultural adaptation of the Brazilian-Portuguese version of the Cumberland Ankle Instability Tool (CAIT). Disabil Rehabil. 2008;30(26):1959–65. pmid:18720109
  8. 8. Cruz-Diaz D, Hita-Contreras F, Lomas-Vega R, Osuna-Perez MC, Martinez-Amat A. Cross-cultural adaptation and validation of the Spanish version of the Cumberland Ankle Instability Tool (CAIT): an instrument to assess unilateral chronic ankle instability. Clin Rheumatol. 2013;32(1):91–8. pmid:23053687
  9. 9. Rodriguez-Fernandez AL, Rebollo-Roldan J, Jimenez-Rejano JJ, Gueita-Rodriguez J. Psychometric properties of the Spanish version of the Cumberland Ankle Instability Tool. Disabil Rehabil. 2015;37(20):1888–94. pmid:26292174
  10. 10. Ko J, Rosen AB, Brown CN. Cross-cultural adaptation and validation of the Korean version of the Cumberland ankle instability tool. Int J Sports Phys Ther. 2015;10(7):1007–14. pmid:26676061
  11. 11. Hadadi M, Ebrahimi Takamjani I, Ebrahim Mosavi M, Aminian G, Fardipour S, Abbasi F. Cross-cultural adaptation, reliability, and validity of the Persian version of the Cumberland Ankle Instability Tool. Disabil Rehabil. 2017;39(16):1644–9. pmid:27416105
  12. 12. Kunugi S, Masunari A, Noh B, Mori T, Yoshida N, Miyakawa S. Cross-cultural adaptation, reliability, and validity of the Japanese version of the Cumberland ankle instability tool. Disabil Rehabil. 2017;39(1):50–8. pmid:26878332
  13. 13. Vuurberg G, Kluit L, van Dijk CN. The Cumberland Ankle Instability Tool (CAIT) in the Dutch population with and without complaints of ankle instability. Knee Surg Sports Traumatol Arthrosc. 2018;26(3):882–91. pmid:27714439
  14. 14. Sperber AD. Translation and Validation of Study Instruments for Cross-Cultural Research. Gastroenterol. 2004;126(1):S124–S8.
  15. 15. Wild D, Grove A, Martin M, Eremenco S, McElroy S, Verjee-Lorenz A, et al. Principles of Good Practice for the Translation and Cultural Adaptation Process for Patient-Reported Outcomes (PRO) Measures: report of the ISPOR Task Force for Translation and Cultural Adaptation. Value Health. 2005;8(2):94–104. pmid:15804318
  16. 16. Terwee CB, Bot SDM, de Boer MR, van der Windt DAWM, Knol DL, Dekker J, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60(1):34–42. pmid:17161752
  17. 17. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: An international Delphi study. Qual Life Res. 2010;19(4):539–49. pmid:20169472
  18. 18. Stratford PWS, G.F. Sample size estimation for the comparison of competing measures' reliability coefficients. Physiother Can. 2003;55:225–9.
  19. 19. Gribble PA, Delahunt E, Bleakley C, Caulfield B, Docherty C, Fourchet F, et al. Selection Criteria for Patients With Chronic Ankle Instability in Controlled Research: A Position Statement of the International Ankle Consortium. J Orthop Sports Phys Ther. 2013;43(8):585–91. pmid:23902805
  20. 20. Korakakis V, Malliaropoulos N, Baliotis K, Papadopoulou S, Padhiar N, Nauck T, et al. Cross-cultural Adaptation and Validation of the Exercise-Induced Leg Pain Questionnaire for English- and Greek-Speaking Individuals. J Orthop Sports Phys Ther. 2015;45(6):485–96. pmid:25927499
  21. 21. Malliaropoulos N, Korakakis V, Christodoulou D, Padhiar N, Pyne D, Giakas G, et al. Development and validation of a questionnaire (FASH—Functional Assessment Scale for Acute Hamstring Injuries): to measure the severity and impact of symptoms on function and sports ability in patients with acute hamstring injuries. Br J Sports Med. 2014;48(22):1607–12. pmid:25287515
  22. 22. Devon HA, Block ME, Moyle-Wright P, Ernst DM, Hayden SJ, Lazzara DJ, et al. A psychometric toolbox for testing validity and reliability. J Nurs Scholarsh. 2007;39(2):155–64. pmid:17535316
  23. 23. Bannigan K, Watson R. Reliability and validity in a nutshell. J Clin Nurs. 2009;18(23):3237–43. pmid:19930083
  24. 24. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63(7):737–45. pmid:20494804
  25. 25. Dunn JGH, Bouffard M, Rogers WT. Assessing item content-relevance in sport psychology scale-construction research: Issues and recommendations. Meas Phys Educ Exerc Sci. 1999;3(1):15–36.
  26. 26. Binkley JM, Stratford PW, Lott SA, Riddle DL. The Lower Extremity Functional Scale (LEFS): Scale development, measurement properties, and clinical application. Phys Ther. 1999;79(4):371–83. pmid:10201543
  27. 27. Guyatt GH, Deyo RA, Charlson M, Levine MN, Mitchell A. Responsiveness and validity in health status measurement: a clarification. J Clin Epidemiol. 1989;42(5):403–8. pmid:2659745
  28. 28. Terwee CB, Dekker FW, Wiersinga WM, Prummel MF, Bossuyt PM. On assessing responsiveness of health-related quality of life instruments: guidelines for instrument evaluation. Qual Life Res. 2003;12(4):349–62. pmid:12797708
  29. 29. Wright CJ, Linens SW, Cain MS. Establishing the Minimal Clinical Important Difference and Minimal Detectable Change for the Cumberland Ankle Instability Tool. Arch Phys Med Rehabil. 2017;98(9):1806–11. pmid:28137476
  30. 30. Aiken LR. Three Coefficients for Analyzing the Reliability and Validity of Ratings. Educ Psychol Meas. 1985;45(1):131–42.
  31. 31. Bortz J. Statistik für Sozialwissenschaftler. 4th ed: Springer Verlag; 1999.
  32. 32. Beaton DE, Boers M, Wells GA. Many faces of the minimal clinically important difference (MCID): a literature review and directions for future research. Curr Opin Rheumatol. 2002;14(2):109–14. pmid:11845014
  33. 33. de Vet HC, Bouter LM, Bezemer PD, Beurskens AJ. Reproducibility and responsiveness of evaluative outcome measures. Theoretical considerations illustrated by an empirical example. Int J Technol Assess Health Care. 2001; 17(4):479–87. pmid:11758292
  34. 34. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiol. 1982;143(1):29–36. pmid:7063747
  35. 35. McGraw KO, Wong SP. Forming Inferences about Some Intraclass Correlation Coefficients. Psychol Methods. 1996;1(1):30–46.
  36. 36. Beckerman H, Roebroeck ME, Lankhorst GJ, Becher JG, Bezemer PD, Verbeek ALM. Smallest real difference, a link between reproducibility and responsiveness. Qual Life Res. 2001;10(7):571–8. pmid:11822790
  37. 37. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1(8476):307–10. pmid:2868172
  38. 38. Guyatt G, Walter S, Norman G. Measuring change over time: assessing the usefulness of evaluative instruments. J Chronic Dis. 1987;40(2):171–8. pmid:3818871
  39. 39. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd Ed. ed. New Jersey: Lawrence Erlbaum Associates; 1988.
  40. 40. Wright CJ, Arnold BL, Ross SE, Linens SW. Recalibration and validation of the Cumberland Ankle Instability Tool cutoff score for individuals with chronic ankle instability. Arch Phys Med Rehabil. 2014;95(10):1853–9. pmid:24814563