Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Validation of a Polish version of the National Institutes of Health Stroke Scale: Do moderate psychometric properties affect its clinical utility?

  • Adam Wiśniewski ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Neurology, Laboratory for Experimental Biotechnology, Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Toruń, Bydgoszcz, Poland

  • Karolina Filipska,

    Roles Formal analysis, Investigation, Writing – review & editing

    Affiliation Department of Neurological and Neurosurgical Nursing, Laboratory for Experimental Biotechnology, Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Toruń, Bydgoszcz, Poland

  • Marlena Puchowska,

    Roles Data curation, Methodology

    Affiliation Department of Neurology, Laboratory for Experimental Biotechnology, Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Toruń, Bydgoszcz, Poland

  • Katarzyna Piec,

    Roles Data curation, Investigation, Resources

    Affiliation Department of Neurology, Laboratory for Experimental Biotechnology, Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Toruń, Bydgoszcz, Poland

  • Filip Jaskólski,

    Roles Data curation, Formal analysis, Methodology

    Affiliation Department of Neurology, Laboratory for Experimental Biotechnology, Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Toruń, Bydgoszcz, Poland

  • Robert Ślusarz

    Roles Supervision, Visualization, Writing – review & editing

    Affiliation Department of Neurological and Neurosurgical Nursing, Laboratory for Experimental Biotechnology, Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Toruń, Bydgoszcz, Poland


9 Jun 2022: Wiśniewski A, Filipska K, Puchowska M, Piec K, Jaskólski F, et al. (2022) Correction: Validation of a Polish version of the National Institutes of Health Stroke Scale: Do moderate psychometric properties affect its clinical utility?. PLOS ONE 17(6): e0270016. View correction



The National Institutes of Health Stroke Scale (NIHSS) is a validated tool for assessing the severity of stroke. It has been adapted into several languages; however, a Polish version with large-scale psychometric validation, including repeatability and separate assessments of anterior and posterior stroke, has not been developed. We aimed to adapt and validate a Polish version of the NIHSS (PL-NIHSS) while focusing on the psychometric properties and site of stroke.


The study included 225 patients with ischemic stroke (102 anterior and 123 posterior circulation stroke). Four NIHSS-certified researchers estimated stroke severity using the most appropriate scales to assess the psychometric properties (including internal consistency, homogeneity, scalability, and discriminatory power of individual items) and ultimately determine the reliability, repeatability, and validity of the PL-NIHSS.


The PL-NIHSS achieved Cronbach’s alpha coefficient of 0.6885, which indicates moderate internal consistency and homogeneity. Slightly more than half of the individual items provided sufficient discriminatory power (r > 0.3). A favorable coefficient of repeatability (0.6267; 95% confidence interval: 0.5737–0.6904), narrow limits of inter-rater agreement, and excellent intraclass correlation coefficients or weighted kappa values (> 0.90), demonstrated high reliability of PL-NIHSS. Highly significant correlations with other tools confirmed the validity and predictive value of the PL-NIHSS. In posterior stroke, the PL-NIHSS achieved the required Cronbach’s alpha coefficient (0.71070). Additionally, stroke location did not affect other psychometric features or instrument reliability and validity.


We developed a valid and reliable tool for assessing stroke severity in Polish-speaking participants. Moderate psychometric features were emphasized without limiting its clinical applications.


Clinometric scales are used to objectively evaluate the severity of stroke. Undoubtedly, the National Institutes of Health Stroke Scale (NIHSS) has played the most important role in stroke assessment for several years [1]. It is widely accepted due to its simplicity, high reproducibility, and ease of performance [2] and was designed to be used not only by neurologists, but also by other thoroughly trained members of the stroke team [3]. Furthermore, apart from delivering an objective and reliable estimation of stroke severity, numerous studies have stressed the usefulness of the NIHSS in assessing the clinical prognoses, outcomes, and risks for large intracranial vessel occlusions, thus, emphasizing its predictive value [4, 5]. Several researchers from various countries have adapted and validated the NIHSS after demonstrating its high reproducibility and highlighting its clinical utility [615]. However, less attention has been focused on the psychometric properties, such as internal consistency or the discriminatory power of individual items, because these factors have only been analyzed in individual reports [16]. Notably, the NIHSS psychometric parameters that determine homogeneity, stability, and individual component discriminatory power are equally important as its overall utility and clinical validity. Obtaining the appropriate values for all these components will determine the overall quality of the diagnostic tool, and it is of utmost importance that these features are independent of the language, country, region, and culture. In light of this observation, the lack of a reliable and in-depth analysis of the NIHSS scale is a shortcoming; therefore, a comprehensive assessment of the NIHSS is essential to better define its structural features as well as overall clinical and practical relevance.

The language barrier and lack of a standardized stroke evaluation tool in Poland have resulted in a clinical need for a reliable and valid instrument that can enable members of the stroke team in evaluating Polish-speaking patients. The aim of the current study was to develop and validate a Polish version of the NIHSS (PL-NIHSS) and to assess its psychometric properties, including internal consistency, homogeneity, and scalability in relation to its overall reliability and clinical accuracy.


Study design and participants

This prospective, observational, single-center study was conducted between December 2019 and August 2020 in the Stroke Unit of the Department of Neurology at the University Hospital No. 1, Bydgoszcz, Poland. We enrolled 225 patients with ischemic stroke, including 102 patients with anterior and 123 patients with posterior circulation stroke. All participants met the requirements of the updated definition of stroke proposed by the American Heart and Stroke Association [17].

The clinical and functional parameters were assessed within 24 hours of stroke onset using the PL-NIHSS and Glasgow Coma Scale (GCS). The questionnaires were completed by four investigators, including two stroke physicians, a stroke research nurse, and a physiotherapist, all of whom were NIHSS-certified and had several years of experience in the intensive stroke unit.

Estimation of the inter-rater reliability of the PL-NIHSS was based on evaluations by three randomly selected researchers. The time difference between each assessment did not exceed 2 hours. Repeatability was assessed by analyzing the total PL-NIHSS values assessed by two randomly selected examiners. Three hours later, one researcher randomly selected from the initial three researchers re-assessed the patient (test-retest) using the PL-NIHSS to estimate intra-rater reliability. Subsequently, a randomly selected researcher (from the total group of researchers) evaluated the patient within the first 24 hours of onset of stroke using the GCS to evaluate its construct validity and again at 3 months using the Barthel Index and modified Rankin Scale (mRS) to assess its predictive validity.

The following exclusion criteria were used: (1) significant speech impairment or disturbances of consciousness that prevented a patient from providing informed consent to participate in the study, and (2) patients undergoing specific stroke therapy (intravenous thrombolysis and/or endovascular treatment), which can significantly contribute to discernable fluctuations in the clinical condition. The baseline characteristics of the participants are summarized in Table 1.

Table 1. Baseline characteristics of ischemic stroke subjects (n = 225).


Adaptation of the English version of the NIHSS into Polish was performed in accordance with standards proposed by the International Quality of Life Assessment Project [18]. Two forward translations were used to create an intermediate version that was translated back for comparison with the original version. After analyzing for any contradictions or misinterpretations and obtaining agreement on the consistency and equivalence, the scale was reviewed by Polish-speaking neurologists who estimated how well it was comprehended and rated its overall acceptance. Each item received the required minimum of three points (out of a total of four points) in the content validity rating [19], and after considering minor corrections and suggestions, a preliminary version of the PL-NIHSS was established (S1 Table). Subsequently, the items that assessed speech disorders, inattention, or visual extinction (Fig 1) were modified and adapted to the cultural aspects that would be better recognized and understood by the Polish population. The word complexity, knowledge of phrases, and commonness of idioms were considered while maintaining the content and meaning of the original items. The researchers completed the PL-NIHSS training based on repeated clinical examinations of all the items. The same rules were also adapted for the assessment of individual components included in the original NIHSS [20].

Fig 1. Pictogram showing modified words, phrases, and pictures for better assessment of speech disorders, inattention, and extinction in a Polish-speaking population.

Ethical statement

The study protocol was approved by the Bioethics Committee of the Nicolaus Copernicus University in Torun at Collegium Medicum of Ludwik Rydygier in Bydgoszcz (KB number 732/2019). All participants read and understood the study protocol and provided informed written consent to participate in the study.

Statistical evaluation methods

STATISTICA v13.1 (Dell Technologies, Round Rock, TX, USA) was used for the statistical analyses. The following tests were performed: Spearman’s rank correlation (estimation of construct and predictive validity), intraclass correlation coefficient (evaluation of inter-rater and intra-rater agreement), and weighted Cohen’s kappa (intra-rater agreement). Cronbach’s alpha coefficient and Bland–Altman analysis were performed to assess the psychometric properties of the PL-NIHSS [21, 22]. A p-level < 0.05 was considered statistically significant.


A Cronbach’s alpha coefficient of 0.6885 was achieved in all patients with stroke with individual values of 0.6387 and 0.7107 for anterior and posterior stroke, respectively. The characteristics of individual items are summarized in Table 2.

Table 2. Psychometric properties of individual items of the Polish version of the National Institutes of Health Stroke Scale (PL-NIHSS).

In the group that included both types of stroke (irrespective of location), only 8/15 (53.3%) items achieved a satisfactory and required discriminant level (r>0.3) [23]. Of those, only three, including items for facial palsy, dysarthria, and extinction or inattention, achieved a high correlation with the others (r>0.5). Limb ataxia was the least correlated with the other components. However, when limb ataxia and right arm motor function were excluded, the overall alpha coefficient increased. In the patients with anterior stroke, eight items met the minimum requirements for discriminatory power; of those, only items for visual field, best gaze, and extinction or inattention achieved high values. Notably, the motor function of the right arm and limb ataxia were distinguished from the other items by negative correlation values. Removing four items (motor function of right arm, motor function of right leg, limb ataxia, and best language) improved the overall accuracy of the PL-NIHSS. In the patients with posterior stroke, eight items achieved a satisfactory discriminant level, and half of them, including items for facial palsy, motor function of the left arm, motor function of the left leg, and dysarthria were highly correlated with the others. Only one item (sensory) was negatively correlated with the others; however, removing four items (sensory, limb ataxia, level of consciousness-commands, and visual field) increased the overall alpha coefficient. The median inter-item correlation for the entire stroke group was 0.1834, while the values were 0.1807 and 0.1737 for anterior and posterior stroke, respectively.

The results of the inter-rater and intra-rater agreements are summarized in Table 3.

Table 3. Inter-rater and intra-rater reliability of the Polish version of the National Institutes of Health Stroke Scale (PL-NIHSS).

Excellent weighted kappa values (κ > 0.9) and intraclass correlation coefficients (ICC > 0.9) among all the items indicated high reproducibility of the PL-NIHSS. A favorable coefficient of repeatability (CR = 0.6267; 95% confidence interval [CI] = 0.5737–0.6904) and narrow limits of agreement (lower: -0.6408, 95%CI = -0.7128 to -0.5689; upper: 0.6142, 95%CI = 0.5422–0.6862) were observed in Bland–Altman analyses (Fig 2), thus, emphasizing the accuracy of PL-NIHSS. A vast majority of related pairs of total scores (n = 211; 93.8%) fell within the limits of agreement and reached an identical total number of points whereas the maximum difference in the total score between the examiners was two points, which was observed only in three cases.

Fig 2. Bland–Altman diagram indicating the repeatability of the Polish version of the National Institutes of Health Stroke Scale (PL-NIHSS).

The distribution of plots is based on the mean and difference from the total PL-NIHSS scores obtained by two randomly selected examiners. The limits of agreement occupy the area between the dashed lines. The 95% confidence interval of the regression line is located between the orange bold lines.

We observed a moderate, but significant correlation between the PL-NIHSS score and the initial GCS score (r = -0.4460, p < 0.0001), which indicated satisfactory construct validity (Fig 3A). On the 90th day after the onset of stroke, we also observed a high correlation between the PL-NIHSS, Barthel Index (r = -0.8648, p < 0.0001), and mRS (r = 0.8310, p < 0.0001), which reflected the predictive validity of the device (Fig 3B and 3C). We found no significant differences in the assessment of the reliability (ICC, kappa, CR, limits of agreement) or validity (correlation coefficient) between the patient groups with anterior and posterior stroke as well as in comparison of each subgroup with the overall group.

Fig 3.

Construct (A) and predictive (B, C) validity of the Polish version of the National Institutes of Health Stroke Scale (PL-NIHSS). Significant correlation with Glasgow Coma Scale (GCS) on the first day of stroke. Significant correlations with (B) modified Rankin Scale (mRS) and (C) Barthel index on the 90th day of stroke.


To our knowledge, this study describes the first adaptation and validation of a Polish version of the NIHSS (PL-NIHSS). In this novel report, we highlighted its moderate psychometric properties, assessed its repeatability using Bland–Altman statistics, and analyzed its internal consistency, reliability, and validity based on the stroke location (anterior or posterior).

An ideally constructed stroke scale should be characterized by appropriate psychometric parameters, which demonstrate the correct structure of the tool. Particularly, it should be characterized by scalability (internal consistency and homogeneity) by confirming that each component of the instrument is equally important and measures the same attribute [24]. According to Nunnally’s principle, the Cronbach’s alpha coefficient used for this assessment should reach a minimum of 0.7 [21]. Each item on the scale should also significantly correlate with the others (discriminatory power), and its removal should not increase the overall reliability of the scale. We observed a sufficient alpha coefficient only in posterior stroke (slightly exceeding the limit), whereas the required value was not achieved in the groups with anterior and overall stroke. Only slightly more than half of the assessed items had appropriate discriminatory power in the overall stroke group as well as in the anterior and posterior stroke subgroups. Additionally, some items did not correlate with the others at all, thus, contributing to a reduction in the quality of the entire tool. The median correlation coefficients were far below those expected. Our findings emphasized doubtful homogeneity of the adapted version of the NIHSS and are inconsistent with the data reported by Sun et al. [16] who demonstrated Cronbach’s alpha coefficient of 0.92 and mean inter-item correlation of 0.44. However, they analyzed only 48 patients with stroke, and the small sample size may have significantly affected their overall study reliability [25]. The moderate psychometric properties observed in our study indicated a lack of homogeneity and internal consistency, and therefore, suggests a structural disadvantage of the NIHSS. Accordingly, further research to improve the existing NIHSS version should be supported in order to develop a scalable tool in accordance with the current international guidelines.

Irrespective of the design imperfections, the significant clinical utility of the validated version of the NIHSS should be emphasized; it was particularly manifested in the high reliability and validity observed in our study. Our findings are consistent with those of other studies in this topic; however, we noted higher individual item agreement values than those reported by most other investigators. Only one report by Jurjans et al. [26] found that all the items of a Latvian validated tool achieved excellent ICC (> 0.95) in both inter-rater and intra-rater assessments. The authors of validation reports of other scales found moderate, and sometimes, even poor agreement between the selected items [915]. Notably, the sample size in the present and the Latvian study were larger than those in the other studies, thus, emphasizing the significance of the results in this study as well as highlighting the high reproducibility of the PL-NIHSS. Simultaneously, our research supports the wide use and assessment of the NIHSS by qualified, trained, and certified members of the stroke team and not just neurologists. A clear advantage of our study over others is the assessment of repeatability based on the agreement achieved between raters regarding the total score and not just individual items. To our knowledge, this is the first study to emphasize a satisfactory coefficient of repeatability and narrow limits of agreement using Bland–Altman statistics, thus, confirming the stability and reliability of the validated tool. The high construct and predictive validity of the PL-NIHSS was reflected in the significant, high, and moderate correlations with other instruments used in similar situations in other studies.

Another strength of our study is the assessment of the psychometric parameters, reliability, and validity depending on the stroke location. Many reports have demonstrated that the NIHSS is more accurate when used to assess the severity of anterior stroke whereas the clinical condition of posterior stroke is often underestimated [27]. Therefore, unlike previous studies, we attempted to validate the PL-NIHSS with both types of stroke and found that specifying the type of stroke did not negatively affect the parameters, thus, confirming the reproducibility, repeatability, and validity of the tool. This result verified the high accuracy of the validated instrument, irrespective of the area of brain vascularization. Surprisingly, better psychometric properties, such as internal consistency or homogeneity, were noted in the patients with posterior stroke. These differences between the compared groups confirmed that better scalability of the PL-NIHSS did not translate into a more accurate assessment of stroke severity or increase its validity and reliability. Furthermore, we hypothesized that the psychometric properties of the validated instrument did not affect or limit its clinical utility. Nevertheless, we believe that the optimal situation occurs when the commonly used scale is characterized by high psychometric values as well as high reliability and validity.

The current study has some limitations. The study sample size was moderate, although it was larger than in those in other studies. Our study was a single-center study; therefore, verification of our postulates, particularly regarding the psychometric aspects, is required in multi-center studies, preferably with international cooperation. Due to the requirement for obtaining informed written consent, some patients with stroke were procedurally excluded, and therefore, the data did not cover the entire stroke profile (especially of patients with severe strokes).


We developed a valid and reliable Polish version of the NIHSS suitable for use in everyday practice by trained and certified staff of the Polish-speaking stroke unit. The moderate psychometric properties emphasized in the PL-NIHSS did not affect its clinical usefulness. However, considering the international requirements for commonly used diagnostic tools, further research should be pursued to improve the design and structural quality of the current NIHSS.

Supporting information

S1 Table. Polish version of the National Institutes of Health Stroke Scale.



Special thanks to the Members of the Student Research Club at the Department of Neurology at Collegium Medicum in Bydgoszcz for contributing to the development of the database.


  1. 1. Lyden P, Brott T, Tilley B, Welch KM, Mascha EJ, Levine S, et al. Improved reliability of the NIH Stroke Scale using video training. NINDS TPA Stroke Study Group. Stroke. 1994;25:2220–2226. pmid:7974549
  2. 2. Lyden PD, Lu M, Levine SR, Brott TG, Broderick J, NINDS rtPA Stroke Study Group. A modified National Institutes of Health Stroke Scale for use in stroke clinical trials: preliminary reliability and validity. Stroke. 2001;32:1310–1317. pmid:11387492
  3. 3. Lyden P, Raman R, Liu L, Grotta J, Broderick J, Olson S, et al. NIHSS training and certification using a new digital video disk is reliable. Stroke. 2005;36:2446–2449. pmid:16224093
  4. 4. Heldner MR, Hsieh K, Broeg-Morvay A, Mordasini P, Bühlmann M, Jung S, et al. Clinical prediction of large vessel occlusion in anterior circulation stroke: mission impossible? J Neurol. 2016;263:1633–1640. pmid:27272907
  5. 5. Kharitonova T, Mikulik R, Roine RO, Soinne L, Ahmed N, Wahlgren N, et al. Association of early National Institutes of Health Stroke Scale improvement with vessel recanalization and functional outcome after intravenous thrombolysis in ischemic stroke. Stroke. 2011;42:1638–1643. pmid:21512176
  6. 6. Goldstein LB, Samsa GP. Reliability of the National Institutes of Health Stroke Scale. Extension to non-neurologists in the context of a clinical trial. Stroke. 1997;28:307–310. d: pmid:9040680
  7. 7. Kasner SE, Chalela JA, Luciano JM, Cucchiara BL, Raps EC, McGarvey ML, et al. Reliability and validity of estimating the NIH stroke scale score from medical records. Stroke. 1999;30:1534–1537. pmid:10436096
  8. 8. Dewey HM, Donnan GA, Freeman EJ, Sharples CM, Macdonell RA, McNeil JJ, et al. Interrater reliability of the National Institutes of Health Stroke Scale: rating by neurologists and nurses in a community based stroke incidence study. Cerebrovasc Dis. 1999;9:323–327. pmid:10545689
  9. 9. Berger K, Weltermann B, Kolominsky-Rabas P, Meves S, Heuschmann P, Böhner J, et al. The reliability of stroke scales. The German version of NIHSS, ESS and Rankin scales. Fortschr Neurol Psychiatr. 1999;67:81–93. pmid:10093781
  10. 10. Pezzella FR, Picconi O, De Luca A, Lyden PD, Fiorelli M. Development of the Italian version of the National Institutes of Health Stroke Scale: It-NIHSS. Stroke. 2009;40:2557–2559. pmid:19520997
  11. 11. Hussein HM, Abdel Moneim A, Emara T, Abd-Elhamid YA, Salem HH, Abd-Allah F, et al. Arabic cross cultural adaptation and validation of the National Institutes of Health Stroke Scale. J Neurol Sci. 2015;357:152–156. pmid:26210056
  12. 12. Prasad K, Dash D, Kumar A. Validation of the Hindi version of National Institute of Health Stroke Scale. Neurol India. 2012;60:40–44. pmid:22406778
  13. 13. Domínguez R, Vila JF, Augustovski F, Irazola V, Castillo PR, Rotta Escalante R, et al. Spanish cross-cultural adaptation and validation of the National Institutes of Health Stroke Scale. Mayo Clin Proc. 2006;81:476–480. pmid:16610567
  14. 14. Oh MS, Yu KH, Lee JH, Jung S, Ko IS, Shin JH, et al. Validity and reliability of a korean version of the national institutes of health stroke scale. J Clin Neurol. 2012;8: 177–183. pmid:23091526
  15. 15. Cincura C, Pontes-Neto OM, Nevilee IS, Mendes HF, Menezes DF, Mariano DC, et al. Validation of the National Institutes of Health Stroke Scale, Modified Rankin Scale and Barthel Index in Brazil: the role of cultural adaptation and structured interviewing. Cerebrovasc Dis. 2009;27:119–122. pmid:19039215
  16. 16. Sun TK, Chiu SC, Yeh SH, Chang KC. Assessing reliability and validity of the Chinese version of the stroke scale: scale development. Int J Nurs Stud. 2006;43:457–463. pmid:16146632
  17. 17. Sacco RL, Kasner SE, Broderick JP, Caplan LR, Connors JJ, Culebras A, et al. An updated definition of stroke for the 21st century: a statement for healthcare professionals from the American Heart Association/American Stroke Association. Stroke. 2013;44:2064–2089. pmid:23652265
  18. 18. Gandek B, Ware JE Jr. Methods for validating and norming translations of health status questionnaires: the IQOLA Project approach. International Quality of Life Assessment. J Clin Epidemiol. 1998;51:953–959. pmid:9817112
  19. 19. Lynn MR. Determination and quantification of content validity. Nurs Res. 1986;35: 382–385. pmid:3640358
  20. 20. Lyden P. Using the National Institutes of Health Stroke Scale. a cautionary tale. Stroke. 2017;48:513–519. pmid:28077454
  21. 21. Nunnaly JC, Bernstein IH. Psychometric theory. Mc-Graw-Hill, New York 1994.
  22. 22. Bland JM, Altman DG. Statistics notes: Cronbach`s alpha. BMJ. 1997;314:572–572. pmid:9055718
  23. 23. Kline P. An easy guide to factor analysis. Routledge 2008.
  24. 24. Polit DF, Beck CT. Nursing Research: Principles and Methods. Lippincott Williams Wilkins Hagerstow 2003.
  25. 25. Streiner DL, Norman GR. Health measurement scales–a practical guide to their development and use. Oxford University, New York 2003.
  26. 26. Jurjans K, Noviks I, Volceka D, Zandersone L, Meilerte K, Miglāne E, et al. The adaption and evaluation of a Latvian version of the National Institutes of Health Stroke Scale. J Int Med Res. 2017;45:1861–1869. pmid:28703630
  27. 27. Schneck MJ. Current stroke scales may be partly responsible for worse outcomes in posterior circulation stroke. Stroke. 2018;49:2565–2566. pmid:30355229