Psychometric properties of the Portuguese version of the National Eye Institute Visual Function Questionnaire-25

Background To investigate the psychometric properties of the Brazilian Portuguese version of the National Eye Institute Visual Function Questionnaire (NEI VFQ-25) questionnaire in a group of patients with different eye diseases. Methods Cross-sectional study. All subjects completed the Portuguese version of the NEI VFQ-25 questionnaire. Another questionnaire containing a survey about clinical and demographics data was also applied. Rasch analysis was used to evaluate the psychometric properties of the NEI VFQ-25. Results The study included 104 patients with cataract, 65 with glaucoma and 83 with age macular degeneration. Mean age was 70.7 ± 9.9 years, with 143 female (56.7%) and 109 male patients (43.2%). Mean visual acuity was 0.47 and 1.17 logMAR in the better and worse eye, respectively. According to Rasch analysis, seven items were found to misfit. Those items belonged to the following subscales: general health, social function, mental health, ocular pain and role limitations. The principal component analysis of the residuals showed that 55.5% of the variance was explained by the principal component. Eight items loaded positively onto the first contrast with a correlation higher than 0.4. These items belonged to the following subscales: near vision, distance vision, mental health and dependency. After excluding those items, we were able to isolate items from the NEI VFQ-25, related only to a visual functioning component. Finally, the principal component analysis from residuals of this revised version of the NEI VFQ-25 (items related to visual function) showed that the principal component explained 61.2% of the variance, showing no evidence of multidimensionality. Conclusions The Portuguese version of the NEI VFQ-25 is not a unidimensional instrument. We were able to find items that belong to a different trait, possible related to a socio-emotional component. Thus, in order to obtain psychometrically valid constructs, both the visual functioning and socio-emotional components should be analyzed separately.

Describe where the data may be found in full sentences. If you are copying our sample text, replace any instances of XXX with the appropriate details.
If the data are held or will be held in a public repository, include URLs, accession numbers or DOIs. If this information will only be available after acceptance, indicate this by ticking the box below. For example: All XXX files are available from the XXX database (accession number(s) XXX, XXX.).
• If the data are all contained within the manuscript and/or Supporting Information files, enter the following: All relevant data are within the manuscript and its Supporting Information files.
• If neither of these applies but you are able to provide details of access elsewhere, with or without limitations, please do so. For example: Data cannot be shared publicly because of [XXX]. Data are available from the XXX Institutional Data Access / Ethics Committee (contact via XXX) for researchers who meet the criteria for access to confidential data.
The data underlying the results presented in the study are available from (include the name of the third party • We have no ethical or legal restrictions on sharing our data set and we have uploaded it as supporting information file. and contact information or URL). This text is appropriate if the data are owned by a third party and authors do not have permission to share the data.

• * typeset
Additional data availability information: Tick here if your circumstances are not covered by the questions above and you need the journal's help to make your data available.
José Paulo Cabral Vasconcellos 1 ¶ 8 9 10 11 12 13 1. We also evaluated differential item functioning, which assesses whether the items 199 have different meanings for different groups in the sample. The raw differences in item 200 calibration between groups were examined to identify differential item functioning. The 201 differential item functioning was considered absent if it was less than 0.50 logits, 202 minimal but probably inconsequential if it ranged between 0.50 and 1.0 logits, and 203 notable if it was >1.0 logit. [12,27]

272
Results of Rasch analysis are shown in Table 3. Three items (Q4, Q17, and Q19) 273 were found to misfit (from subscales: ocular pain and role limitations) with infit mean 274 scores >1.3. Figure   For the current work, the principal component analysis of the residuals showed 294 that the variance explained by the principal component was comparable for empirical 295 calculation (50.9%) and by the model (51.9%). This suggests that the questionnaire was 296 not unidimensional. Moreover, the unexplained variance explained by the first contrast 297 was 3.38 eigenvalue units and the second contrast was 2.56 eigenvalue units with no 298 further contrasts exceeding 2.0 eigenvalue units. These findings suggested the presence 299 of a second dimension in the scale. We analyzed the principal components/contrast plots 300 of items loadings and found items belonging to different clusters with high positive 301 loadings (correlation>0.4) onto the first contrast. We also identified 3 different clusters 302 and exclude items from General health (Q1), Mental Health (Q3, Q21, Q22 and Q25), 303 Role limitations (Q18), Dependency (Q20, Q23 and Q24), that belonged to a secondary 304 dimension underlying the first contrast, which could be biasing the person measures. 305 This suggests that these nine items cannot be grouped with other items in the 306 scale to measure a single latent trait (visual functioning). These items are probably related 307 to a social-emotional component. Of note, in the current sample, 187 patients (74.4%) 308 answered that they were not currently driving (Q15). Within this group, 158 patients 309 (84.5%) reported that they never had driven (Q15a). Therefore questions related to 310 driving were not assessed in the Rasch Analysis due to missing data. 311 Differential item functioning was tested for some of the variables from Table 1  312 and Table 2, such as: age, gender, race, job status, marital status, education, level of 313 income, low vision and type of eye disease (cataract, glaucoma and AMD). There was no 314 differential item functioning for any of the variables mentioned. These results suggest 315 that items could be interpreted similarly across subgroups of the sample. After excluding items that were considered misfitted (Q4, Q17 and Q19) and also those items with high 317 loadings on the principal component analysis of the residuals, such as: General health 318 (Q1), Mental Health (Q3, Q21, Q22 and Q25), Role limitations (Q18), Dependency (Q20, 319 Q23 and Q24), we were able to isolate items from the NEI VFQ-25, related only to the 320 visual function component. According to Table 4, no items were misfitted. We also 321  (Table 5).   We also investigated the association between demographic and clinical variables 363  Table 3, four  385 items (Q4, Q19, Q24 and Q25) were found to misfit. Those items belonged to the 386 subscales of mental health, ocular pain and role limitations. In Figure 1 we can observe a 387 scatterplot with the items that we considered misfitted. When the variance of the 388 principal component is considered high (60% or greater), there is a low likelihood of 389 additional components [13]. In the current study, the principal component analysis of the 390 residuals showed that the variance explained by the principal component was 51.9%. In 391 addition to that, if the variance explained by the principal component for the real data and 392 the model are similar, the chances of finding additional constructs are low. When patterns 393 within variance are unexplained by the principal component, a second construct can be 394 measured. This finding is reported by the first contrast in the residuals. According to 395 previous study, a contrast should have an eigenvalue higher than 2.0 to be considered as 396 an evidence of a second construct being greater than the magnitude seen with random 397 data [13]. Our initial analysis showed a first contrast with 3.38 eigenvalue units, 398 suggesting that the Brazilian Portuguese version of the NEI VFQ-25 was not 399 unidimensional. The loading of items onto the contrasts allows identification of which 400 items tap different constructs. In our analysis, nine items loaded positively onto the first 401 contrast with a correlation higher than 0.4. These items belonged to the following 402 subscales: General health (Q1), Mental Health (Q3, Q21, Q22 and Q25), Role limitations 403 (Q18), Dependency (Q20, Q23 and Q24). This suggests that these nine items cannot be 404 grouped with other items in the scale to measure a single latent construct, such as QoL 405 related to visual function. The purpose of using principal component analysis and 406 inspecting contrast plots is to find patterns in the dataset and discover groups of items 407 that share the same patterns of unexpectedness, which could represent a secondary 408 dimension, proving that the instrument is not unidimensional. 409

410
We were able to isolate items from the NEI VFQ-25, related only to a visual 411 function component, after excluding items that were considered misfitted and also those 412 items with high loadings on the principal component analysis of the residuals. When we 413 re-examined the fit statistics of this revised version of the NEI VFQ-25, no items were 414 misfitted (Table 4). Moreover, the final variance of the principal component was 61.2% 415 and the unexplained variance by the first contrast was 1.64 eingenvalue units (Table 5). Another important characteristic of a good instrument is that items function 445 similarly for persons at the same level of ability. Differential item functioning was tested 446 for the following variables: age, sex, race, job status, marital status, education, level of 447 income, low vision and type of eye disease (cataract, glaucoma and AMD). Differential 448 item functioning occurs when subgroups of people with comparable levels of ability 449 respond differently to an item, which implies a response to some characteristic other than 450 item difficulty. We were not able to find evidence of differential item functioning for any 451 of the variables mentioned. Thus, our results suggest that items from the Portuguese 452 version of the NEI VFQ-25 could be interpreted similarly across subgroups of the 453 sample, including different eye diseases, such as cataract, glaucoma and AMD. 454 We found that worse visual acuity and patients with lower education level had 455 lower Rasch-calibrated NEI VFQ-25 scores. Even though patients with AMD had lower 456 Rasch-calibrated scores of NEI VFQ-25 compared to cataract and glaucoma patients, 457 when adjusting for visual acuity, the correlation with different types of eye disease in the 458 multivariable analysis was not statistically significant, implying that visual acuity may be   that Rasch analysis can offer an alternative to traditional scoring methods enabling one to 108 estimate the latent variable of interest (visual function) and assess the performance of 109 each item as a contributor to the final measurement. In a subsequent work, Marella and 110 colleagues have suggested that the NEI VFQ-25 questionnaire does not seem to be 111 unidimensional, and that the questionnaire items may actually be measuring two different 112 underlying constructs, one related to visual functioning and another to socio-emotional 113 status. This is important, as it would indicate that a single composite score is not 114 appropriate to represent responses to this questionnaire. [12,13] In addition to 115 dimensionality, Rasch analysis can provide information about appropriateness of the 116 response categories, measurement precision, and item fit to the construct. [14,15] Rasch 117 analysis of the English version of the NEI VFQ-25 has also suggested that the subscales 118 represented on the questionnaire would not be valid in their current format. [16] 119 As a widely used instrument to assess vision-related QoL, the NEI VFQ-25 has 120 been translated into several different languages. When a questionnaire is translated into a 121 new language, a linguistic validation is necessary but not sufficient unless the We also evaluated differential item functioning, which assesses whether the items 200 have different meanings for different groups in the sample. The raw differences in item 201 calibration between groups were examined to identify differential item functioning. The 202 differential item functioning was considered absent if it was less than 0.50 logits, Socio-economic questionnaires were also administered along with the NEI VFQ-214 comorbidities, we investigated the presence or history of the following conditions: 217 diabetes mellitus, arthritis, high blood pressure, heart disease, depression, asthma, and 218 cancers. A simple summation score was used to create a comorbidity index.

273
Results of Rasch analysis are shown in Table 3. Four Three items (Q4, Q1719, 274 and Q1924 and Q25) were found to misfit (from subscales: general health, mental 275 healthocular pain , ocular pain and role limitations) with infit and/or outfit mean scores 276 >1.3. Figure    limitations (Q17 and Q18) and dependency (Q20 and Q23). This suggests that these 313 eight nine items cannot be grouped with other items in the scale to measure a single latent 314 trait (visual functioning). These items are probably related to a social-emotional 315 component. Of note, in the current sample, 187 patients (74.4%) answered that they were 316 not currently driving (Q15). Within this group, 158 patients (84.5%) reported that they 317 never had driven (Q15a). Therefore questions related to driving were not assessed in the 318 Rasch Analysis due to missing data. 319 Differential item functioning was tested for some of the variables from Table 1  320 and Table 2, such as: age, gender, race, job status, marital status, education, level of 321 income, low vision and type of eye disease (cataract, glaucoma and AMD). There was no 322 differential item functioning for any of the variables mentioned. These results suggest 323 that items could be interpreted similarly across subgroups of the sample. 324 After excluding items that were considered misfitted (Q4, Q419, Q1724 and 325 Q1925) and also those items with high loadings on the principal component analysis of 326 the residuals, such as: General health (Q1), Mental Health (Q3, Q21, Q22 and Q25), Role 327 limitations (Q18), Dependency (Q20, Q23 and Q24)general health (Q1), mental health 328 (Q3, Q21 and Q22), role limitations (Q18) and dependency (Q20 and Q23), we were able 329 to isolate items from the NEI VFQ-25, related only to the visual function component. 330 Table 4, no items were misfitted. We also performed a principal component  (Table 5). The mean (± SD) of the person measures was -3.02 ± 1.09 335 logits. In figure 2, we showed the Wright item-person maps of the revised version of the 336 NEI VFQ-25 (only items related to visual function). T We investigate targeting using 337

According to
Wright person-item maps (Figure 2). We found that our sample did not ideally matched 338 items for both original versions of NEI VFQ-25. Even after removal of misfitting items, most of the items (right-side, top of the scale) were not able to cover people with most 340 visual ability (left side, bottom of the scale). The separation index for person measures 341 was 2.44, with reliability of 0.86. We also reported the psychometric properties of the 342 socioemotional component of the NEI VFQ-25 (Table 5).  We also investigated the association between demographic and clinical variables 383  We have changed the text and figure legend as suggested.
We have included the following sentence: "The NEI VFQ was originally developed with 51 items to capture the influence of vision impairment on multiple dimensions of health related QoL, such as emotional status and social functioning. Mangione et al validated a shorter and reliable version with 25 items." Targeting: Did you inspect the person-item map and calculated the difference between item and person means before revising the questionnaire? Knowing if the difficulty of the items adequately targets the ability We have made the proper changes as suggested.
We have included this sentence in the Results section: "We investigate targeting using Wright personitem maps. We found that our sample did not ideally matched items for both original versions NEI VFQ-25.

Cover Letter
of the sample provides valuable information.
Even after removal of misfitting items, most of the items (right-side, top of the scale) were not able to cover people with most visual ability (left side, bottom of the scale)." Figure 2: The mean person achievement measure is much lower than the mean item difficulty, suggesting a lack of items at the lowability end. Could you please explain and describe that? This finding is completely counter-intuitive as most questionnaires struggle with items that are too easy, in particular in questionnaires trying to capture VRQOL.
We have made the proper changes as suggested.
We have now included the following sentence in the Discussion section: "The Wright person-item map revealed that items did not adequately covered all spectrum of visual abilities. In fact, most of uncovered percentage of patients represents persons with more visual ability (Figure 2). This may be explained by our sample did not include enough people with significant visual impairment." Table 2: Does the SAP MD values only refer to glaucoma patients or to the overall sample?
We thank the reviewer for the comments.
The SAP MD values refer only to glaucoma patients.
Could you please add more information on visual ability of the participants? Please report n/% of the sample in the categories none/mild/moderate/severe visual impairment.
We thank the reviewer for the comments.
In table 2 we have the information that 24.6% of the sample had low vision in one or both eyes.
Line 252: The authors say, that four items were found to misfit with Infit and/or Outfit MNSQ values <0.7 or >1.3. In Table 3 Infit and Outfit MNSQ values for Q24 are within these values, so why was this item found to misfit? Same for Q25.
We thank the reviewer for the comments.
We have reviewed the manuscript and fixed the error. "Three items (Q4, Q17, and Q19) were found to misfit (from subscales: mental health and role limitations).
Line 226: The authors say, that they investigated the relationship between final Rasch-calibrated NEI VFQ-25 scores with socioeconomic variables using a linear regression We thank the reviewer for the comments.
We have now uploaded the table as requested.
We have corrected the error.

Author's Response Change in the Manuscript
The purpose of this manuscript is to investigate the psychometric properties of the Brazilian Portuguese version of the National Eye Institute Visual Function Questionnaire (NEI VFQ-25) questionnaire in a group of patients with different eye diseases by Rasch analysis. However, The authers found that Portuguese version of the NEI VFQ-25 is not a unidimensional instrument in measuring psychometric properties. Although they suggest that analyzing both visual function and socioemotional components separately may be a valid method, they did not test the validity of this method. In addition, some mistakes in grammar was found in the manuscript.
We thank the reviewer for the comment.
We have corrected the grammar errors in the text.

Author's Response Change in the Manuscript
In this paper, Abe et al. investigated the psychometric properties of the Brazilian We thank the reviewer for the comment.

N/A
Portuguese version of the National Eye Institute Visual Function Questionnaire-25. They found the Portuguese version of NEI VFQ-25 is not a unidimensional instrument, and suggested the visual functioning and socioemotional components should be analyzed separately. In general, the paper is well written and in a good quality. The study is well designed, the data are convincing, the analysis is cogent.
In the methods part, the author may want to elaborate how the NEI VFQ-25 questionnaires were conducted. Were they selfadministered or intervieweradministered?
We thank the reviewer for the suggestion.
We have included the following sentence: "The questionnaire was interviewadministered." In the methods part (line 206), "Socio-economic questionnaires were also administered along with the NEI VFQ-25 to all patients." A sample of this questionnaire should be attached to this manuscript, preferably as supplementary materials.
We thank the reviewer for the comment.
We have now attached the socio-economic questionnaire as supplementary material.
3. In the results part (line 252), "Four items (Q4, Q19, Q24 and Q25) were found to misfit (from subscales: general health, mental health, ocular pain and role limitations) with infit and/or outfit mean scores >1.3.". However, from Table 3, the infit MNSQ and outfit MNSQ of Q24 are 0.96 and 0.87 respectively, and are both less than 1.3.
We thank the reviewer for the comment.
We have reviewed the manuscript and fixed the error. "Three items (Q4, Q17, and Q19) were found to misfit (from subscales: mental health and role limitations).

As authors wrote in the intro (line 110-115), "Marella
We thank the reviewer for the comment.
In the Discussion section we have included the following