An alternative application of Rasch analysis to assess data from ophthalmic patient-reported outcome instruments

Purpose To highlight the potential shortcomings associated with the current use Rasch analysis for validation of ophthalmic questionnaires, and to present an alternative application of Rasch analysis to derive insights specific to the cohort of patients under investigation. Methods An alternative application of Rasch analysis was used to investigate the quality of vision (QoV) for a cohort of 481 patients. Patients received multifocal intraocular lenses and completed a QoV questionnaire one and twelve months post-operatively. The rating scale variant of the polytomous Rasch model was utilized. The parameters of the model were estimated using the joint maximum likelihood estimation. Analysis was performed on data at both post-operative assessments, and the outcomes were compared. Results The distribution of the location of symptoms altered between assessments with the most annoyed patients completely differing. One month post-operatively, the most prevalent symptom was starbursts compared to glare at twelve months. The visual discomfort from the most annoyed patients is substantially higher at twelve months. The current most advocated approach for validating questionnaires using Rasch analysis found that the questionnaire was “Rasch-valid” one month post-operatively and “Rasch-invalid” twelve months post-operatively. Conclusion The proposed alternative application of Rasch analysis to questionnaires can be used as an effective decision support tool at population and individual level. At population level, this new approach enables one to investigate the prevalence of symptoms across different cohorts of patients. At individual level, the new approach enables one to identify patients with poor QoV over time. This study highlights some of the potential shortcomings associated with the current use of Rasch analysis to validate questionnaires.

single potentially non representative cohort of patients, occasionally with a relatively small sample size e.g. [10]- [12], [16], [20], [21]. However, it is well recognized that the analysis of fit for the Rasch model is a never ending process since a continued use of the instrument requires constant monitoring of the item and person responses to maintain quality control [22].
At its inception, the Rasch model aims to assess psychometric properties of some intelligence and attainment tests. In such context, individuals are examined via some tests consisting of several items. When there is sufficient similarity among the individual in the way they approach the tests, then the responses to the items are expected to follow some specific patterns. The individuals misfitting the model correspond to those individuals whose responses deviated from the expected patterns and it could be envisaged that these responses are partially based on guessing or they are due to some carelessness from the respondents. On the other hand, items misfitting the model can be interpreted as follows: either the items do not contribute to an adequate assessment of the examinees or there is an underlying multi-dimensional structure among the individuals. However, the misfit statistics on their own do not provide enough ground to remove items from the tests. On the contrary, misfitting items are worth keeping since they provide useful information on the underlying dimensional structure among the individuals.
In the context of test-based ophthalmic instruments such as LogMAR or Snellen charts for visual acuity testing, the responses to the items are sufficiently similar among patients with similar visual function. Hence, the responses are expected to follow some specific patterns, and serious item misfit generally indicates an unanticipated problem which may be attributed to the quality of the items. However, for ophthalmic questionnaires which are based on items often independent, the misfitted items may be due to various reasons including an underlying multi-dimensional structure among the patients. For instance, a consistent difference in response propensity introduced by variation in the characteristics of the respondents such as lifestyle, age and gender may contribute significantly to item and/or person misfits. The misfitted items and/or persons therefore may not necessarily be outliers. Even if they were, medical care implies that patients are taken as individuals with their own problems, and not as a group. Furthermore, misfitting items may actually be relevant for the quality of care (although they may imply a different latent trait). In other words, Rasch validation as performed currently, might help qualify a technique or a therapy but it does not provide any insight into the cause of particular patients being affected differently by the same item.
Issues associated with the removal of items are well known in the Rasch analysis literature see e.g. [23,24] and the references therein. For instance, it is well known that removing items from a questionnaire is very likely to increase the intrinsic variance within the data, which could affect the estimation of the person and item measures. This could be problematic, in particular when comparing items/persons across different conditions. Moreover, removing some items could misfit other items which were not initially misfitted, leading to a downward spiral in the number of items in the questionnaire. Massof et al. [25] presented a study on a visual function questionnaire in which they maintained misfitted items and provided a meaningful interpretation of the items. They showed that items with infit statistics greater than 2.5 the standard deviation from the expected value are related to mobility tasks, whereas items with infit statistics lower than 2.5 the standard deviation from the expected value are associated with reading tasks. Furthermore, leveraging on these misfit statistics, the authors used Principal Component Analysis to demonstrate the non-unidimensional aspect of the visual function trait under investigation.
In contrast with the current validation practice, which consists of using Rasch analysis to dismiss [9]- [18] or approve [9], [14], [21]- [30] an ophthalmic questionnaire based solely on the misfit statistics of the items, this work introduces an alternative, meaningful and relevant application of the Rasch model to analyze data collected via ophthalmic questionnaires. The proposed approach aims to present Rasch analysis as a decision support tool for deriving valuable insights specific to the cohort of patients under investigation, at both population and individual level. At the population level, such an approach enables the investigation of the prevalence of ophthalmic symptoms across different cohorts of patients pre-operatively and post-operatively, in order to assess the effectiveness of a treatment-e.g. different types of intraocular lenses (IOLs) or different surgical procedures. At the individual level, the new approach can be applied across a population at different time points and identify patients who experienced most visual discomfort pre-operatively and/or post-operatively, so that additional appropriate care and monitoring can be dedicated to them. Ultimately, this new perspective will pave the way for a more adequate application of Rasch analysis within the context of ophthalmic questionnaires, so that insights gained from the analysis can be exploited to enhance the quality of care and patient care experience. However, this paper does not attempt to advocate an alternative method of validation of ophthalmic questionnaires, and our future work will investigate this aspect of ophthalmic questionnaire development.
The remaining part of this paper is organized as follows. Section 2 briefly presents the Rasch model and highlights the key mathematical features and their meaning. Then, a brief overview and illustration of Rasch analysis for dichotomous response data is provided. Section 3 presents an application of Rasch analysis on data from an ophthalmic questionnaire as an effective decision support tool for a post-operative follow-up of patients, at both population and individual level. The overarching aim of the process is to improve our understanding of how patients' responses to the questionnaire evolve over time, which ultimately should provide the opportunity to improve the patient care experience. Finally, Section 4 concludes the paper and highlights some potential further research.

Background
In a series of seminal research works [1]- [19] Rasch introduced a probabilistic framework for analyzing the ability of pupils using a model for the items of a test, which is known as the Rasch Model. This section will briefly present a basic set of assumptions and the general framework that underpins the Rasch model from its original form to its most commonly used version, implemented in most of the software packages dedicated to Rasch Analysis.
The Rasch model formulation is based on a two-dimensional data matrix, denoted U, obtained by administering a test, which consists of n items, to m examinees or persons. Each component u pi , of the matrix U, denotes the response of the examinee or the person p to the item i. The response to the items, i.e. u pi , can be dichotomously or polytomously scored hence the denomination dichotomous or polytomous Rasch model, respectively. The general form of the data matrix U is shown in Fig 1. The Rasch model [1], [3] owes its key desirable mathematical features to a certain number of assumptions, and the most fundamental assumptions will be described in this section.
The fundamental assumptions behind the Rasch model are: Assumption 1 [1,31] The response of an examinee or a person p to an item i, u pi , depends solely on the examinee's ability, characterized by the parameter a p , and the difficulty of the item, characterized by the parameter d i .
Basically, the main purpose of a test is to estimate the location of an individual with a certain ability, taking the test, on the line defined by the difficulty level of the different test items [31]. This is illustrated in Fig 2, where the ability of the person p is between d 3 and d 4 , which represent the difficulty level of items 3 and 4, respectively. Therefore, it is expected that the person p will be able to answer correctly all the items with difficulty below his/her ability a p . If the score for a correct answer to each item is 1, then the total expected score for the person p, from this test, is 3.
Assumption 2 [1,3] The ability and the difficulty characterize the person and the item, respectively, such that if an examinee p was k times as able as an examinee q then a p = ka q . Similarly, if an item i was k times as difficult as an item j, then d i = kd j . Thus, Using Eq (1) in Assumption 2, Assumption 1 reduced to the following.

Assumption 3 (Unidimensionality)
The response of an examinee p to an item i, u pi , depends solely on the ratio a p d i , denoted ξ pi .
Another key assumption behind the Rasch model is that: Assumption 4 (Specific objectivity) [3] For any given set of items with some given difficulties and any population of examinees with some given abilities, the response of the examinees to the items are stochastically independent.
This assumption considers that on the one hand, the response of some examinees with the same ability to the n items in the test are independent. On the other hand, the response of the examinees to an item with a given difficulty are independent. Thus, this assumption enables the Rasch model to treat the examinees and the items independently. However, this assumption is not always satisfied in practice.

Dichotomous Rasch model
If the responses to test items consist of only two categories then without loss of generality we can assume that the response of any examinee p to any item i, u pi , can only be either 0 or 1.  The dichotomous Rasch model estimates the probability of any instance of response u pi as: whereâ p is the estimated ability of the person p andd i is the estimated difficulty of item i. Some details on the derivation of the dichotomous Rasch model as well as its mathematical properties are presented in Appendix A.
Parameters estimation and goodness of fit measures for the Rasch model. Estimating parameters of the Rasch model. There are a variety of methods which can be used to estimate the set of parameters (â p ;d i ) of the Rasch model (2), see [32], [33] for an overview. However, the most commonly implemented methods in software packages dedicated to Rasch analysis include the joint maximum likelihood estimation (JMLE) and the marginal maximum likelihood estimation (MMLE).
The JMLE procedure assumes some initial known estimates of the parameters of the persons and items, then uses Newton-Raphson iterations to improve jointly the estimates of parameters, until a specific convergence criterion has been satisfied. This approach requires the removal of items and persons with perfect scores (i.e. all their scores are either equal to one or equal to zero for the dichotomous model).
The MMLE approach assumes a known distribution, of the persons' parameters, which is used to estimate the items' parameters. In contrast with the JMLE approach, MMLE enables estimation of the parameters of items and persons with all scores equal to one or zero. However, the reliability of the parameters estimated using the MMLE approach depend upon the relevance of the assumed distribution of the person parameters. Hence, the MMLE approach could be prone to greater bias compared to the JMLE approach.
Measuring goodness of fit for the Rasch model. The most commonly used goodness of fit measure for the Rasch model, i.e. how well the observed data fit the model, is to test the normality of residuals. Each residual represents a piece of information not covered by the model, and large residuals raise doubts about the match between the model and data [31], [34].
In Rasch analysis, the goodness of fit measures, also called misfits statistics, consist of the infit and outfit test statistics which are based on the standardized residuals. The outfit statistic, also referred to as outlier-sensitive fit statistic, is a measure that is sensitive to unexpected observations by persons on items that are very easy or very hard for them, and vice-versa. The infit statistic, also referred to as inliner-pattern-sensitive fit statistic, is a measure that is sensitive to unexpected patterns of observations by persons on items that are targeted for them, and vice-versa. The most commonly used misfit statistics for Rasch analysis are the Mean-squares misfit statistics and z-standardized misfit statistics. Some details on the derivation of these statistics are provided in Appendix B.
Mean-squares fit statistics (Outfit MNSQ and Intfit MNSQ) describe the level of the randomness in the response data, and their expected values are 1. The values of mean-squares fit statistics which are very low compared to 1, indicate a high degree of predictability of responses to the items by the model, i.e. the model overfits the data. On the other hand, the values of mean-squares fit statistics, which are very high compared to 1, indicate a high degree of unpredictability of responses to the items by the model, i.e. the model provides a distorted representation of the data. A general guideline is that values of mean-squares fit statistics greater than 1.5 suggest a deviation of the model from the unidimensionality assumption within the data. The value 1.5 is rather a rough approximation of the z-score for an area of 0.95 (or 95%) for the cumulative function of the standard normal distribution, which is about 1.64. This means that 95% of the values of mean-squares fit statistics are generally below the threshold of 1.5 (or to be more accurate 1.64). On the other hand, values of mean-squares fit statistics less than 0.5 suggest an overfitting of the model.
The z-standardized misfit statistics describe the improbability of the model to fit the data, and their expected values are 0. The values of z-standardized misfit statistics which are very low compared to 0 (less than -1.9) indicate an overfitting of the model; on the other hand, the values of z-standardized misfit statistics which are very high compared to 0 (greater than 1.9) indicate that the model is less likely to fit the data. The z-standardized misfit statistics are generally used when the mean-squares statistics fail.

Polytomous Rasch model
When the item response data have more than two response options, a generalized version of the Rasch model known as the polytomous Rasch model is used. The polytomous Rasch model inherits most of the properties of the dichotomous Rasch model. The main difference between these two models lies in the introduction of the concept of thresholds in the polytomous version. These thresholds play an important role for a polytomous model since they enable the identification of critical points along the latent trait continuum. Furthermore, for a polytomous model, each item response category has a unique probability distribution associated with it, and at a threshold the relative probabilities of two adjacent item response categories are equal.
There are two types of polytomous Rasch models commonly used in the literature. Namely, the Rating Scale Model (RSM) [35], in which the threshold estimate, for a given category response, is identical for all the items, and the Partial Credit Model (PCM) [36], in which the estimates of the thresholds can vary across the items and response categories. The PCM model can be viewed as a generalization of the RSM.
The RSM can be formulated as follows: where, p = 1, . . ., m denotes an examinee's index, i = 1, . . ., n denotes an item's index, η = 0, . . ., k denotes an item response category, and t = 0, . . ., k − 1 denotes a threshold's index; the parameterĥ t denotes the common threshold associated with all the items for the category response t, whereas the rest of the parameters are identical to those defined for the dichotomous model in the previous sections.
The PCM can be formulated as follows: whered it denotes the joint item difficulty and threshold parameter, while the remaining notations are identical to those in the RSM model (3).

A brief overview and illustration of Rasch analysis for dichotomous response data
Step 1. This step uses the data from the response matrix to estimate the initial values of the difficulty and ability parameters for each item and person, respectively; Step 2. This step uses the initial parameters estimates, from Step 1, to obtain some optimal estimates of the difficulty of items and the ability of persons parameters; the most commonly used techniques to achieve this include the joint maximum likelihood estimation (JMLE) and the marginal maximum likelihood estimation (MMLE); Step 3. This step consists of the identification of items and persons with unexpected response patterns using goodness of fit measure, e.g. Mean Square or z-standardized misfit statistics.
Steps 1 and 2 are often combined into a single step known as the calibration step, whereas the last step is generally termed the fit analysis.
Illustration of Rasch analysis to assess a LogMAR chart for visual acuity testing. In order to illustrate the aforementioned steps, we consider the following data matrix for dichotomous response where 10 patients undergo a visual acuity test using the 9 items LogMAR chart depicted in Fig 3. In the response data matrix, the score of 1 corresponds to a correct answer to an item (i.e. if at least 3 correct answers are given in a row of the chart) by a patient, whereas a score of 0 corresponds to an incorrect answer (i.e. if at most 2 correct answers are given in a row of the chart). In this situation, the concept of person ability and item difficulty, in the Rasch model, corresponds to the patient's location (in logit) in terms of visual acuity, and the item's location (in logit) in terms of difficulty to read. The higher the location of a patient (respectively, an item), the higher the visual acuity (respectively difficulty to read) of the patient (respectively for the item).
Remark 1 Due to the following conditions, Rasch analysis can be an appropriate approach for the assessment of a LogMAR chart for visual acuity testing:

for a LogMAR chart, the responses to items are sufficiently similar between patients with similar visual function;
2. the responses to the items are expected to follow specific patterns according to the patient's location, in terms of visual acuity, and the item's location, in terms of difficulty to read; for instance, a patient with a given location is expected to read correctly most of the items with lower locations, and the misfit statistics (e.g. Outfits and Infits MSNQ) enable the identification of any unexpected response patterns from a patient and for an item;

the scenario complies with the most fundamental assumptions behind the Rasch model (namely, Assumptions 1, 2, 3, and 4).
Step 1: Estimation of the initial locations for items and patients. In this step, the following rows and columns in Table 1 are calculated: columns r p ; m p ;â 0 p and rows r i ; are the initial estimates of the locations for patient p and item i, respectively; •d 0ðAdjÞ is the adjusted initial location for item i; thus the mean of the adjusted locations for the items is zero.
Step 2: Estimation of the optimal locations for items and patients. In this step, the initial estimates of the visual acuity location of each patient,â 0 p , and initial adjusted estimates of the difficulty location of each item,d 0ðAdjÞ i , are improved by maximizing the likelihood of the response of each patient to each item to obtain the optimal patient's visual acuity location (â Ã p ) and item difficulty location (d Ã i ). The joint maximum likelihood estimation (JMLE) was used to obtain the optimal parametersâ Ã p andd Ã i .
Step 3: Identification of items and patients with unexpected response patterns. From the optimal locations results for items and patients, presented in Tables 2 and 3 and depicted in Fig 4, the following observations can be drawn for this cohort of patients.
• The most difficult item to read was Item 9, followed by Item 7 and Item 8, respectively, whereas Items 2, 3, 4 were the easiest to read, followed by Items 1, 5, 6. Furthermore, since Items 2, 3, 4 have the same values for the location estimates, then the model suggested that  Table 3. Estimates of patient location (in logit), in terms of visual acuity, and the corresponding standard error, mean square (MNSQ) infits and outfits. The fit statistics, highlighted in bold, are those exceeding the threshold of 1.5. An alternative application of Rasch analysis to PROs data these three items have the same degree of difficulty for this cohort of patients. Likewise, Items 1, 5, 6 have the same degree of difficulty for this cohort of patients.

Patient ID Patient location (in logit) Standard Error Outfits MNSQ
• The patient with the highest visual acuity within the cohort was Patient 2, followed by Patient 1, 3, 4, 5, Patient 6 and Patient 7, respectively, whereas Patient 9, 10 had the lowest visual acuity, followed by Patient 8. Although Patient 2 did not answer Item 8 correctly, he/she is most likely to respond correctly to all the items, thus his/her location is higher than the location of the hardest item (Item 9). On the other hand, although Patients 9 and 10 responded correctly to two items, the erratic patterns in their responses suggest that they are less likely to answer correctly any of the items on the chart. Thus, the estimates of their locations are lower than the location of the easiest item to read.
• The relatively high Outfit MNSQ value, compared to 1, for Item 9 reflected the outlying response pattern for this item. In fact, only the patient with the highest visual acuity (Patient 2) and one of the patients with the lowest visual acuity (namely Patient 10) responded correctly to this item. This is a rather unexpected response pattern for Item 9.
• The relatively high Outfit and Infit MNSQ values, compared to 1, for Patient 10, highlighted the outlying patterns of his/her responses. Indeed, this patient answered correctly only one relatively easy item (Item 6) and the hardest item (Item 9), which is unexpected.
• The relatively high Outfit MNSQ value, compared to 1, for Patient 1, indicated that this patient only failed the hardest item (Item 9) and a relatively easy item (Item 1). The latter wrong response is rather unexpected.

Application of Rasch analysis to assess data from an ophthalmic PRO instrument
Test-based ophthalmic instruments, such as visual acuity tests using Snellen or LogMAR charts, where the responses to the items are sufficiently similar among patients, and expected to follow specific patterns,-comply with the main assumptions behind the Rasch model. Therefore, the model can be used to assess whether these instruments are appropriate for their purpose. In such cases, serious item misfit generally indicates an unanticipated problem which may be attributed to the quality of the items. However, in the context of ophthalmic questionnaires, the unidimensionality assumption of the responses to the items is not always satisfied and as a consequence, some of the major assumptions of the Rasch model, namely Assumptions 3 and 4 presented in section 2 do not always fully hold. Due to the nature of the responses, which encompass any potential underlying multi-dimensional structure among the patients, the misfits statistics may be interpreted differently. For instance, a consistent difference in response propensity introduced by various respondents' characteristics such as lifestyle, age and gender may contribute significantly to items and/or person misfits.
The currently most advocated practice, for validating ophthalmic PRO questionnaires, is either to collapse some item response categories or to drop items or questions which misfit the Rasch model [4], [10], [26]. If for any reason all the items misfit the model or some estimation problems are encountered during the process then the entire questionnaire is dismissed [9]- [18]. However, even for tests based on items where responses are sufficiently similar between patients, it is well recognized that in order to maintain quality control, a continuous monitoring of items and patient responses is required [22].
The main objective of this study is to attempt to introduce an alternative application of Rasch analysis, which is specific to the cohort under investigation, as an alternative to the current misuse of the method to dismiss [9]- [18] or approve [9], [14], [21]- [30] a questionnaire based on the misfit statistics of data from a single and potentially non-representative cohort of patients, occasionally with a relatively small sample size e.g. [10]- [12], [16], [20], [21].
In this section, we will present a case study to illustrate how the proposed approach enables the use of Rasch analysis as a decision support tool for post-operative patient follow-up, in order to improve patient care experience.

The PRO instrument
The PRO instrument, used for this study, is a previously developed Quality of Vision (QoV) questionnaire [7], from which only the bothersome scale was used to reduce number of questions. The questionnaire attained information on the presence of various dysphotopsias and visual disturbances that a patient may experience, and the annoyance of each side effect to the patient. Patients reported the degree of annoyance of the nine vision related symptoms presented in Table 4. The choice of these nine symptoms was motivated by their substantive representativeness of QoV. This QoV questionnaire uses pictures to further aid understanding of the dysphotopsias or visual disturbances being questioned. A sample of the pictures used is provided in Supporting Information (S1 Fig). In addition to the original questionnaire a linear 0-10 scale was incorporated to define each patients own view of their overall QoV, in order to gain a better understanding of post-operative satisfaction.

Participants
The participants consist of a cohort of 481 patients who had implantation of multifocal intraocular lenses (IOLs) from Cathedral Eye Clinic, Belfast. Patients were thoroughly assessed and informed of the risks of the procedure and all patients gave their informed consent for their anonymized data to be used for research purposes. The patients received multifocal IOLs following either refractive lens exchange (RLE) or cataract extraction surgery. Full ophthalmologic examination was performed on each patient approximately one month and one year post-operatively following the implantation of the IOLs. In each case the QoV questionnaire was completed with an optometrist to ensure understanding of the questions.
The summary statistics of the patients are presented in Table 5. Among these 481 patients, 125 and 160 declared not suffering at all from any of the nine symptoms, one month and one year post-operatively, respectively. Therefore, these patients have been discarded from the analysis so that the JMLE method operates properly.

Contextualization of the Rasch model
In order to properly interpret the outputs of the Rasch model we need to establish the meaning of the terminologies used in Rasch analysis within the context of the ophthalmic questionnaire of interest. In this context, 1. the ability parameterâ p , associated to an examinee p in the Rasch model, corresponds to the location (in logit), in terms of perception of visual discomfort, for the patient p; the lower the value of this parameter the lower the perception of visual discomfort, whereas the higher the value of the parameter the higher the perception of visual discomfort.
2. the difficulty parameterd i , associated to an item i in the Rasch model, corresponds to the location (in logit), in terms of "non-prevalence" within the cohort, for the symptom i; the lower the value of this parameter the higher the proportion of patients within the cohort affected by the symptom, whereas the higher the value of the parameter the lower the proportion of patients affected by the symptom.
3. the probability for a patient p to give a response category η to the question associated with symptoms i, given her/his location, in terms of his/her perception of visual discomfort,â p , and the location of the symptom, in terms of its prevalence within the cohort,d i , is as follows: where the parameterĥ t denotes the common threshold associated with all the items for the category response t.
The calibration step of Rasch analysis enables the researcher at a glance to compare and contrast different populations and define whether the examined items hold the same weight or relevance within the particular cohort. An example might be how car drivers are affected by glare compared to non car drivers. Different positioning of the items might at a glance highlight the differential importance of glare in these two different populations of patients. The fit analysis however would help to quickly highlight individuals potentially with ocular problems including higher levels of astigmatism or macular problems such as cystoid macular oedema (CMO) or age-related macular degeneration (AMD), which might produce values of the misfit statistics deviating significantly from the expected values for the Rasch model.

Results and discussion
The types of different response categories for the questions, described in Table 4, suggest a polytomous Rasch model as the most appropriate option. The Rating Scale Model (RSM) [35] was used to analyse the questionnaire data, and the parameters of the model were estimated by mean of the joint maximum likelihood estimation (JMLE) method, implemented using the Matlab 1 software [37].
The objective of the analysis was not to select only symptoms which fit the Rasch model but to ensure that most of the symptoms affecting the QoV, in general, are covered as suggested by Messick [23]. Furthermore, the interpretation of the outputs of the model is specific to the data of the response matrix under investigation.
Analysis of the questionnaire data collected one month post-operatively. From the estimates of symptoms' locations in Table 6 and depicted in Fig 5, the most prevalent symptom within the cohort was Starbursts (ST), followed by Glare (GL), Blurred vision (BV), Haloes and Fluctuation (HL, FL), Hazy vision (HV) and Double images (DI), respectively; whereas the cohort under investigation was barely affected by Difficulty in depth perception (DDP) and Distortion (DS). These results are corroborated by the Item Characteristics Curves (ICCs) depicted in Fig 6, where the ICC for the response category "Not at all" dominates nearly all the ICCs for the other response categories for Distortion, and the ICCs of the response categories "Not at all" and "A little" dominate all the ICCs for the other response categories for Difficulty in depth perception. Furthermore, the ICCs suggest that the response category "Quite" is the least reported by this cohort of patients.
The relatively high Outfit and/or Infit MNSQ values, compared to 1, for Group 2, Group 10, Group 16 and 17 indicated that most of the patients in these groups were annoyed by both the most and the least prevalent symptoms but not some of the other symptoms. However, this did not make these patients outliers.
The patients from this cohort who experienced most discomfort with their vision, and thus require additional care and monitoring, were those with higher location estimates. The top 10 patients, within the cohort, who experienced most discomfort with their vision are those in the rows highlighted in grey in Table 7, i.e. from Groups 12 to 17. From the questionnaire responses for these patients presented in Table 8, most of them reported significant discomfort from Glare (GL), Haloes (HL) and Starbursts (ST) but less from Distortion (DS) and Double images (DI) and to a certain extent Difficulty in depth perception (DDP). However, for the other symptoms their perception of visual discomfort is quite mixed.
Analysis of the questionnaire data collected one year post-operatively. From the estimates of the symptoms' locations in Table 9 and depicted in Fig 7, the most prevalent symptom, within the cohort, was Glare (GL), followed by Starbursts (ST), Fluctuation (FL), Haloes Table 6 Fig 8, where the ICC for the response category "Not at all" dominates nearly all the ICCs for the other response categories for Distortion. Moreover, the ICCs suggest that the response category "Quite" was barely reported by the patients this time round. However, this does not provide enough ground to dismiss this response category. Only a continuous analysis An alternative application of Rasch analysis to PROs data of data collected from various cohorts of patients might enable the confirmation of an excessive subscaling of the response options, if any. The relatively high Infit MNSQ values, compared to 1, for the symptom "Distortion", indicated that this symptom affected patients who were the most and least annoyed with their vision, but this did not make this symptom irrelevant. The relatively high Outfit and/or Infit MNSQ values, compared to 1, for Group 4, Group 9, Group 11, Group 12, Group 15 and 16 indicated that most of the patients in these groups were most annoyed by both the most and the least prevalent symptoms but not some of the symptoms in between. However, this did not make these patients outliers.

Table 7. Patients' location estimates (in logit), in terms of their perception of visual discomfort, and the corresponding standard errors, infit MNSQ and outfit MNSQ values, obtained from
QoV questionnaire data collected one month post-operatively. The patient IDs, highlighted in bold, correspond to the top 10 patients with the most visual discomfort, one year post-operatively.

Percentage of patients per group
One year post-operatively, the top 10 patients who were most annoyed with their vision are those in rows highlighted in grey in Table 10, i.e. from Groups 15 to 18. From the questionnaire responses, presented in Table 11, most of them reported significant discomfort from Glare (GL), Haloes (HL), Starbursts (ST), Blurred vision (BV), Hazy vision (HV) but not from Distortion (DS). Their perception of visual discomfort from the other symptoms is mixed.   The distribution of the locations of patients (in logit), in Fig 10(a), showed globally little variation in terms of the level of perception of visual discomfort within the cohort one month and one year post operatively. However, the results point-out that Patient 263 (Group 18) was significantly annoyed with his/her vision, one year post-operatively which was not the case one month post-operatively. From the distribution of patients per group, in Fig 10(b), there was a relative increase in both the fractions of patients who experienced less and more visual discomfort one year post-operatively compared to eleven months earlier. Table 10. Patients' location estimates (in logit), in terms of their perception of visual discomfort, and the corresponding standard errors, infit MNSQ and outfit MNSQ values, obtained from QoV questionnaire data collected one year post-operatively. The patient IDs, highlighted in bold, correspond to the top 10 patients with the most visual discomfort, one month post-operatively.

Standard Error
Outfit MNSQ

Infit MNSQ
Percentage of patients per group Group 1 1,3,10,15,16,20,22,38, 48, 54, 57, 59, 61, 64, 65, 70, 77, 101, 103, 111,  116, 117, 124, 127, 128, 129, 130, 134, 135, 142, 146, 148, 150, 153, 158, 159,  166, 167, 171, 173, 181, 182, 188, 190, 192, 196 An alternative application of Rasch analysis to PROs data The cohort of the top 10 patients, who were most annoyed with their vision one month post-operatively (Table 8) is entirely different from the cohort of the top 10 who experienced most discomfort one year post-operatively (Table 11). The top 10 patients, who were most annoyed with their vision one-month post-operatively, highlighted in bold in Table 10, have shown a significant improvement in their perception of visual discomfort. On the other hand, the top 10 patients who were most annoyed with their vision one year post-operatively, highlighted in bold in Table 7, were generally mildly annoyed with their vision one month post-operatively. The location distribution results, depicted in Fig 11, showed that the level of perception of visual discomfort from the top 10 patients is substantially higher one year postoperatively compared to one month post-operatively.  An alternative application of Rasch analysis to PROs data Remark 3 Following the approach advocated in previous studies, e.g. [10]- [6], [29]- [7], [8]- [30], which use the misfit statistics of the items and the Items Characteristics Curves (ICCs), to dismiss [9]- [18] or approve [9], [14], [21]- [30] an ophthalmic questionnaire, the following conclusions can be drawn on the QoV questionnaire used in this study: one month post-operatively, the values of the Outfit MNSQ and Infit MNSQ statistics, for all the symptoms, were below the 1.5 threshold and were not far away from the expected value of 1; all the response categories were expressed in the ICCs of most of the symptoms, except for Double Images (DI), Distortion (DS) and Difficulty in depth perception (DDP); hence, the QoV questionnaire is "Rasch-valid" after the removal of the aforementioned three symptoms.
On the other hand, the same questionnaire, administered to the same cohort of patients, becomes "Rasch-invalid" eleven months later, i.e. one year post-operatively, since the category response "Quite" was no longer expressed in the ICCs of all the symptoms and the values of the Infit MNSQ statistics for the symptoms Double images (DI) and Distortion (DS) exceeded the 1.5 threshold.

Alternative approach in the application of Rasch model to assess ophthalmic PROs data
Rasch analysis as an intelligent decision support system for deriving valuable insights from data collected via ophthalmic questionnaires. At the population level, such an approach enables one to investigate the prevalence of ophthalmic symptoms across different cohorts of patients, through a better characterization of patient groups pre-operatively and an appropriate follow-up post-operatively, in order to assess the effectiveness of a treatment-e.g. different types of intraocular lenses (IOLs) or different surgical procedures. At the individual level, the new approach can be applied across a population at different time points and identify patients who experienced most visual discomfort pre-operatively and/or post-operatively, so that additional appropriate care and monitoring can be dedicated to them. This new perspective will pave the way for a more adequate application of Rasch analysis within the context of ophthalmic questionnaires, so that insights gained from the analysis can be exploited to enhance the quality of care and patient care experience.
For illustrative purposes, the new approach was used to investigate the prevalence of QoV related symptoms across a cohort of patients at different time points. The analysis of the questionnaire data, using the new approach in the application of Rasch model, was used to characterize the variation in the prevalence of symptoms, from one month to one year post-operatively, and to identify the patients who experience the most visual discomfort at these two time points, and therefore can receive additional care and monitoring.
The purpose of this paper was not to attempt to advocate an alternative validation method of ophthalmic questionnaires or to supersede Rasch analysis but to highlight the importance of continuous assessment and monitoring of questionnaire data through Rasch analysis instead An alternative application of Rasch analysis to PROs data of simply dismissing or approving questionnaires based on a study of single cohort at a given, and present Rasch analysis as a decision support tool for deriving insights from data obtained using ophthalmic questionnaires. We will use the proposed alternative application of Rasch analysis to assess and compare the effectiveness of various IOLs, and to investigate the impact of patient characteristics such as lifestyle, age and gender, on the perception of visual discomfort post-operatively. Our future work will also further investigate validation methods of ophthalmic questionnaires.

A Appendix A-Derivation of the dichotomous Rasch model
If the responses to test items consist of only two categories then dichotomous item response models can be applied. Without loss of generality, we can assume that the response of any examinee p to any item i, u pi , can only be either 0 or 1. From Assumption 3, the response of an examinee p to item i, u pi , depends on a single parameter ξ pi , which goes from 0 to 1. Thus, the response probability for an examinee p to an item i can be defined by any continuous and monotonic function of ξ pi , which takes on only the values from 0 to 1, as ξ pi goes from 0 to 1. Rasch suggested [1,3] the following simple function: Therefore, Eqs (7) and (8) can then be written in a general form, as follows Substituting ξ pi by a p d i in Eq (9) yields However, the above formulation restricted the parameter ξ pi to vary from 0 to 1. Since, x pi ¼ a p d i , then this formulation restricted the ability and the difficulty parameters, a p and d i , respectively, to be either both positive or negative. However, it would be preferable to have a formulation where both the ability and the difficulty parameters can be used irrespective of their signs. One way to address the limitation of the above formulation is to consider a logarithmic transformation of both the ability and difficulty parameters as follows: Now, the rescaled ability and difficulty parametersâ p andd i , respectively vary from −1 to +1, and the following inverse transformation enables the recovery of the initial ability and difficulty parameters a p and d i : Substituting a p and d i by eâ p and ed i respectively, in Eq (10) yields Thus, Pðu pi ¼ 0jâ p ;d i Þ ¼ 1

A.1 Some mathematical properties of the Rasch model
Linearity. The extension of the logarithmic transformation (11)- (12) to the parameter ξ pi leads to the following result: Hence, after the above logarithmic transformation, the response probability of an examinee p to an item i is governed by the difference betweenâ p andd i . In other words, the response probability depends only on the distance between the examinee's ability and the item difficulty parameters both on the logit scale, i.e. a line similar to the one described in Fig 2. Therefore, the derived model becomes an additive model. Separation of parameters. Assumption 3 and Assumption 4 confer the Rasch model some desirable mathematical features, which enable the estimation of the two classes of parameters of the model, i.e.â p andd i , from the data response matrix, independently from one another. Given the Rasch model (15) and its parametersâ p andd i , which are not known yet, and a response data matrix U, the probability of the whole response data matrix-i.e. the likelihood, denoted L-consists of the following continued product.
The most desirable parametersâ p andd i for the Rasch model are those such that the likelihood, L, is maximal. However, obtaining these parameters from (19) can be tedious due to the complexity of the expression of likelihood L. On the other hand, the parametersâ p andd i , which maximize L are identical to those which maximize the logarithm of L. The logarithm of L, i.e. the log likelihood, of the data matrix U, writes where s p ¼ P n i¼1 u pi and s i ¼ P m p¼1 u pi denote the total score of the examinee p and the item i, respectively.
In order to estimate the desirable parametersâ p andd i , we need to solve the system (21)- (22), and the corresponding solution needs to satisfy the conditions (23)- (24).
An additional condition, namely X n i¼1d i ¼ 0, is included to the system (21) in order to have the item parametersd i centered at zero. It is worth mentioning that the parameters obtained from (21) and (23) are not deficiency free. Indeed, these estimates assume that the person score s p is independent from the difficulty of the items in the test, and likewise the item score s i is independent from the ability distribution of the persons tested. However, none of these assumptions are generally satisfied in practice. An adjustment of the observed scores s p and s i to the corresponding item difficulty and person ability distributions are required to estimate the desirable test-free person parametersâ p and sample-free item parametersd i [31].

B Appendix B-Derivation of the misfit statistics for the Rasch model
For the dichotomous Rasch model, the response of person p to an item i, u pi , is a variable following a Bernoulli distribution, i.e. it takes only two values, e.g. 0 and 1. The Rasch model estimates the probability of any instance of response u pi as whereâ p is the estimated ability parameter of the person p andd i is the estimated difficulty parameter of item i. The expected value of instances of u pi , denotedû pi , is given bŷ The variance of instances of u pi is given by The residual, i.e. the difference between the observed value of u pi and its estimated valueû pi , obtained via the Rasch model, is given by The standard residual, i.e. the residual divided by the expected standard deviation of instances of u pi obtained from (26), is given by The expected value of the standard residuals, denotedẑ pi , is given bŷ The variance of the standard residuals is given by Therefore, the standard deviation of the standard residuals, z pi , is 1.
For a large response data matrix, the standard residuals approximate a standard normal distribution with a mean of 0 and a standard deviation of 1, i.e. z pi $ N ð0; 1Þ; and consequently, the square of standard residuals approximate a chi-square distribution with one degree of freedom, i.e. z 2 pi $ w 2 1 : Either of the above reference distributions, i.e. N ð0; 1Þ and w 2 1 , can be used to assess the significance of the deviation of the standard residuals from their expected values. On the one hand, the analysis of the standard residuals enables the identification of ill-defined items, if any, which require further refinement to be inline with reasonable expectations. Furthermore, the standard residuals enable the identification of persons, if any, whose responses deviated from reasonable expectations [31].

B.1 Item misfit statistics
The infit mean square statistic for item i, denoted Infit MNSQ i , is given by the following weighted sum of the mean square residuals: The outfit mean square statistic for item i, denoted Outfit MNSQ i , is given by the unweighted An alternative application of Rasch analysis to PROs data sum of the mean square residuals: Although some of the statistical properties of the above outfit and infit statistics are not fully known, they are generally assumed to approximate a standard normal distribution (i.e. with a mean of 0 and a standard deviation of 1) in Rasch analysis literature. However, the distribution of their following cube-root transformation, suggested by Wilson and Hilferty [38], approximate a scaled chi-squared distributions. The transformed outfit and infit statistics are referred to as the outfit z-standardized and the intfit z-standardized, respectively, in Rasch analysis literature.
The intfit z-standardized statistics for item i, denoted Intfit ZSTD i , is given by where k i = Infit MNSQ i and q i is the standard deviation of the infit mean square statistic for item i. The outfit z-standardized statistics for item i, denoted Outfit ZSTD i , is given by wherek i ¼ Outfit MNSQ i andq i is the standard deviation of the outfit mean square statistic for item i.

B.2 Person misfit statistics
Like for items, the mean square misfit statistics for a person p are given by: Outfit MNSQ p ¼ P n i¼1 z 2 pi n : ð31Þ The z-standardized misfit statistics for a person p are given by: Outfit ZSTD p ¼ 3ðk 1=3 p À 1Þ q p þq with k p = Infit MNSQ p ,k p ¼ Outfit MNSQ p , whereas q p andq i are the standard deviations of the infit mean square and the outfit mean square statistic for person p, respectively.
Supporting information S1 Dataset. One month post operative data. This file contains the questionnaire data collected one month post-operatively. The column names indicate the 9 symptoms corresponding to the items. The rows correspond to the patients. The value 1, 2, 3 and 4 correspond to the severity levels "Not at all", "A little", "Quite", and "Very", respectively. (XLSX) S2 Dataset. One year post operative data. This file contains the questionnaire data collected one year post-operatively. The column names indicate the 9 symptoms corresponding to the items. The rows correspond to the patients. The value 1, 2, 3 and 4 correspond to the severity levels "Not at all", "A little", "Quite", and "Very", respectively.