Assessing unmet needs in patients with cancer: An investigation of differential item functioning of the Needs Evaluation Questionnaire across gender, age and phase of the disease

The Needs Evaluation Questionnaire (NEQ) is a self-administered instrument used in oncology clinical practice and research. Objective The main aim of this study was to provide evidence of the broad employability of the NEQ with patients of different gender and age with cancer in different phases of the disease and care process, using an Item Response Theory (IRT) approach and investigating Differential Item Functioning (DIF). Methods The NEQ was completed by 762 patients visiting, consecutively, outpatient clinics or admitted to oncology wards. Patients included in the study had different primary tumor sites and were in different phases of the disease and care process. The properties of the questionnaire were analyzed by applying IRT to test how well each item of the scale concurs in measuring unmet needs, how reliable the whole scale is, and whether the scale was metrically invariant across gender, age, and phase of the disease. Results Results showed that the NEQ performed well in measuring unmet needs and measurement equivalence of the scale across gender, age, and phase of the disease was verified. Conclusions The current study supports the utility and broad employability of the NEQ, thus providing empirical evidence that it is psychometrically sound and metrically equivalent across different groups of cancer patients. As such, the scale could be an effective tool when planning psychosocial interventions to improve the care process and patients’ quality of life.


Introduction
Patients with cancer have different needs which are, in part, related to the illness and in part to the care process [1][2][3][4][5][6]. As stated by Osse et al. [7] an unmet need corresponds to a desire to receive support with a demand perceived by the patient as not adequately met by the care system, and as such it is a request to the health system in general and to the staff involved in caring in any clinical setting. These needs are distributed in several main thematic areas: information and dialogue with physicians [8][9][10][11], assistance/care [12,13], psychosocial support [9,11,14,15], spiritual issues [16], and sexual problems [17,18].
Many patients still do not voluntarily express their concerns to oncologists or nurses. As far as the control of symptoms, some patients feel that suffering (pain, anxiety, depression, anger, etc.) is inevitable when you have cancer, or that there is no effective treatment to address those issues which doctors do not include spontaneously among their concerns [1]. In many cancer patients, especially in males and the elderly, low levels of spontaneous communication of unmet needs have been observed [19][20][21]. In oncology practice, therefore, the introduction of routine evaluation of patients' unmet needs, based on clinical interviews and specific questionnaires, has become a common practice. Indeed, the evaluation of unmet needs of cancer patients offers oncologists individual responses and helps solve specific problems (e.g. information on the diagnosis or therapy, control of symptoms, psycho-emotional support) and, in subgroups of patients with similar socio-demographic and clinical characteristics, helps identify intervention protocols and targeted services.
To assess the unmet needs in patients with cancer several specific scales have been developed but only a few of them have been proven reliable and valid measurements [22,23]. Among them, the Needs Evaluation Questionnaire (NEQ) [8,24], a self-administered scale particularly useful both in clinical practice and research due to its agility and ease of administration. The items on the NEQ are divided into five main areas: information needs, needs related to assistance/care, relational needs, psycho-emotional needs and material needs. The scale was originally developed by Tamburini and colleagues for use in hospital oncology settings, but recently the validity of this scale with outpatients from oncology Day Hospital settings, follow up ambulatory, and rehabilitation units was supported [25].
The psychometric properties of the NEQ have been investigated according to Classical Test Theory [24,26], a traditional and widely used approach to evaluate psychological assessment instruments. To provide further evidence of the suitability of the NEQ in measuring unmet needs in patients with cancer, in the current study we investigated the psychometric properties of the scale by employing Item Response Theory (IRT). IRT is a parametric statistical modeling procedure which involves fitting a hypothetical model to sample data, assuming that the characteristics of items on a test (i.e. item parameters) and the characteristics of individuals (i.e. latent traits) are related to the probability of a positive response (i.e. a trait-consistent endorsement of an item). Since applications of IRT have potential benefits in testing the accuracy of assessment instruments, we propose that IRT can help in testing the psychometric properties of the NEQ.
Preliminarily, we studied the characteristics of the single items and the characteristics of the whole scale. Indeed, IRT analyses provide: item location and discrimination parameters that enable evaluation of the level of the construct targeted by the item, and how well an item performs in measuring that level of the underlying construct; and the Test Information Function (TIF) which, instead of providing a single value (e.g. coefficient alpha) for reliability, evaluates the precision of the test at different levels of the measured construct [27,28].
Then, since the main aim of this study was to provide evidence of the broad employability of the NEQ, our study aimed at investigating the measurement equivalence of the scale across different groups. Indeed, the NEQ has been used with patients of varying age, gender, and phase of the disease and care process (e.g. inpatients and outpatients). Nonetheless, the needs described by the items of the NEQ may be understood or interpreted differently by different respondents (e.g. the readability of the items might be greater for younger than older people, the meaning of the items might change for men or women, the appropriateness of the vocabulary might not be the same for patients during diagnosis and/or treatment or patients during follow up). As such, testing the invariance of the NEQ across these groups might be especially useful to ensure valid interpretation of scores as well as group differences. This fundamental measurement issue is adequately addressed by IRT procedures that allow the assessment of Differential Item Functioning (DIF) [27,29], which examines the relationship between item response and another variable, called the group variable (e.g. gender). This latter depends on measurement of an underlying construct (e.g. unmet needs) to ascertain whether, after controlling for the underlying construct, the response to an item is related to group membership. For example, a randomly selected woman with a specific unmet need and a randomly selected man with the same unmet need should have the same chance of endorsing an item referring to this need. If this not the case, the item is biased. Thus, DIF analysis involves three factors: item response, trait level, and subgroup membership. If a test includes many items with DIF, the differences in the scores are not an exclusive function of the measured trait but there are some artifacts in the measurement process due to group membership.
To sum up, in order to provide evidence of the suitability of the NEQ in measuring unmet needs in patients with cancer we investigated: how well each item of the scale concurs in measuring unmet needs, how reliable the whole scale is in measuring unmet needs, and whether items have different measurement properties in different groups testing the equivalence of the NEQ items across gender, age, and phase of the disease.

Participants
The present sample is part of a wider survey on unmet needs of cancer patients called I.B.I.S. (Indagine sui Bisogni Insoddisfatti nella Sanità) Project [Survey on Unmet Health Care Needs]. The survey involved patients from six different oncology units in Tuscany, Italy. Participation was proposed to all patients visiting, consecutively, outpatient clinics or admitted to oncology wards, regardless of site or stage of the tumour. The percentage of patients who accepted to take part in the research in the different medical units ranged from 71.0% to 95.4%.
Clinical data and, in particular, phase of disease and care process were provided by oncologists. Exclusion criteria were: age under 18 or over 90, cognitive impairment or psychiatric diseases symptoms, severe symptoms due to illness or side effects of therapy that precluded to complete questionnaires independently. The total sample was composed of 762 patients (65% female) with a mean age of 61.71 years (SD = 12.06).

Measure and procedure
Participants filled out a paper-and-pencil self-report battery which included the NEQ [8,24], a self-administered instrument with 23 dichotomous items (i.e. yes/no answer) assessing needs in five areas: informative needs, needs related to assistance /care, relational needs, needs for psycho-emotional support, material needs (see Appendix). This battery was presented by the psycho-oncologist or by nurses or physicians, depending on the setting (ward, day hospital, rehabilitation ambulatory or follow up ambulatory), the second day after admission to the ward or during waiting periods at the day hospital or ambulatory. Patients were assured that participation was totally free and voluntary and that non-adherence did not alter care received by ward staff. They did not receive any assistance in completing the battery. Approximately, they employed a time of between 5 and 10 min to answer the NEQ.

Statistical analyses
To apply IRT modeling it is important to ascertain if there is a common factor among the items; if so, item parameters reflect the relation between the common latent trait and the item responses [30][31][32]. Since the NEQ has been described as multidimensional [24][25][26] -due to clusters of items that reflect specific needs (i.e. needs belonging to the same domain)-we started testing if the scale can be considered "unidimensional enough" so that the item parameter estimates properly reflect the latent trait held in common among the items (i.e. unmet needs). Indeed, the robustness of unidimensional IRT model parameter estimates to multidimensionality violations has been demonstrated, concluding that if the data have a "strong" common factor or multiple highly correlated factors, then IRT item parameter estimates are not seriously distorted [30,31]. Thus, to verify this prerequisite we tested a second-order model (i.e. we tested the hypothesis that seemingly distinct but related constructs can be accounted for by one common underlying higher order construct) in which the higher-order latent variable was the general unmet needs whose influence is shared among the five firstorder specific needs (information needs, needs related to assistance/care, relational needs, psycho-emotional needs and material needs) as defined by Annunziata et al. [26]. This Confirmatory Factor Analysis (CFA) for dichotomous data was implemented in Mplus software [33].
The goodness of fit of the IRT model was evaluated using M 2 statistic and the associated root mean square error of approximation (RMSEA) value. M 2 statistic, like other chi-square statistics, is generally unrealistic because there will be some error in any strong parametric model, thus the RMSEA provides a metric for model error [34]. The item fit under the IRT model was tested for each item computing the S-χ 2 statistics. Due to the large sample size and the above mentioned limitations of the chi-square statistics, α was fixed at .01. IRT models use the original response data to estimate probabilities of responses as a function of the latent trait (theta). Item parameters are: the location parameters (b), which indicate the trait level where there is a 0.5 probability of endorsing the affirmative option, and the discrimination parameter (a), which indicates the ability of an item to discriminate people of different levels of the underlying trait.
We investigated the Test Information Function (TIF) which provides test reliability estimations, indicating the precision of the whole test for each level of the latent trait [27]. This means that the more information the test provides at a particular ability level, the smaller the error associated with ability estimation is and thus the higher the test's reliability. In terms of graphical presentation, the test information curve shows how well the construct is measured at different levels of the underlying measured trait.
To test the measurement invariance of the NEQ across genders, ages, and phases, DIF analyses were performed. The DIF detection procedure is based on a nested model comparison approach. First, a more parsimonious model is tested with all parameters constrained to be equal across groups for a studied item against an augmented model. Here, one or more parameters of studied item are freed to be estimated distinctly for the two groups (a focal group and a reference group). This procedure involves comparing differences in log-likelihoods (distributed as chi-square) associated with nested models. To adjust for multiple comparisons and the aforementioned limitations of the chi-square statistics, α was fixed at .01. Initial DIF estimates can be obtained by treating each item as a studied item while using the rest as "anchor" items. Anchor items are assumed without DIF and are used to estimate the trait, and to link the two groups being compared in terms of trait levels. Anchor items are selected through a process of log-likelihood comparison performed iteratively and called "purification" procedure. During this iterative process the DIF status of items may change as a result of using a less than optimal conditional variable at various steps in the analyses. Since DIF analyses examine differences in item parameters, two types of DIF can be detected: uniform DIF that refers to location parameters, and non-uniform DIF that refers to discrimination parameters.

Results and discussion
The second-order model showed a goof fit (CFI = .96; TLI = .98; RMSEA = .055). The factor loadings were all significant (p < .01) and adequate in size, ranging from .43 to .94. The firstorder latent variables were all significantly related to the second order latent variable and second-order loadings ranged from .77 to .98. The unidimensional IRT model showed satisfactory fit (M 2 = 1144.55, df = 230, p = 0.0001; RMSEA = .07). Each item had a non-significant S-χ 2 value, indicating that all items fit under unidimensional IRT model. Item calibration showed that all the b values were above the mean trait level with the exception of item 2 which was slightly below the mean. Specifically, as showed in Table 1, eleven items were about a half standard deviation above the mean, four items about one, and the remaining eight items one and a half standard deviations or more above the mean. In this case, higher values identify needs perceived to a lesser extent (e.g. Item 16: "I need economic help"), while lower values represent compelling needs (e.g. Item 2: "I need more information about my future condition"). The range of item discrimination parameters was between .65 and 3.90. According to Baker [36], seven items showed moderate, five high, and the remaining eleven items very high discriminative power.
The TIF (Fig 1) showed that within a large range of trait, the amount of test information was equal to or greater than 4 indicating that the instrument was sufficiently informative for this range of the trait. Indeed, if we interpret the information magnitude by computing the associated reliability (r = 1-1/Information), reliability was equal to or greater than .75 within the aforementioned range.
In the first step of gender DIF analyses (in which the male group was the reference group), item 18 was identified as the studied item (p = .0013). Then, using all the other items as "anchor" items, the analysis was repeated. During this iterative process the DIF status of item 18 did not change. Specifically, whereas discrimination parameters were invariant across groups, this item showed uniform DIF referring to location parameters (i.e. male respondents were consistently less likely than female respondents to endorse this item). Nevertheless, since just one item out of 23 exhibits DIF, the NEQ can be considered invariant across gender.
Concerning age, comparing Under 65 (reference group) and Over 65, no item exhibited DIF from the first step of the analysis. As such, the NEQ can be considered invariant across age.
For the phase of the disease, comparing Diagnosis/Treatment (reference group) and Follow Up/Rehabilitation, item 10 was identified as the studied item (p = .0077). Then, using all the other items as "anchor" items, the analysis was repeated. During this iterative process the DIF status of item 10 did not change. Specifically, whereas discrimination parameters were invariant across groups, this item showed uniform DIF referring to location parameters (i.e. Follow Up/Rehabilitation patients were consistently less likely than Diagnosis/Treatment patients to endorse this item). Again, since just one item exhibits DIF, the NEQ can be considered invariant across the phase of the disease.
From visual inspection of the TIFs of the different groups, we can see that the scale functions exactly in the same way across groups (Fig 2). Indeed, the amount of test information was sufficiently high within the same range of the trait, and the standard errors of the measurement were very similar attesting to measurement equivalence of the scale.

Conclusion
The NEQ was originally developed by Tamburini and colleagues [8,24] for use in oncology wards to assess the needs of cancer patients and their families. Indeed, patients diagnosed with cancer have many needs with regard to relief from physical and psycho-social distress and to improve their quality of life. Some of the most important needs concern the possibility to be cured or to have their life prolonged, the control of symptoms, information and dialogue with clinicians, and material, psychological and spiritual support [1][2][3][4]. Therefore, the introduction of a routine and customized assessment of the unmet needs of cancer patients has become very common. As such, it is important that the instrument employed to assess the needs be psychometrically sound and equally suitable for patients with different characteristics, e.g. men or women, old or young, inpatients or outpatients in oncology settings. For these reasons, the current paper aims to provide evidence of the suitability of the NEQ in measuring unmet needs investigating whether the scale is metrically invariant across gender, age, and phase of the disease, and how reliable the whole scale is across these groups.
Overall, the spread of location parameters indicated that the described needs were perceived differently. Additionally, all the items have an adequate discriminative power, i.e. able to distinguish among people who have or do not have each specific need described by the item. With regard to reliability, the scale was sufficiently informative for a large range of the trait. This result might be indicative of the utility of the NEQ in recognizing among patients with cancer those who truly have a significant number of unmet needs in order to care for them and promote intervention strategies.
The NEQ appears to perform equally well regardless of gender, age, and phase of the disease of patients with cancer. Indeed, with only two exceptions, the items on the scale did not show a different functioning across groups. With regard to reliability of the whole scale, the NEQ was equally informative for quite a large range of the traits in these different subsamples. This finding confirms that the meaning, the wording, and the readability of the items are the same for men and women, younger and older patients, patients during diagnosis and/or treatment or patients during follow up and/or rehabilitation. As such, the scale ensures valid interpretation of scores as well as of group differences regardless of the specificity of the patient's characteristics.
Due to its agility and ease of administration the scale has been proven to be particularly useful both in clinical practice and research [8,24,25]. The current study supports the utility and large employability of the scale, providing empirical evidence that the NEQ is psychometrically sound and metrically equivalent across different groups. Moreover, the scale assesses with adequate reliability the measured construct effectively distinguishing individuals with different levels of unmet needs. A practical consequence of this is that the scale can be used to detect patients who need support so they may receive targeted effective answers (e.g. information on diagnosis or therapy, control of symptoms, psycho-emotional support). Additionally, no relevant differences were found in the item functioning when comparing men and women, older and younger respondents, and patients in a different phase of the disease. As a result, no different scoring rules or interpretation are needed when using the scale with different populations, and it can be used in subgroups of patients with different demographic and clinical characteristics inside wide intervention protocols and targeted services.
However, this research has some limitations which must be noted. First, there are some problems in generalizing the obtained findings since we employed an Italian sample. Moreover, due to the small sample sizes, differences among some particular subsamples (e.g. relapse) or extremely different groups (e.g. very young vs very old patients) were not investigated. Arguably, it would be noteworthy to investigate if scale functions differently for some kinds of patients, at least for some items/needs. Future studies might confirm and extend the current results and go beyond the limitations of the present findings. To extend the use of the current scale it would be relevant to test the psychometric properties of different language versions of the scale in different oncology settings. In the same way, DIF analysis across language and settings should be performed to provide additional evidence of the invariance property of the scale. As such, the NEQ might be a robust assessment that provides comparable scores from different contexts. Finally, to provide evidence of the suitability of the NEQ in test-retest research design (e.g. in studies aiming at verifying the efficacy of intervention programs on patient's needs), it would be relevant to test the stability of the instrument over time.
In conclusion, the NEQ seems to be an effective tool across different age, gender, and phases of the disease, and care process patients in the assessment of unmet needs.