Measurement Properties of Questionnaires Measuring Continuity of Care: A Systematic Review

Background Continuity of care is widely acknowledged as a core value in family medicine. In this systematic review, we aimed to identify the instruments measuring continuity of care and to assess the quality of their measurement properties. Methods We did a systematic review using the PubMed, Embase and PsycINFO databases, with an extensive search strategy including ‘continuity of care’, ‘coordination of care’, ‘integration of care’, ‘patient centered care’, ‘case management’ and its linguistic variations. We searched from 1995 to October 2011 and included articles describing the development and/or evaluation of the measurement properties of instruments measuring one or more dimensions of continuity of care (1) care from the same provider who knows and follows the patient (personal continuity), (2) communication and cooperation between care providers in one care setting (team continuity), and (3) communication and cooperation between care providers in different care settings (cross-boundary continuity). We assessed the methodological quality of the measurement properties of each instrument using the COSMIN checklist. Results We included 24 articles describing the development and/or evaluation of 21 instruments. Ten instruments measured all three dimensions of continuity of care. Instruments were developed for different groups of patients or providers. For most instruments, three or four of the six measurement properties were assessed (mostly internal consistency, content validity, structural validity and construct validity). Six instruments scored positive on the quality of at least three of six measurement properties. Conclusions Most included instruments have problems with either the number or quality of its assessed measurement properties or the ability to measure all three dimensions of continuity of care. Based on the results of this review, we recommend the use of one of the four most promising instruments, depending on the target population Diabetes Continuity of Care Questionnaire, Alberta Continuity of Services Scale-Mental Health, Heart Continuity of Care Questionnaire, and Nijmegen Continuity Questionnaire.


Introduction
Continuity of care is an important characteristic of good health care. [1][2][3][4] In the literature, continuity often refers to the extent by which care is provided by the same person (personal continuity). Personal continuity is relatively easy to measure as it can be expressed as an index, based on duration of provider relationship, density of visits, dispersion of providers or sequence of providers [5].
From the 1990's on, however, continuity of care is increasingly seen as a multidimensional concept. [6] Besides personal continuity, it also includes the seamless provision of care by a group of professionals in the medical home (team continuity), and continuity between different care settings, e.g. general practice and specialist care (cross-boundary continuity). [6][7][8] As more and more care providers are involved in individual patient care, the communication and cooperation aspects of care become increasingly important.
Measuring continuity of care in its multidimensional meaning requires a robust and solid measurement instrument. Reviews have shown that many instruments have been developed over time. [9][10][11][12][13] These reviews, however, did not include recent publications and have focused solely on one concept. As we found that other concepts like coordination and integration of care show great overlap with continuity of care [6], the limited continuity scope seems too narrow for a complete overview of instruments. Moreover, existing reviews have not systematically appraised the measurement properties of the instruments found. Therefore, we performed a systematic review to identify the instruments measuring continuity of care, to assess the dimensions of continuity in those instruments, and to evaluate their measurement properties.

Search Strategy
We searched the computerized bibliographic databases of PubMed, Embase and PsycINFO from 1995 to October 2011. We chose to start searching in 1995, as the multidimensional concept only emerged from then on. [6] It would therefore be very unlikely that relevant instruments developed before 1995 would use multidimensional definitions of continuity of care. We used the keywords 'continuity of care', 'coordination of care', 'integration of care', 'patient centered care', 'case management' and its linguistic variations in combination with a search filter developed for finding studies on measurement properties of measurement instruments (see Appendix S1). [14] We restricted our search to English or Dutch language articles. Reference lists were screened to identify additional relevant studies.

Selection Criteria
We included all articles describing the development and/or evaluation of the measurement properties of an instrument measuring -what we will define in this review as -continuity of care [6][7][8]: (1) care from the same provider who knows and follows the patient (personal continuity), (2) communication and cooperation between care providers in one care setting (team continuity), and (3) communication and cooperation between care providers in different care settings (cross-boundary continuity). Instruments measuring only one or two of these dimensions were also included. Instruments based on a single item or index or instruments also measuring other concepts besides these three dimensions of continuity of care were excluded.
Two reviewers (AU and CH) independently screened titles, abstracts and reference lists of the studies retrieved by the literature search. If there was any doubt as to whether the article met the inclusion criteria, consensus was reached between the reviewers. The full-text articles were reviewed by two independent reviewers (AU and CH) for in-and exclusion criteria. If necessary a third independent reviewer (HS) was consulted.

Data Extraction
Data extraction and assessment of measurement properties and methodological quality were performed by two reviewers (AU and CH) independently. In case of disagreement, a third reviewer (CT) made the decision. One of the found measurement instruments was developed and validated by AU [15;16], so CH and CT scored this instrument. All instruments were questionnaires with pre-defined answering categories. The following data were extracted: 1. Dimensions of continuity of care. For each questionnaire we identified which dimensions of continuity of care (personal, team and/or cross-boundary continuity) are measured. 2. Measurement properties. We describe the measurement properties of each questionnaire divided over three domains, according to the COSMIN taxonomy [17]: (1) reliability (including internal consistency, reliability, measurement error), (2) validity (including content validity, structural validity and hypothesis testing (construct validity)), and (3) responsiveness. These measurement properties are defined in Table 1. In addition, interpretability is also described. Interpretability is the degree to which one can assign qualitative meaning to quantitative scores. [17] This means that investigators should provide information about clinically meaningful differences in scores between subgroups, floor and ceiling effects, and the minimal important change. [18] Interpretability is not a measurement property, but an important characteristic of a measurement instrument [17]. 3. Quality assessment. Assessment of the methodological quality of the included studies was carried out using the COSMIN checklist. [19] This checklist consists of nine boxes with methodological standards for how each measurement property should be assessed. [20] Each item was rated on a 4-point scale (poor, fair, good or excellent). An overall score for the methodological quality of a study was determined by taking the lowest rating of any of the items in the nine boxes.

Best Evidence Synthesis -Levels of Evidence
Some studies evaluated the same measurement properties for a specific questionnaire. To determine the overall quality of each measurement property established in different studies we combined the results of the different studies for each questionnaire, taking into account the number of studies, the methodological quality of the studies and the direction (positive or negative) and consistency of their results.
The possible overall rating for a measurement property could reach 8 different categories (+++, ++, +, +/2, ?, 2, 22 or 222) [21;22] (Table 2). For example, when two studies of the same questionnaire show good methodological quality on evaluating 'reliability', then the overall rating would be either '+++' or '222' (Table 2), depending on the result (positive or negative) of the measurement property for which we used criteria based on Terwee et al. [23] (Table 1). These criteria were derived from existing guidelines and consensus within the research group of Terwee et al.
In this case, when both studies showed intraclass correlation coefficient (ICC) ,0.70, the overall rating would be '222'. This means that there is strong evidence (multiple studies of good methodological quality) for low levels of reliability. However, when there is only one study of fair methodological quality showing ICC.0.70, the overall rating would be '+'. When one study shows ICC.0.70, while another study shows ICC,0.70, the overall rating would be '+/2'. When there are only studies of poor methodological quality, the overall rating would be '?', independent of the result of the measurement property.

Results
The search strategy resulted in 4749 articles from PubMed, 2366 articles from Embase and 349 articles from PsycInfo ( Figure 1). From these searches, we included 23 articles in this review. We included one extra article that was not yet published which describes the validation of an included measurement instrument. [16] Reference tracking did not result in additional articles. Finally, we included 24 articles describing the development and/or evaluation of 21 questionnaires measuring continuity of care [15;16;24-45]. Table 3 presents an overview of the identified questionnaires. Seventeen questionnaires measured continuity of care from the perspective of the patien [15;16;24-27;29-35;37-41;43-45], four from the perspective of the care provider/program director [28;36;42]. From the instruments measuring continuity from the perspective of the patient, three were developed for diabetic patient [29;33;44], three for patients with a mental illnes The degree to which the scores of an instrument are consistent with hypotheses (e.g. with regard to internal relationships, relationships to scores of other instruments, or differences between relevant groups) based on the assumption that the other instru + + Correlation with an instrument measuring the same construct $0.50 OR at least 75% of the results are in accordance with the hypotheses AND correlation with related constructs is higher than with unrelated constructs ?
? Solely correlations determined with unrelated constructs 2 2 Correlation with an instrument measuring the same construct ,0.50 OR ,75% of the results are in accordance with the hypotheses OR correlation with related constructs is lower than with unrelated constructs

Responsiveness
The ability of an instrument to detect change over time in the construct to be measured + + (Correlation with an instrument measuring the same construct $0.50 OR at least 75% of the results are in accordance with the hypotheses OR AUC $0.70) AND correlation with related constructs is higher than with unrelated constructs ?
? Solely correlations determined with unrelated constructs 2 2 Correlation with an instrument measuring the same construct ,0.50 OR ,75% of the results are in accordance with the hypotheses OR AUC ,0.70 OR correlation with related constructs is lower than with unrelated constructs a The word 'true' must be seen in the context of the classical test theory, which states that any observation is composed of two components -a true score and error associated with the observation. 'True' is the average score that would be obtained if the scale were given an infinite number of times. It refers only to the consistency of the score and not to its accuracy. MIC = minimal important change, SDC = smallest detectable change, LOA = limits of agreement, ICC = intraclass correlation coefficient, AUC = area under the curve. Most questionnaires were originally developed in English, except for the Dutch questionnaires of Casparie et al. [27] and Uijen et al. [15;16], the Chinese questionnaire of Wei et al. [44], and the Swedish questionnaire of Ahgren et al [25].   . The methodological quality of the studies is presented in Table 5 for each questionnaire and measurement property. Most studies assessed the internal consistency, content validity, structural validity and construct validity of the instruments, although frequently the methodological quality of the studies regarding these measurement properties was fair or poor. The reliability and measurement error were only assessed in a minority of the studies and the methodological quality regarding these measurement properties was often fair or poor. Cross-cultural validity, criterion validity and responsiveness were not assessed in any of the studies.
The synthesis of results per questionnaire and their accompanying level of evidence are presented in Table 6. Six instruments (CPCI [31], CCI [26], CPCQ [40], HCC [34;39], CCCQ [45] and NC [15;16]) scored positive on the quality of at least three measurement properties. Information regarding the interpretability of the instruments was missing in most studies.

Discussion
In this systematic review we found 21 instruments measuringwhat we define as -continuity of care. We found six instruments that we would probably not have found when we would have focussed our review solely on continuity of care, instead of taking into account related concepts as coordination and integration.  [36] and CPCI measures 'attributes of primary care' [31].
Most included instruments have problems with either the ability to measure all three dimensions of continuity of care or the number or quality of its assessed measurement properties.
Only about half of the questionnaires measured all three dimensions of continuity of care (personal, team and crossboundary continuity). Of most instruments three or four measurement properties were assessed (mostly internal consistency, content validity, structural validity and construct validity). Only six instruments (CPCI [31], CCI [26], CPCQ [40], HCCQ [34;39], CCCQ [45] and NCQ [15;16]) scored positive on the quality of at least three measurement properties. These findings do not mean that the other questionnaires are of poor quality, but imply that studies of high methodological quality are needed to properly assess their measurement properties.

Strengths and Limitations
One of the strengths of this review is that our search not only focused on the concept of 'continuity of care', but also took into account the relating concepts 'coordination of care', 'integration of care', 'case management' and 'patient centred care'. This resulted in the inclusion of instruments which measure the same aspects of care but are defined in different ways.
To our knowledge, this is the first review on measurement instruments for continuity of care that systematically appraised the measurement properties of the instruments found. This allows us to compare the instruments on the quality of their measurement properties.
We used a robust and standardized method to assess the quality of the measurement properties, which attributes considerably to the continuity knowledge base.
A limitation of this study is that we searched from 1995 onwards. Measurement instruments developed before this time were not included in our review. However, because of the     Table 6. Quality of measurement properties and the interpretability per instrument. changing definitions of continuity over time, we consider it very unlikely that we missed relevant instruments [6]. Another limitation is that the raters had to make a large number of judgements on each study and each measurement instrument. Although the COSMIN checklist [19] and the quality criteria for the measurement properties [23] are defined as objective as possible, different raters could come to a different judgement. That is why two reviewers assessed the measurement properties and methodological quality of the studies, and in case of disagreement a third reviewer was consulted.

Comparison with Existing Literature
Previous reviews have identified many instruments measuring continuity of care or one of its related concepts, such as patient centred care or integrated care. [9][10][11][12][13] Most reviews have limited their search to only one concept. We found only one review, identifying measures of integrated care, that broadened its search to concepts as continuity of care, care coordination and seamless care, but this review did not systematically appraise quality measures of the instruments. [13] Most instruments included in previous reviews have not been included in our review due to several reasons. Some studies did not describe the development or evaluation of the measurement properties at all, some did not measure -what we define in this review as -continuity of care, and some measured a much broader concept than continuity of care (e.g. all key areas of primary care including accessibility and thoroughness of physical examination).
We found no review assessing the quality of the measurement properties of the included instruments. Hudon et al. systematically assessed the quality of the included articles, i.e. whether all relevant information such as characteristics of the study population was described. [10] However, the quality of the measurement properties was not assessed.

Implications for Practice and Research
The decision which instrument to use will depend on the characteristics of the study population, the ability and desire to measure all three dimensions of continuity, the population in which the instrument was developed and/or validated, the quality of the measurement properties and the interpretability of the instrument.
For a comprehensive measurement of continuity of care, we recommend to use the the DCCQ [44] for diabetic patients, as both other questionnaires for diabetic patients (DCCS [29] and ECC-DM [33]) either do not measure all three dimensions of continuity of care or show lower quality of their measurement properties and interpretability.
For patients with a mental illness, we recommend to use the the ACSS-MH [24;30;37]. Both other questionnaires available for patients with a mental illness (CONNECT [43] and CONTINU-UM [41]) are only validated in primary care, do not measure all three dimensions of continuity of care or show lower quality of their measurement properties and interpretability.
For patients with heart failure or atrial fibrillation, we only found the HCC [34;39]. As this instrument measures relational, team and cross-boundary continuity and shows good quality of the measurement properties, this seems to be a proper questionnaire for this patient group.
For patients with a (chronic) illness (irrespective of the type of (chronic) illness), we found the CPCI [31], VCC [27], CPCQ [40], the instrument of Gulliford et al. [32] and the NCQ [15;16]. For a comprehensive measurement of continuity of care, the NCQ is the only questionnaire that has been validated in primary and secondary care and shows the highest quality of its measurement properties and interpretability.
The instruments developed to measure continuity for patients with cancer (CCCQ [45] and the instrument of King et al. [38]), patients previously hospitalized (CCI [26] and PCCQ [35]), and users of welfare services (instrument of Ahgren et al. [25]) all have problems regarding the limited number of dimensions of continuity measured, the limited quality of the measurement properties or the low interpretability of the instrument. The instruments developed to measure continuity of care from the perspective of the provider (CCPS-I [42], CCPS-P [42], CRP-PIM [36] and CSI [28]) need to be used with caution because of the limited quality of the measurement properties and interpretability.
For future research, we believe it is especially important to further evaluate the measurement properties and interpretability of the promising DCCQ, ACSS-MH, HCCQ and NCQ. For none of these instruments, responsiveness is evaluated, although this is an important characteristic of a questionnaire, especially when used to measure change in continuity of care. As the DCCQ and NCQ are originally developed in respectively Chinese and Dutch, cross-cultural validation needs to be evaluated.

Supporting Information
Appendix S1 Search strategy.