Screening for Vulnerability in Older Cancer Patients: The ONCODAGE Prospective Multicenter Cohort Study

Background Geriatric Assessment is an appropriate method for identifying older cancer patients at risk of life-threatening events during therapy. Yet, it is underused in practice, mainly because it is time- and resource-consuming. This study aims to identify the best screening tool to identify older cancer patients requiring geriatric assessment by comparing the performance of two short assessment tools the G8 and the Vulnerable Elders Survey (VES-13). Patients and Methods The diagnostic accuracy of the G8 and the (VES-13) were evaluated in a prospective cohort study of 1674 cancer patients accrued before treatment in 23 health care facilities. 1435 were eligible and evaluable. Outcome measures were multidimensional geriatric assessment (MGA), sensitivity (primary), specificity, negative and positive predictive values and likelihood ratios of the G8 and VES-13, and predictive factors of 1-year survival rate. Results Patient median age was 78.2 years (70-98) with a majority of females (69.8%), various types of cancer including 53.9% breast, and 75.8% Performance Status 0-1. Impaired MGA, G8, and VES-13 were 80.2%, 68.4%, and 60.2%, respectively. Mean time to complete G8 or VES-13 was about five minutes. Reproducibility of the two questionnaires was good. G8 appeared more sensitive (76.5% versus 68.7%, P =  0.0046) whereas VES-13 was more specific (74.3% versus 64.4%, P<0.0001). Abnormal G8 score (HR = 2.72), advanced stage (HR = 3.30), male sex (HR = 2.69) and poor Performance Status (HR = 3.28) were independent prognostic factors of 1-year survival. Conclusion With good sensitivity and independent prognostic value on 1-year survival, the G8 questionnaire is currently one of the best screening tools available to identify older cancer patients requiring geriatric assessment, and we believe it should be implemented broadly in daily practice. Continuous research efforts should be pursued to refine the selection process of older cancer patients before potentially life-threatening therapy.

Data Availability: The authors confirm that, for approved reasons, some access restrictions apply to the data underlying the findings. This manuscript adheres to the appropriate reporting guidelines and community standards for data availability; Any related manuscripts currently in press or under consideration elsewhere are mentioned in the cover letter and will be uploaded as part of your submission as a related manuscript; Any persons named in the Acknowledgments section of the manuscript, or referred to as the source of a personal communication, have agreed to being so Introduction Cancer occurs predominantly in the older population, yet patients over 60 years are significantly under-represented in clinical trials in oncology [1,2]. Consequently, oncologists are confronted with the paucity of clear therapeutic directives, and older patients are often offered reduced treatments and face worse outcomes [3], including an increased risk of toxicity or even early death [4]. With growing numbers of older cancer patients, and considerable heterogeneity among them, effective tools are required for oncologists to better define the trade-off between treatment benefits and toxicity risk.
Several recent reports have strongly suggested that different components of comprehensive (CGA) or multidimensional geriatric assessment (MGA) can be useful in oncology to predict early death [4], functional decline [5], toxicity [6,7] and ultimately survival [8][9][10], and to adapt cancer treatment [11]. However, despite recommendations from the International Society of Geriatric Oncology (SIOG) [12], CGA is still underused in practice. The likely reason is that it is timeand resource-consuming, which makes it unaffordable for community and small cancer hospitals. Furthermore, true CGA (in contrast to MGA which involves the administration of a range of assessments) is conducted by an experienced geriatrician who interprets and can act upon the MGA results, and geriatricians are rarely available in most cancer treatment structures. This has made the development of shortened instruments essential [13,14]. To be acceptable for the whole community, such instruments should be performed quickly (less than 10 min) by a nurse or physician trained for the tool completion, but not necessarily trained in geriatrics.
In response to a French National Cancer Institute (INCa) call for proposal, and following escalating appeals for validated geriatric screening tools [15][16][17], we developed the G8 screening tool to identify older cancer patients requiring geriatric assessment. The G8 tool originated from a regional multicenter prospective cohort of 364 cancer patients treated by first-line chemotherapy [18,19]. The reference test (or ''gold-standard'') was defined as at least one abnormal geriatric assessment test among seven. Preliminary results indicated 85% sensitivity and 65% specificity, which was promising given the priority of a screening test for maximum sensitivity to minimize the number of patients not detected.
We subsequently launched the national ONCODAGE multicenter study to validate the G8 tool prospectively. The primary objective was to validate the G8 instrument as a screening tool to identify older cancer patients (.70 years) requiring geriatric assessment by comparing with a reference test of MGA. Secondary objectives included assessing the diagnostic accuracy of G8 in specific sub-populations, the diagnostic accuracy of VES-13 and comparing it to that of G8, the within-patient reproducibility of both tests, and the prognostic value of both tests in terms of 1-year survival. Additional exploratory analyses included the assessment of the diagnostic accuracy of G8 and VES-13 using a modified reference test (at least two MGA tests with abnormal scores), and sensitivity analyses to assess the impact of missing questionnaires in the definition of the reference test.

Patients
We recruited patients from 23 health care facilities, including the 15 INCaaccredited Regional Coordination Units for Geriatric Oncology. Patients eligible were older than 70 years and were included either before any first-line treatment, or between any two steps of a pre-defined first-line treatment sequence (chemotherapy, endocrine therapy, targeted treatment, surgery or radiotherapy) for various types of histologically-confirmed cancer (colon, lung, upper aero digestive tract (UAT)/head and neck, breast, prostate, and non-Hodgkin's lymphomas (NHL)). Patients with known central nervous system metastases were excluded. Patients were informed of the study and provided their signed informed consent prior to enrollment and G8/MGA assessment. The protocol was approved by the regional ethics committee (Comité de Protection des Personnes Sud-Ouest et Outre Mer III), and was conducted in accordance with the Declaration of Helsinki, Good Clinical Practices (Trial registration: NCT00963911).

Test methods
The G8 index test At the first visit after enrollment, patients received a full clinical examination and completed the G8 test with a nurse, a clinical research assistant (CRA), or a physician. The G8 consists of eight items: patient age (.85, 80-85, ,80), and seven items from the original 18-item MNA (appetite changes, weight loss, mobility, neuropsychological problems, body mass index, medication, and selfrated health). The total score ranges from 0 to 17, with lower scores indicating a higher risk of impairments. [18] The cut-off value for an 'impaired' reference test score was #14 and the time taken to complete the test was recorded. The G8 questionnaire is provided in S1 Appendix.
The VES-13 questionnaire VES-13 is a self-administered questionnaire that was completed during the first visit after enrollment. For three pre-identified centers, patients also filled in the questionnaire at the following geriatric visit. VES-13 consisted of four groups of questions: age, self-perceived health, difficulties to perform six specific activities, and difficulties to perform daily living tasks due to health concerns. The score ranged from 0 to 10 and a score $3 was considered to show impairment.

Multidimensional geriatric assessment (MGA) reference test
Patients underwent a geriatric evaluation in the month following the completion of G8 and VES-13 (+/-seven days) before treatment began. The nurse completed six of the seven instruments of the MGA as already described [18] (MNA, Timed Get up and Go (TUG), Activities of Daily Living (ADL), Instrumental ADL (IADL), Mini Mental State Examination (MMSE), and Geriatric Depression Scale (GDS-15)), and the geriatrician rated comorbidity on the Cumulative Illness Rating Scale (CIRS-G), recorded the time required for the consultation, identified patients who needed personalized geriatric interventions, and, if necessary, proposed further geriatric evaluation (outside of the scope of this study). G8 results were blinded to both the geriatrician and the nurse.
Abnormal scores for each instrument were established according to the following published cut-offs [18]: at least one Grade $3 comorbidity on the CIRS-G [20] (excluding the cancer being treated); ADL#5, IADL#7 across genders, MNA#23.5, MMSE#23/30, GDS15$6, and TUG.20 seconds. Based on preliminary analyses [18], we considered the reference test to be 'impaired' if scores on the seven instruments were available and one or more of them was abnormal, or if the score on one or more instruments could not be calculated due to one or more missing item or unavailable instrument.
The reference test was defined as normal if scores for the seven instruments were available and normal. In a subsequent exploratory analysis, the reference test was modified and we considered the reference test to be 'impaired' if the seven instruments were available with two or more abnormal scores, or if the score for two or more instruments could not be calculated due to one or more missing items or unavailable instruments.

Statistical methods
We defined the following populations: the included population, the eligible population, and the eligible and evaluable population. The included population corresponded to all patients included, regardless of eligibility and availability of G8 and MGA results. The eligible population included all patients who did not violate any eligibility criteria. The eligible and evaluable population for diagnostic accuracy assessment was defined as all eligible patients, for whom G8 as well as at least one instrument of MGA were available and were administered less than one month apart (¡ one week).
Diagnostic accuracy was measured by the classification probabilities (sensitivity and specificity), positive and negative predictive values (PPV and NPV), and positive and negative diagnostic likelihood ratios (+/-DLR). The McNemar test was applied to compare the sensitivity of G8 and VES-13, as well as their specificity.
The required sample size was estimated based on our preliminary work [18]. Assuming 90% sensitivity for the G8 tool [18], we calculated that the enrollment of 750 patients with at least one abnormal MGA instrument would allow us to estimate sensitivity with sufficient precision (2.4%) and to obtain 95%CIs between 87.6% and 92.4%. Based on an estimated 50% of patients with at least one abnormal MGA instrument [21], 1500 eligible and evaluable patients were required. Assuming 10% ineligibility, this involved recruitment of 1650 patients.
The study population was described in terms of clinical and demographic characteristics with counts and percentages for qualitative variables and summary statistics (mean and variance where appropriate; percentiles otherwise) for quantitative variables. Sensitivity, specificity, PPV, NPV, +/-DLR and area under the ROC curve were estimated with their (two-sided) 95%CI.
Reproducibility analysis was based on estimation of the Kappa agreement statistics for dichotomous data (normal v abnormal score) [22].
Reproducibility of G8 was assessed by comparing the score on the actual G8 with the scores extracted from the corresponding seven questions of MNA completed during the MGA for all patients. Reproducibility of VES-13 was assessed based on a subgroup of patients included in three pre-identified centers who completed the questionnaire on two occasions. A priori sample size estimation suggested that enrollment of at least 180 subjects would ensure sufficient precision to estimate the reproducibility of VES-13 in this subgroup.
The prognostic value of the screening tools was assessed by analyzing one-year overall survivals using a Cox proportional hazards model. Candidate prognostic factors included age, sex, ECOG PS (Eastern Cooperative Oncology Group performance status), stage (metastatic v non-metastatic), and the G8 score. Significant factors at the univariate stage (p,5%) were subsequently included in a multivariate model. The final model was based on a manual stepwise backward selection approach with statistical significance set at 5%. An exploratory model was also calculated examining the prognostic value of the reference test MGA score. Hazard ratios (HR) are reported with 95%CIs.
Results are presented according to the STARD guidelines [23] for reporting of studies of diagnostic accuracy, and the study protocol is available in S2 Appendix.

Enrollments
Between August 2008 and March 2010, 1674 patients were included in the ONCODAGE study. Initial exclusion of 77 ineligible patients left 1597 patients (eligible population). A further 162 patients were excluded from analyses due to protocol violations, participation withdrawals, missing G8 or MGA (Fig. 1). Delay between G8 and MGA exceeded 37 days in 15 cases, and G8 was inadequately completed by three patients. The final eligible and evaluable population for the principal analyses consisted of 1435 patients with a median age of 78 years and of whom 69.5% were females (Table 1). Patients were mostly seen in first consultations by a medical oncologist (45.6%), surgeons (21.7%), radiotherapists (13.5%), or other cancer specialists (19.0%).
After the first 779 enrollments, we convened an international independent data monitoring committee to examine recruitment across different tumor sites and discuss initial statistical hypotheses. No modification was proposed by the committee of experts.

The multidimensional geriatric assessment reference standard results
On average, it took one hour to complete the MGA overall (67.5 minutes, +/-24.6; range 10 minutes to 3 hours) and it was completed in less than one and a half hours for 75% of patients. Almost all (91.6%) patients completed the seven instruments entirely. Rates of completion varied across instruments from 97.0% for the GDS15 to 99.8% for the ADL and MNA. The proportion of patients with abnormal scores varied from 15.3% for the ADL to 47.8% for the IADL (Table 2). Similar results were found for the eligible population (n51597).
Overall, 1151 (80.2%) eligible and evaluable patients were considered to have an impaired reference test. This was determined for the large majority of patients (1031, 89.6%) who had an abnormal score on one or more of any of the seven available instruments. For the remaining 120 patients (10.4%), the score from at least one instrument was missing and could not be determined. For 85 of these patients, although at least one score was missing, the score on one of the remaining instruments was abnormal so their reference test was considered impaired. For the remaining 35 patients with only five or six available scores, all available scores were normal. Their reference test was considered to be impaired for the purposes of the main analyses (see further discussion and analyses in results). Footnote: *In total, ten G8 were incomplete, but six 'abnormal' scores were able to be imputed from the incomplete assessments. Of the 1031 patients overall with altered scores, 306 (29.7%) had an altered score on one instrument, 236 (22.9%) on two, 173 (16.8%) on three, 132 (12.8%) on four, 94 (9.1%) on five, 73 (7.1%) on six and 17 (1.6%) on seven.
Proportions of subjects with at least one impaired score varied across disease stage (93.0% for metastatic patients (M1) v 75.1% for non-metastatic (M0)) and across tumor site (73.0% prostate, 74.2% breast, 86.6% NHL, and 91 to 92% for colon rectum, lung and UAT/head and neck). At least one geriatric intervention was proposed by the geriatrician at the end of MGA in 72.2% of the cases and between one and four per patient in 64.3% of the population. The most frequently proposed interventions were nutritional support (524 patients, 37.0%), home assistance (499 patients, 35.2%), standard treatment adaptation (345 patients, 24.3%), psychological support (258 patients, 18.3%) and physiotherapy (231 patients, 16.3%).

Validation of the G8 test
In the eligible and evaluable population (n51435), G8 was mostly administered by a nurse or CRA (87%) and less frequently by a physician (12.9%). It took an average of 4.4 minutes to complete (+/22.8, range: 1-60 minutes) with 98.7% completed in ten minutes or less. The final G8 scores ranged from 1.5 to 17, with 68.4% of patients showing impaired scores (#14). The proportions of patients with impaired G8 scores varied according to disease stage (85.6% for M1 and 63.0% for M0) and tumor site (36.9% prostate, 62.9% breast, 70.5% NHL, and 85-88% for colon-rectum, lung and UAT/head and neck).
The diagnostic accuracy of G8 is outlined in Table 3. G8 sensitivity was 76.5% and specificity 64.4%. The AUC compared to the reference standard MGA was 0.804, 95%CI 0.78 to 0.83 (Fig. 2)

G8 diagnostic accuracy by subgroups
In terms of disease stage, sensitivity and PPV were superior for M1 patients than for M0 patients, whereas the specificity and NPV were better for M0 patients   (Table 4). A subgroup analysis explored the influence of the presence or absence of at least one treatment in the last three months. G8 sensitivity or specificity did not appear to be particularly affected by this factor.

Diagnostic accuracy of VES-13
In the eligible and evaluable population (n51435), VES-13 was mainly administered with the assistance of a nurse or CRA (78.7%), with 12.1% being completed by the patient alone, and 8.8% with the assistance of the physician. On average, it took 5.7 minutes to complete (+/23.2, range1-30 minutes), with 98% completed in less than 10 minutes.
VES-13 showed impaired scores in 60.2% of patients. Sensitivity and specificity were 68.7% and 74.3%, respectively ( Table 3). The AUC for VES-13 compared to MGA was 0.79, 95%CI 0.77 to 0.82. Sensitivity and PPV were higher for M1 rather than for M0 patients

Exploratory analyses
Diagnostic accuracy of G8 and VES-13 with the modified standard (two abnormal MGA tests) Using a modified definition of the reference test (two abnormal MGA tests), 56.7% of patients (813 cases) were considered abnormal. G8 sensitivity was 86.5% and specificity 55.3% (Table 3). The AUC was 0.82, 95%CI 0.80 to 0.84. Sensitivity and specificity of VES-13 with this modified reference test were 78.5% and 63.7%, respectively (Table 3).
Sensitivity analysis to assess the impact of missing questionnaires of MGA For 35 out of 1435 patients, scores for at least five instruments were available and normal. However, MGA was considered abnormal due to the one or two not fully completed questionnaires despite normality on all other available instruments. To specifically account for these patients with missing questionnaires, we investigated

Discussion
This is the first study designed to determine the diagnostic accuracy of the G8 questionnaire and, by far, the largest prospective cohort available to validate a screening tool in geriatric oncology with previously published studies including 41-419 patients [24][25][26][27][28][29][30][31][32]. The G8 test proved to be convenient, easy and quick to administer. It was generally completed in less than five minutes and was mostly administered by a nurse with no specific expertise in geriatrics. So far, the G8 tool exists only in French, but can easily be applied in other languages using the official English MNA translation (Appendix S1), or one of the 22 other official translations.
The proportion of G8 impaired scores was 68.4%. In the validation against the reference test (0 v $1 abnormal MGA tests 2 80% of the population), sensitivity, of foremost importance for a screening tool, was good at 76.5% and specificity was satisfactory at 64.4%. With the modified reference test (,2 v $2 abnormal MGA tests 2 56.7% of the population), sensitivity of G8 was improved to 86.5% while specificity was reduced to 55.3%.
The population of the study was homogeneously defined including only firstline cancer treatment patients. As it has been developed in a large number of investigating centers in France, including community hospitals, and as it can be equally administered by a nurse or a physician, the G8 questionnaire can be smoothly implemented in daily practice. However, the large representation of breast cancer patients included in our study (over 50%) needs to be considered when generalizing the results. Consequently, we provide diagnostic accuracy estimations per tumor location in the subgroup analyses. There was a significant proportion of patients (9.1%) with missing metastatic status for whom physicians decided not to perform usual pre-treatment work-up because of low risk of metastasis and age.
Our target population is clearly identified with simple, quantitative criteria: at least one or two abnormal questionnaires among seven consensus geriatric tools. However, while MGA appears the best available and most reproducible instrument to identify it, this target population (reference test) has no unbiased definition. Two different thresholds considering one [18,24,27,29] or two [18,26,[30][31][32][33] abnormal questionnaires have been proposed in the literature with different sets of questionnaires that more or less cover geriatric domains [34]. So far, no objective arguments have been raised that enable us to choose the best threshold [35]. Selecting patients with at least one abnormal questionnaire (80% of patients in our series, 66% to 94% in other published studies) [18,24,27,29] reduces the risk of missing unfit patients but also limits the validity of the screening procedure. With two abnormal tests as the threshold, the target population is smaller (56.7% in our series vs. 43% to 76% in the literature) [18,26,[30][31][32][33], which may enable us to concentrate our efforts on the most vulnerable patients.
While considering sensitivity as the most important criteria (to limit false negative case occurrences), we believe G8 to be the best available tool, although VES-13 remains a good alternative with lower sensitivity but higher specificity. In this study we assessed the performance of VES-13, which is currently the most widely-used screening tool for older cancer patients, although it was originally designed to predict functional decline or death over a two-year period in community-dwelling elders. The higher sensitivity of the G8 screening tool has been reported in the literature. A previous independent report of 113 patients found greater sensitivity for G8 (85.7% v 57.1% for VES-13), although the AUC was not statistically different [33]. A recent systematic review [36] compared all available screening methods to CGA and reported a median sensitivity for VES-13 of 68% (range 39 to 88%), and median specificity of 78% (range 62 to 100%). The median sensitivity for G8 was higher at 87% (range 77 to 92%) with a median specificity of 61% (range 39 to 75%) [36].
VES-13 has been studied as a screening tool in oncology in a number of previous reports [26,30,31,33,37] and two of them concluded that VES-13 could be a useful preliminary screening tool with a sensitivity of 73% and 87% [28,30]. However, similar analyses reported lower sensitivities ranging from 55 to 68.7% [26,31,33,37]. These variations may result from differences in administration. In Luciani et al's study, VES-13 was administered by a physician and thus possibly over a longer time and in more detail [38] than in the present study or others, where VES-13 was administered predominantly by a nurse or CRA.
Additional tools, such as the Barber Questionnaire that was developed as a screening procedure for older adults in general practice [39] are available but results reported for older adults with breast cancer are disappointing [31]. Further geriatric tools have been proposed for screening purposes such as cancer specific geriatric assessment [40], the abbreviated (a)CGA [41], and the Groningen Frailty Index (GFI) [42]. However, overall, most of these instruments have only been presented in feasibility or pilot studies [25], and initial results suggest that they miss too many cases of vulnerable patients [26].
The false negative rate, undetected unfit patients, was lower with G8 than with VES-13: 23.5% v 31.3% patients respectively with MGA, and 13.5% v 21.5% with the modified MGA. The survival analyses demonstrated the strong prognostic role of the G8 score (HR 5 2.72, p,0.0001) along with male sex (HR 2.69, p,0.0001), poor ECOG status (HR 3.28, p,0.0001) and metastatic disease (HR 3.30, p,0.0001). The definition of the reference test was supported by the association observed between impaired MGA and poorer survival at one year in the exploratory survival analyses (HR 2.96, p50.0018). With specificity of 64.4% and NPV of 40.3%, the G8 tool still needs improvement. The somewhat low reproducibility observed for questions such as neuropsychological problems may be an issue for improvement, although part of the explanation may be the delay between the two tests that lasted up to one month.
Considering these results, the G8 questionnaire may be proposed to cancer patients over 70 years. Given its simplicity, this approach may allow physicians to discriminate fit from unfit patients. In this way, fit patients can benefit from standard treatment without extensive evaluation and efforts can be centered on unfit patients who need careful medical attention. For the latter, if resources are available, management should be multidisciplinary and based on appropriate geriatric assessment. If not, they should be offered at least cautious medical attention and case management when advanced practice nurses can be involved.
In summary, no current consensus exists to define target elderly population who may benefit from further medical attention before cancer treatment. Our definition which uses a threshold of one or two abnormal geriatric assessment questionnaires is probably the most reliable up to now and our exploratory survival analyses demonstrated the prognostic value of impaired MGA, but search for a refined reference test remains an issue This study responds to a critical need for easy and quick-to-use screening tools to identify older patients requiring more detailed assessment and possible geriatric interventions. Screening has been recently encouraged for all patients that may benefit from full CGA in a recent SIOG/EUSOMA publication [43]. First and foremost, this study documents the high rate of older patients with geriatric impairments for whom oncologists lack clear directives and skills for practice and care. The G8 screening method recognizes the heterogeneity of the older patient population and provides oncologists with a useful, efficient tool to improve care.
Supporting Information S1 Appendix. The G8 questionnaire.