Combining simple blood tests to identify primary care patients with unexpected weight loss for cancer investigation: Clinical risk score development, internal validation, and net benefit analysis

Background Unexpected weight loss (UWL) is a presenting feature of cancer in primary care. Existing research proposes simple combinations of clinical features (risk factors, symptoms, signs, and blood test data) that, when present, warrant cancer investigation. More complex combinations may modify cancer risk to sufficiently rule-out the need for investigation. We aimed to identify which clinical features can be used together to stratify patients with UWL based on their risk of cancer. Methods and findings We used data from 63,973 adults (age: mean 59 years, standard deviation 21 years; 42% male) to predict cancer in patients with UWL recorded in a large representative United Kingdom primary care electronic health record between January 1, 2000 and December 31, 2012. We derived 3 clinical prediction models using logistic regression and backwards stepwise covariate selection: Sm, symptoms-only model; STm, symptoms and tests model; Tm, tests-only model. Fifty imputations replaced missing data. Estimates of discrimination and calibration were derived using 10-fold internal cross-validation. Simple clinical risk scores are presented for models with the greatest clinical utility in decision curve analysis. The STm and Tm showed improved discrimination (area under the curve ≥ 0.91), calibration, and greater clinical utility than the Sm. The Tm was simplest including age-group, sex, albumin, alkaline phosphatase, liver enzymes, C-reactive protein, haemoglobin, platelets, and total white cell count. A Tm score of 5 balanced ruling-in (sensitivity 84.0%, positive likelihood ratio 5.36) and ruling-out (specificity 84.3%, negative likelihood ratio 0.19) further cancer investigation. A Tm score of 1 prioritised ruling-out (sensitivity 97.5%). At this threshold, 35 people presenting with UWL in primary care would be referred for investigation for each person with cancer referred, and 1,730 people would be spared referral for each person with cancer not referred. Study limitations include using a retrospective routinely collected dataset, a reliance on coding to identify UWL, and missing data for some predictors. Conclusions Our findings suggest that combinations of simple blood test abnormalities could be used to identify patients with UWL who warrant referral for investigation, while people with combinations of normal results could be exempted from referral.

person with cancer referred, and 1,730 people would be spared referral for each person with cancer not referred. Study limitations include using a retrospective routinely collected dataset, a reliance on coding to identify UWL, and missing data for some predictors.

Conclusions
Our findings suggest that combinations of simple blood test abnormalities could be used to identify patients with UWL who warrant referral for investigation, while people with combinations of normal results could be exempted from referral.

Author summary
Why was this study done?
• The risk of an early and late stage cancer diagnosis is increased during the 3 to 6 months following the first record of unexpected weight loss (UWL) in primary care. UWL presents a diagnostic challenge as it is associated with a wide range of other benign and serious conditions.
• Diagnostic strategies that avoid the harms of unnecessary invasive and costly cancer investigation are required for patients with UWL. Our research has shown that the absence of individual or pairs of co-occurring clinical features does not reduce the likelihood of cancer enough to sufficiently rule-out patients from further cancer investigation. It has also identified that primary care clinicians commonly request multiple blood tests when patients present with UWL.
• We aimed to identify whether the presence or absence of risk factors, symptoms, signs, and blood test results could be used together to rule-out more accurately the need for cancer investigation in patients with UWL.
What did the researchers do and find?
• We analysed the electronic health records of 63,693 adults with UWL recorded between January 1, 2000 and December 31, 2012 to derive 3 clinical scores including symptoms, symptoms and test results, and test results, to predict the risk of cancer within 6 months.
• The scores including test results were discriminative between patients with and without cancer, were well calibrated at the levels of risk that decisions to investigate are made in primary care, and showed superior clinical utility compared to the symptoms-only model.

Introduction
Unexpected weight loss (UWL) is a presenting feature of cancer for which there remains no consensus on the most appropriate investigation strategy in primary care [1]. Patients with UWL recorded by their primary care clinician are more likely to be diagnosed with the following cancers within 3 months: pancreatic, cancer of unknown primary, gastro-oesophageal, lymphoma, hepatobiliary, lung, bowel, and renal tract [2]. This association is greatest in males once aged 60 years or older and in females 80 years or older [2,3]. Current investigation guidelines focus on selecting patients for single-site cancer investigation based on simple combinations of clinical features (individual risk factors, signs, symptoms, and blood test abnormalities) [3][4][5]. As most patients presenting to primary care with UWL will not have cancer, diagnostic strategies that avoid the harms of unnecessary invasive and costly investigation are also required for patients at a low risk of cancer [1]. Our previous work has shown that the presence of individual co-occurring clinical features increases the likelihood of cancer sufficiently to rule-in cancer investigation [5]. However, the absence of individual co-occurring clinical features, including pairs of normal inflammatory markers, do not reduce the likelihood of cancer sufficiently enough to rule-out patients from further cancer investigations [5,6]. Primary care clinicians commonly request multiple blood tests when patients present with UWL [5,7]. There is little guidance on how clinicians should interpret these blood tests in combination or which are most relevant for use in clinical practice [1,6]. When baseline investigations are normal, a watchful waiting approach may be preferable to invasive testing [8].
Prediction models have been developed to identify the most helpful combinations of clinical features for use in clinical practice [9,10]. However, these studies were based on small cohorts from secondary care; they recommend conflicting approaches and include some investigations uncommon in primary care. Research using data from primary care is therefore needed to investigate whether the absence of risk factors and co-occurring clinical findings in the context of normal test results could reduce the risk of cancer to sufficiently rule-out patients with UWL from invasive cancer investigation.
We aimed to derive and internally validate prediction models using co-occurring risk factors, symptoms, signs, and blood test data to identify those clinical features that could be used together to stratify cancer risk in patients attending primary care with UWL.

Methods
The protocol was approved by the Independent Scientific Advisory Committee (ISAC) of the MHRA (protocol number 16_164A2A) [11]. Ethics approval for observational research using the CPRD with approval from ISAC was granted by a National Research Ethics Service committee (Trent Multiresearch Ethics Committee, REC reference number 05/MRE04/87). We followed the TRIPOD (S1 TRIPOD Checklist) reporting guidelines [12]. Stata (version 15) was used for all analyses.

Cohort design and population
We selected a cohort of patients with UWL indicated by the presence of a code for UWL previously been shown to be linked to measured weight loss [13,14]. Patients were selected for the derivation cohort if UWL was first coded between January 1, 2000 and December 31, 2012 in the Clinical Practice Research Datalink (CPRD). The CPRD is an anonymised database of primary care records database covering a representative 6.9% of the United Kingdom population [15]. Patients were included if they were �18 years of age, registered with a CPRD general practice, eligible for linkage to NCRAS and Office for National Statistics (ONS) data, and at least 12 months of data before their first UWL code (the "index date"). These UWL Read codes equated to a mean weight loss of �5% within a 6-month period in our previous internal validation study of weight-related coding in CPRD [13]. UWL may be coded following a range of clinical scenarios, including UWL reported as the patient's presenting complaint, after targeted history taking, following weight measurement as part of the clinical examination, or as part of a routine health check or chronic disease review [5]. Patients were excluded if they had a prescription of weight-reducing medication (orlistat) or a code for bariatric surgery in the previous 6 months, or if they had been previously diagnosed with cancer.

Outcome definition
The outcome was any cancer diagnosed within 6 months of the index date identified in the CPRD or NCRAS, using an existing library of codes [2]. Patients were followed up until the date of the first cancer diagnosis or for 6 months, whichever occurred first. Six months was chosen as previous research has shown that this is the period associated with an increased risk of cancer diagnosis following a presentation of UWL to primary care [2]. Cancers classified as nonmelanoma skin cancer, in situ, benign, ill-defined, or uncertain were excluded.

Predictor variables
Sociodemographic features, recorded on or before the index date, were extracted for each patient (Table 1). Preexisting comorbidities were identified using a previously described approach [13]. Clinical features shown to be associated with cancer when recorded in the 3 months before to 1 month after the UWL date were identified within that time period [5]. Continuous results of blood tests commonly requested within this time period were also identified using entity codes in CPRD, and outliers and erroneous results were dropped [5] ( Table 2).

Multiple imputation
Multiple imputation was used to replace missing values for smoking status, alcohol intake, body mass index, and blood tests using the mi suite of commands in Stata [16,17]. Fifty imputed datasets were created. Multinomial logistic regression was used to impute categorical variables, and predictive mean matching (PMM) with 5 donors was used to impute continuous variables [18]. The imputation model included the outcome, all candidate variables to be included in the final predictions models, and auxiliary variables to increase the likelihood that the missing at random assumption was satisfied. These were a combination of variables found to predict missingness, personal characteristics, comorbidities, risk factors, other markers of inflammation, or full blood count components (S1 Text). For the primary analysis, continuous test results were dichotomised as abnormal/normal in each imputed dataset using standard laboratory ranges (S1 Table). Rubin's rules were used to combine results across the imputed datasets [16].

Model derivation
Three prediction models were derived in the complete dataset: a symptoms-only model (Sm), a symptoms and tests model (STm), and a simple tests-only model (Tm). The mim Stata command was used to select variables for each model in the imputed data using backwards stepwise logistic regression, using a p-value of <0.01 for inclusion. Candidate variables for Sm included age-group, sex, smoking status, and clinical features found to be associated with a cancer diagnosis within 6 months in males and females (Table 2) [5]. Candidate variables for the STm also included the blood tests most commonly requested by GPs in patients with UWL and tests used in prognostic scores for patients with cancer (Table 3) [5,19,20]. For the Tm, candidate variables included age-group, sex, smoking status, and the blood tests, and as we intended to derive a parsimonious model, we chose the quantum over component tests; for example, the total white cell count was included rather than the white cell subtypes ( Table 4). The most complex model (STm) had at least 15 events per variable [21].

Internal cross-validation
Ten-fold internal cross-validation was used to assess overall model performance using the mean predicted probability for each patient across all 50 imputation datasets with the cvauroc command in Stata [22]. Model performance was assessed using discrimination and calibration statistics. Discrimination was quantified using the area under the curve (AUC) and 95% confidence intervals calculated using bootstrap resampling. Calibration plots were generated using Stata's pmcalplot command to assess how the predicted probabilities derived by each model correspond to the observed proportion of patients diagnosed with cancer [23].

Decision curve analysis
We then used decision curve analysis (DCA) to compare the standardised net benefit (SNB) and proportion of investigations avoided by the Sm, STm, and Tm with scenarios where no prediction model was used (i.e., treat everyone or treat nobody) across a range of risk thresholds (threshold probabilities) using the dca command in Stata [24]. In general, the strategy with the highest net benefit (the highest plotted curve) is considered to have the greatest clinical utility at any given risk threshold [25]. Net benefit represents the proportion of the studied population with true positive results minus the proportion with false positives multiplied by the odds of cancer at each risk threshold. To ease interpretation, we calculated the SNB to give the proportion of the maximum achievable utility attained by each model (SNB = NB / prevalence of cancer) [26]. An alternative presentation of DCA is the proportion of patients who would avoid further investigation without missing a cancer diagnosis at each risk threshold [27].

Clinical risk scores
Finally, to demonstrate how these models could be used in clinical practice, we followed established methods to develop 2 simple clinical risk scores for the STm and Tm [28]. The risk score associated with each variable was derived by multiplying each coefficient by the same conversion factor and rounding the result to the nearest whole number. We calculated the mean point score for each patient across the imputation datasets and constructed a 2 × 2 table using each total score as the cutoff. We calculated the sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), positive predictive value (PPV), and negative predictive values (NPVs) for each score. Candidate covariates included in the backwards stepwise selection procedure using p < 0.01 to retain covariates included the following: sex, age-group, smoking status, abdominal mass, abdominal pain, appetite loss, back pain, chest pain (noncardiac), chest signs, change in bowel habit, dyspepsia, dysphagia, iron deficiency anaemia, jaundice, lymphadenopathy, venous thromboembolism, albumin, alkaline phosphatase, bilirubin, C-reactive protein, erythrocyte sedimentation rate, haemoglobin, liver enzymes (AST/ALT), lymphocytes, mean cell volume, monocytes, neutrophils, and platelets. � Conversion factor = 0.32. The conversion factor is used to translate coefficients into the risk score, rounded to the nearest integer.

Sensitivity analysis
We refitted the models using the missing indicator method to assess our approach to multiple imputation. We refitted the models using continuous blood test results to explore the impact on discrimination and calibration statistics. We used Stata's mfpmi command to select the most appropriate functional form for each continuous covariate in relation to the outcome [29].

Study participants
In the derivation cohort of 63,973 adults aged �18 years with UWL recorded, 908 (1.4%) were diagnosed with cancer within 6 months of the index date, of whom 902 (99.3%) were aged �40 years.  the same, and the confidence intervals overlapping when comparing variables included in the final imputed and missing indicator models.

Model development
In the final Sm, 12 of 17 candidate variables were associated with cancer (Table 2), of which concurrent jaundice (adjusted odds ratio 6.62 (95% CI 3.77 to 11.63)) and lymphadenopathy (4.67 (2.08 to 10.47)) were most predictive. Out of 29 candidate variables, 17 were retained in the final STm (Table 3), of which a raised C-reactive protein (4.09 (3.24 to 5.16) and concurrent jaundice (2.33 (1.27 to 4.28)) were most predictive. Out of 12 candidate predictor variables, 9 were retained in the final Tm (Table 4), of which raised C-reactive protein (4.50 (3.60 to 5.64)) and raised liver enzymes (2.02 (1.51 to 2.71)) were most predictive.

Internal validation
The AUC for both the STm (0.92 (0.91 to 0.93)) and the Tm (0.91 (0.90 to 0.92)) showed discrimination, which was superior to the Sm (0.79 (0.78 to 0.81)) ( Table 5). However, the calibration statistics showed that the Sm was better calibrated compared to the STm and Tm. The calibration plots showed that the difference in calibration statistics was mainly due to underprediction in the highest decile of risk for the STm and Tm that was not seen for the Sm (Fig  1). Refitting the STm and Tm with continuous blood test results instead of dichotomised blood test results made negligible difference to model performance (Table 5, S2 Fig, S2 Table).

Decision curve analysis
The STm had greatest clinical utility (Fig 2A). The STm had higher SNB than the Sm for the risk thresholds of 0.4% to 18%, and the Tm had greater net benefit to the Sm for 0.5% to 15% (Fig 2A, S3 Table). At a cancer risk threshold of 1%, these differences translate into a 55% reduction in further cancer testing if using the STm compared to investigating all patients, a 19% reduction compared to using the Sm, or a 2% reduction compared to the Tm (Fig 2B, S3 Table).

Examples of applying the clinical risk scores
Figs 3 and 4 show diagnostic accuracy statistics corresponding to each possible point score for the STm and the Tm, respectively. S4 and S5 Tables show how these statistics apply to 100,000 patients with UWL for the STm and the Tm risk score thresholds, respectively. Box 1 gives An AUC of 0.5 represents chance, and 1 represents perfect ability to discriminate between patients who will and patients who will not be diagnosed with cancer [47]. Perfect calibration has a calibration slope of 1, a CITL of 0, and an O:E ratio of 1 [41]. examples of how the STm score might be used in clinical practice, for example, by choosing the optimal threshold to sufficiently rule-out further cancer investigation.

Summary of findings
Combinations of multiple simple test results were discriminative between patients with and without cancer, were well calibrated at the levels of risk that decisions to investigate are made in primary care, and showed superior clinical utility when compared to symptoms and signs. We present stand-alone risk scores that could be used by GPs to guide test selection and interpretation. The simplest includes age-group, sex, and 7 primary care blood tests (alkaline phosphatase, liver enzymes, albumin, C-reactive protein, haemoglobin, platelets, and total white cell count). They could be used to select patients with UWL who do not warrant further cancer investigation in addition to those that do.

Strengths and limitations
Our study design aimed to minimise bias. We excluded patients with objective evidence of intentional weight loss, restricted co-occurring clinical features to the time of the UWL presentation [5], and included only the first UWL code [2,30]. We were reliant on electronic health record (EHR) codes to define UWL as weight is not recorded frequently enough [13]. It is unclear how recording bias relates to coding for UWL, which occurs when GPs preferentially code clinical features that they associate with cancer and can lead to inflated estimates of association for these features [31]. We excluded patients with a past history of cancer and only Decision curve analysis comparing the three models in terms of net benefit (Fig 2a) and investigations avoided (Fig 2b). DCA, decision curve analysis.
https://doi.org/10.1371/journal.pmed.1003728.g002 included cancers coded within 6 months of the UWL date to ensure that we investigated a first diagnosis of a cancer associated with UWL. By utilising multiple imputation to replace missing risk factor and continuous test result data, we could produce precise estimates for combinations of multiple covariates. Previous studies have not done this and have had to focus on single or pairs of blood test abnormalities [3]. We included auxiliary variables to increase the likelihood that missing values could be accurately predicted by the observed data (that they are missing at random) [16]. However, there is no established method to formally evaluate whether this was successful. Imputation allowed us to combine multiple blood test results and to show that once blood tests are modelled with sex and age, there appears to be no need to include additional risk factors and clinical features.
We dichotomised each blood test for the primary analysis to derive a simple risk score for use in clinical practice using the upper or lower boundary of the normal reference range. This can have limitations. Firstly, by dichotomising a continuous variable, information is lost by grouping slightly and extremely abnormal results together. Secondly, choosing raised values to define abnormal might be unhelpful for cancer sites associated with low values (and vice versa). Thirdly, we chose the upper limit of the normal range to define abnormal for blood tests where there is no consensus on how to define abnormal. Refitting the models to include continuous linear and fractional polynomial terms made no meaningful difference to model performance.
We required a testing strategy appropriate for a composite of all cancer types. The literature reports that the direction of blood test abnormalities are similar for most cancers and that a pro-inflammatory state underpins many cancers and cancer cachexia [19,20,[32][33][34][35][36][37][38][39]. While this supports our approach, it remains likely the composite cancer outcome is partly responsible for the underprediction observed at the highest decile of risk. It is unlikely that this would have adverse clinical consequences because GPs' decision to refer for invasive testing is likely to be triggered at lower thresholds than this.
We used 10-fold internal cross-validation to derive estimates of predictive performance [40]. However, internal validation may produce overoptimistic estimates and so external validation remains necessary to assess the generalisability of our findings in settings, populations, and subgroups of interest [41,42]. Primary care data will be identified from alternative clinical systems for the same time period or from the same clinical systems for a later time period to account for variation in UK practice, international settings with alternative approached to weight measurement and weight loss recording and where a different spectrum of patients consults with primary care, and in systems where similar blood tests are used with differing degrees of missingness.

Findings in context
One previous study developed a risk score to predict cancer in a cohort of 256 patients referred for the investigation of UWL (AUC 0.90 (95% CI 0.88 to 0.92)) including the following: age �80 years, white blood cells, albumin, alkaline phosphatase, and lactate dehydrogenase [10]. The AUC was notably lower when externally validated (0.70 (0.61 to 0.78) in a cohort of 290 consecutive patients referred to hospital with UWL [9]. This study also developed a simpler 3-variable score that included age, alkaline phosphatase, and albumin and gave an AUC of 0.74 (0.66 to 0.81) [9]. The models we developed here produced higher AUCs than these models and, more importantly, include a cohort of all patients presenting to primary care with UWL, not those referred to hospital for further investigation of UWL.
Two existing primary care prediction models incorporate multiple symptoms and risk factors to estimate cancer risk over a 2-year period for 1.24 million females (AUC 0.85 (95% CI 0.84 to 0.85)) and 1.26 million males (0.87 (0.88 to 0.89)) aged 25 to 89 years [43,44]. They also demonstrate good calibration at the lower deciles of risk and miscalibration at the highest decile of risk. The relative timing of symptoms was not reported. Blood tests were not included, except haemoglobin results in the 12 months before to 2 months after study entry were used to define anaemia as a baseline risk factor. Consequently, the design and reporting of these models make it impossible to understand the diagnostic value of symptoms and blood tests cooccurring with UWL.

Implications for research and clinical practice
DCA allowed us to demonstrate the importance of simple tests in comparison to symptoms and signs. It was used to define how risk thresholds related to the probable "yield" of referrals in diagnosing cancer through hospital-based invasive investigation [25] However, DCA cannot itself define the acceptable balance between the number of people referred to each person with cancer referred or between the number of people spared referral to each person with cancer not referred These trade-offs are for patients, clinicians, and society at large to decide and could be evaluated further in health economic analysis. The examples shown in Box 1 illustrates 4 examples of how a risk score could inform these decisions. Moreover, these examples could help GPs develop watchful waiting strategies for patients with intermediate risks of having cancer, perhaps scheduling periodic review and blood test reevaluation to examine whether the risk has changed.
There is a dearth of research on how risk scores for cancer are best adopted into clinical practice. We intended to derive an intuitive risk score that mirrors clinical practice by focussing on simple combinations of clinical features including commonly used and available blood tests. Each score could be completed by GPs by hand, used as an online calculator, or integrated as into the EHR. However, the limited literature shows that risk scores are underused, and GPs find predictions difficult to interpret or are distrustful of them, especially when they conflict with intuitive clinical judgement [45,46]. Further research is therefore required to understand their uptake and to assess whether their use impacts on cancer outcomes. Box 1. Consequences of applying the STm score in clinical practice (Tables 3 and S4) Clinical example A 52-year-old woman with UWL, no other clinical features, a low albumin, high alkaline phosphatase, and a raised C-reactive protein corresponds to an STm score of 4: sensitivity 93.9%, specificity 68.4%. At this threshold, 23 people would be referred for each person with cancer referred, and 784 people would be spared investigation for each person with cancer not referred. Per 100,000 people with UWL, 1,333 with cancer would be referred, 31,152 would be referred unnecessarily, 67,429 correctly spared referral, and 86 people with cancer would not be referred.

Example of maximising sensitivity to rule-out cancer
An STm score of 1 prioritises ruling-out cancer by maximising sensitivity: sensitivity 98.6%, specificity 43.4%. At this threshold, 40 people with UWL would be referred for each person with cancer referred, and 2,139 people with UWL spared referral for each person with cancer not referred. Per 100,000 people with UWL, 1,399 people with cancer would be referred, 55,797 people would be unnecessarily referred, 42,784 correctly spared referral, and 20 people with cancer not referred.

Example of a threshold to balance ruling-in and ruling-out cancer
An STm score of 6 would balance ruling-in (PLR 5.09) and ruling-out (NLR 0.16) the need for referral: sensitivity 86.3%, specificity 83.0%. At this threshold, there would be 14 people referred for investigation for each person with cancer referred, and 422 would be spared investigation for each person with cancer not referred. Per 100,000 people with UWL, 1,225 people with cancer would be referred, 16,759 patients would be unnecessarily referred, 81,822 correctly spared referral, and 194 people with cancer not referred.
Example of a threshold close to the NICE PPV threshold of 3% An STm score of 2 is the closest to a PPV to 3%, the threshold chosen by NICE to warrant further investigation: sensitivity 97.9%, specificity 50.9%. At this threshold, 35 people would be referred for each person with cancer referred, and 1,730 patients would be spared investigation for each person with cancer not referred. Per 100,000 patients with UWL, 1,390 people with cancer would be referred, 48,403 patients would be unnecessarily referred, 50,178 correctly spared referral, and 29 people with cancer not referred.