Development and validation of a prediction model for adenoma detection during screening and surveillance colonoscopy with comparison to actual adenoma detection rates

Objective The adenoma detection rate (ADR) varies widely between physicians, possibly due to patient population differences, hampering direct ADR comparison. We developed and validated a prediction model for adenoma detection in an effort to determine if physicians’ ADRs should be adjusted for patient-related factors. Materials and methods Screening and surveillance colonoscopy data from the cross-sectional multicenter cluster-randomized Endoscopic Quality Improvement Program-3 (EQUIP-3) study (NCT02325635) was used. The dataset was split into two cohorts based on center. A prediction model for detection of ≥1 adenoma was developed using multivariable logistic regression and subsequently internally (bootstrap resampling) and geographically validated. We compared predicted to observed ADRs. Results The derivation (5 centers, 35 physicians, overall-ADR: 36%) and validation (4 centers, 31 physicians, overall-ADR: 40%) cohort included respectively 9934 and 10034 patients (both cohorts: 48% male, median age 60 years). Independent predictors for detection of ≥1 adenoma were: age (optimism-corrected odds ratio (OR): 1.02; 95%-confidence interval (CI): 1.02–1.03), male sex (OR: 1.73; 95%-CI: 1.60–1.88), body mass index (OR: 1.02; 95%-CI: 1.01–1.03), American Society of Anesthesiology physical status class (OR class II vs. I: 1.29; 95%-CI: 1.17–1.43, OR class ≥III vs. I: 1.57; 95%-CI: 1.32–1.86), surveillance versus screening (OR: 1.39; 95%-CI: 1.27–1.53), and Hispanic or Latino ethnicity (OR: 1.13; 95%-CI: 1.00–1.27). The model’s discriminative ability was modest (C-statistic in the derivation: 0.63 and validation cohort: 0.60). The observed ADR was considerably lower than predicted for 12/66 (18.2%) physicians and 2/9 (22.2%) centers, and considerably higher than predicted for 18/66 (27.3%) physicians and 4/9 (44.4%) centers. Conclusion The substantial variation in ADRs could only partially be explained by patient-related factors. These data suggest that ADR variation could likely also be due to other factors, e.g. physician or technical issues.


Introduction
Colonoscopy combined with polypectomy, when necessary, has been shown to decrease colorectal cancer (CRC) incidence [1] and CRC-related mortality. [1,2] However, the protective benefit is reduced by the occurrence of interval-CRC, i.e. CRC occurring within the colonoscopy surveillance interval. Three main reasons for the occurrence of interval-CRC have been suggested in literature, namely: 1) missed lesions during colonoscopy (accounting for approximately 50-60% of the cases), 2) incomplete resection, and 3) newly developed cancers. [3] The proportion of patients undergoing colonoscopy in which at least one adenoma is detected, the adenoma detection rate (ADR), has been shown to be inversely associated with the development of interval-CRC. [4,5] Quality improvement in colonoscopy therefore aims, among other things, at increasing and thereby achieving sufficient physicians' ADRs. American and European guidelines recommend thus ADRs to be !25%, [6,7] i.e. !30% in male and !20% in female patients. [7] The influence of several modifiable factors, such as procedural and technological factors, on ADR have been studied. [8] Moreover, training programs to improve the ADR have been developed and evaluated. [9,10] Nevertheless, ADRs vary widely between physicians, [11] possibly caused by patient population differences. ADR comparison between centers and physicians is therefore challenging and even more complicated by the strict domain in which the ADR should be calculated, i.e. for patients at average risk of adenoma detection in a screening population. Average risk individuals are those without additional risk factors for adenoma detection, e.g. a family history of CRC, a personal history of CRC or colorectal adenomas, and are not preselected for colonoscopy through, for example, stool tests, e.g. fecal immunochemical tests, or other diagnostic tests. Several initiatives aim to provide physicians with feedback on their ADR through online databases, such as the Gastro-Intestinal Quality Improvement Consortium (GIQuIC). [12] The feedback could be improved if next to these unadjusted ADRs the expected ADR for a physician's patient population could be predicted.
Therefore, the aim of the present study was to develop and validate a prediction model for colorectal adenoma detection based on patient risk factors in a screening and surveillance population. The secondary aim was to compare the observed individual physicians' and centers' ADRs to the predicted proportion of patients with !1 adenoma based on the developed model.

Materials and methods
The present study has been performed and reported according to the TRIPOD statement for the reporting of multivariable prediction models (S2 File). [26] Data source We used the data from the cluster-randomized cross-sectional Endoscopic Quality Improvement Program-3 (EQUIP-3) study (Clinical Trials Registration: NCT02325635) that ran from September 2013 until January 2015. [9] The EQUIP-3 study was approved by the Mayo Clinic Institutional Review Board and was considered minimal risk and exempt from patient-level consent. [9] In the EQUIP-3 study centers were randomized, after a lead-in phase, to receive a quality improvement program aimed at increasing the ADR or no intervention. During the lead-in phase data on all colonoscopies performed in all participating centers was collected enabling analyses of the colonoscopy quality metrics without the influence of the quality improvement program. The centers randomized to receive the quality improvement program, received the first EQUIP training as described in the EQUIP-1 study, [27] consisting of baseline measurement of ADR, followed by an in-person 1-hour powerpoint based training emphasizing on improvement of adenoma detection and flat lesion recognition. Furthermore in these centers posters about EQUIP were placed in each endoscopy room, and the ADR of all endoscopists was one-on-one discussed, typically with low performers. Each center and individual then received regular follow-up ADR reports, approximately monthly during the post-intervention phase. [9] Data was collected through the GIQuIC-form (S1 File) [12] by the physician or nurse at the end of the procedure. Pathology results were entered subsequently when available. Predictor assessment was thus blinded for the outcome.

Sample split
The original dataset was split into two cohorts based on the performing center. This enabled validation of the model in geographically different centers, which is a second-best after fully external validation. In short, in the first cohort, i.e. the derivation cohort, the model will be developed and internally validated. Subsequently, the fitted model will be geographically validated in the second cohort, i.e. the validation cohort. The number of patients, centers, physicians, and randomization to the intervention were balanced between the derivation and validation cohort.

Participants
Sixty-six physicians from nine centers in the United States, i.e. within California, Illinois, Indiana, New Mexico, New York, Ohio, Tennessee, and Virginia, participated. Patients undergoing outpatient colonoscopy were included in the EQUIP-3 study if 1) they did not have a history of colorectal surgery; 2) indication for colonoscopy was screening or surveillance, as indicated in the GIQuIC database; and 3) bowel preparation was adequate, i.e. sufficient to accurately detect polyps !6 mm.
In the present study we excluded patients: 1) <50 years, because screening is recommended !50 years; 2) with a known increased risk of colorectal neoplasia who are normally discarded in ADR calculations, i.e. colonoscopy for a high-risk genetic CRC syndrome or surveillance colonoscopy because of inflammatory bowel disease; and 3) a personal history of CRC, because these patients will probably have undergone colorectal surgery. Screening and surveillance colonoscopies were both included, because these are performed side-to-side in daily practice and might both be used in ADR calculations. Furthermore, a prediction model for adenoma detection in both a screening and surveillance population potentially facilitates ADR comparisons across physicians and centers in both screening and surveillance settings.
A minuscule proportion (N = 53, 0.2%) of colonoscopies were probably performed in a patient already included in the study, we therefore excluded these second colonoscopies from the database.

Outcome
The outcome of the prediction model was the detection of !1 histologically confirmed colorectal adenoma per patient. The histological assessment was performed in daily practice, and thus not completely blinded for patient factors, e.g. sex and age of the patient. However, the pathologist did not have access to the GIQuIC form and was consequently blinded to factors such as BMI, ASA class, and race and ethnicity, and it is therefore unlikely that these factors could have influenced the pathologists' judgment.
As a proxy for clinical condition and co-morbidities the American Society of Anesthesiology physical status (ASA) was used [28]. Because no ASA V patients and only a small number of ASA IV patients were included, we categorized this variable as: "ASA I", "ASA II" and "ASA III or IV".
The indication of colonoscopy was surveillance, i.e. a personal history of colorectal adenomas or surveillance marked as indication on the GIQuIC form, or colorectal cancer screening. Family history, i.e. !1 first-degree relative <60 years diagnosed with the condition, of colorectal adenomas and family history of CRC were analyzed as dichotomous variables. The precolonoscopic risk on adenoma detection, i.e. high or low, was not included as a possible predictor due to possible multicollinearity with indication, family history of CRC and family history of colorectal adenomas.

Statistical analysis
All statistical analyses were performed with R language environment for statistical computing version 3.1.3. [29] Sample size. We refer to the original paper for sample size calculations. [9] The number of patients with !1 adenoma detected exceeded ten per possible predictor considered, and should therefore be sufficient for the analysis.
Missing data. To correct for possible errors with data-entry on the GIQuIC-form we recoded the following implausible values as missing: BMI <15 or >55kg/m 2 , height <140 or >220cm, weight <40 or >250kg. We assumed missing data of these variables to be missing at random, in other words the fact that the data is missing is not related to the value that is missing. [30] Multiple imputation based on iterative (10 iterations) chained equations with predictive mean matching was performed, creating 20 multiple imputed datasets, using the MICEpackage for R. [31] The multiple imputation procedure was performed based on center, physician, all possible predictors and detection of: !1 adenoma, !3 adenomas, any polyp, advanced adenoma(s), adenocarcinoma(s), and serrated lesion(s). [32] The imputed values for BMI were calculated from the imputed height and weight.
We assumed patient's race and ethnicity being categorized as "unknown or patient declined to provide" to be missing not at random, i.e. we expect that there is a reason why these values are not filled out in the GIQuIC database, and we therefore retained this category in the modeling process.
Descriptive statistics. The number of patients per center and per physician is presented as medians and ranges. Continuous baseline characteristics are presented as mean ± standard deviation, and categorical data as frequencies with proportions. The proportion of patients with !1 adenoma detected, i.e. the ADR, is reported per subgroup.
Univariable odds ratios (ORs) including 95%-confidence intervals (CIs) for the detection of !1 adenoma were estimated for all possible predictors based on logistic regression modeling. An odds ratio was regarded statistically significant if the 95%-CI did not include 'one'.
Model development and internal validation within the derivation cohort. A multivariable logistic regression model with detection of !1 adenoma as outcome was fitted with the following possible predictors: age, sex, BMI, race, ethnicity, ASA class, indication, family history of CRC, family history of colorectal adenomas, interaction between race and sex, and interaction between ethnicity and sex. The model estimates were adjusted for randomization to a quality improvement intervention by adding the dichotomous variable "colonoscopy performed after intervention received versus no intervention received" to the model. The final model was selected using stepwise backwards selection based on Akaike's information criterion.
The model's discriminative ability was assessed with the apparent C-statistic, which is equivalent to the area under the receiver operating characteristic curve. Subsequently internal validation was performed using 1000 bootstrap resamples per imputed dataset to calculate the shrinkage factor and optimism-corrected C-statistic. Regular bootstrap resampling without taking clustering into account was performed. [33] The optimism-corrected model coefficients and intercept were calculated based on the shrinkage factor. The model calibration was visually assessed with a calibration plot. Geographical validation and model-update within the validation cohort. Because we expected differences in the baseline adenoma prevalence and overall effect of predictors due to geographic variation we performed logistic re-calibration by fitting a logistic regression model with the original model's linear predictor as the independent variable and detection of !1 adenoma as the dependent variable. The intercept and coefficient of this new logistic model are the re-calibrated intercept and slope-correction, respectively. [34] The discriminative ability (C-statistic) and calibration (with a calibration plot) of the updated model were assessed.

Predicted proportion of patient with !1 adenoma detected compared to actual ADRs
In both cohorts the probability for the detection of !1 adenoma per patient was predicted with the final model. The sum of the predicted probabilities of adenoma detection per patient was calculated resulting in the predicted proportion of patients with !1 adenoma per physician and center. These predicted proportions were graphically compared to the physicans' and centers' observed ADR, i.e. the proportion of patients with !1 adenoma detected, including 95% Wilson CIs. The observed ADR was considered considerably lower than predicted if the upper bound of the 95% CI was lower than the predicted ADR, and considerably higher than predicted if the lower bound of the 95% CI was higher than the predicted ADR.

Study population
The 22316 patients were divided between the derivation and validation cohort. In the derivation cohort 9934 patients (1223 patients were excluded) were examined by 35 physicians in five centers (Fig 1). In the validation cohort 10034 patients (1125 patients were excluded) were examined by 31 physicians in four centers.

Missing values
No outcome data were missing. A substantial proportion (28.2-35.1%) of patients had missing values for height, weight or BMI (Fig 1).

Baseline characteristics and ADR per subgroup of patients
The provider and patient characteristics, and ADR per variable are summarized in Table 1. The reasons for being at high risk of adenoma detection are summarized in S1 Table. Within the derivation cohort the mean age was 60.2 years, 47.6% were male, and the mean BMI was 28.3 kg/m 2 . The overall ADR was 35.9%, with ADRs for female (29.6%) and male (42.9%) patients above the recommended thresholds [7] of !20% and !30% respectively. Within the validation cohort the patients were slightly older (mean age 61.0 years), almost the same proportion was male (47.8%), and patients had a higher mean BMI (29.2 kg/m 2 ). The overall ADR (40.0%) was higher, and again the ADR for female (34.9%) and male (45.6%) patients was above the recommended thresholds [7].
Regarding the baseline characteristics the derivation and validation cohort differed the most with respect to the included proportion of African-American and Hispanic or Latino patients, and colonoscopies with screening as indication (Table 1).

Univariable association between predictors and detection of !1 adenoma
In the derivation cohort statistically significant positive predictors for detection of !1 adenoma were age (per year increase), male sex, BMI (per kg/m 2 increase), ASA class II compared to class I, ASA class III or IV compared to I, Hispanic or Latino ethnicity and surveillance compared to screening as indication ( Table 2).
Some univariable associations within the validation cohort clearly differed from those observed in the derivation cohort. The OR of ASA class III or IV compared to I was lower and statistically non-significant. The OR of family history of CRC was above 1 in the validation cohort, but was still non-significant. The OR for African-American, Asian, and Hispanic or  Five centers were included in the derivation cohort, and four centers in the validation cohort. c Thirty-five physicians were included in the derivation cohort, and thirty-one physicians were included in the validation cohort.  Table 1 (S1 Table). Latino patients was <1 within the validation cohort, while these variables had an OR >1 in the derivation cohort.

Model development and validation within the derivation cohort
After stepwise backwards selection the following patient-related predictors were selected in the multivariable model (Table 3): age, sex, BMI, ASA class, indication (surveillance versus screening), and Hispanic or Latino ethnicity. The discriminative ability was modest (apparent C-statistic: 0.630 and optimism-adjusted C-statistic: 0.626). Calibration of the model was visually accurate, however, the model overestimated the probability of the detection of !1 adenoma among patients with a low observed and especially high observed adenoma detection (Fig 2A).

Geographical validation within the validation cohort
The re-calibrated model with an intercept of -2.406 and overall slope correction of 0.757 (final model estimates not shown) had a modest discriminative ability (C-statistic 0.603). The model tended to overestimate the adenoma detection among patients with a high observed adenoma detection (Fig 2B).

Predicted proportion of patients with !1 adenoma compared to observed ADR
Within the derivation cohort the median observed ADR was 35.2% (range: 20.0-52.5%) per physician and 36.9% (range: 27.9-39.5%) per center. Six physicians and no center had an ADR below the recommended threshold [6,7] of !25%. The median predicted proportion of patients with !1 adenoma was 36.7% (range: 31.0-40.2%) per physician and 36.5% (range: 35.0%-38.0%) per center. The upper bound of the 95%-CI of the observed ADR was lower than the predicted proportion of patients with !1 adenoma for five physicians and one center. The lower bound of the 95%-CI of the observed ADR was higher than the predicted proportion of patients with !1 adenoma for five physicians and one center (Figs 3 and 4A).
Within the validation cohort the median observed ADR was 41.9% (range: 4.2-65.0%) per physician and 43.4% (range: 23.5-51.9%) per center. Five physicians and one center had an Prediction of adenoma detection rates during screening and surveillance colonoscopy  Prediction of adenoma detection rates during screening and surveillance colonoscopy observed ADR <25%. The median predicted proportion of patients with !1 adenoma was 39.1% (range: 31.9-44.2%) per physician and 39.1% (range: 36.4-41.7%) per center. The upper bound of the 95%-CI of the observed ADR was lower than the predicted proportion of patients with !1 adenoma for seven physicians and one center. The lower bound of the 95%-CI of the observed ADR was higher than the predicted proportion of patients with !1 adenoma for 13 physicians and three centers (Figs 5 and 4B). Prediction of adenoma detection rates during screening and surveillance colonoscopy Overall, there was no obvious trend of smaller sample sizes per physician or center at the extremes of observed ADRs. The predicted proportion of patients with !1 adenoma varied less than the observed ADR.

Model performance including physician as predictive factor
Given the moderate discriminative ability of the prediction model, a sensitivity analysis was performed in which the performing physician was added as a variable during model development within the derivation cohort. After stepwise backwards selection performing physician was selected in the model and ethnicity was left out which led to a modest increase in the apparent C-statistic from 0.630 to 0.647.

Discussion
This study shows that age, sex, BMI, ASA class, surveillance vs. screening, and a Hispanic or Latino ethnicity are independent patient-related predictors of colorectal adenoma detection in a screening and surveillance population. These patient risk factors only modestly account for the variation in individual physicians' and centers' ADR. The final multivariable model had a moderate discriminative ability within the derivation and validation cohort, which slightly increased after the addition of the performing physician as predictor. The lowest observed ADRs were lower than predicted, while the highest observed ADRs were higher than predicted. Approximately one in six physicians performed colonoscopy with an ADR below that predicted by patient risk factors.
Our finding that increasing age is an independent predictor of adenoma detection is in line with the recommendation to start screening colonoscopy from 50 years onwards, since adenoma detection increases with age. [35] The increased risk (OR: 1.73, 95%-CI: 1.60-1.88) for adenoma detection in male versus female patients is consistent with the current recommended difference in adenoma detection rate for male patients (!30%) and female patients (!20%) that equals an OR of 1.71. [7] Increasing BMI was included as a predictor in almost every previous prediction model for adenoma detection [18,20,24,25] as is the case in our prediction model.
A recent Asian study found a self-reported family history of colorectal adenomas to be associated with an increased risk of adenoma detection. [36] Moreover, a family history of advanced adenomas has been shown to increase the risk of advanced adenoma detection. [37] However in the present study a family history of adenomas was not independently predictive for adenoma detection. Interestingly a family history of CRC was not found to be an independent predictor either, while a family history of CRC has been previously included in prediction models for both adenoma [18,[23][24][25] and advanced adenoma detection. [16][17][18][19]22,23] Some possible predictors for adenoma detection that have been associated with (advanced) adenoma detection or CRC development are not collected within the GIQuIC database. Smoking was included in prediction models for adenoma detection. [18,[23][24][25] Furthermore alcohol consumption has been associated with the development of (advanced) adenomas, [38] and has been included as independent predictor in two prediction models for advanced adenomas. [21,22] Certain medication, such as aspirin has been associated with a decreased risk of CRC development [39] and could therefore have possibly added to the discrimination between patients with and without adenomas. [25] In our model ASA class, which is routinely assessed to make sedation decisions, might function as a proxy for medical condition in general. Nevertheless, assignment of ASA-class might heavily depend on the assessor, and certain medical conditions, if predictive, would be less prone to observer judgment and might be more specific. Lastly, dietary components, e.g. dietary fibers have been associated with a decreased risk of colorectal adenoma development, [40] and fried food, picked food, white meat or green vegetable consumption were predictive of advanced adenoma detection in a Chinese population. [15] However, dietary factor assessment in clinical practice might suffer from recall bias. These factors should be considered to be included as variables within ADR monitoring databases such as GIQuIC, if addition of any of these predictors would improve prediction of adenoma detection.
The moderate discriminative ability of our patient-factor based prediction model for adenoma detection is in line with a recent study in which adjustment of the ADR for age, sex, race/ethnicity, and family history of CRC did reduce the variability in ADRs but had only a small effect on the differences of ADRs between physicians. [41] In another study excluding different patient subgroups from ADR calculations changed ADRs substantially, but had only a small effect on the ADR ranking among physicians [42] suggesting that factors, such as physician-related, procedural or technological factors, are likely to influence the ADR. A physicianeffect might also be part of the explanation of the ADR increase during the first ten years of the German CRC screening program, because physicians who started to perform screening colonoscopy detected more adenomas compared to physicians who stopped performing colonoscopies. [43] Unfortunately, no physician-factors were known in our study, therefore their influence on the variation in ADR could not be assessed. Future identification of modifiable physician-factors next to for example withdrawal time could create opportunities for further quality improvement in colonoscopy.
In this study, to the best of our knowledge, we describe the first prediction model for adenoma detection in a screening and surveillance population and its application to compare the predicted proportion of patients with !1 adenoma to the individual physician's and center's observed ADRs. The study is strengthened by the prospective data collection. The large sample size decreased the risk of model overfitting, which was further decreased by the internal and geographical validation, however, a full external validation is still preferable before the model would be widely applied.
Our study also has some limitations. Firstly, although handled in the most appropriate way with multiple imputations, the substantial proportion of missing data on BMI might have influenced our results. Moreover, data on race and ethnicity was possibly missing not at random and could therefore not be handled with multiple imputation. Data on smoking status was unknown for all patients. Secondly, the race and ethnicity sub classification was based on the possibilities within the GIQuIC form; therefore we could for example not specify whether Asian patients originally came from Asian countries with a low or high colorectal adenoma prevalence. Thirdly, due to the design of data collection the pathologists could not be blinded for all predictors during histology assessment. Fourthly, the number of patients who underwent surveillance colonoscopy earlier or later than recommended is unknown. This could have influenced the ADR in the surveillance group, however compared to screening colonoscopy more patients with !1 adenoma were detected during surveillance colonoscopy in the present study. Lastly, since adenoma detection is not considered to be perfect with reported per adenoma miss rates up to 20% of all adenomas, [44] the reference standard for detection, i.e. colonoscopy, is not perfect. If the performing physician or procedural factors are truly influencing the adenoma detection per patient, the outcome assessment might have been biased. Unfortunately, it is not possible to determine to what extent this has influenced the performance of our model and the selection of predictors.

Conclusions
We developed and validated a prediction model for colorectal adenoma detection in a screening and surveillance population based on patient-related predictors, i.e. age, sex, BMI, ASA class, surveillance as indication, and Hispanic or Latino ethnicity, with a modest discriminative ability. Additional patient-related predictors that were not included in the database, e.g. smoking, alcohol use, medication and medical history, could possibly have added to the model performance. However, these data suggest that variation in ADR between physicians could likely be due to a combination of patient-related and other factors, e.g. physician, procedural or technical issues. These data also suggest that a low individual physician's ADR, can only partially be explained by patient mix, and efforts to improve ADR to meet current guidelines should be pursued.  Table. Reasons of high-risk assessment for adenoma detection and adenoma detection rate per high-risk assessment reason. (DOCX)