Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Prediction of findings at screening colonoscopy using a machine learning algorithm based on complete blood counts (ColonFlag)

  • Robert J. Hilsden ,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Project administration, Supervision, Writing – original draft, Writing – review & editing

    rhilsden@ucalgary.ca

    Affiliation Departments of Medicine and Community Health Sciences, Cumming School of Medicine University of Calgary, Calgary, Alberta, Canada

  • Steven J. Heitman,

    Roles Resources, Writing – review & editing

    Affiliation Departments of Medicine and Community Health Sciences, Cumming School of Medicine University of Calgary, Calgary, Alberta, Canada

  • Barak Mizrahi,

    Roles Formal analysis, Methodology, Writing – review & editing

    Affiliation Medial Cancer Research, Kfar Malal, Israel

  • Steven A. Narod,

    Roles Methodology, Writing – original draft, Writing – review & editing

    Affiliations Familial Breast Cancer Research Unit, Women's College Research Institute, Women's College Hospital, Toronto, ON, Canada, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada

  • Ran Goshen

    Roles Conceptualization, Methodology, Resources, Supervision, Writing – original draft, Writing – review & editing

    Affiliation Medial Early Sign, Kfar Malal, Israel

Prediction of findings at screening colonoscopy using a machine learning algorithm based on complete blood counts (ColonFlag)

  • Robert J. Hilsden, 
  • Steven J. Heitman, 
  • Barak Mizrahi, 
  • Steven A. Narod, 
  • Ran Goshen
PLOS
x

Abstract

Adenomatous polyps are a common precursor lesion for colorectal cancer. ColonFlag is a machine- learning-based algorithm that uses basic patient information and complete blood cell counts (CBC) to identify individuals at elevated risk of colorectal cancer for intensified screening. The purpose of this study was to determine whether ColonFlag is also able to predict the presence of high risk adenomatous polyps at colonoscopy. This study was conducted at a large colon cancer screening center in Calgary, Alberta. The study population included asymptomatic individuals between the ages of 50 and 75 who underwent a screening colonoscopy between January 2013 and June 2015. All subjects had at least one CBC result within the year prior to colonoscopy. Based on age, sex, red blood cell parameters, inflammatory cells and platelets, the ColonFlag algorithm generated a score from 0 to 100. We compared the ability of the ColonFlag test result to discriminate between individuals who were found to have a high risk polyp and those with a normal colonoscopy. Among the 17,676 individuals who underwent a screening colonoscopy there were 1,014 found to have a high risk precancerous lesion (5.7%) and 60 were found to have colorectal cancer (0.3%). At a specificity of 95%, the odds ratio for a positive ColonFlag was 2.0 for those with an advanced precancerous lesion compared with those with a normal colonoscopy. The odds ratio did not vary according to patient subgroup, colorectal cancer location or stage. ColonFlag is a passive test that can use routine blood test results to help identify individuals at elevated risk for high risk precancerous polyps as well as frank colorectal cancer. These individuals may be targeted in an effort to achieve greater compliance with conventional screening tests.

Introduction

Population-based screening programs for colorectal cancer have been established in many countries.[13] The dual goals of screening are to reduce the incidence or colorectal cancer and subsequent mortality. To achieve these goals, a screening test must detect both high risk precancerous lesions (advanced adenomatous and sessile serrated polyps) and early invasive cancers. The US Multi Society Task Force on Colorectal Cancer Screening ranks colonoscopy and annual fecal immunochemical testing as valid screening options.[1] Due to cost considerations, most population-based screening programs are based on the fecal immunochemical test. Despite the established benefits and the cost-effectiveness of screening for colorectal cancer, the uptake of screening is suboptimal.[4, 5] Colorectal cancer screening requires active participation of the individual by collecting a stool sample and/or undergoing a more invasive test, such as a colonoscopy. Screening could be enhanced through the use of a passive test that uses electronic medical records to identify individuals at elevated risk of harboring an asymptomatic colorectal cancer or a high risk precancerous lesion.

We have described the development and validation of the ColonFlag score (previously known as MeScore), an algorithm that incorporates patient factors (age and gender) with complete blood count information (CBC) and which is used to predict the presence of colorectal cancer at the time of testing.[68] ColonFlag was developed using data from healthy Israelis (Maccabi Health Care Services, MHS) and colorectal cancer patients. Training of the model was done using the MHS database and the Israeli National Cancer Registry, which documents invasive colorectal cancer but does not document pre-cancerous lesions. ColonFlag was validated in additional cohorts from Israel (MHS), the UK (Health Information Network database) and US (Kaiser Permanente). In the MHS validation study, the area under the receiver operator curve (AUC) for the detection of colorectal cancer was 0.82 ± 0.01. At a specificity of 90%, 56% of colorectal cancer cases were detected. Similar results were achieved in UK and US study samples.[7, 9] It is expected that the identification and removal of high risk polyps at colonoscopy will enhance efforts to prevent invasive colon cancer. To date, the ability of ColonFlag to predict the presence of high risk precancerous lesions at colonoscopy has not been evaluated.

Methods

Study design and setting

This study was conducted at Alberta Health Services’ Forzani & MacPhail Colon Cancer Screening Centre in Calgary, AB, Canada by linking the Centre’s electronic medical records with provincial laboratory data. This study was approved by the Health Research Ethics Board of Alberta (HREBA.CC-16-0162), which waived the requirement for informed consent. The Centre is a publicly-funded endoscopy unit that performs only screening-related colonoscopies. Colonoscopies performed for other indications, including the investigation of signs or symptoms of gastrointestinal disease, such as rectal bleeding or iron deficiency anemia, are done at hospital endoscopy units. The Centre follows published clinical practice guidelines to determine an individual’s eligibility for a first screening or post-polypectomy surveillance colonoscopy. All patients must be free of any medical condition that places them at higher risk for a colonoscopy-related adverse event (predominantly ASA Class I/II). All patients undergo consultation with a trained nurse, and those who do not meet the Centre’s eligibility criteria (for example, those with signs or symptoms of gastrointestinal disease or new anemia) are directed elsewhere. Colonoscopies at the Centre are performed by experienced gastroenterologists and colorectal surgeons who also perform endoscopies at hospital endoscopy units.

Study population

The study population included individuals between the ages of 50 and 75 who underwent a colonoscopy at the Centre between January 2013 and June 2015. To be included in the study, the patient must have undergone a successful colonoscopy (complete to the cecum unless incomplete due to an obstructing mass) with a bowel preparation rated by the endoscopist as adequate to detect polyps greater than 5 mm in size.

Three subgroups of patients were eligible for the study (1) individuals at average risk for colorectal cancer, (2) individuals with a personal history of polyps and (3) individuals with a family history of polyps or colorectal cancer. Patients were excluded if they had a positive guaiac or immunochemical fecal occult blood test, a prior history of colorectal cancer, a known or suspected genetic predisposition to cancer or no CBC result within the year prior to their colonoscopy.

Data sources

We obtained data on colonoscopies from the Centre’s endoscopy reporting program endoPRO (Pentax Medical). Data elements included age, sex, date of procedure, indication, depth of endoscope insertion, bowel preparation quality and unique lifetime identifier. Pathology data was obtained from the Centre’s Pathology Database, which includes a structured summary of the pathology report. The summary is completed by a trained nurse who reconciles each polyp reported at colonoscopy with the pathology report. This provides a best possible classification of each resected polyp for those situations where more than one polyp was included in a specimen container. The Centre also regularly links to the Alberta Cancer Registry to identify all colorectal cancer diagnosed on the Centre’s patients and to obtain staging information using the 6th Edition of the American Joint Committee on Cancer Staging Handbook.[10] Prior to August 2013, the Centre’s Pathology Database did not record the presence of dysplasia in sessile serrated polyps.

Complete blood count results were obtained from Alberta Health Services’ Analytics department through a deterministic record linkage using the patients’ unique lifetime identifier. This department receives all laboratory results performed in Alberta. All CBC results from January 1, 2010 to the date of colonoscopy were obtained. Results of CBCs collected during a hospital stay or emergency room visit were excluded. CBC components could include one or more of the following: hemoglobin, hematocrit, mean corpuscular volume, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, red blood cell count, red cell distribution width, white blood cell count, platelet count and count and/or percentage of neutrophils, lymphocytes, monocytes, eosinophils and basophils.

Colonoscopy outcomes

We classified the outcomes of colonoscopies based on the most advanced lesion found into five categories: (Group 1) invasive colorectal cancer, (Group 2) high risk precancerous lesions (advanced adenomatous polyp or sessile serrated polyp with conventional cytological dysplasia, (Group 3) non-advanced adenomatous polyp or non-dysplastic sessile serrated polyp, (Group 4) non-neoplastic findings (i.e. distal hyperplastic polyps, “polyps” classified as normal tissue) and (Group 5) no finding. An advanced adenoma was defined as one greater than 10 mm in size or containing villous elements or high-grade dysplasia.[11]

ColonFlag scoring

For this study, the ColonFlag algorithm, previously described for the detection of colorectal cancer, was used. ColonFlag incorporates information about age, sex, hemoglobin, red blood cell parameters, white blood cell parameters and platelets using a computer algorithm to generate a raw score that ranges from 0 to 100.[7] For the calculation of the ColonFlag score, the model requires at a minimum the following six CBC components: hemoglobin, red blood cell count, hematocrit, mean corpuscular volume, mean corpuscular hemoglobin and mean corpuscular hemoglobin concentration. Additional CBC components improve the performance of ColonFlag, but are not mandatory. If a CBC does not include one or more of the inflammatory cell parameters or the platelet count, the algorithm imputes a value for these based on the age- and gender-specific means of the population. Values were missing for 0.1% - 1.8% of the inflammatory cell parameters and 0.3% of platelet counts. For this study, all available CBC results completed prior to the date of the colonoscopy were included.

Statistical analysis

Statistical analysis was performed using R version 3.0.1 (R Development Core Team, Vienna, Austria) and SPSS version 23.0 (IBM, Armonk, New York). The raw ColonFlag scores were transformed into percentiles, based on the distribution of scores in all the patients included in the analysis. A ColonFlag score was considered positive if it was in the top 10th or 5th percentile, depending on the preset specificity of 90% or 95%, respectively. Bootstrap resampling methods were used to estimate the expected mean sensitivity and odds ratio with associated 95% confidence interval of the model performance for the detection of screening-relevant lesions. For 500 bootstrap samples, we calculated the sensitivity and odds ratio. Mean values for each test characteristic and their associated 95% confidence intervals were determined from the bootstrap sample distribution.

The primary goal of the study was to estimate the performance of the ColonFlag test for identifying patients with advanced precancerous colon lesions. We estimated the odds ratio and associated 95% confidence interval for a positive ColonFlag in patients with colorectal cancer (Group 1) for those with an advanced precancerous lesion (Group 2) and for those with non-advanced adenomas (Group 3). All comparisons were done with respect to those with no findings at colonoscopy as a reference (Group 5). We also estimated these odds ratios for subgroups according to anatomic site and cancer stage. Next, we estimated the sensitivity of the ColonFlag in the detection of either colorectal cancer or and advanced precancerous lesions (Groups 1 and 2 combined) compared to those participants with non-advanced polyps, non-neoplastic findings and no findings (Groups 3, 4 and 5 combined).

Results

Patient characteristics

There were 27,685 potentially eligible individuals of whom 10,009 were excluded because no CBC result was available for the preceding 12-month period. The characteristics of the 17,676 individuals included in the final study sample are shown in Table 1. The majority of the individuals were at average risk for colorectal cancer; 9.1% of all potentially eligible individuals were considered to be at higher than average risk because of a previous history of polyps and 21.5% were considered to be at higher risk because of a family history of colorectal cancer (Table 1). There were 60 colorectal cancer cases detected and 1014 high risk precancerous lesions detected in the study group. Among the remaining individuals, 26.3% had (one or more) non-advanced adenoma or a non-dysplastic sessile serrated polyp.

thumbnail
Table 1. Study sample: Baseline characteristics and colonoscopy outcomes.

https://doi.org/10.1371/journal.pone.0207848.t001

ColonFlag test performance

Compared to those with no findings at colonoscopy, the odds ratio for a positive ColonFlag test at a specificity of 95% was 5.1 (95% CI 2.3–8.9) for colorectal cancer, was 2.0 (95% CI 1.6–2.6) for an advanced precancerous lesion, and was 1.7 (95% CI 1.5–2.0) for those with a non-advanced adenoma/serrated polyp. The results at specificities at 90% and at 95% are shown in Table 2.

Table 3 shows the sensitivity of ColonFlag for colorectal cancer or an advanced precancerous lesion in the full patient cohort. In this analysis, those with a non-advanced polyp, a non-neoplastic polyp or no findings at colonoscopy are included in the disease-free group. For colorectal cancer, there were no significant differences in sensitivity by anatomic site or cancer stage, although the study had limited power to detect a meaningful difference (Table 4).

thumbnail
Table 3. Sensitivity and odds ratio of a positive colonflag for colorectal cancer or an advanced precancerous lesion in all study patients, by patient subgroup.

https://doi.org/10.1371/journal.pone.0207848.t003

thumbnail
Table 4. Sensitivity and odds ratio of colonflag for different categories of colorectal cancer in all study patients.

https://doi.org/10.1371/journal.pone.0207848.t004

ColonFlag test performance

Compared to those with no findings at colonoscopy, the odds ratio for a positive ColonFlag test at a specificity of 95% was 5.1 (95% CI 2.3–8.9) for colorectal cancer, was 2.0 (95% CI 1.6–2.6) for an advanced precancerous lesion, and was 1.7 (95% CI 1.5–2.0) for those with a non-advanced adenoma/serrated polyp. The results at specificities at 90% and at 95% are shown in Table 2.

Table 3 shows the sensitivity of ColonFlag for colorectal cancer or an advanced precancerous lesion in the full patient cohort. In this analysis, those with a non-advanced polyp, a non-neoplastic polyp or no findings at colonoscopy are included in the disease-free group. For colorectal cancer, there were no significant differences in sensitivity by anatomic site or cancer stage, although the study had limited power to detect a meaningful difference (Table 4).

Discussion

We have previously demonstrated that ColonFlag is able to enhance detection of invasive colorectal cancers[6, 7, 9]. In this study, we have shown that the ColonFlag score is also able to detect high risk precancerous lesions, such as advanced adenomatous polyps. In general, the performance of ColonFlag was comparable across patient subtypes, anatomic locations and stage. ColonFlag appeared to perform less well in participants with a personal history of polyps. This finding is perhaps surprising given that ColonFlag is likely detecting subtle iron deficiency and alterations in the immune system resulting from the lesion, which would not be expected to be different between those with and without a family history of cancer. Further work on validating the ColonFlag performance in this patient group is warranted.

We focused on the sensitivity of ColonFlag for colorectal cancer and high risk precancerous polyps, rather than the detection of adenomas in general. This is consistent with the goals of screening programs. In this context, a detection of a patient with a non-advanced adenoma only using ColonFlag would be considered a false positive We conducted two sets of analyses: one where only those with no findings were included as controls and the second where those with no neoplasia, non-advanced polyps or no findings at colonoscopy were included as controls. There was only a modest difference in the odds ratio between the two, indicating that in clinical practice, the performance of ColonFlag is not influenced to a great degree by the presence of non-advanced adenomas.

The sensitivity of ColonFlag for colorectal cancer is low compared to other well-established screening tests such as colonoscopy or FIT [1, 12]. ColonFlag should not be viewed as a possible substitute for these tests, rather ColonFlag could provide a useful tool in identifying those unscreened individuals at higher risk for harboring a colorectal cancer or high risk precancerous lesion. People who are told that they are at greater risk for colorectal cancer are more likely to undergo screening.[13] Ideally, ColonFlag could be incorporated into an electronic medical record and performed when a CBC blood test is conducted. If a patient’s ColonFlag score exceeded a preset level, that person would be flagged and prioritized for formal screening [14]. In this way, reductions in colorectal cancer incidence and mortality could be achieved through increasing screening uptake among previously unscreened populations.

ColonFlag does not require any action by either the patient or the health care provider. It can act as a passive means to identify individuals at high risk for colorectal cancer or high risk precancerous lesions. However, ColonFlag is limited by the need for at least one recent CBC result.

This study has several limitations. First, detailed information on a patient’s personal or family history of colorectal cancer or polyps was not available. Therefore, these two groups will include a mix of patients with a history of colorectal cancer or polyps in one or more first degree or more distant relatives. Second, we did not have complete characterization of all sessile serrated polyps. The Centre’s pathology database did not routinely record the presence of conventional dysplasia in these polyps prior to 2013. Therefore, some high-risk polyps in the earlier years will be misclassified as non-high risk. However, the prevalence of conventional dysplasia in sessile serrated polyps is low (<5%).[15] Moreover, ColonFlag was trained and developed using a registry of invasive cancers and in the development stage, high risk precancerous lesions were considered negative findings. In the future, re-training of ColonFlag may increase its overall performance. Finally, within this study, eligibility criteria included the absence of gastrointestinal symptoms or unexplained anemia. Therefore, our study patients, classified as average risk for colorectal cancer, are likely to be at slightly lower risk of colorectal cancer than the (unscreened) general population.

It is possible that the performance of future versions of ColonFlag could be improved using a broader array of standard laboratory tests[8] or by increasing the number of risk factors included in the algorithm. For example, smoking, body mass index and family history of colorectal cancer are commonly recorded in a patient’s electronic medical record. Several risk prediction models for colorectal neoplasia have been developed although none have seen widespread validation or adoption.[16] However, combining ColonFlag with additional clinical risk factors routinely available in an EMR could improve its ability to discriminate high risk from low risk patients.

In summary, the ColonFlag model was able to identify individuals at elevated risk of having colorectal cancer or a high risk precancerous polyp using data solely based on routinely collected complete blood cell counts and patient’s age and sex. These findings support the value of ColonFlag to be embedded into laboratory information systems or electronic health records to identify individuals who warrant targeted efforts to enhance screening compliance.

References

  1. 1. Rex DK, Boland CR, Dominitz JA, Giardiello FM, Johnson DA, Kaltenbach T, et al. Colorectal Cancer Screening: Recommendations for Physicians and Patients from the U.S. Multi-Society Task Force on Colorectal Cancer. American Journal of Gastroenterology. 2017;112(7):1016–30. https://dx.doi.org/10.1038/ajg.2017.174. pmid:28555630.
  2. 2. Canadian Task Force on Preventive Health C, Bacchus CM, Dunfield L, Gorber SC, Holmes NM, Birtwhistle R, et al. Recommendations on screening for colorectal cancer in primary care. CMAJ Canadian Medical Association Journal. 2016;188(5):340–8. https://dx.doi.org/10.1503/cmaj.151125. pmid:26903355.
  3. 3. Altobelli E, Lattanzi A, Paduano R, Varassi G, di Orio F. Colorectal cancer prevention in Europe: burden of disease and status of screening programs. Preventive Medicine. 2014;62:132–41. https://dx.doi.org/10.1016/j.ypmed.2014.02.010. pmid:24530610.
  4. 4. Klabunde C, Blom J, Bulliard JL, Garcia M, Hagoel L, Mai V, et al. Participation rates for organized colorectal cancer screening programmes: an international comparison. Journal of Medical Screening. 2015;22(3):119–26. https://dx.doi.org/10.1177/0969141315584694. pmid:25967088.
  5. 5. White A, Thompson TD, White MC, Sabatino SA, de Moor J, Doria-Rose PV, et al. Cancer Screening Test Use—United States, 2015. MMWR—Morbidity & Mortality Weekly Report. 2017;66(8):201–6. https://dx.doi.org/10.15585/mmwr.mm6608a1. pmid:28253225.
  6. 6. Kinar Y, Akiva P, Choman E, Kariv R, Shalev V, Levin B, et al. Performance analysis of a machine learning flagging system used to identify a group of individuals at a high risk for colorectal cancer. PLoS ONE. 2017;12(2):e0171759. https://dx.doi.org/10.1371/journal.pone.0171759. pmid:28182647.
  7. 7. Kinar Y, Kalkstein N, Akiva P, Levin B, Half EE, Goldshtein I, et al. Development and validation of a predictive model for detection of colorectal cancer in primary care by analysis of complete blood counts: a binational retrospective study. Journal of the American Medical Informatics Association. 2016;23(5):879–90. https://dx.doi.org/10.1093/jamia/ocv195. pmid:26911814.
  8. 8. Goshen R, Mizrahi B, Akiva P, Kinar Y, Choman E, Shalev V, et al. Predicting the presence of colon cancer in members of a health maintenance organisation by evaluating analytes from standard laboratory records. British Journal of Cancer. 2017;116(7):944–50. https://dx.doi.org/10.1038/bjc.2017.53. pmid:28253525.
  9. 9. Hornbrook M, Goshen R, Choman E, O'Keeffe-Rosetti M, Kinar Y, Liles EG, et al. Early colorectal cancer detected by machine-learning model using gender, age and complete blood count data. Digestive Diseases & Sciences. 2017.
  10. 10. AJCC Cancer Staging Handbook Seventh Edition, American Joint Committee on Cancer. 7th ed. New York: Springer; 2010 2010.
  11. 11. Driman DK, Marcus VA, Hilsden RJ, Owen DA. Pathological reporting of colorectal polyps: Pan-Canadian Consensus Guidelines. Can J Pathol. 2012;4:81–90.
  12. 12. Robertson DJ, Lee JK, Boland CR, Dominitz JA, Giardiello FM, Johnson DA, et al. Recommendations on fecal immunochemical testing to screen for colorectal neoplasia: a consensus statement by the US Multi-Society Task Force on colorectal cancer. Gastrointestinal Endoscopy. 2017;85(1):2–21.e3. https://dx.doi.org/10.1016/j.gie.2016.09.025. pmid:27769516.
  13. 13. Atkinson TM, Salz T, Touza KK, Li Y, Hay JL. Does colorectal cancer risk perception predict screening behavior? A systematic review and meta-analysis. Journal of Behavioral Medicine. 2015;38(6):837–50. https://dx.doi.org/10.1007/s10865-015-9668-8. pmid:26280755; PubMed Central PMCID: PMCNIHMS716423 [Available on 12/01/16].
  14. 14. Ritvo PG, Myers RE, Paszat LF, Tinmouth JM, McColeman J, Mitchell B, et al. Personal navigation increases colorectal cancer screening uptake. Cancer Epidemiology, Biomarkers & Prevention. 2015;24(3):506–11. https://dx.doi.org/10.1158/1055-9965.EPI-14-0744. pmid:25378365.
  15. 15. Rex DK, Ahnen DJ, Baron JA, Batts KP, Burke CA, Burt RW, et al. Serrated Lesions of the Colorectum: Review and Recommendations From an Expert Panel. Am J Gastroenterol. 2012;107(9):1315–29. pmid:22710576
  16. 16. Ma GK, Ladabaum U. Personalizing colorectal cancer screening: a systematic review of models to predict risk of colorectal neoplasia. Clinical Gastroenterology & Hepatology. 2014;12(10):1624–34.e1. https://dx.doi.org/10.1016/j.cgh.2014.01.042. pmid:24534546.