Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A machine learning approach to support triaging of primary versus secondary headache patients using complete blood count

  • Fei Yang ,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Writing – original draft, Writing – review & editing

    terry.yangfei@gmail.com

    Affiliation Roche Information Solutions, F. Hoffmann-La Roche AG, Basel, Switzerland

  • Tong Meng,

    Roles Formal analysis, Writing – review & editing

    Affiliation Roche Molecular Systems, Santa Clara, California, United States of America

  • Ben Torben-Nielsen,

    Roles Formal analysis, Writing – original draft, Writing – review & editing

    Affiliation Roche Information Solutions, F. Hoffmann-La Roche AG, Basel, Switzerland

  • Carsten Magnus,

    Roles Formal analysis, Writing – original draft, Writing – review & editing

    Affiliation Roche Information Solutions, F. Hoffmann-La Roche AG, Basel, Switzerland

  • Chuang Liu,

    Roles Conceptualization, Writing – original draft, Writing – review & editing

    Affiliation Product Development Data and Statistical Sciences, Real World Data Enabling Platform, F. Hoffmann-La Roche AG, Basel, Switzerland

  • Emilie Dejean

    Roles Conceptualization, Formal analysis, Writing – original draft, Writing – review & editing

    Affiliation Roche Diagnostics International Ltd, Rotkreuz, Switzerland

Abstract

Headaches account for up to 4.5% of emergency department visits, where they present a significant diagnostic challenge. While primary headaches are benign, secondary headaches can be life-threatening. It is essential to rapidly differentiate between primary and secondary headaches as the latter require immediate diagnostic work-up. Current assessment relies on subjective measures; time constraints can result in overuse of diagnostic neuroimaging, prolonging diagnosis, and adding to economic burden. There is therefore an unmet need for a time- and cost-efficient, quantitative triaging tool to guide further diagnostic testing. Routine blood tests may provide important diagnostic and prognostic biomarkers indicating underlying headache causes. In this retrospective study (approved by the UK Medicines and Healthcare products Regulatory Agency Independent Scientific Advisory Committee for Clinical Practice Research Datalink (CPRD) research [20_000173]), UK CPRD real-world data from patients (n = 121,241) presenting with headache from 1993–2021 were used to generate a predictive model based on a machine learning (ML) approach for primary versus secondary headaches. A ML-based predictive model was constructed using two different methods (logistic regression and random forest) and the following predictors were evaluated: ten standard measurements of complete blood count (CBC) test, 19 ratios of the ten CBC test parameters, and patient demographic and clinical characteristics. The model’s predictive performance was assessed using a set of cross-validated model performance metrics. The final predictive model showed modest predictive accuracy using the random forest method (balanced accuracy: 0.7405). The sensitivity, specificity, false negative rate (incorrect prediction of secondary headache as primary headache), and false positive rate (incorrect prediction of primary headache as secondary headache) were 58%, 90%, 10%, and 42%, respectively. The ML-based prediction model developed could provide a useful time- and cost-effective quantitative clinical tool to facilitate the triaging of patients presenting to the clinic with headache.

Introduction

Headache is a common nervous system disorder, affecting approximately 50% of the general population [1, 2]. It can be classified as a primary headache disorder, typically migraine, tension, or cluster headaches [1], or a secondary headache disorder, which includes: giant cell arteritis, meningitis/encephalitis, subarachnoid hemorrhage (SAH), cerebral venous thrombosis, idiopathic intracranial hypertension, brain tumor, and ischemic stroke [1, 3, 4]. Although secondary headaches are rare compared with primary headaches, they are extremely important to recognize as they can require immediate intervention [1, 3], with further and rapid diagnostic evaluation including neuroimaging and lumbar puncture often needed [3, 5].

In primary care, headache is one of the most common presenting symptoms, with most due to primary causes, although many headache sufferers do not receive a specific diagnosis [6]. Headaches also account for up to 4.5% of all visits to the emergency department (ED) [2, 3, 5, 7, 8], with one observational study revealing an equal prevalence of primary versus secondary headache etiologies presenting to the ED (48 vs. 52%, respectively) [2]. Thus, accurate diagnosis of the underlying cause of headache and treatment initiation can be critical.

Headache has a historic reputation as being one of the most poorly classified neurologic disorders, and the International Headache Society classification system was developed to provide a hierarchy of criteria for diagnosis [9]. In the United Kingdom (UK) primary care setting, recognition of the importance of effective headache diagnosis, and management has prompted the additional training of general practitioners (GPs) with special interest in headache, and the establishment of headache clinics in general practice [10, 11]. Despite this, a varied degree of confidence in headache diagnosis and treatment still exists amongst emergency and primary clinicians, due in part to a lack of specific experience in neurology and poor use of the International Headache Society classification system [2, 6, 10]. A study conducted in Canada reported that misdiagnosis or diagnostic uncertainty occurred in more than one-third (35.7%) of cases of neurologic complaints in the ED, when comparing the initial diagnosis to the diagnosis made by the consulting neurologist [12]. In the UK, where there is a concerning shortfall of neurologists [10], the fear of clinical neurology by doctors outside the discipline has been described to amount to ‘neurophobia’ [10]. This can, together with patient anxiety and legal concerns, result in multiple appointments and unnecessary investigations, which may subsequently increase diagnosis times, healthcare costs, and economic burden [1, 4, 13].

The effective triaging of patients presenting with primary versus secondary headache is an important and currently unmet need [1, 5, 14]. Consideration of patients’ medical history and physical examination are currently the most important aspects of headache assessment, and clinicians must be vigilant for “red flag” symptoms that are characteristic of serious secondary causes [1, 3]. This qualitative approach could be complimented by a quantitative methodologic tool, which could potentially streamline diagnosis, facilitate clinical decision-making, and reduce unnecessary investigations [2, 5, 14].

A complete blood count (CBC) is one of the most commonly requested blood tests [15] and its results may provide important diagnostic and prognostic biomarkers indicating underlying causes of headache. Several CBC parameters have been investigated for their ability to distinguish between primary and secondary headaches. A retrospective study of patients presenting to the ED with headache reported that leukocytosis or an increase in the percentage of polymorphonuclear leukocytes (PMNs) had a sensitivity of 89.8%, a specificity of 46.7%, a positive predictive value of 82.1%, and a negative predictive value of 62.8% for diagnosing SAH within 6 or 12 hours of ED admission [16]. The ratio of neutrophils to lymphocytes (NLR) has also received increasing attention as a diagnostic and prognostic marker of inflammation and can be easily calculated from standard measurements of CBC tests. A retrospective study found NLR to be higher in people presenting to the ED with a migraine attack versus people without a headache [17]. A further retrospective, single-center analysis in ED patients presenting with headache accompanied by nausea and vomiting, found that NLR could distinguish between those with migraine and those with SAH [16]. Although this retrospective study involved a limited number of clinical cases, median NLR values were found to be significantly higher in patients with SAH compared with those with migraine and other headaches (both p < 0.001) [18].

Measurements derived from CBC tests have also been evaluated in the prediction of secondary headache severity. In one retrospective, single-center review, patients with SAH whose leukocyte count was >15 x 109/L during admission were more than three times more likely to develop vasospasm [19]. In another study of patients with SAH, the authors reported that patients admitted with spontaneous SAH with a leukocyte count of >20,000 had a mortality rate of 50% [20]. Furthermore, mean platelet volume and platelet distribution width has been shown to be increased in patients with cerebral venous thrombosis and brain parenchymal lesions compared with patients with cerebral venous thrombosis without lesions [21].

Aligning with the concept of using routine blood test results as a triaging tool to assist physicians when deciding whether to perform neuroimaging on patients presenting with severe headache, this study was designed to evaluate a machine learning (ML)-based approach to classify primary versus secondary headache using real-world data (RWD) derived from a large patient group.

Materials and methods

Study design

This was a retrospective, observational study of RWD from patients presenting with a complaint of headache to the clinic. Due to the limitation of accessing RWD in the ED and evaluating the suitability to the study objectives, primary care data from the UK Clinical Practice Research Datalink (CPRD) were used.

Database

The CPRD includes longitudinal electronic health records generated by GP practices in the UK. The current study included patients from two CPRD databases [2224]: CPRD GOLD (using a data-cut of July 2021), and CPRD Aurum (using a data-cut of June 2021).

Study population

Eligible patients with a record of presenting to a GP practice with a complaint of headache were identified in the CPRD GOLD and CPRD Aurum databases separately, according to the list of codes provided in S1S10 Tables. The date of the patient’s first clinic visit with headache symptoms was defined as the index date. Patients were included for analysis if they: 1) received a diagnosis of either primary or secondary headache within a 30-day window after the index date, 2) had laboratory results available for all ten specific parameters from the CBC test within a 30-day window after the index date and 3) had data classed as of “research acceptable” quality by the CPRD. Qualifying patients from both CPRD GOLD and CPRD Aurum were then merged to form an analytical dataset. To avoid possible duplicate individuals in the analytical dataset, the following steps were employed: 1) if a patient had multiple sets of records that fulfilled the inclusion criteria, only data from their latest index date were included; and 2) for clinics that changed enrolment from CPRD GOLD to CPRD Aurum, qualifying patients were only identified during the period that the clinic was enrolled for CPRD Aurum and not for CPRD GOLD.

The study was conducted in accordance with the principles founded in the Declaration of Helsinki of 1975 (revised 2013). The study was approved by the UK Medicines and Healthcare products Regulatory Agency Independent Scientific Advisory Committee for CPRD research (20_000173). All patient data were anonymized; thus, the requirement for patient consent was waived. Individual patients can opt out of sharing their records with the CPRD. An overview of the study is available online [25].

Outcome definitions and study variables

Primary headaches were defined as migraine, tension-type headache, and cluster headache, while secondary headaches were defined as those caused by ischemic stroke, cerebral venous thrombosis, hemorrhage (including SAH), arteritis, and angiitis (S1S6 Tables). In the event that a patient had diagnoses contributing to both primary and secondary headaches during the same episode, the patient was categorized as having secondary headache.

Measured values of the following ten CBC parameters were included: red blood cell (RBC) count, platelet count, mean corpuscular volume (MCV), white blood cell (WBC) count, neutrophil count, lymphocyte count, monocyte count, eosinophil count, basophil count, and hemoglobin. The following 19 ratios of the individual parameters were also contrived as variables: platelet/RBC, WBC/RBC, RBC/neutrophil, RBC/monocyte, monocyte/eosinophil, platelet/MCV, platelet/lymphocyte, platelet/eosinophil, MCV/WBC, MCV/neutrophil, neutrophil/lymphocyte, neutrophil/eosinophil, lymphocyte/monocyte, lymphocyte/eosinophil, hemoglobin/lymphocyte, hemoglobin/eosinophil, hemoglobin/RBC, MCV/monocyte, and MCV/hemoglobin.

Descriptive analysis

First, patients with primary headaches and secondary headaches included in the analytical cohort were described, and the differences between the two groups were assessed using t-test for continuous variables and chi-squared test for categorical variables. The Hotelling T2-test was then used to examine whether data from the primary and secondary headache groups could be differentiated based on several collective variables, and was performed using the T2.test function from the R-project for statistical computing [26].

Data from patients with outlier results (the top and bottom 1% of the values) for any of the ten CBC parameters and body mass index, were excluded from the analysis.

Prediction model development, performance, and validation

To develop the prediction model for primary headaches versus secondary headaches, two separate ML-based approaches (logistic regression and random forest) were used and evaluated. A total of 31 candidate predictors for the prediction models were preselected, including: demographic variables (age and sex), ten parameters from the CBC test (blood cell counts) and 19 variables derived from the ratios of the ten parameters from the CBC test. Earlier reports indicate that variables comprising specific blood cell ratios are medically relevant [17, 18].

Data normalization and optimization was performed by min–max scaling to standardize the scale of CBC-related values included in the model (preliminary tests showed that min–max scaling outperformed z-score scaling). No data balancing technique was used, rather, all available data in the analytical cohort were used.

Model construction

The ability of the blood cell count ratios to predict headache type was determined by comparing the predictive performance metrics of each prediction model with and without the 19 variables derived from the ratios of the ten parameters from the CBC test. As part of the model specification, the following feature selection techniques were assessed with the aim of simplifying the model while maintaining good performance: lasso regularization [27], recursive feature elimination [28], weight of feature importance (analyzed by logistic regression), and correlation analysis for any candidate predictor with headache type.

For evaluation of the prediction models developed using logistic regression and random forest, five-fold cross-validation was used to obtain the mean and standard deviation for each of the following model performance metrics: 1) accuracy; 2) balanced accuracy; 3) average precision; 4) F1-score; and 5) area under the curve. Since the data were imbalanced and secondary headache is more serious than primary headache, the model was tuned by changing the predicted probability threshold cut off from 0.5 to 0.3. This was done with the aim of finding a prediction model that balanced the correct detection of secondary headaches with a false negative rate (i.e. incorrect prediction of secondary headache as primary headache) of less than 10%, whilst minimizing overcalling of primary headache as secondary headache. The final prediction model was then evaluated using a single 80:20 train/test dataset split. To visualize and summarize the performance of the prediction model, a confusion matrix was generated.

Standard descriptive statistical analyses were conducted using R. Machine learning software (R 4.1.3) [29], and ML analyses were implemented in Python (3.9.0), using the scikit-learn library [30, 31]. Matplotlib was used for visualization [32].

Results

Cohort description

A total of 121,241 patients satisfied the inclusion/exclusion criteria and formed the analytical cohort, comprising 108,906 patients with primary headache and 12,335 patients with secondary headache (Fig 1). Baseline demographic and clinical characteristics of the analytical cohort, stratified into primary and secondary headache groups, are presented in Table 1. Overall, patients presenting with primary headache were significantly younger than those presenting with secondary headache (mean 44 vs. 70 years; p < 0.001). The majority of primary headache patients were between 21 and 50 years of age (n = 64,004; 58.7%), whereas the majority of secondary headache patients were between 61 and 80 years of age (n = 6,454; 52.3%). In both headache groups, there were considerably more female than male patients (78.1% vs. 21.9% and 65.7% vs. 34.3%, for primary and secondary headache groups, respectively).

thumbnail
Fig 1. Patient flow and determination of the analytical cohort.

*Patient data flagged as “research acceptable” by CPRD. Two consultation types ("follow-up/routine visit" and "mail from patient") occurring on the index date were removed from the data set. CPRD, Clinical Practice Research Datalink.

https://doi.org/10.1371/journal.pone.0282237.g001

thumbnail
Table 1. Baseline demographic and clinical characteristics of the analytical cohort (n = 121,241).

https://doi.org/10.1371/journal.pone.0282237.t001

Descriptive statistical analyses

Results from descriptive statistical analyses revealed statistically significant differences (p < 0.001) between the population means for almost all of the ten parameters from the CBC tests and the 19 ratio variables measured in patients between headache groups (Tables 2 and 3). However, there were substantial overlaps in the range of values for these variables with broadly similar mean values (S1 and S2 Figs). Further analysis using the Hotelling T2-test showed that the two headache groups could not be properly differentiated (Table 4).

thumbnail
Table 2. Descriptive statistics for variables from the CBC test for the analytical cohort (n = 121,241).

https://doi.org/10.1371/journal.pone.0282237.t002

thumbnail
Table 3. Descriptive statistics for blood cell ratios for the analytical cohort (n = 67,974).

https://doi.org/10.1371/journal.pone.0282237.t003

thumbnail
Table 4. Hypothesis testing results using the Hotelling T2-test (n = 67,974).

https://doi.org/10.1371/journal.pone.0282237.t004

Predicting primary headaches versus secondary headaches

Table 5 shows five-fold cross-validated performance metrics of the prediction models, with and without the ratio variables of blood cell count parameters before varying the probability threshold, using logistic regression and random forest methods separately. The performance metrics suggested that the random forest method without the blood cell count ratio variables had an overall better predictive performance for predicting primary headaches versus secondary headaches. Spearman’s correlation matrix and feature weight analysis suggested that age group, sex, total WBC count, neutrophil, and monocyte count correlated strongly with headache type (S3 Fig).

thumbnail
Table 5. Performance metrics of the prediction models using logistic regression and random forest methods with five-fold cross-validation.

https://doi.org/10.1371/journal.pone.0282237.t005

After changing the probability threshold, the final prediction model showed accurate and robust performance with a balanced accuracy at 0.7405 (Table 6), reflecting an ability to achieve a false negative rate (incorrect prediction of secondary headache as primary headache) of 10% while maintaining a false positive rate (incorrect prediction of primary headache as secondary headache) of 42% (Fig 2). The sensitivity (correct prediction of secondary headache as secondary headache) and specificity (correct prediction of primary headache as primary headache) of the final model were 58% and 90%, respectively (Fig 2).

thumbnail
Fig 2. Confusion matrix (normalized to ratio instead of patient count) of the final prediction model on the test set using the random forest method.

Features included in the model are: age group, sex, and laboratory blood test results of ten CBC parameters. CBC, complete blood count.

https://doi.org/10.1371/journal.pone.0282237.g002

thumbnail
Table 6. Performance metrics of the final prediction model on the test set using the random forest method.

https://doi.org/10.1371/journal.pone.0282237.t006

Discussion

In summary, descriptive statistical analysis revealed substantial overlap in values for the ten CBC parameters and 19 ratio variables measured in patients in both headache groups, and patients with primary and secondary headaches could not be properly differentiated based on results from the Hotelling T2-test. However, we have developed a ML-based prediction model with a modest predictive accuracy to differentiate between primary and secondary headaches on the basis of readily available patient characteristics and routine blood test results.

Diagnostic procedures and acute treatment for headaches may vary across different countries, depending on factors such as catchment area, structure of the care facility, in-house protocols, and local medical staff [3]. In current clinical practice, an excess of patients presenting to the ED with a severe headache are referred for neuroimaging, despite current guidelines recommending against routine neuroimaging for headaches [33]. In one European study, neurologic examination was performed in 72.5% of patients presenting to the ED with headache; 60.9% subsequently underwent technical investigation and 53.2% had non-contrast cranial computed tomography [3]. However, unnecessary investigations should be avoided [1]; it is not appropriate, for example, to routinely use computed tomography in headache patients, because of the high cost and radiation exposure [16].

Our study suggests that a CBC-based ML algorithm could mitigate this problem, by simplifying the triage of patients who require such diagnostic procedures. Our findings indicate that age group, sex, and ten parameters that are usually collected during CBC tests represent convenient, measurable variables for use in a ML prediction model, to differentiate patients with primary and secondary headache. We also developed a prediction model with modest performance, predicting almost 60% of secondary headache patients and 90% of primary headache patients. Future studies may look at including certain clinical characteristics (such as a history of brain trauma, hypertension, epilepsy, or stroke) in the prediction model to assess whether their addition could improve its performance further.

In a similar study, the sensitivity of leukocytosis or increase in the percentage of PMNs in cases of patients with SAH was investigated, with a view to developing a non-invasive blood test to facilitate diagnosis [16]. Investigators concluded that CBC had an excellent sensitivity (89.8%) in the exclusion of SAH in non-traumatic headache. Specificity, however, was poor (46.7%), as leukocytosis can result from other headache etiologies such as migraine, temporal arteritis (giant cell arteritis), and hypertension, suggesting CBC levels could only be used to rule out, rather than confirm, SAH [16].

Our ML model has potentially important clinical implications. The reasonably low error rate (10%) of misclassifying secondary headache as primary headache could aid clinicians’ decision-making on which patients to refer for further examination (e.g. neuroimaging and/or lumbar puncture), thereby avoiding unnecessary procedures and reducing the drain on healthcare resources. It is envisaged that our model could be used alongside taking a detailed headache history and, if indicated, a thorough neurologic examination. In cases of abnormal findings, neuroimaging should be performed to rule-out secondary headaches. Currently, if there is a clinical suspicion of SAH, computed tomography is undertaken, followed by lumbar puncture if the scan is inconclusive [34]. As our model is able to distinguish between primary and secondary headaches with a sensitivity of 58%, a specificity of 90%, a false negative rate of 10% and false positive rate of 42%, it is hoped that it may help reduce the number of unnecessary procedures. Furthermore, the model will be of particular use in countries where neuroimaging is not readily available, but should be used with caution, particularly when headache history is sparse.

As a point to consider, because primary headache accounted for approximately 90% of all headache cases in this study, a default prediction model with good prediction accuracy may have been informed by an imbalanced data set (i.e. a model that predicts primary headache for all patients will be correct in 90% of cases). However, the model developed herein accounted for such imbalanced data and focused on the accurate prediction of secondary headaches, resulting in a modest predictive model with a balanced accuracy of 0.7405. Furthermore, as healthcare progresses, in the future our model could be expanded to include other relevant parameters to further improve the model performance.

When comparing random forest and logistic regression methods, the former marginally outperformed the latter in nearly all prediction performance metrics, particularly those suitable for imbalanced data sets (balanced accuracy, average precision, and F1-score). This is likely due to the fact that the random forest model is fundamentally a large number of uncorrelated individual decision trees operating as an ensemble/committee and is hence better suited to capture interactions between variables. The logistic regression model on the other hand, has a linear function and cannot capture such interactions.

Correlated features have an influence on the prediction performance of the model, which was suggested by performance metrics to be slightly compromised by the features of blood cell ratios. Although this is counterintuitive, according to the premise that “more data is better”, data are required to be independent and identically distributed, and it follows that such correlations are detrimental for the performance of ML techniques. The features of blood cell ratios are highly inter-dependent, being strongly correlated both to each other and to the original features of blood cell counts, thus violating this data premise.

We attempted to simplify the model, whilst maintaining good performance, by using feature selection techniques, including lasso regularization and recursive feature elimination. As correlation and feature weight analyses indicated that age group, total WBC count, monocyte count, and neutrophil count were of the greatest significance for the model, only these features were included. However, none of the ML feature selection techniques yielded a better result compared with the final model that included age group, sex, and laboratory results of ten parameters usually collected during a CBC test. This is probably due to the final prediction model demonstrating only a marginal improvement when compared with the ML selection techniques focusing on a certain metric, e.g. accuracy. In addition, many simpler models with fewer features resort to the baseline “guess” model of predicting every patient as primary headache regardless of the data.

The key strength of this study is the large sample size of >120,000 patients and the use of RWD from the UK. Awareness in the UK of the difficulties around headache diagnosis and treatment has prompted the training of GPs with special interest in headache [10] and the establishment of a network of headache centers [11]; our findings could potentially inform these endeavors.

The use of primary care data rather than data directly from ED settings is a limitation of the study, due to the extrapolation performed, which makes it difficult to assess the real-life advantages of the ML approach to differentiating headache types in the ED environment. Further studies using ED data are warranted to validate the algorithm described here for this differentiation. This study considered primary and secondary headaches as two groups of heterogeneous conditions; future work could evaluate diagnostic accuracy of measurements from CBC tests in different forms of primary and secondary headache. As this was a retrospective study of anonymized data some patients included in the study may, for example, have comorbidities, such as an underlying infection or inflammatory condition, or may be using medications that alter certain CBC markers. Further research should be carried out in specific patient populations, such as immunocompromised individuals, to elucidate the potential prognostic value of CBC and CBC-derived ratio parameters in differentiating primary and secondary headaches.

In conclusion, this study demonstrated the use of a ML approach to create a prediction model with a modest level of performance to differentiate patients with primary and secondary headache in clinical settings.

Supporting information

S1 Table. List of read codes used to identify diagnosis of primary headache disorders in CPRD GOLD. NOS, not otherwise specified.

https://doi.org/10.1371/journal.pone.0282237.s001

(DOCX)

S2 Table. List of medical and read codes used to identify diagnosis of primary headache disorders in CPRD Aurum. NOS, not otherwise specified.

https://doi.org/10.1371/journal.pone.0282237.s002

(DOCX)

S3 Table. List of read codes used to identify diagnosis of secondary headache disorders in CPRD GOLD. NOS, not otherwise specified.

https://doi.org/10.1371/journal.pone.0282237.s003

(DOCX)

S4 Table. List of medical and read codes used to identify diagnosis of secondary headache disorders in CPRD Aurum.

CVA, cerebrovascular accident; NOS, not otherwise specified.

https://doi.org/10.1371/journal.pone.0282237.s004

(DOCX)

S5 Table. List of read codes used to identify headache symptoms in CPRD GOLD. C/O, complains of; NOS, not otherwise specified.

https://doi.org/10.1371/journal.pone.0282237.s005

(DOCX)

S6 Table. List of medical and read codes used to identify headache symptoms in CPRD Aurum. C/O, patient complains of; NOS, not otherwise specified.

https://doi.org/10.1371/journal.pone.0282237.s006

(DOCX)

S7 Table. List of codes used to define laboratory blood test type in CPRD GOLD.

https://doi.org/10.1371/journal.pone.0282237.s007

(DOCX)

S8 Table. List of codes used to define laboratory blood test type in CPRD Aurum.

https://doi.org/10.1371/journal.pone.0282237.s008

(DOCX)

S9 Table. List of codes used to define consultation type in CPRD GOLD.

https://doi.org/10.1371/journal.pone.0282237.s009

(DOCX)

S10 Table. List of codes used to define consultation type in CPRD Aurum.

https://doi.org/10.1371/journal.pone.0282237.s010

(DOCX)

S1 Fig.

Distribution of the following 10 parameters from CBC test results by headache group (data cleaned by removing the extreme values of blood test results): A. RBC count (1012/L), B. platelet count (109/L), C. MCV (fL), D. WBC count (109/L), E. neutrophil count (109/L), F. lymphocyte count (109/L), G. monocyte count (109/L), H. eosinophil count (109/L), I. basophil count (109/L), J. hemoglobin (g/dL). CBC, complete blood count; MCV, mean corpuscular volume; RBC, red blood cell; WBC, white blood cell.

https://doi.org/10.1371/journal.pone.0282237.s011

(DOCX)

S2 Fig.

Distribution by headache group of the following ratios: A. platelet/RBC, B. WBC/RBC, C. RBC/neutrophil, D. RBC/monocyte, E. monocyte/eosinophil, F. platelet/MCV, G. platelet/lymphocyte, H. platelet/eosinophil, I. MCV/WBC, J. MCV/neutrophil, K. neutrophil/lymphocyte, L. neutrophil/eosinophil, M. lymphocyte/monocyte, N. lymphocyte/eosinophil, O. hemoglobin/lymphocyte, P. hemoglobin/eosinophil, Q. hemoglobin/RBC, R. MCV/monocyte and S. MCV/hemoglobin. Solid vertical lines indicate the mean of the distribution. MCV, mean corpuscular volume; RBC, red blood cell; WBC, white blood cell.

https://doi.org/10.1371/journal.pone.0282237.s012

(DOCX)

S3 Fig.

A. Spearman’s correlation matrix and B. feature weight analysis for the features age group, sex and 10 variables from the CBC test, derived from the logistic regression model. CBC, complete blood count; MCV, mean corpuscular volume; RBC, red blood cell; WBC, white blood cell.

https://doi.org/10.1371/journal.pone.0282237.s013

(DOCX)

Acknowledgments

The authors gratefully thank Marie Stobbe (Roche Diagnostics) for project coordination and support, Iori Namekawa (F. Hoffmann-La Roche Ltd.) for data management, Martine Kallemeijn (Roche Diagnostics) for critical review of the manuscript, and Simon John Davidson (Freelance Hemostasis Consultant, London, UK) for his input into study design and conception. Third-party medical writing assistance, under the direction of the authors, was provided by Anna King, PhD and Elizabeth Hilsley, BSc, of Ashfield MedComms (Macclesfield, UK), an Inizio company.

References

  1. 1. Ahmed F. Headache disorders: differentiating and managing the common subtypes. Br J Pain. 2012;6: 124–132. pmid:26516483
  2. 2. Cerbo R, Villani V, Bruti G, Di Stani F, Mostardini C. Primary headache in emergency department: prevalence, clinical features and therapeutical approach. J Headache Pain. 2005;6: 287–289. pmid:16362689
  3. 3. Doretti A, Shestaritc I, Ungaro D, Lee JI, Lymperopoulos L, Kokoti L, et al. Headaches in the emergency department—a survey of patients’ characteristics, facts and needs. J Headache Pain. 2019;20: 100. pmid:31690261
  4. 4. Evans RW. Diagnostic testing for the evaluation of headaches. Neurol Clin. 1996;14: 1–26. pmid:8676838
  5. 5. Edlow JA, Panagos PD, Godwin SA, Thomas TL, Decker WW. Clinical policy: Critical issues in the evaluation and management of adult patients presenting to the emergency department with acute headache. Ann Emerg Med. 2008;52: 407–436. pmid:18809105
  6. 6. Bösner S, Hartel S, Diederich J, Baum E. Diagnosing headache in primary care: a qualitative study of GPs’ approaches. Br J Gen Pract. 2014;64: e532–e537. %J British Journal of General Practice pmid:25179066
  7. 7. Munoz-Ceron J, Marin-Careaga V, Peña L, Mutis J, Ortiz G. Headache at the emergency room: etiologies, diagnostic usefulness of the ICHD 3 criteria, red and green flags. PLoS One. 2019;14: e0208728. pmid:30615622
  8. 8. Knox J, Chuni C, Naqvi Z, Crawford P, Waring W. Presentations to an acute medical unit due to headache: a review of 306 consecutive cases. Acute Med. 2012;11: 144–149. pmid:22993744
  9. 9. International Headache Society. The international classification of headache disorders, 3rd edition. Cephalagia. 2018;38: 1–211. https://doi.org/10.1177/0333102417738202
  10. 10. Ridsdale L, Doherty J, McCrone P, Seed P, The Headache Innovation Evaluation Group. A new GP with special interest headache service: observational study. Br J Gen Pract. 2008;58: 478–483. https://doi.org/10.3399/bjgp08X319440%JBritishJournalofGeneralPractice
  11. 11. British Association for the Study of Headaches (BASH): Headache centres. Available from: https://www.bash.org.uk/about/headache-centres/ [accessed June 2022].
  12. 12. Moeller JJ, Kurniawan J, Gubitz GJ, Ross JA, Bhan V. Diagnostic accuracy of neurological problems in the emergency department. Can J Neurol Sci. 2008;35: 335–341. pmid:18714802
  13. 13. Holle D, Obermann M. The role of neuroimaging in the diagnosis of headache disorders. Ther Adv Neurol Disord. 2013;6: 369–374. pmid:24228072
  14. 14. Friedman BW, Lipton RB. Headache in the emergency department. Curr Pain Headache Rep. 2011;15: 302–307. pmid:21400252
  15. 15. Ma I, Guo M, Lau CK, Ramdas Z, Jackson R, Naugler C. Test volume data for 51 most commonly ordered laboratory tests in Calgary, Alberta, Canada. Data Brief. 2019;23: 103748. pmid:31372413
  16. 16. Kilic TY, Aksay E, Atilla OD, Sezik S, Camlar M. The diagnostic value of complete blood count parameters in patients with subarachnoid hemorrhage. Turk J Emerg Med. 2017;17: 128–131. pmid:29464214
  17. 17. Karabulut KU, Egercioglu TU, Uyar M, Ucar Y. The change of neutrophils/lymphocytes ratio in migraine attacks: a case-controlled study. Ann Med Surg. 2016;10: 52–56. pmid:27551404
  18. 18. Eryigit U, Altunayoglu Cakmak V, Sahin A, Tatli O, Pasli S, Gazioglu G, et al. The diagnostic value of the neutrophil-lymphocyte ratio in distinguishing between subarachnoid hemorrhage and migraine. Am J Emerg Med. 2017;35: 1276–1280. pmid:28366288
  19. 19. McGirt MJ, Mavropoulos JC, McGirt LY, Alexander MJ, Friedman AH, Laskowitz DT, et al. Leukocytosis as an independent risk factor for cerebral vasospasm following aneurysmal subarachnoid hemorrhage. J Neurosurg. 2003;98: 1222–1226. pmid:12816268
  20. 20. Parkinson D, Stephensen S. Leukocytosis and subarachnoid hemorrhage. Surg Neurol. 1984;21: 132–134. pmid:6701748
  21. 21. Kamisli O, Kamisli S, Kablan Y, Gonullu S, Ozcan C. The prognostic value of an increased mean platelet volume and platelet distribution width in the early phase of cerebral venous sinus thrombosis. Clin Appl Thromb Hemost. 2013;19: 29–32. pmid:22815317
  22. 22. Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, van Staa T, et al. Data resource profile: Clinical Practice Research Datalink (CPRD). Int J Epidemiol. 2015;44: 827–836. pmid:26050254
  23. 23. Wolf A, Dedman D, Campbell J, Booth H, Lunn D, Chapman J, et al. Data resource profile: Clinical Practice Research Datalink (CPRD) aurum. Int J Epidemiol. 2019;48: 1740–1740g. pmid:30859197
  24. 24. Medicines & Healthcare products Regulatory Agency. Clinical Practice Research Datalink [Internet]. National Institute for Health and Care Research. 2022 [cited December 7, 2022]. Available from: https://cprd.com/.
  25. 25. Medicines & Healthcare products Regulatory Agency. A machine learning approach to classify types of headache from laboratory blood test results [Internet]. National Institute for Health and Care Research. 2021 [cited December 7, 2022]. Available from: https://cprd.com/protocol/machine-learning-approach-classify-types-headache-laboratory-blood-test-results.
  26. 26. The Comprehensive R Archive Network. Scalable robust estimators with high breakdown point: package ’rrcov’. 2022. [cited June 2022] Ver: 1.7–2. Institute for Statistics and Mathematics; Vienna University, Vienna, Austria. Available from: https://cran.r-project.org/web/packages/rrcov/rrcov.pdf
  27. 27. Muthukrishnan R, Rohini R. LASSO: a feature selection technique in predictive modeling for machine learning. IEEE International Conference on Advances in Computer Applications (ICACA); 24 October 2016; Coimbator, India 2016. https://doi.org/10.1109/ICACA.2016.7887916
  28. 28. Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Machine Learning. 2002;46: 389–422. https://doi.org/10.1023/A:1012487302797
  29. 29. R Core Team. The R Project for Statistical Computing (version 4.1.3). 2022. [cited July 2022]. Available from: https://www.r-project.org/.
  30. 30. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12: 2825–2830.
  31. 31. Buitinck L, Louppe G, Blondel M, Pedregosa F, Mueller A, Grisel O, et al. API design for machine learning software: experiences from the scikit-learn project. European Conference on Machine Learning and Principles and Practices of Knowledge Discovery in Databases; September 23–27; Prague, Czech Republic 2013.
  32. 32. Hunter JD. Matplotlib: a 2D graphics environment. Comput Sci Eng. 2007;9: 90–95. https://doi.org/10.1109/MCSE.2007.55
  33. 33. Callaghan BC, Kerber KA, Pace RJ, Skolarus LE, Burke JF. Headaches and neuroimaging: hHigh utilization and costs despite guidelines. JAMA Intern Med. 2014;174: 819–821. https://doi.org/10.1001/jamainternmed.2014.173
  34. 34. Marcolini E, Hine J. Approach to the diagnosis and management of subarachnoid hemorrhage. West J Emerg Med. 2019;20: 203–211. pmid:30881537