Machine learning-based diagnosis for disseminated intravascular coagulation (DIC): Development, external validation, and comparison to scoring systems

The major challenge in the diagnosis of disseminated intravascular coagulation (DIC) comes from the lack of specific biomarkers, leading to developing composite scoring systems. DIC scores are simple and rapidly applicable. However, optimal fibrin-related markers and their cut-off values remain to be defined, requiring optimization for use. The aim of this study is to optimize the use of DIC-related parameters through machine learning (ML)-approach. Further, we evaluated whether this approach could provide a diagnostic value in DIC diagnosis. For this, 46 DIC-related parameters were investigated for both clinical findings and laboratory results. We retrospectively reviewed 656 DIC-suspected cases at an initial order for full DIC profile and labeled their evaluation results (Set 1; DIC, n = 228; non-DIC, n = 428). Several ML algorithms were tested, and an artificial neural network (ANN) model was established via independent training and testing using 32 selected parameters. This model was externally validated from a different hospital with 217 DIC-suspected cases (Set 2; DIC, n = 80; non-DIC, n = 137). The ANN model represented higher AUC values than the three scoring systems in both set 1 (ANN 0.981; ISTH 0.945; JMHW 0.943; and JAAM 0.928) and set 2 (AUC ANN 0.968; ISTH 0.946). Additionally, the relative importance of the 32 parameters was evaluated. Most parameters had contextual importance, however, their importance in ML-approach was different from the traditional scoring system. Our study demonstrates that ML could optimize the use of clinical parameters with robustness for DIC diagnosis. We believe that this approach could play a supportive role in physicians’ medical decision by integrated into electrical health record system. Further prospective validation is required to assess the clinical consequence of ML-approach and their clinical benefit.

Introduction Disseminated intravascular coagulation (DIC) is a life-threatening condition which arises as a secondary complication from a range of underlying conditions including sepsis, severe trauma, and advanced cancer [1]. The Scientific and Standardization Committee on DIC of the International Society on Thrombosis and Haemostasis (ISTH) define DIC as 'an acquired syndrome characterized by the intravascular activation of coagulation with a loss of localization arising from different causes.' [2] Despite this definition highlighting DIC's key features, the major challenge in the diagnosis of disseminated intravascular coagulation (DIC) comes from the lack of a single potent marker for DIC, leading to developing composite scoring systems, derived from underlying conditions and laboratory results [2][3][4].
The diagnostic criteria, widely used as a gold standard, is the ISTH criteria which consist of platelet (PLT) count, prothrombin time (PT), fibrinogen, and fibrin-related markers (e.g. Ddimer or fibrin degradation products; FDP) [2]. Although the ISTH criteria have been validated by various studies and the performance was shown to be satisfactory, several issues remain [5][6][7][8]. Particularly, determination of the optimal fibrin-related markers and individual laboratory cut-off values for moderate to strong increase have not yet been clearly defined [9][10][11][12]. Furthermore, the ISTH criteria's sensitivity is regarded by some to be lacking when compared to other scoring systems [13]. Two other well-established scoring systems are the Japanese Ministry of Health and Welfare's criteria (JMHW criteria) and the Japanese Association for Acute Medicine's criteria (JAAM criteria; Table 1) [4][5][6][7][8][9][10][11][12][13][14]. Those criteria have respective advantages and limitations depending on the underlying conditions, and numbers of refinements have been made [10,13,15].
Artificial intelligence (AI)-where computers mimic human intelligence through machine learning algorithms-has drawn media attention, and ubiquitous application of AI has grown in momentum across various fields. Similar trials have shown up in the medical field, particularly using clinical data and medical images [16][17][18][19]. Artificial neural network (ANN) resembles human neuronal connections by building a multi-layered network and can be trained to functionalize or categorize complex patterns [20,21]. There are two remarkable characteristics of this machine learning (ML)-approach: 1) non-linear pattern recognition and 2) improvement by learning. These features are not only ideal for considering various clinical conditions, but also for giving standardized results with wide extensibility. ANNs have demonstrated positive medical application in areas such as diagnosis of myocardial infarction, cancer, and diabetic retinopathy [22][23][24]. In this study, we demonstrated ML-approaches for DIC diagnosis and established an optimized ANN model which integrates both the clinical findings and the laboratory results.

Patients
This study was approved by the institutional review board and the ethics committee of Yonsei University Health System (Seoul, Korea; IRB 4-2016-0698). The current study used medical records and participating centers have waived by completing the questionnaires. All data was treated confidentially with anonymized numbers. Patients with full DIC profile were defined as cases with all laboratory results including complete blood count (CBC) with differential counts, global coagulation tests (PT, PT % activity, international normalized ratio [INR], activated partial thromboplastin time [aPTT], and thrombin time), fibrinogen, D-dimer, FDP, and anti-thrombin III (AT III) having been ordered on the same day (this order set defined as 'DIC profile'). Eligible cases had full DIC profile orders (n = 837) between April and October 2016 at a tertiary hospital (Severance Hospital, Seoul, Korea; Fig 1A). After excluding consecutive orders from the identical patients and outpatient clinic orders, patients with initial full DIC profile after admission (n = 769) were enrolled. Cases from pediatric patients, routine orders at admission, longterm hospitalized patients, or previous transfusion therapy were excluded (n = 113). Finally, the development set (set 1; n = 656) remained with DIC suspected cases requiring an evaluation of DIC. Because the cases were enrolled at initial evaluation point, no cases were previously treated with transfusion therapy (plasma product or cryoprecipitate) or AT III. A similar approach was used for the external validation set (set 2; n = 217) derived from data obtained from another tertiary hospital (Gangnam Severance Hospital, Seoul, Korea). Demographics and clinical characteristics of the patients are represented in Table 2. There were no cases having heparin-induced thrombocytopenia, thrombotic microangiopathy (TMA), or antiphospholipid syndrome.

Data collection and labeling DIC status
We retrospectively followed the timeline of physicians' diagnostic process. Therefore, the presence of clinical signs, symptoms, underlying DIC-related conditions and the full set of laboratory results was obtained at the same day of DIC profile (Tables 2 and 3, Text A in S1 File) (9). CBC was obtained from K 2 -EDTA tube using automated hematology analyzers (ADVIA 2120i; Siemens Healthcare Diagnostics, IL, USA) which provide commonly reported clinical parameters and additional research use only (RUO) parameters, such as large unstained cells (LUC; %), delta neutrophil index (DNI), and TMA score [25][26][27]. Global coagulation tests and fibrinrelated markers were performed using ACL-TOP 750 analyzer (Instrumentation Laboratory, Bedford, MA, USA), with the samples collected in 3.2% sodium citrate tubes. Noticeably, the external validation hospital used different automated hematologic analyzers (XN-9000 and CS-5100 system; Sysmex, Kobe, Japan) and had different DIC profile: protein C was included instead of FDP, and RUO parameters were not provided. In this reason, four parameters were excluded  In the training phase, the development set (n = 656) was randomly split into training and test sets in 80:20 ratio and hyperparameters were determined for an optimal modeling. All layers have 32 nodes with an input-layer and two-hidden layers. The relative importance of input features was calculated based on the 'Connection Weight' approach, after the ANN model was established. https://doi.org/10.1371/journal.pone.0195861.g001 Machine learning-based diagnosis for DIC in set 2 (Fig 2). To curate the DIC status (non-DIC: 0, DIC: 1), patient's medical record, clinical manifestation, and laboratory results were retrospectively reviewed by medical experts. Each case was carefully reviewed by two experts individually and the patient's evaluation result which occurred within a week was assigned comprehensively depending on the laboratory data change, clinical manifestation, clinical intervention, and final diagnosis. If a discrepancy occurred, the case was reviewed by another third expert and labeled after a consensus was reached.

Model development
Using the collected datasets, we tested several ML algorithms including logistic regression, linear regression, ridge regression, random forest, gradient boosting machine, deep learning, and ANN with DaVinci Labs (Solidware Inc., Seoul, Korea) which support an AI-based data analysis; the performance of ANN model was the best among the seven algorithms. In the training phase, the development set was randomly split into training and test sets in 80:20 ratio. Next, auto-tuning for hyper-parameters (e.g. number of hidden layers, epochs) with respect to the performance of the model on the test set was conducted. After several iterations of the autotuning and training processes, an ANN model (2 hidden layers, 10 epochs) to evaluate DIC status was established ( Fig 1B). Additionally, we calculated the relative importance of the input variables with 'Connection Weight' approach [28]. Briefly, the importance of variables is proportional to the sum of absolute values of products between weights of connections, by which this variable is propagated. We also conducted an external validation of the ANN model using set 2. As set 2 was missing four variables, the ANN model was re-trained without the four variables and evaluated.

Analysis
Performance of the ANN model was compared to the three scoring systems (ISTH, JMHW, and JAAM; Table 1). Sensitivity, specificity, positive and negative predictive value, and area under curve (AUC) values were calculated following each criterion. D-dimer was used as the fibrin-related marker for the ISTH criteria as both set 1 and set 2 had this parameter (cut-off values for the moderate to the strong increase were based on 25% and 75% quartiles of all Machine learning-based diagnosis for DIC patients in each hospital, respectively.) [9]. JMHW and JAAM scores could not be evaluated in set 2 as FDP was lacked in the DIC profile of set 2. Statistics software R version 3.4.3 was used for data analysis. Datasets were visualized using 'ComplexHeatmap' package [29]. Performance evaluation was achieved via receiver operating characteristic (ROC) curve analysis, and calculation of AUC using the 'pROC' package [30]. The cut-off value (0.501) for the ANN model was determined by the 'OptimalCutpoints' package using the Youden method [31]. Statistical analyses were performed by Student's t-test for parametric data and Mann-Whitney U test for non-parametric data. P values below 0.05 were considered as statistically significant.

Patient characteristics
We conducted a retrospective cross-sectional study of DIC-suspected patients at initial evaluation with full DIC profiles. All available cases with full DIC profiles were reviewed in two  different hospitals. After excluding cases from consecutive orders, pediatric patients, routine orders at admission or from long-term hospitalized patients, and patients with previous transfusion therapy, the development dataset was constructed from 656 patients with initial full DIC profiles. Among the 656 patients admitted to either general ward (n = 330) or intensive care unit (ICU; n = 326) in Set 1, 228 (34.8%) and 428 (65.2%) patients were labeled as DIC and non-DIC status, respectively ( Table 2). Univariate analysis showed no differences in age or gender between the two groups (DIC vs. non-DIC; in the parentheses). However, the proportion in ICU (68.0 vs. 40.0%, P < .001), Acute Physiology and Chronic Health Evaluation (APACHE) II scores (30.2 vs. 24.3, P < .001) which were only calculated for ICU patients [32], and 28-days mortalities (64.0 vs. 18.0%, P < .001) were higher in DIC group. Moreover, the proportion of patients showing organ failure (50.9 vs. 13.6%, P < .001) and systemic inflammatory response syndrome (SIRS; 77.2 vs. 34.1%, P < .001) was higher in the DIC group. Additionally, bleeding was more common in the DIC group (26.3 vs. 19.9%) although the difference was only significant at P = .072. Above clinical conditions showed similar results in Set 2.
We also investigated the DIC-related conditions, and sepsis/infection was the most common condition (70.6%) followed by solid cancer (38.2%). Sepsis/Infection (70.6 vs. 32.7%, P < .001), solid cancer (38.2 vs. 19.2%, P < .001), and hepatic failure (14.9 vs 4.0%, P < .001) were the underlying conditions positively correlated with the DIC group. While post major surgery status (14.9 vs 31.1%, P < .001) tended to be more prevalent in the non-DIC group, we believe that this was resulted by physicians' inclination ordering DIC profile after major surgery. Other associated conditions such as tissue damage, hematologic malignancy, obstetric complications, vascular abnormalities, and toxic or immunologic insult showed no significant result between the two groups, although the number of such cases was relatively small.
Most laboratory results showed a difference between the DIC and non-DIC groups. Global coagulation parameters, except thrombin time, showed different results (P < .001; Table 3). CBC components also presented different results, except for two RBC indices and some differential counts. The relative lower levels of RBC count and hemoglobin in the DIC group may be caused by the higher proportion of bleeding patients than the non-DIC group. We visualized the two datasets with the heat map (Fig 2) which enabled us to look over the landscapes of the data distributions. Sepsis/infection, SIRS, and ICU admission were more commonly observed in the DIC group. PT and PLT count showed reverse predisposition, as expected. The general patterns represented in the heat map confirmed the similar composition of the two data sets, while vascular abnormalities were more common in the validation set due to the vascular surgery center located at this hospital. Missing values were presented as blanks, and seven variables contained missing values. APACHE II scores were only calculated if a patient was admitted to ICU. PLT changes (%/24hr) were only available for patients with previous PLT count result. WBC differential counts, PLT distribution width (PDW), Mean PLT volume (MPV), and RUO parameters were not reported from hematologic analyzers in cases of severe thrombocytopenia or leukopenia.

Established model and variable importance
We first tested the ANN model with 46 investigated variables and gradually excluded negligible variables. The laboratory parameters with trivial impacts on the performance were excluded: mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC) and WBC counts. Because of the small number of the cases with trivial effects, the following clinical variables were also excluded: anticoagulant use, bleeding, thrombosis, hematologic malignancy, immunologic insult, hepatic failure, obstetric complication, tissue damage, and vascular abnormalities. Consequently, 32 representative variables were used in the developed ANN model including clinical signs and symptoms, underlying conditions, and laboratory parameters.
To provide an interpretable model for each clinical variable, we calculated the relative importance. It is noteworthy that statistical significance does not guarantee variable importance level in ANN, and vice versa. Recently, the 'Connection Weight' approach was reported to be an efficient method of identifying variable importance in ANN model [33]. Fig 3 shows the calculated importance using this approach in 32 clinical variables. Most parameters such as PLT count (8.74%), PLT changes (4.86%), D-dimer (4.13%), and FDP (3.96%) had contextual importance in accordance with DIC features, whereas fibrinogen level (2.43%) showed relatively low importance. PLT count and PLT changes ranked as the first and third important variable and these results were expectable owing to the evident statistical differences between the DIC and non-DIC groups (57 vs. 168 ×10 3 /μL, P < .001; -0.36 vs. -0.15, P < .001). To minimize inter-laboratory variations, three PT parameters (sec, INR, percent activity) were separately used in the model, because INR and PT % activity is a standardized value using normal pooled plasma. Although separately evaluated, PT parameters occupied a total of 8.94% of the entire importance level and also played a significant role. Interestingly, the importance of previously overlooked parameters was not negligible in the ANN model including PDW (variations in PLT size and shape), red cell distribution width (RDW; variations in RBC size and shape), and RUO parameters. Most parameters presented a contextual importance, however, their importance in the ML-approach was different from the traditional approach.

Performance
The performance of four methods was compared in terms of AUC values, sensitivity, specificity, and predictive values (Table 4). Among the four methods, the ANN model showed the best AUC value with P < .001 while the three DIC criteria presented no differences (Fig 4A). The AUC (95% confidence interval; CI) of the ANN model was 0.981 (0.973-0.989) and the three DIC criteria had AUC of 0.945 (0.929-0.962) for the ISTH, 0.943 (0.927-0.959) for the JMHW, and 0.928 (0.909-0.946) for the JAAM, respectively. The optimal cut-off value by Youden index for the ANN model was 0.501 with 89.9% (85.2-93.5) and 96.0% (93.7-97.7) of the sensitivity and specificity (95% CI), respectively. Additionally, the sensitivity and specificity of the three DIC criteria were 82.0% (76.4-86.8), 93.7% (91.0-95.8) for the ISTH, 91.2% (86.8-94.6), 84.3% (80.6-87.7) for the JMHW, and 94.7% (91.0-97.3), 79.0% (74.8-82.7) for the JAAM, respectively. All methods showed relatively lower performance than the previous prospective study in the ICU setting using the ISTH criteria (sensitivity 91%, specificity 97%) [5]. The difference may be attributed to the study design, the patient composition and ward setting, the hematologic analyzer, and/or variance in the expert opinion. Nevertheless, the ISTH criteria showed relatively low sensitivity and high specificity, while the JMHW showed relatively high sensitivity and low specificity, as reported [9]. Furthermore, we reviewed the performance in the external validation set (n = 217). The ANN model was re-trained without the four variables (FDP and RUO parameters) which were included in set 1, and the AUC value of this model without the four variables was 0.975 (0.966-0.984). Using this model, we tested set 2, and the AUCs were 0.968 (0.945-0.986) for the ANN model and 0.946 (0.916-0.976) for the ISTH (Fig 4B, Table 4). Both models showed slightly compromised results, while the ISTH remained with constant AUC with slightly skewed performance-low sensitivity and high specificity. The compromised AUC portion in the ANN model may primarily come from the different hospital setting, the different reference intervals, and/or the unstandardized measured values from analyzers. Nevertheless, the ANN model showed overall higher performance than the ISTH criteria. Because of the small numbers of cases in set 2, it was inevitable that the 95% CI overlapped with the performance of ISTH criteria.

Discussion
This study demonstrated ML-approach for DIC diagnosis to optimally integrate the DICrelated parameters. The established model enrolled 32 clinical parameters and the study shed light on the buried roles of overlooked clinical parameters in the scoring systems. However, the clinical implication of the enrolled variables remained uncertain to further investigation. We suggest that a number of additional cases with excluded variables should be obtained to precisely evaluate the role of anticoagulant use, bleeding, thrombosis, hematologic malignancy, immunologic insult, hepatic failure, obstetric complication, tissue damage, and vascular abnormalities in ML-approach. Therefore, the current ANN model may need to be further updated and validated with bigger data sets. Nevertheless, we believe that this approach may facilitate the diagnosis of DIC and the performance can be further improved by adding diverse training data and applying more advanced algorithms and parameters.
The major limitation of this study is DIC labeling procedure. Labeled results could be biased by the medical experts and by the limitation of retrospective approach. We employed supervised learning method which is generally used for classification and risk prediction in medicine [17]. In this approach, supervised labels determine the developed model. Although we labeled DIC status after expert agreements with careful medical record reviews, the labeled results can be incorrect or uncertain [34]. Moreover, we only enrolled the cases at the initial orders with varying elapsed time to diagnose, therefore consecutive monitoring of DIC profile was not possible except PLT changes. Because DIC is a rapid and dynamic change in blood vessels, an ML model reflecting consecutive changes of variable laboratory parameters would be developed in the future. These reasons may potentially play as limitations and could have affected the current model. Recently, several advancements in ML algorithms have been reported to overcome variations in human expert opinion. We expect that rapid advancement in ML algorithms may cover such issues in the future. ML cannot go beyond what's contained in data. Meaning that more powerful and specific tests are still required for DIC diagnosis. Some studies have shown the usefulness of several methods in diagnosing DIC such as thromboelastography, clot waveform analysis, damage-associated molecular patterns, histone-DNA complexes, and circulating histones [13]. Additional data from such potential assays may also improve the performance.
Developing an AI system that gives contextual rationale is another important issue in the medical application. ANN is occasionally described as a 'black box' as it provides little explanatory insight into the variables [28]. However, recent studies illuminated substantial part of this 'black box' with a range of approaches. In order to provide intuitive information on clinical parameters, we calculated the relative importance and the values were mostly circumstantial to DIC features; absolute PLT count and changes, fibrin-related markers, PT prolongation were also important features in the ANN model, whereas fibrinogen level had relatively low importance. Additionally, some overlooked laboratory parameters such as PDW, RDW, and RUO parameters operated considerably in the ANN machinery. As a result of DIC progression, increases in PDW and MPV may be caused by a morphological transformation of PLT activation and young PLT production by megakaryopoiesis [35] that may explain the supportive role of PDW (3.28%) and MPV (1.48%). Additionally, mechanical damage to RBC during DIC progression such as schistocyte production may explain the importance of RDW (3.03%) [36]. Furthermore, TMA score (4.43%), an RUO parameter originally developed for the detection of TMA and reported to be linked to thrombocytopenia associated multiple organ failure, was revealed to be a supportive classifier for DIC [26]. DNI is another RUO parameter reflecting immature granulocyte percentages in circulating blood and has been reported to be a useful marker for sepsis. Because DIC commonly associated with sepsis, DNI (2.49%) may relate to this proportion in the ANN model [27]. Although most of the ranks of variable importance were understandable, it was difficult to interpret the relationships of some variables such as eosinophil (3.82%), lymphocyte (3.81%), and monocyte (3.31%) percentages. Those CBC differential count parameters may be negatively related to neutrophilia which is primarily induced by infection or malignancy and frequently accompanied with DIC. Laboratory results vary among institutions even when the sample is identical. Because many laboratory parameters are required in this tool, standardization of their parameters remains problematic and must be addressed to reduce variation between institutions. Particularly, D-dimer assays, the salient variable in DIC evaluation, exploited various measuring principles with a lack of standardized calibrators and reporting units which lead to wide interlaboratory and inter-method variability (Table A in S1 File) [37]. We believe that the best way is to use a normalized value such as scaled values or z-score, however, it is not practically possible for all laboratory parameters and is remained to be solved. This standardization issue should be always considered in the ML approach using laboratory parameters.
In conclusion, our study demonstrates a novel strategy to optimize the DIC diagnostic process with DIC-related parameters using ML-approach. The results showed some improvement of the diagnostic power in the retrospective design and provided additional insights into the importance of the DIC-related parameters. We believe this approach could be implemented in electrical health record system as a clinical decision support system in the near future. However, further prospective validation is required to assess the relationship between the MLapproach and their clinical benefit.
Supporting information S1 File. Appendix A. Abbreviations, Text A. Description for clinical and laboratory variables used in this study, Table A Tsoy.