Single-modal and multi-modal false arrhythmia alarm reduction using attention-based convolutional and recurrent neural networks

This study proposes a deep learning model that effectively suppresses the false alarms in the intensive care units (ICUs) without ignoring the true alarms using single- and multi- modal biosignals. Most of the current work in the literature are either rule-based methods, requiring prior knowledge of arrhythmia analysis to build rules, or classical machine learning approaches, depending on hand-engineered features. In this work, we apply convolutional neural networks to automatically extract time-invariant features, an attention mechanism to put more emphasis on the important regions of the segmented input signal(s) that are more likely to contribute to an alarm, and long short-term memory units to capture the temporal information presented in the signal segments. We trained our method efficiently using a two-step training algorithm (i.e., pre-training and fine-tuning the proposed network) on the dataset provided by the PhysioNet computing in cardiology challenge 2015. The evaluation results demonstrate that the proposed method obtains better results compared to other existing algorithms for the false alarm reduction task in ICUs. The proposed method achieves a sensitivity of 93.88% and a specificity of 92.05% for the alarm classification, considering three different signals. In addition, our experiments for 5 separate alarm types leads significant results, where we just consider a single-lead ECG (e.g., a sensitivity of 90.71%, a specificity of 88.30%, an AUC of 89.51 for alarm type of Ventricular Tachycardia arrhythmia).


Introduction
The electrocardiogram (ECG) is a biomedical signal that includes information about the electrical activity of heart function and heart conditions over a period of time. Monitoring and interpretation of ECG signals serve the most useful tool for medical staff in ICUs to check the patients' heart condition such as arrhythmia, ventricular hypertrophy, and myocardial infarction, etc. Cardiac arrhythmias can cause serious and even potentially fatal symptoms if they a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Tachycardia (VTA), and Ventricular Flutter/Fibrillation (VFB). The performance of the proposed model is evaluated using the publicly available alarm dataset for ICUs provided by "Phy-sioNet computing in cardiology challenge 2015". The experimental results show the proposed method can significantly suppress the rate of false alarms in ICU equipment with respect to five mentioned life-threatening arrhythmias without suppressing true alarms. In the following, the main contributions of this work are summarized: • We present a multi-modal model that integrates three main signals of arterial blood pressure (ABP), photoplethysmograph (PPG) and ECG in order to enhance the accuracy of arrhythmia detection and reduce the false alarm rate in ICUs. A multi-modal approach that analyzes a set of independent sources/signals for alarm detection can significantly improve the alarm detection performance. The reason behind this idea is that each independent channel or source of data is inclined to distinct noise and/or artifacts, thereby a hidden pattern in a certain channel caused by noise and/or artifacts can be disclosed by other clean channels.
• We develop a network architecture for automatic feature extraction that utilizes a convolutional neural network (CNN) with two consecutive one-dimensional convolutional layers composed of different filter sizes, attention and long short-term memory (LSTM) units, and a classification layer. The CNN part extracts a vector of features from each segment of a single channel, while the attention and LSTM units are trained to identify the most effective parts of the segment in the detection and capture long-range of dependencies between segments of an input signal, respectively. Typically, some indicators appear in the signals as early as few hours before cardiac events [19][20][21]. Since considering the entire length of the signals is not necessarily feasible, an attention mechanism along with a memory-based approach can divide the signals into different partitions by putting a higher weight on the most important ones to save space/computation as well as enhance the accuracy.
• We apply two loss functions of Mean False Error (MFE) and Mean Squared False Error (MSFE) instead of using the common loss function in deep learning algorithms; Mean Squared Error (MSE), to reduce the effect of class unbalanced dataset on degrading the performance. This proposed loss function propagates the training error for a misclassified sample without considering its membership to the major or minor class.
In the next section (Methodology), we describe the proposed false arrhythmia alarm reduction method. Dataset section provides a description of the dataset used in this study. In Section Experimental Results, we present the experimental results and compare the performance of the proposed algorithm to other state-of-the-art algorithms, followed by the conclusion in Section Conclusion.

Methodology
We develop a deep learning model to classify the arrhythmias from the segments of three common physiological signals of ECG, ABP, and PPG signals based on a two-stage approach to further reduce the false alarm rate. In the first part, we develop three pre-trained networks to extract features of interest for the three biosignals separately, followed by a shallow neural network in the second part that uses the extracted features from the pre-trained nets to perform a classification task. At each time step, pre-trained networks extract features of their corresponding input signals, and then, the extracted features are averaged and fed to the fully-connect layer with the size of 256 neurons followed by a dropout block. Finally, a softmax layer is used to determine the probability of the input signal belonging to each class of interest (true or false alarm). Fig 1 shows an overall view of the proposed model for reducing false arrhythmia alarms in ICUs. It should be noted that the dropout block is frozen during the testing phase and is just used in the training phase. In the following sections, we describe the details of different parts of the proposed model.

Pre-processing
Prior to feature extraction and classification parts, the ECG, ABP, and PPG signals were subjected to normalization and segmentation steps. For the normalization step, the signals are normalized to a range of 0 to 1. The segmentation part is perfomed using a sliding 200-sample window with an overlap of 25% for all three signals separately. These segments are fed to their corresponding networks (i.e., ECG, ABP, and PPG NETs as shown in Fig 1) as the input sequences. It is worth mentioning that the pre-processing process does not include any noise removing and/or filtering steps to remove muscle artifacts and baseline wander.

The model architecture
The following subsections describe the main parts of the automatic feature extraction network. We train a feature extraction network for each of the three input signals separately. Convolutional neural network (CNN). We employ two consecutive 1D convolutional layers with different sizes of filters and a max-pooling layer following the first convolutional layer. The first convolutional layer is composed of 32 filters with a kernel size of 2 × 1 and a stride 1, and a Rectified Linear Unit (ReLU) layer. The second convolutional layer with larger sizes of filters has 64 filters with a kernel size of 2 × 1 and a stride 1, and a ReLU layer. The max-pooling layer has a pooling region of size 2 × 1 with a stride size of 2 × 1. At each time step, a sequence of a segmented signal (e.g., ECG, ABP or PPG) with the size of n is fed to the CNN to extract features of interest. The second CNN layer generates D feature maps of size L × 1 for each sample of the input signal, which is converted to L vectors of D-dimension as Single-modal and multi-modal false arrhythmia alarm reduction using deep learning PLOS ONE | https://doi.org/10.1371/journal.pone.0226990 January 10, 2020 Single-modal and multi-modal false arrhythmia alarm reduction using deep learning PLOS ONE | https://doi.org/10.1371/journal.pone.0226990 January 10, 2020 5 / 15 follows: Here, we have 64 feature maps with the sizes of 5 × 1 (see Fig 2). Attention and long short-term memory (LSTM) units. We use an attention unit to learn the most effective parts of the input signal that are responsible to trigger a specific alarm. The attention mechanism has also been used in previous biomedical signal processing studies such as [17,22] to improve the atrial fibrillation classification performance. In [22], they have placed attention modules after the LSTM units to have attentions on each 30s input segment. However, we put the attention units before LSTM units to focus on the segments parts (each segment is divided into fixed predefined parts (i.e., here, 10)) instead of input segments of the signal. The attention unit assigns a probability value to each part of the signal to specify its importance in the prediction process (e.g., predicting true or false alarm). For instance, as depicted in Fig 2, the attention unit assigns a probability value to each vector extracted from the input segment by the CNN. Finally, an expected value of the most effective regions of the input segments is generated using the probability values provided by the attention units (represented by the feature vector, C t ).
Fig 3 illustrates a systematic diagram of the attention unit utilized in our proposed model. The attention unit is fed by two inputs: (1) L feature vectors, C t,1 , C t,2 , . . ., C t,L , where each C t,i represents a different part of the input segment, and (2) A hidden state h t−1 , which is the internal state of the RNN at the previous time step, t − 1. Then, it calculates a vector, c t which is a weighted sum over feature slices, C t,i . With respect to the aforementioned assumptions, the attention mechanism can be formulated as: In the above equations, α t,i is the importance of part i of the input segment. f(.) is a softmax function that processes a vector of L real numbers as input, and normalizes them into probability values. At first, a vector consisted of a weighted sum over C t,i and h t−1 values is created and passed to the tanh function. Then, the softmax function normalizes the L values of the input vector and creates α t,i . In other words, each α t,i is considered as the amount of importance of the corresponding vector C t,i among L vectors in the input segment. Finally, the attention unit calculates c t , a weighted sum of all vectors C t,i with respect to α t,i s. Following the above process, the model attempts to learn to put more emphasis on the important regions of the input segment with higher probabilities that make to trigger an alarm (e.g., a false or true alarm) in ICUs.
In order to extract temporal information and capture long-range of dependencies between segments of the input signal, we employ a stack of two long short-term memory (LSTM) units with sizes of 256. The LSTM units are following the attention units and take c t+i values produced by the attention units and the previous hidden states of the LSTM units as inputs to generate the next hidden states. In other words, the LSTM unit takes c t , the output of attention unit at time t, and h t−1 , previous hidden state, to return the next hidden state h t . The new hidden states are fed to the attention units to produce the value of h t at the next step and also the fully-connected layer with a size of 256 (see Fig 2).
Classification layer. This layer specifies the label of the input signal (i.e., true or false alarm) and consists of a fully-connected layer followed by a softmax layer. The softmax layer assigns probabilities that the given input belongs to each of the class labels (i.e., true or false alarm classes). Note that this layer is removed while the model depicted in

Loss calculation
An important caveat in the false alarm reduction research is the class imbalance problem, meaning that the number of true alarms is much less than the false alarms. This problem causes to drop the performance of the applied method for the minor class. To tackle this problem, we examined two loss functions: mean false error (MFE) and mean squared false error (MSFE) [23,24] instead of the commonly used Mean Squared Error (MSE) in deep learning algorithms. These loss functions calculate the training error without considering the membership of the misclassified sample to the major or minor class. In other words, the MFE and MSFE methods capture the training error of the classes equally as opposed to the MSE method that is biased to the major class in a imbalanced dataset. The loss functions can be defined as follows: In the above equations, g i is the class label (e.g., true or false alarm), G i is the number of samples in the class g i , N is the number of available classes (in this study, we have two classes), and l(g i ) is the error calculated over the class g i .

Training algorithm
In order to effectively train the proposed model via back-propagation algorithm, we present a two-step training algorithm as illustrated in 1.
Step 1 (lines 1-9) involves extracting the features of interest for a specific input signal (i.e., for each of ECG, ABP, and PPG signals, separately). Then, pre-trained networks are used as feature extractors for their corresponding models including ECG, ABP, and PPG. In this step, in order to apply the pre-trained networks as feature extractors, only the output of the fully-connected layer in the classification layer is utilized to represent the given signal and the softmax layer is discarded (i.e., line 8). In step 2 (lines 10-16), the classification task is accomplished using the three signals as shown in Fig 1. It must be pointed out that the three pre-trained networks are frozen during training process and the second part of the model is trained to generate a label. Also, training the models in both steps are performed with the same hyper-parameters.
Algorithm 1 Two-step training algorithm for the proposed model

Dataset
We applied the publicly available alarm database for ICUs provided by PhysioNet computing in cardiology challenge 2015 [3,25]. It includes five types of life-threatening arrhythmia alarms: Asystole (ASY), Extreme Bradycardia (EBR), Extreme Tachycardia (ETC), Ventricular Tachycardia (VTA), and Ventricular Flutter/Fibrillation (VFB). The definition and visualization of each alarm are presented in Table 1 and in Fig 5, respectively. The training set includes 750 recordings and the test set includes 500 recordings. The test set has not been publicly available yet, therefore we use the training set for both test and training purposes. Each is recording composed of two ECG leads and one or more pulsatile waveforms (i.e., the photoplethysmogram (PPG) and/or arterial blood pressure (ABP) waveform). Fig 4 shows a sample of each type of the ECG, ABP and PPG signals. The signals were re-sampled to a resolution of 12 bit and frequency of 250 Hz and filtered by a finite impulse response (FIR) bandpass [0.05 to 40 Hz] and mains notch filters for denoising. The alarms were labeled with a team of expert to either 'true' or 'false'. Table 2 shows the statistics of the numbers of true and false alarms of each arrhythmia type in the training set.

Experimental results
The performance of the proposed model was evaluated using the PhysioNet challenge-2015 dataset. Since multi-modal prediction is based on the three signals of ECG, ABP and PPG, only 220 samples out of 750 recordings that include all these signals are used and for the single-modal method all samples are utilized. The PhysioNet challenge 2015 [25] have considered two main events: (i) real-time setting in which the information before the alarm onset can be used, and (ii) retrospective setting in which up to 30 seconds of data after the alarm can be

Alarm Type Definition
Asystole(ASY) There might not be heartbeats for more than 4s in the signal

Extreme Bradycardia (EBR)
The heart rate is less than 40 beats per minute (bpm)

Extreme Tachycardia (ETC)
The heart rate would be greater than 140 bpm for 17 consecutive beats Ventricular Tachycardia (VTA) A sequence of five or more ventricular beats with the heart rate greater than 100 bpm in the signal

Ventricular Flutter/ Fibrillation (VFB)
A rapid Fibrillatory, flutter, or oscillatory waveform for at least 4 seconds in the signal HR: Heart rate https://doi.org/10.1371/journal.pone.0226990.t001 used. In this study, we focus on the real-time setting where only information prior to occurring the alarm is used. As mentioned above, using all signals in the learning process makes the model take benefit of all available information and extract the correlation between different models. We used k-fold cross-validation approach to train and test the proposed model with a k size of 10 unless explicitly stated otherwise. Indeed, we divided the dataset into k = 10 folds. Then, for each fold of the 10 folds, one fold is used for evaluating the model and the remaining 9 folds are used to train the model. In the end, all evaluation results were concatenated. It is worth noting that the pre-training and fine-tuning steps were performed for each round of the cross-validation rounds. Both whole model and the three networks (ECG, ABP and PPG Nets) were trained with a maximum of 100 epochs and a mini-batch size of 10. The RMSProp optimizer was applied to minimize the l MFE loss with a learning rate parameter of α = 0.001. Two different regularization  Single-modal and multi-modal false arrhythmia alarm reduction using deep learning techniques were used to prevent the overfitting problem. First, the dropout layer with the probability of dropping of 0.5 (as shown in Fig 1). At every learning iteration, the dropout function chooses the some nodes randomly and deletes them along with their connections. Second, an additional L 2 regularization term with β = 0.001 was added to the loss function. This kind of regularization tries to punish the model parameters with large values. As a result, it prevents an unstable learning (i.e., the exploding gradient problem). Python programming language along with Google Tensorflow deep learning library were used to implement our model. Furthermore, a machine with 8 CPUs (Intel(R) Xeon(R) CPU @ 3.60 GHz), 32 GB memory and Ubuntu 16.04 was utilized to run the k-fold cross validation. The training time for each epoch was 98 seconds on average and the testing time for each batch of 20 EEG epochs was approximately 0.102 seconds. Different metrics were considered to assess the performance of the proposed model. These metrics include accuracy (ACC), sensitivity (SEN), specificity (SPE), precision (PRE), F1-score, and area under the ROC curve (AUC). We also report the PhysioNet Challenge 2015 score for our proposed method. It is defined as score = (TP + TN)/(TP + TN + FP + 5 × FN), where TP is true positives, FP is false positives, FN is false negatives, and TN is true negatives. All results are reported as an average over k-folds, where k can set to 5 and 10).

Results and discussion
The results in Table 3 represent the alarm classification (as true or false alarm) success for our proposed method against other methods in the literature while three signals (i.e., ECG II, ABP and PPG) are considered. It can be seen from the table that our model significantly outperforms other methods. We also experimented our single-modal (using just one single lead) approach to bold how outcome might be different. Table 3 demonstrates using the multimodal approach absolutely leads in better performance results compared to the single-modal one.
The results provided in Table 3 are for 220 samples of dataset with three available signals, aggregating all alarm types. We also evaluated our model with samples with just Ventricular Tachycardia alarm type. There were two main reasons that we selected this alarm type, (1) the number of samples for other life-threatening arrhythmia alarm types were too small, Asystole (34: 4 true and 30 false alarms), Extreme Bradycardia (30: 21 false and 9 true alarms), Extreme Tachycardia (15: 14 false and 1 true alarms), Ventricular-Flutter/Fibrillation (17: 12 false and 5 true alarms), and Ventricular Tachycardia (124: 106 false and 18 true alarms), (2) the Ventricular Tachycardia alarms are more difficult than other alarm types to detect [25]. Table 4 shows the performance of our proposed model for Ventricular Tachycardia alarm type using a single-lead signal and multi-lead signals. Our method achieves remarkable results for both the Single-modal and multi-modal false arrhythmia alarm reduction using deep learning multi-modal and the single-modal (ECG II) approaches, a sensitivity and a specificity of 93.75% and 93.92% for the single-modal technique, and a sensitivity and a specificity of 93.75% and 95.49% for the multi-modal technique. As shown in the table, our method outweighs the other method significantly. It also can be seen that using all available signals performs better compared to the single-lead signal. The reason behind this improvement is that the multi-modal approach has integrated information from three input signals that makes the model to give better performance.
We also investigated how our model behaves for all alarm types using single-lead ECG waveforms. Table 5 compares the performance (in terms of true positive rate (TPR or also called the sensitivity) true negative rate (TNR or also called specificity) and AUC) of various algorithms using different signals. As can be seen in Table 5, the proposed method performs better than the methods proposed by Lehman et al. [11] and Li et al. [26] on Ventricular Tachycardia (VTA) alarm. Furthermore, our method using single-lead ECG (ECG II) detects Extreme Bradycardia (EBR), Extreme Tachycardia (ETC) and Ventricular-Flutter/Fibrillation (VFB) alarms significaly better than other methods using two-lead ECG (Lehman et al. [11]) and all available signals, including ECG II, ECG V, ABP and PPG (Ansari et al. [27] and Gajowniczek et al. [10]). Moreover, as shown in Table 5, our proposed single-modal method leads to comparable results (in some cases, even better outcomes) for detecting Asystole (ASY) and Ventricular Tachycardia (VTA) arrhythmical alarm types compared to other listed algorithms that have utilized more than one signal. In addition, we note that here our remarkable results were obtained using a single-lead ECG (ECG II), however having more than one modal would leads to a improvement in performance results. In addition, we have tested our proposed  Physionet), in which 562 records contains VTA alarms method without employing attention mechanism into the network, and using MSE loss function. Table 6 presents the evaluation results with various metrics. As it can be seen from the table, our proposed method in which we consider the attention module and utilize the mean false alarm (MFE) loss achieved significantly better findings compared to the ones that do not employ attention mechanism and use the MSE loss function instead of MFE loss function. Furthermore, Table 7 reports the evaluation results of our single-modal proposed method with various metrics, including the challenge score provided by the PhysioNet Challenge 2015, using just the ECG II signal. This table can be used as a reference to compare future work.

Conclusion
False arrhythmia alarm reduction in ICUs is a challenging classification problem because of the presence of different sources of noise and artifacts in the data (i.e., the collected signals) as well as a large number of false alarms that results in the class imbalance problem. In this study, we proposed a deep learning-based network composed of the CNN layers, attention mechanism, and LSTM units to reduce false alarm arrhythmia in ICUs. We also utilized a new loss function to alleviate the effect of the class imbalance problem while training the model. Our proposed approach utilized a two-step training algorithm that trains the model for each modal (i.e., ECG, ABP, and PPG) to efficiently extract features, and then uses the combined features of each modal to classify the three-input signal to a true or false alarm (i.e., in a multi-modal way). Our proposed multi-and single-modal approaches demonstrated high performance for the suppression of false alarms without disregarding the true alarms compared to the existing algorithms in the literature.