Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Deep VMD-attention network for arrhythmia signal classification based on Hodgkin-Huxley model and multi-objective crayfish optimization algorithm

  • Hang Zhao,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Visualization, Writing – original draft, Writing – review & editing

    Affiliation School of Physics and Optoelectronic Engineering, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China

  • Xiongfei Yin

    Roles Conceptualization, Funding acquisition, Supervision, Visualization, Writing – review & editing

    yinxiongfei888@163.com

    Affiliation School of Physics and Optoelectronic Engineering, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China

Abstract

Recent research for arrhythmia classification is increasingly based on AI-driven approaches, which are primarily grounded in ECG data, but often neglect the mathematical foundations of cardiac electrophysiology. A finite element model (FEM) of the human heart, grounded in the Hodgkin-Huxley (HH) model was established to simulate cardiac electrophysiology, and ECG signals from 200 representative points were acquired. Two types of arrhythmia characterized by significant anomalies in the variables of the HH model were simulated, and corresponding synthetic ECG signals were generated. A multi-objective optimization method based on non-dominated sorting was integrated into the crayfish optimization algorithm (MOCOA). To optimize the key parameters K and in variational mode decomposition (VMD), a MOCOA-VMD technique specifically tailored for ECG signal processing was developed. The Pareto optimal front was generated using MOCOA with the indicators of spectral kurtosis and KL divergence, by which the optimal intrinsic mode functions were obtained. A deep VMD-attention network based on MOCOA was developed for ECG signal classification. The ablation study evaluated the effectiveness of the proposed signal decomposition method and deep attention modules. The model based on MOCOA-VMD achieves the highest accuracy of 94.46%, outperforming models constructed using EEMD, VMD, CNN and LSTM modules. Bayesian optimization was employed to fine-tune the hyperparameters and further enhance the performance of the deep model, with the best accuracy of the deep attention model after TPE optimization reaching 96.11%. Moreover, the real-world MIT-BIH arrhythmia database was utilized for further validation to prove the robustness and generalizability of the proposed model. The proposed deep VMD-attention modeling and classification strategy has shown significant promise and may offer valuable inspiration for other signal processing fields as well.

Introduction

Cardiovascular disease (CVD) is a leading global cause of morbidity and mortality, responsible for more than 31% of worldwide deaths in 2020 [1]. Cardiac arrhythmias, defined as deviations from the normal heart rate or rhythm that cannot be explained by physiological factors [2], represent a significant subset of CVD. These arrhythmias can present as irregular heartbeats, excessively fast heart rates (tachycardia), or abnormally slow heart rates (bradycardia), which can be benign or indicate serious underlying heart conditions. Timely and precise identification of arrhythmias not only facilitates appropriate therapeutic interventions but also aids in preventing potential complications and improving patient outcomes [3].

The traditional rule-based approach to diagnosis often struggles with the complexity and heterogeneity of large-scale medical data, necessitating intensive analysis and substantial medical expertise to achieve reliable diagnostic outcomes. In recent years, the application of artificial intelligence (AI) in electrocardiogram (ECG) interpretation has become a transformative development within the cardiovascular domain. Leveraging vast repositories of digitized clinical ECG data, researchers have developed sophisticated AI models designed to identify various cardiac conditions, such as left ventricular dysfunction, asymptomatic atrial fibrillation, and hypertrophic cardiomyopathy. Additionally, these models can accurately predict demographic and phenotypic features, including age, gender, race, and other phenotypes [46]. The integration of AI into ECG analysis not only improves diagnostic precision but also enables timely and targeted interventions, ultimately leading to better patient care and improved health outcomes.

Recent advancements in artificial intelligence, particularly deep learning models such as convolutional neural networks (CNNs) and long short-term memory (LSTM) networks [79], have revolutionized the field of ECG interpretation [10]. These AI-driven approaches have demonstrated exceptional accuracy in detecting nuanced signals and patterns that are typically undetectable by human experts, elevating the ECG to a highly effective, non-invasive biomarker for cardiovascular diagnostics. Despite these achievements, the classification of arrhythmias remains a formidable challenge due to the diverse and complex etiologies underlying ECG signal abnormalities. These abnormalities can be attributed to a multitude of factors, including coronary heart disease, heart failure, cardiac pacing, hypertension, cardiomyopathy, heart valve disease, and electrolyte imbalances. The heterogeneity of these contributing factors significantly complicates the task of accurately classifying arrhythmias. Furthermore, the combination of variational mode decomposition (VMD) and machine learning is an emerging and promising approach for analyzing and modeling complex signals, particularly in time-series analysis and predictive modeling. [11] VMD is a signal decomposition technique that decomposes a signal into intrinsic mode functions (IMFs), while machine learning excels at learning complex patterns from data. Integrating VMD and machine learning can leverage the strengths of both methods to improve model performance and interpretability.

The majority of existing deep-learning techniques for arrhythmia classification are primarily based on ECG data, often neglecting the fundamental principles of cardiac electrophysiology [3]. This limitation underscores the need for a more integrated approach that combines the strengths of AI with a solid understanding of the underlying physiological mechanisms. In this context, this paper seeks to bridge this gap by incorporating the mathematical foundations of cardiac electrophysiology into AI-driven ECG analysis, thereby enhancing the accuracy and interpretability of arrhythmia classification.

This paper presents a framework for modeling mechanisms in arrhythmia signal classification, organized as follows. In Theoretical foundations for electrocardio signal modeling and processing section, the theoretical foundations for electrocardio signal modeling and processing were elaborated, including Hodgkin-Huxley model, finite element method and variational mode decomposition. The finite element model and various arrhythmia models induced by abnormalities of PDEs were established in Modeling of the human heart and arrhythmia electrocardiosignal acquisition section, and the typical synthetic signals of the models were prepared. In Variational Mode Decomposition based on Multi-objective COA section, the multi-objective crayfish optimization algorithmic variational mode decomposition was proposed and the optimal IMFs were obtained. The deep attention model for classification was constructed in Learning model establishment and optimization for classification section, the modules were validated by ablation studies and the hyperparameters were optimized by the Bayesian algorithm. Finally, the Conclusion section offered the discussions and concluding remarks.

Related works

As noted in the preceding section, a number of AI-based ECG classification methods have been presented in the literature.

Shu Lih Oh et al. [7] propose an automated system using a combination of CNN and LSTM for the diagnosis of normal sinus rhythm, left bundle branch block (LBBB), right bundle branch block (RBBB), atrial premature beats (APB) and premature ventricular contraction (PVC) on ECG signals. The model achieves an accuracy of 98.10%, sensitivity of 97.50% and specificity of 98.70% using a ten-fold cross validation strategy. A similar architecture of the deep model consisting of CNN and LSTM can be seen in [8], which achieved an average accuracy of 97.15%.

Sumanta Kuila et al. [12, 13] utilize leading algorithms such as K-nearest neighbor, artificial neural network, and support vector machine to classify different features of ECG signals. In their work, a novel classification algorithm is proposed based on ELM (Extreme Learning Machine) with Recurrent Neural Network (RNN) by using morphological filtering. The experimental results with the MIT-BIHdatabase, using hidden neurons of ELM with RNN, show an accuracy of 96.41%, sensitivity of 93.62% and specificity of 92.66%.

Sayli Siddhasanjay Aphale et al. [14] proposed a novel convolutional neural network named ArrhyNet for MIT-BIH arrhythmia classification. Low pass filter and baseline wander filter, along with the Synthetic Minority Over Sampling (SMOTE) technique are utilized to enhance the performance. The results indicate that the top-1 accuracy of the fiveclass classification system for the database used is 92.73%.

Sadegh Ilbeigipour et al. [15] developed and compared three decision trees, random forest, and logistic regression classifiers, respectively. The performance of the random forest classifier was much higher than that of the decision trees and logistic regression. The model was tested on the MIT-BIH database and achieved 88% accuracy for 3-class classifications.

Qi Meng et al. [16] propose a multi-database integration methodology and a heartbeat self-processing method to eliminate data differences. A 16-classification 5-layer fully connected neural network was built and verified based on the integrated three databases Hercules-3, achieving an accuracy rate of up to 98.67%. Hany El-Ghaish et al. [17] develop a deep learning framework called ECGTransForm by embedding a novel Bidirectional Transformer (BiTrans) mechanism, ensuring a robust spatial feature extraction across various granularities.

Currently, most AI-driven models for arrhythmia classification focus primarily on ECG data, ignoring the fundamentals of cardiac electrophysiology. As a result, this limitation highlights the need for a more integrated approach that combines AI’s strengths with a solid understanding of physiological mechanisms. A precise mathematical model of the ECG needs to be developed, and the accuracy of predictions made by deep models will also significantly improve. Using the mathematical foundations of cardiac electrophysiology, this paper aims to bridge the gap between electrophysiology and AI-driven ECG analysis.

Theoretical foundations for electrocardio signal modeling and processing

Modelling theory of cardiac electrophysiology

Biological systems exhibit numerous electrical activities in response to external stimuli, which are crucial for neuron-based application in information processing analysis. However, the impact of these electrical activities on the encoding and decoding technologies in neuroscience has not yet been fully understood or well established.

There are various neuron models developed to imitate certain biological neuron dynamics [1821], The Hodgkin-Huxley (HH) model is a mathematical model that describes how action potentials in neurons are initiated and propagated. This model was developed by Alan Hodgkin and Andrew Huxley in 1952 based on their experimental work on the giant axon of the squid [22]. The HH model is considered one of the most influential models in neuroscience due to its accuracy in describing the electrical properties of neurons and its ability to explain the underlying mechanisms of neuronal excitability. Subsequently, variants based on the HH model emerged, such as the FitzHugh-Nagumo (FHN) model [18, 23], a simplified version of the Hodgkin-Huxley model, which captures the essential features of excitability and action potential generation in neurons [24]. The FHN model reduces the complexity of the full Hodgkin-Huxley model while preserving the key dynamics of spike generation and propagation, which facilitates the analysis of fundamental action potential and excitability dynamics, broadening its applicability for studying excitable systems in a more accessible mathematical framework.

The Hodgkin-Huxley model is described by a set of four coupled nonlinear ordinary differential equations. The membrane potential equation is expressed by

(1)

Where V is the membrane potential, I is the external applied current, Cm is the membrane capacitance, , , are the maximum conductances for sodium (), potassium (), and leak currents, respectively. , EK, EL are the reversal potentials for sodium, potassium, and leak currents, respectively.

m, h, and n are gating variables representing the sodium activation, sodium inactivation and potassium activation channels, respectively, whose dynamics can be modeled by gating variable equations

(2)

where and are rate functions that describe the opening and closing rates of the ion channels, which are defined by

(3)

Finite element method for numeric solving

Developed in the early 1960s, the finite element method (FEM) has become one of the most widely used numerical analysis approaches in engineering. By simulating the physical behavior of objects and systems, engineers are able to optimize designs, ensure safety, and reduce development costs [25]. Consider a multidimensional steady-state heat conduction process as an example, which can be described by the Poisson equation with homogeneous boundary conditions

(4)

with domain . Here, u is the unknown function to be solved, and f is the known function (source term). The weak form of Eq 4 can be obtained by choosing a function v from a space U of smooth functions by performing the inner product of both sides with v, i.e.,

(5)

Assume that, in addition to having the necessary smoothness, the functions that are to be the solutions also satisfy the boundary conditions. The space U of test functions is of the form on . More specifically, let d = 2, the weak form of Eq 4 can be written as

(6)

To obtain a numerical method, it requires U to be finite-dimensional with basis . Then the approximate solution of Eq 4 can be represented as

(7)

where the coefficients cj are to be determined once a basis has been chosen for the approximation space U. By inserting into the weak form Eq 6, and selecting the basis functions of U as trial functions v, a system of equations is obtained

(8)

Which is known as the Ritz-Galerkin method and can be written in matrix form, , The expression of stiffness matrix A is

(9)

Theoretical basis of variational mode decomposition

Variational Mode Decomposition (VMD) [26] represents a recent advancement in signal processing, designed to decompose a multi-component signal f into a series of K quasi-orthogonal intrinsic mode functions (IMFs) uk. Each IMF exhibits specific sparsity characteristics and limited bandwidth, obtained through a non-recursive decomposition process. The variational formulation and the optimization-based approach make VMD a powerful and flexible tool for multi-component signal analysis and decomposition, with advantages over traditional data-driven methods like Empirical Mode Decomposition (EMD) [27], Ensemble Empirical Mode Decomposition (EEMD) [28], etc., which recursively decomposes a non-stationary signal into IMFs thus facing robustness and mode mixing problems [29].

The essential aspect of the VMD is solving a constrained variational formulation written as

(10)

Where and are shortening designations for the set of all modes and their center frequencies, respectively. is the partial derivative of the function at the time t, and is the unit impulse function. denotes the summation over all modes. In order to render the problem unconstrained, both a quadratic penalty term and a Lagrangian multiplier are considered, and the augmented Lagrangian is introduced as

(11)

Therefore, the original minimization problem Eq 10 can be solved by finding the saddle point of the augmented Lagrangian in a sequence of iterative sub-optimizations known as the alternating direction method of multipliers (ADMM) [30]. After pre-setting the decomposition mode number K and the quadratic penalty term , the decomposed mode and its associated center frequency , along with the Lagrangian multiplier , are initialized. Then the updating strategy of modes uk(t) for the k-th mode is

(12)

where the is the Fourier transform of the k-th mode, is the Fourier transform of the original signal, is the Fourier transform of the Lagrangian multiplier, and n is the iteration index. The center frequencies are updated to minimize the bandwidth of each mode by

(13)

The Lagrangian multiplier is updated to enforce the constraint that the sum of the modes equals the original signal by

(14)

where is the update parameter for the Lagrangian multiplier. The criterion of convergence is

(15)

where is typically set to 10−7, the value of k is constrained by the complexity of the signal and computational resources, normally less than 10. Finally, the K modes of the original signal are obtained.

The selection of the number of intrinsic mode functions K and the penalty term is paramount when implementing the variational mode decomposition method. These parameters significantly influence the decomposition process and the quality of the resulting intrinsic mode functions. K influences the granularity of the decomposition, while controls the smoothness of the resulting modes. Therefore, optimizing the number of intrinsic mode functions and the penalty term in Variational Mode Decomposition is crucial for achieving accurate and meaningful signal decomposition, which balances the need to capture all relevant signal components against the risk of introducing noise or artifacts. Various optimization methods, such as grid search, Bayesian optimization, and meta-heuristic optimization can be used to achieve the best decomposition results [3133].

Popular meta-heuristic optimization techniques are typically inspired by physical phenomena, species’ behaviors, biological evolution, or laws governing social operations [34, 35]. Many emergent behavioral patterns in organisms have optimized overtones due to the evolutionary process of natural selection and have been identified separately and used in nature-inspired algorithms. There are plenteous evolutionary algorithms and swarm algorithms built on the foundations of naturally occurring mechanisms, such as Genetic Algorithm (GA) [36], Particle Swarm Optimization (PSO) [37], Gray Wolf Optimization (GWO) [38], Gold Rush Optimizer (GRO) [39], Rime Optimization Algorithm (RIME) [40], Crayfish optimization algorithm (COA) [41], Beluga Whale Optimization (BWO) [42] and so on. Meta-heuristic optimization algorithms are playing an increasingly important role in optimization and decision-making problems in various engineering fields and will also be employed in this paper.

Attention scheme for sequencing signal feature extraction

The Transformer model [43] has proven a great success in sequencing signal feature extraction, making a significant contribution to natural language processing (NLP) tasks, and has become the foundation for many state-of-the-art deep network models, such as BERT [44], GPT [45] and so on. Effectively capturing long-range dependencies and relationships in the input sequence is made possible by the attention scheme, which is a key component and a powerful mechanism of the Transformer model. Instead of processing the entire input sequence uniformly, the attention mechanism enables the model to selectively focus on relevant parts of the input when generating the output.

The attention scheme in the Transformer model mainly consists of three matrix components, i.e., Query (Q), Key (K), and Value (V), which is linearly transformed by the input sequences. The dot product between Q and K is calculated and is divided by to prevent the result from being too large, where dk is the dimension of the Key vector. After implementing the Softmax operation to normalize the results, the probability distribution is obtained, which is finally multiplied by the matrix V to yield the weighted sum representation as the final output, as shown in Eq 16.

(16)

The Transformer model utilizes multiple attention heads, each with its own set of Query, Key, and Value matrices, which facilitate the model to capture different types of relationships and dependencies in the input sequence.

(17)

where the projections are parameter matrices and .

The position-wise feed-forward network performs two linear transformations with a ReLU activation function [46] that can be modeled as

(18)

where W1, W2 are the weights and b2 is the bias of linear transformations, which is the same across different positions but vary from layer to layer. Each layer of the Transformer Encoder module consists of two primary components, multi-head self-attention mechanisms and position-wise feed-forward networks. All the parts work together to interpret the input sequence into a high-level abstraction, prospering the transformer model a powerful architecture for dealing with sequential data.

Modeling of the human heart and arrhythmia electrocardiosignal acquisition

Finite element model establishment

The task of modeling the three-dimensional human heart is intricate and requires the integration of data from multiple imaging modalities, anatomical knowledge, and computational techniques. In specific, firstly, acquire high-resolution imaging from Magnetic Resonance Imaging (MRI) data, then improve image quality by applying cleaning and preprocessing algorithms to prepare the data for segmentation, and extract the heart structures from the imaging data. Finally, a 3D model is converted by the segmented 2D slices by stacking and interpolating between the slices, which can be rendered by visualization tools. In this paper, a full-scale, three-dimensional model of the human heart is established by MRI data, as shown in Fig 1(A).

thumbnail
Fig 1. Model establishment of the human heart.

(A) 3D geometry. (B) Finite element model. (C) Initial condition.

https://doi.org/10.1371/journal.pone.0321484.g001

The common values of the parameters in HH model are examined and determined by the following. , , , , , , .

For simulations, the typical initial condition of the membrane potential is set to , and the initial gating variables are usually set to their steady-state values at the resting potential calculated by

(19)

The potential distribution V is initialized by setting one quadrant of the heart at a constant, elevated potential,, while the rest remains at zero. The logical expression of the boundary condition can be written as

(20)

Where TRUE evaluates to 1 and FALSE to 0. d is set to 10−5, which is included in the expressions to shift the elevated potential slightly off the main axes.

The simulation platform COMSOL Multiphysics is adopted for partial differential equation (PDE) solving and result visualization, which allows for coupling various physical fields, including cardiac electrophysiology and mechanics. Due to a plethora of irregular and complex surfaces in the 3D model, the free tetrahedron mesh is employed as the partitioning strategy. The finite element mesh model that consists of 30245 elements across all domains is shown in Fig 1(B), and the initial condition of Eq 19 is visualized in Fig 1(C).

Through FEM simulation results, the temporal variation pattern of the action potential throughout the heart model region is revealed. Five representative points within the heart region are extracted for further analysis. As shown in Fig 2, the action potential signals at these representative points exhibit typical and homogeneous electrocardiographic characteristics. The different phases demonstrate the propagation of electrical signals through the heart tissues. For convenience, the original model is named model 0, and the electrocardiographic signals are identified as signal 0.

Various arrhythmias modelings induced by PDE parameters

According to the Hodgkin-Huxley model, it can be deduced that the parameters in partial differential Eq 1 and gating variable Eq 2 play essential roles in the propagation behaviors of electrocardiographic signals. Therefore, the key anomalies characterized by I and h in the mathematical model of HH are examined in detail. An exponentially attenuated sinusoidal function is well-designed for simulating the HH model’s abnormal dynamics caused by PDE variables, which is defined as

(21)

The function consists of two main components: an exponentially decaying term and a sinusoidal term with a frequency of 2. Fig 3 depicts the varying pattern of the function value over time, which exhibits a sinusoidal oscillation whose amplitude decays exponentially.

The variable I is a critical parameter representing the external current applied to the neuron. The underlying dynamics of action potential generation and propagation in cardiac cells can be understood by adjusting this parameter to simulate and study different excitability conditions. Here, consider multiplying I by the exponentially attenuated sinusoidal function , then the HH model given by Eq 1 transforms to

(22)

The arrhythmia model described by Eq 22 is denoted as model 1. After mesh generation and FEM calculation, the results of five representative points within the heart region are displayed in Fig 4, which are referred to as signal 1. The curves of action potential changing with time show a homogeneous pattern with different amplitudes and phases, which exhibit more fluctuated characteristics compared with signal 0.

The sodium inactivation gate h plays a critical role in regulating the availability of sodium channels during and after an action potential. By inactivating the channels during depolarization and allowing them to recover during repolarization, it ensures that action potentials are well-spaced and that the neuron can return to its resting state properly. The exponentially attenuated sinusoidal function is encapsulated in Eq 2 of the sodium inactivation variable h, which is regarded as the arrhythmia model 2 and can be represented by

(23)

Similarly, the arrhythmia model 2 represented by Eq 23 is modeled and calculated using the finite element method. The results for five representative points within the heart region are illustrated in Fig 5. The curves of action potential changing over time demonstrate a homogeneous pattern but show different characteristics compared with signal 0 and signal 1 in terms of amplitude.

In order to construct the final synthetic signals, the prototypical signals derived from the three models are collected and processed through periodization. Moreover, Gaussian white noise, with a range of signal-to-noise ratios (SNR), is introduced to the synthetic signals to enhance their realism and robustness in various analytical contexts. Fig 6 exhibits the comparison of the typical synthetic signals of the three models with a noise standard deviation of 5 and a mean of -65, which are prepared for the next signal decomposition and classification.

thumbnail
Fig 6. Different patterns of electrocardio signals with Gaussian noise.

X-axis represents time, Y-axis represents the magnitude of the membrane potential.

https://doi.org/10.1371/journal.pone.0321484.g006

Variational mode decomposition based on multi-objective COA

Crayfish optimization algorithm and comparison with competitive meta-heuristic algorithms

The Crayfish Optimization Algorithm (COA) represents a novel meta-heuristic optimization method inspired by the foraging behaviors and navigational strategies observed in crayfish [41]. Similar to other nature-inspired algorithms (such as Genetic Algorithms, Particle Swarm Optimization, and Ant Colony Optimization), COA aims to solve complex optimization problems by mimicking the natural processes observed in crayfish. The feeding amount of crayfish is influenced by temperature, with the ideal feeding range for crayfish being between 20°C and 30°C, and 25°C being the best temperature. Thus, COA defines temperature range from 20 to 35 °C

(24)

where temp denotes the ambient temperature of the crayfish’s location, and rand is the random scalar obtained from the uniform distribution of the interval (0,1). The mathematical model for crayfish intake is characterized by a Gaussian-like distribution

(25)

where denotes the temperature most suitable for crayfish, and C1 are parameters controlling the intake of crayfish at different temperatures.

The goal of the crayfish at the summer resort stage is to get to the cave, which stands for the ideal solution. By doing so, individuals get closer to the ideal solution and improves COA’s exploitation potential. The crayfish will enter the cave for summer resort by

(26)

where t, t  +  1 represent the current generation number and the next generation iteration number, respectively. C2 is a decreasing coefficient. The cave , where XG denotes the optimal position obtained so far by the number of iterations, and XL refers to the optimal position of the current population.

In the Competition stage, crayfish Xi engage in competition with each other and modify their positions in response to the position Xz of another crayfish. Adjusting the position expands the search range of COA. The crayfish compete for the cave through

(27)

where z represents the random individual of crayfish calculated by  +  1, N denoting the population size.

In the Foraging stage, crayfish use different feeding methods based on the size of their food Q

(28)

where C3 is the food factor, representing the largest food. fitnessi represents the fitness value of the i-th crayfish, and fitnessfood represents the fitness value of the food location. The crayfish will approach the food Q when it is the right size for eating. When Q is too large (), it indicates that there is a significant difference between the crayfish and the optimal solution. At this time, the crayfish will tear the food with the first claw and the mathematical equation is

(29)

The food obtained by crayfish is also related to the food intake, so the equation for foraging is as follows

(30)

where food Xfood represents the optimal solution, which is supposed to be reduced and brought closer to the food. When the size of the food is not too large (), the crayfish can simply move towards the food and eat directly

(31)

Through the foraging stage, COA will approach the optimal solution, improving the algorithm’s exploitation ability and allowing it to have excellent convergence capabilities.

A well-designed test function, named Ackley’s Function, is built and used to verify the performance of COA. Transformations such as shifts and rotations are introduced to increase the complexity of the function landscape. As a result, this function poses significant challenges for global optimization algorithms due to its numerous local minima and a single global minimum. The expression of the test function is

(32)

where is a vector of variables, d is the dimension of the search space. a, b, and c are constants, here a = 20, b = 0.2, and . e is the base of the natural logarithm.

The function landscape is displayed in Fig 7(A), where the range of x is [–32,32] in each dimension, and the colormap corresponds to the function value. Several state-of-the-art algorithms mentioned before, i.e., GWO, GRO, RIME, PSO, GA, and BWO were chosen as competitors, the convergence comparison of which is shown in Fig 7(B). After 1000 iterations, GA has the best precision followed by COA, whereas the other algorithms exhibit premature convergence and remain steady. However, it is evident that COA converges faster than GA, which falls into the local optimum in hundreds of iterations during optimization. In summary, COA shows the greatest potential among other algorithms and is worth further development for multi-objective optimization.

thumbnail
Fig 7. Comparison of different optimization algorithms.

(A) Schematic diagram of test function. (B) Convergence comparison.

https://doi.org/10.1371/journal.pone.0321484.g007

Multi-objective COA based on non-dominated sorting

The original Crayfish Optimization Algorithm (COA) is formulated for scalar objective optimization, where a single objective function is optimized. However, in many practical engineering applications, multi-objective optimization problems are prevalent. These problems involve the simultaneous optimization of multiple, often conflicting, objectives, which adds a layer of complexity not present in single-objective optimization. For example, solutions to a problem with two objectives are not straightforward, as improving one objective may lead to the deterioration of another. Therefore, it is imperative to enhance the COA to broaden its applicability for multi-objective optimization problems. Regarding this point, a Multi-objective COA (MOCOA) based on non-dominated sorting is proposed in this paper.

Non-dominated sorting is an essential concept in multi-objective optimization, particularly in evolutionary algorithms. It is used to classify solutions based on Pareto dominance, helping to identify the set of optimal solutions known as the Pareto front [47].

For the minimization optimization problem with a feasible region , a vector is dominant to vector (denoted by ) if and only if . That is to say, there is at least one uj which is smaller than whilst the remaining u’s are either smaller or equal to corresponding v’s. The solution vector is considered Pareto optimal (minimal) if no other solution can be found to dominate according to the definition of Pareto dominance, i,e., . A Pareto set is a set in the decision variable space consisting of all the Pareto optimal vectors . The Pareto front is a set of vectors of objective functions generated from the vectors of decision variables in the Pareto set .

Non-dominated sorting is a fundamental technique in multi-objective optimization to classify solutions based on Pareto dominance. Solutions are grouped into different fronts based on their dominance relationships, and each front is assigned a rank, which denotes the level of Pareto dominance it belongs to. For example, solutions in the same rank are equally non-dominated with respect to each other. Lower ranks (e.g., Rank 1) indicate solutions that are closer to the Pareto front, representing more optimal trade-offs among the objectives.

The concept of crowding degree is further introduced, which represents the density of the surrounding individuals. As the crowding degree decreases, the density of surrounding individuals increases. Consequently, individuals with greater crowding distance at the same rank are selected. The crowding degree is computed by

(33)

where and refer to the two adjacent fitness values of the i-th individual in the j-th rank, respectively. is the maximum fitness value, and is the minimum fitness value in the j-th rank.

The non-dominated sorting method in this paper can be mainly divided into two steps. The first step is to sort all individuals by using a fast non-dominated sorting algorithm. Then the crowding of the decision space and the object space (CDi,f) of the i-th individual are calculated, respectively. The i-th individuals in the same rank are rated based on the special crowding distance

(34)

where and denote the average crowding degree of the objective space and the decision space, respectively.

Step 1. Define the number of iterations T, population size N, dimension of design space dim, upper bound ub, and lower bound lb. For optimization target Fk with K different objective functions, initialize population Xi,j derived from the upper and lower bounds, respectively

(35)

Xi,j is calculated by , where lbj, ubj represent the lower bound and the upper bound of the j-th dimension, respectively. Every crayfish is a matrix with dimensions of 1 dim, where each column matrix represents a solution to a problem. Moreover, the ambient temperature of crayfish is determined by Eq 24 to induce different stages of COA.

Step 2. When temp > 30 and rand < 0.5, COA activates the summer resort stage, a new position is generated according to the cave position Xshade and crayfish position by Eq 26. When temp > 30 and , COA enters the competitive stage, in which two crayfish will compete for the cave, a new position is obtained based on the cave position (Xshade) and the positions (, ) of the two crayfish by Eq 27.

Step 3. When , COA proceeds to the foraging stage, the food intake p and the food size Q are specified by Eq 25 and 28, respectively. If , shred the food according to Eq 29, then update the position through Eq 30. Otherwise, the position of crayfish is updated by Eq 31.

Step 4: Evaluate the fitness of at step t  +  1 using the different objective functions, respectively, and merge the populations by non-dominated sorting. Specifically, the population at step t + 1 is expanded as , and then non-dominated sorting is conducted based on optimization fitness values F(X). The special crowding distance of each individual is further computed as the sort criterion for individuals at the same rank. Eventually, the population consisting of kN individuals is fully sorted, and only the top N individuals are retained as the updated population for the subsequent step.

Step 5: Determine whether to end the cycle based on whether the maximum number of iterations T has been reached. If the maximum number of iterations has not been reached, continue with Step 2. Otherwise, output the Pareto optimal front .

COA is remarkable in its simulations of the summer resort mechanism, competition mechanism, and foraging mechanism of crayfish, showing significant advantages over other meta-heuristic algorithms. The non-dominated sorting and crowding degree sorting algorithm is innovatively introduced in the MOCOA proposed in this paper, thereby enhancing its applicability and efficacy in resolving multi-objective optimization problems.

Optimal VMD based on MOCOA

MOCOA-VMD (optimized VMD based on MOCOA) is proposed to search for the best number of intrinsic mode functions K and the penalty term in Variational Mode Decomposition. In order to evaluate the effectiveness of VMD, the kurtosis and KL divergence are introduced as indicators of performance. The spectral kurtosis (SK) is a dimensionless time series statistic used to identify and quantify non-stationary changes in a signal that may reflect the random distribution of time series data. For a given frequency bin, SK measures the deviation of the power spectral density (PSD) from that expected for a Gaussian random process. A high SK level corresponds to a high level of nonstationary or non-Gaussian behavior [48, 49]. The Kurtogram is a graphical representation that shows the spectral kurtosis values of various frequency bands in a signal, and it is used to identify the frequency range where the signal has the highest kurtosis, which can indicate the presence of faults or abnormalities in the signal [50].

The spectral kurtosis is calculated by

(36)

where Xf is the Fourier transform of a signal at frequency f, and is the operator that computes the expectation of a series.

Fig 8 shows the Kurtograms of different signals. Among them, the optimal window length of signals are 32, 4,12, respectively, and their kurtograms exhibit similar but differentiated characteristics. The frequency bands with the highest kurtosis values occur at a frequency of around 20 Hz under the sampling frequency of 300 Hz, which indicates the presence of impulses or transients. Kurtosis effectively extracts transient signal changes, making it a suitable performance index.

thumbnail
Fig 8. Kurtograms of different signals.

(A) Kurtogram of signal 0. (B) Kurtogram of signal 1. (C) Kurtogram of signal 2. X-axis represents the frequency bands, Y-axis represents the resolution levels, color intensity represents the kurtosis value (higher intensity indicates higher kurtosis). Kurtograms exhibit similar impulses components but differentiated characteristics of the kurtosis values across three signals.

https://doi.org/10.1371/journal.pone.0321484.g008

Kullback–Leibler (KL) divergence quantifies how one probability distribution diverges from another and is a fundamental concept in the realms of information theory and statistics. The discrete form of KL divergence is determined as the index for measuring the similarity between different IMFs, which is defined as

(37)

where P(x) and Q(x) are the probabilities of x in the distribution of P (the true distribution) and Q (the approximate distribution), respectively.

A sensitivity analysis of K and in VMD was conducted to illustrate how parameter variations affect signal decomposition performance. As shown in Fig 9, the influence of K on the spectral kurtosis and KL divergence is detailed under a fixed value of parameter ; the same applies to under a fixed K = 6. The KL divergence presents a contrary trend as K and rise, and the values of spectral kurtosis fluctuate as K and increase, which reveals the coupling effect of the VMD parameters on the performance indexes.

thumbnail
Fig 9. Sensitivity analysis of parameter K and in VMD.

(A) Influence of K on the VMD performance. (B) Influence of on the VMD performance. X-axis represents the value of K or , left Y-axis represents spectral kurtosis, right Y-axis represents KL divergence. The parameters K and have significant and coupled impacts on decomposition performance.

https://doi.org/10.1371/journal.pone.0321484.g009

Sensitivity analysis indicates that K and have varying and coupled impacts on decomposition performance and may interact with each other. Therefore, further optimization is required. The optimization goal for MOCOA is to minimize the spectral kurtosis and KL divergence of the decomposed IMFs by VMD, with 2 design variables K and , to be optimized. Their ranges of design space are integer and real number , according to previous knowledge. The optimization problem can be written as

(38)

Specifically, the MOCOA configuration starts with an initial population of 300 based on the design space, and the maximum number of generations is set to 1000. The spectral kurtosis and KL divergence of each IMFs decomposed by VMD are calculated to evaluate the individual’s performance at each generation continuously. The crowding degree sorting algorithm is employed to rank the individuals in each generation, and the best individuals are selected based on the Pareto dominance relationship. The optimal solution is determined by the Pareto optimal front, which contains the best trade-off solutions between the two objectives. Fig 10 displays the Pareto optimal front after 1000 generations of evolution in the case of signal 0.

Since all the optimal points in the Pareto optimal front’s solution set are mutually non-dominant, any point in the front set could be the optimal candidate. The summary table of the key parameters and performance indices of the VMD is provided in Table 1. The optimal parameters K = 5 and are determined after subtly selecting, which minimize the spectral kurtosis and KL divergence of the decomposed IMFs. The optimal VMD results are then fed into the deep attention model for arrhythmia classification.

thumbnail
Table 1. Summary of key parameters and performance indices of VMD.

https://doi.org/10.1371/journal.pone.0321484.t001

Fig 11 shows the decomposition results of MOCOA-VMD compared with the original VMD with parameters K = 6, . It can be observed that the IMFs decomposed by MOCOA-VMD offers a balance between computational efficiency, decomposition stability, and interpretability, especially in IMF1, where the main component or trend of the signal is detected. The outcome validates the effectiveness of the proposed MOCOA-VMD method, which strikes a compromise between the necessity to capture all pertinent signal components and the potential for the introduction of noise or artifacts.

thumbnail
Fig 11. Comparison of original VMD and MOCOA-VMD results.

(A) Original VMD result. (B) Optimal VMD result based on MOCOA. IMF1’s MOCOA-VMD result demonstrates superior performance in capturing the main component of the signal compared to the original VMD.

https://doi.org/10.1371/journal.pone.0321484.g011

Learning model establishment and optimization for classification

Model establishment based on MOCOA-VMD attention scheme and ablation study

The classification of arrhythmias from electrocardiogram signals presents several significant challenges. These challenges often arise from the complex and often non-stationary nature of ECG signals, the substantial variability between patients, and the presence of noise and artifacts that can obscure relevant features. Facing these difficulties, a deep attention model based on MOCOA-VMD is established for arrhythmia signal classification, whose architecture is shown in Fig 12.

thumbnail
Fig 12. Deep attention network architecture for arrhythmia classification.

https://doi.org/10.1371/journal.pone.0321484.g012

During the data acquisition stage, the Hodgkin-Huxley model is simulated and solved using FEM to obtain the ECG signals based on the simulation results. Two different types of arrhythmia signals are artificially generated by introducing the abnormalities in the PDEs. In each case, 200 representative points are uniformly and randomly selected throughout the heart region. The signals from these points are coupled with Gaussian noise at different SNRs. As a result, the total data consists of 3000 signals with 3 different ECG patterns, which are divided into 90% for training and 10% for testing. In the process of signal decomposition, the signals are processed using the proposed MOCOA-VMD method, which employs non-dominated sorting by calculating Kf and DKL of IMFs and crowding degree sorting. The best decomposition results with the optimal parameters K and in VMD are achieved and the IMF signals are fed to the deep attention model. In deep attention construction, the multi-head encoder consists of matrices Q, K, and V computed by decomposed signals, and the matrices are handled by linear transformation, scaled dot-product attention, and concatenation operation. Then the fully connected layer is employed to fit the features and corresponding categories, and ultimately the softmax layer outputs classification results in the form of a probability distribution.

In order to verify the effectiveness of the proposed signal decomposition method and deep attention modules, an ablation study is implemented, which is a critical part of evaluating the performance and importance of different components in a deep learning model or system. Specifically, the various components in the model are systematically removed or altered to understand their contribution to the overall performance and which parts might be redundant or less important.

The ablation study is carried out to help optimize the modules, and the corresponding results are shown in Table 2, where the ensemble empirical mode decomposition (EEMD), original variational mode decomposition (VMD), and the convolutional neural network (CNN), long short-term memory (LSTM) are chosen as competitors for the decomposition method and feature extraction, respectively. Moreover, paired t-tests with 10 samples of each model are implemented, and p-values are calculated to validate whether the performance differences between model variants are statistically significant. The p-values reflect that the performance improvements observed in ablation studies are significant.

thumbnail
Table 2. Comparison of ablation study. The deep model which consists of MOCOA-VMD and attention scheme achieves the best accuracy.

https://doi.org/10.1371/journal.pone.0321484.t002

It can be concluded that the attention scheme shows the best performance, followed by LSTM, which performs slightly worse, while CNN ranks last in terms of feature extraction. The attention scheme and LSTM generally exhibit superior performance in signal processing tasks due to their ability to handle temporal dependencies, sequence length flexibility, memory, and contextual information, adaptability to non-stationary data. Besides, the mode decomposition operation significantly improves the model performance. The performance of the model based on MOCOA-VMD reaches the best accuracy of 94.46%, followed by VMD, and EEMD last, which verifies the superiority of the deep attention model based on MOCOA-VMD proposed in this paper. Since the experimental values are adopted for setting the hyperparameters in the established deep attention model, to further improve the performance for classification, Bayesian optimization for hyperparameters will be studied and implemented in the next step.

Bayesian optimization for hyperparameters

Bayesian optimization is a powerful technique that efficiently seeks an approximate optimal solution while minimizing the evaluation cost, which is especially advantageous for scenarios where the computation of fitness functions is resource-intensive or time-consuming [51]. Gaussian Process [52] (GP) is a commonly used non-parametric probabilistic surrogate model, widely applied in regression, classification, and many other domains that require inference of black-box function. Formally, a GP is specified by a mean function and a semi-positive definite covariance function (or kernel)

(39)

where is the mean function, is the covariance function, which defines the covariance between the values of f(x) and at two points x and .

Given training data , where  +  and is Gaussian noise, GP regression is trying to predict the function value at new points x*. The joint distribution of the observed values y and the function values at the test points f* is

(40)

where is the covariance matrix of the training points, is the covariance matrix between test points and training points, is the covariance matrix of the test points. The predictive distribution at the test points x* is Gaussian with mean and covariance

(41)

where the mean provides the predicted value, while represents the uncertainty.

The acquisition function, often denoted as , maps from the input space , the observation space , and the hyperparameter space to the real number space, i.e., . The function constructs the posterior distribution obtained from the observed data set D1:t and guides the selection of the next evaluation point xt + 1 by maximizing at time step t

(42)

One popular acquisition strategy is the Expected Improvement (EI), which is often used in conjunction with a Gaussian Process model. According to the Bayesian formula [53], EI at an acquisition point can be written as

(43)

Given the GP’s predictive mean and variance at a candidate point x and step t, the EI acquisition strategy based on the Gaussian process is specified as

(44)

where is the best value observed so far, and is the normal distribution probability density function.

The Tree-structured Parzen estimator [54] (TPE) models the objective function using two probability density functions

(45)

where represents the density formed by using the observations {x(i)} such that the corresponding loss was less than a threshold of , is the density generated by collecting the remaining observations.

By constructing , then  +  , so we have

(46)

finally

(47)

EI’s goal is to balance exploration and exploitation by taking into account both the predicted mean and uncertainty of the model. Consequently, to fully optimize point X, it is essential to maximize the function and minimize the function g(x). During each iteration, the TPE algorithm returns the candidate with the highest expected improvement.

In Bayesian optimization, batch size, learning rate, number of epochs, and momentum are determined as key parameters to optimize. Batch size defines the number of training samples used in one iteration of training a neural network, which calls for a compromise between computational efficiency and model convergence behavior. While trying to minimize the loss function, the learning rate controls the step size at each iteration. The number of epochs specifies how many times the learning algorithm will work through the entire training dataset, which directly impacts the model’s performance and generalization ability. Momentum is a technique used in the optimization process of training neural networks to accelerate the gradient vectors in the right directions, thus leading to faster convergence and helping to smooth out the gradients and avoid getting stuck in local minima.

Referring to the general experience of fine-tuning a deep model, the ranges of values for the key design parameters are determined as batch size , learning ate , epochs , momentum . The design space is composed of all the combinations of the above key parameters. The Gaussian Process surrogate model and TPE algorithm are put into operation for the deep attention models that demand high evaluation costs. The results of the top 10 performance combinations are shown in Fig 13. The best accuracy of 96.11% occurs with a combination of batch size 32, learning rate 0.00092, epochs 1070, and momentum 0.966, which is higher by 1.65% than the model before optimization.

thumbnail
Fig 13. Bayesian optimization results of key hyperparameters.

https://doi.org/10.1371/journal.pone.0321484.g013

The performance measurement tool confusion matrix is utilized to evaluate the accuracy of a classification model, as shown in Fig 14. It can be seen that the proposed model achieves excellent success in the classification of the different patterns of ECG signals, with accuracies up to 97.06% and 96.47% for signal 0 and signal 1, respectively, and slightly lower at 94.91% for signal 2. We speculate that it is possibly due to the varying characteristics modeled by PDEs for arrhythmia ECG patterns.

Model validation on MIT-BIH database

To assess the robustness and generalizability of the proposed model, we utilize the well-known MIT-BIH Arrhythmia Database for real-world ECG signal analysis. This database, provided by PhysioNet, includes annotated ECG recordings from 47 patients, totaling approximately 25 hours of recording time. Each record contains two lead ECG signals (MLII and V1) sampled at 360 Hz with 12-bit resolution. The comprehensive annotations facilitate the evaluation of our algorithm’s performance in detecting various arrhythmias.

Three different type of ECGs are chosen for model validation, i.e., 75052 samples of normal sinus beat (N), 7130 samples of premature ventricular contraction (V) and 2546 samples of premature atrial contraction (A). The dataset was divided into 80% for training and 20% for testing. The results of the performance validation conducted on the MIT-BIH Arrhythmia Database after 1000 epochs are shown in Fig 15. The training state curve in Fig 15(A) shows that the accuracy reaches 95.46% on the training set and 92.87% on the test set. The confusion matrix in Fig 15(B) reveals that the model performs less accurately on normal sinus beats compared to premature ventricular contractions and premature atrial contractions, which may necessitate further investigations in the future.

thumbnail
Fig 15. Performance validation on the MIT-BIH database.

(A) Training state. (B) Confusion matrix.

https://doi.org/10.1371/journal.pone.0321484.g015

The experiment results are compared with other related studies based on the MIT-BIH database, as shown in Table 3. It can be seen that the deep model proposed in this paper performs well but needs to be further improved in future studies. Integrating different deep modules, such as CNN, RNN and LSTM for feature extraction may be a promising direction.

In summary, the deep VMD-attention network based on the multi-objective crayfish optimization algorithm in this paper is highly effective for classification tasks and is believed to have the potential to be utilized in other signal processing and classification issues.

Conclusion

This paper proposes a novel deep attention model for arrhythmia signal processing based on multi-objective crayfish optimization algorithmic variational mode decomposition. The main conclusions are as follows.

  1. A finite element model of the human heart based on the Hodgkin-Huxley model was established for cardiac electrophysiology simulation, and the ECG signals were obtained from the FEM results of representative points. Two distinct types of arrhythmia, characterized by major anomalies of parameters I and h in the HH model, were examined in detail, and the corresponding ECG signals were also obtained through simulation. The typical synthetic signals from the three models with built-in Gaussian noise, were prepared for signal decomposition and classification.
  2. A variational mode decomposition technique for ECG signal processing based on a multi-objective crayfish optimization algorithm (MOCOA-VMD) was proposed. A multi-objective optimization method based on non-dominated sorting was incorporated into the crayfish optimization algorithm to search for the optimal K and in VMD processing, and the spectral kurtosis and KL divergence were determined as the indicators for decomposition. The Pareto optimal front was generated by MOCOA and the intrinsic mode functions of VMD with the best combination of K and were obtained.
  3. A deep VMD-attention model based on MOCOA was constructed for ECG signal classification. To verify the effectiveness of the proposed signal decomposition method and deep attention modules, an ablation study was implemented. The performance of the model based on MOCOA-VMD achieves the best accuracy of 94.46%, much higher than the model constructed by modules of EEMD, VMD, CNN and LSTM. Moreover, Bayesian optimization was carried out to fine-tune the hyperparameters batch size, learning rate, epochs, and momentum. The best accuracy of the deep attention model after TPE optimization reaches 96.11%, which is 1.65% higher than the original model. The real-world ECG signals from the MIT-BIH arrhythmia database were utilized for further validation, and the accuracy reached 95.46% on the training set and 92.87% on the test set, proving the robustness and generalizability of the proposed model.

In conclusion, the deep VMD-attention network based on the HH model and MOCOA has demonstrated significant success in the classification of ECG signals. The method proposed in this paper shows substantial potential for both mathematical modeling and practical applications. Looking forward, on the one hand, the VMD-attention network needs to be further optimized to improve performance, and the mathematical models for different arrhythmias’ electrophysiology are to be established. On the other hand, we anticipate expanding this ECG signal modeling and processing strategy to various other biological time-dependent and non-stationary signal processing domains, such as electroencephalogram (EEG) signals, respiratory signals, blood pressure signals, and electrooculography (EOG).

Acknowledgments

The authors would like to express gratitude to Shenghua Zhu for his professional suggestions about the 3D model establishment.

References

  1. 1. Virani SS, Alonso A, Benjamin EJ, Bittencourt MS, Callaway CW, Carson AP, et al. Heart disease and stroke statistics—2020 update: a report from the American Heart Association. Circulation. 2020;141(9):e139–e596.
  2. 2. Antzelevitch C, Burashnikov A. Overview of basic mechanisms of cardiac arrhythmia. Card Electrophysiol Clin. 2011;3(1):23–45. pmid:21892379
  3. 3. Ebrahimi Z, Loni M, Daneshtalab M, Gharehbaghi A. A review on deep learning methods for ECG arrhythmia classification. Expert Systems with Applications: X. 2020;7:100033.
  4. 4. Feeny AK, Chung MK, Madabhushi A, Attia ZI, Cikes M, Firouznia M, et al. Artificial intelligence and machine learning in arrhythmias and cardiac electrophysiology. Circ Arrhythm Electrophysiol. 2020;13(8):e007952. pmid:32628863
  5. 5. Liu J, Li Z, Jin Y, Liu Y, Liu C, Zhao L, et al. A review of arrhythmia detection based on electrocardiogram with artificial intelligence. Expert Rev Med Devices. 2022;19(7):549–60. pmid:35993248
  6. 6. Singhal S, Kumar M. A systematic review on artificial intelligence-based techniques for diagnosis of cardiovascular arrhythmia diseases: challenges and opportunities. Arch Computat Methods Eng. 2022;30(2):865–88.
  7. 7. Oh SL, Ng EYK, Tan RS, Acharya UR. Automated diagnosis of arrhythmia using combination of CNN and LSTM techniques with variable length heart beats. Comput Biol Med. 2018;102:278–87. pmid:29903630
  8. 8. Chen C, Hua Z, Zhang R, Liu G, Wen W. Automated arrhythmia classification based on a combination network of CNN and LSTM. Biomedical Signal Processing Control. 2020;57:101819.
  9. 9. Ullah A, Rehman SU, Tu S, Mehmood RM, Ehatisham-Ul-Haq M. A hybrid deep CNN model for abnormal arrhythmia detection based on cardiac ECG signal. Sensors (Basel). 2021;21(3):951. pmid:33535397
  10. 10. Siontis KC, Noseworthy PA, Attia ZI, Friedman PA. Artificial intelligence-enhanced electrocardiography in cardiovascular disease management. Nat Rev Cardiol. 2021;18(7):465–78. pmid:33526938
  11. 11. Pachori D, Tripathy RK, Jain TK. Detection of atrial fibrillation from ppg sensor data using variational mode decomposition. IEEE Sens Lett. 2024.
  12. 12. Kuila S, Dhanda N, Joardar S. Feature extraction of electrocardiogram signal using machine learning classification. IJECE. 2020;10(6):6598.
  13. 13. Kuila S, Dhanda N, Joardar S. ECG signal classification and arrhythmia detection using ELM-RNN. Multimed Tools Appl. 2022;81(18):25233–49.
  14. 14. Aphale SS, John E, Banerjee T. ArrhyNet: a high accuracy arrhythmia classification convolutional neural network. In: 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS). 2021. p. 453–7. https://doi.org/10.1109/mwscas47672.2021.9531841
  15. 15. Ilbeigipour S, Albadvi A, Akhondzadeh Noughabi E. Real-time heart arrhythmia detection using apache spark structured streaming. J Healthc Eng. 2021;2021:6624829. pmid:33968352
  16. 16. Qi M, Shao H, Shi N, Wang G, Lv Y. Arrhythmia classification detection based on multiple electrocardiograms databases. PLoS One. 2023;18(9):e0290995. pmid:37756278
  17. 17. El-Ghaish H, Eldele E. ECGTransForm: Empowering adaptive ECG arrhythmia classification framework with bidirectional transformer. Biomed Signal Process Control. 2024;89:105714.
  18. 18. Fitzhugh R. Impulses and physiological states in theoretical models of nerve membrane. Biophys J. 1961;1(6):445–66. pmid:19431309
  19. 19. Morris C, Lecar H. Voltage oscillations in the barnacle giant muscle fiber. Biophys J. 1981;35(1):193–213. pmid:7260316
  20. 20. Chay TR. Chaos in a three-variable model of an excitable cell. Physica D: Nonl Phenomena. 1985;16(2):233–42.
  21. 21. Panahi S, Jafari S, Khalaf AJM, Rajagopal K, Pham V, Alsaadi FE. Complete dynamical analysis of a neuron under magnetic flow effect. Chin J Phys. 2018;56(5):2254–64.
  22. 22. Anderson AE. Transistors in switching circuits. Bell Syst Tech J. 1952;31(6):1207–49.
  23. 23. Nagumo J, Arimoto S, Yoshizawa S. An active pulse transmission line simulating nerve axon. Proc IRE. 1962;50(10):2061–70.
  24. 24. Xu Q, Chen X, Chen B, Wu H, Li Z, Bao H. Dynamical analysis of an improved FitzHugh-Nagumo neuron model with multiplier-free implementation. Nonl Dyn. 2023;111(9):8737–49.
  25. 25. Reddy JN. Introduction to the finite element method. McGraw-Hill Education; 2019.
  26. 26. Dragomiretskiy K, Zosso D. Variational mode decomposition. IEEE Trans Signal Process. 2014;62(3):531–44.
  27. 27. Huang NE, Shen Z, Long SR, Wu MC, Shih HH, Zheng Q, et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc R Soc Lond A. 1998;454(1971):903–95.
  28. 28. Wu Z, Huang NE. Ensemble empirical mode decomposition: a noise-assisted data analysis method. Adv Adapt Data Anal. 2009;01(01):1–41.
  29. 29. Zeiler A, Faltermeier R, Keck IR, Tomé AM, Puntonet CG, Lang EW. Empirical mode decomposition - an introduction. In: The 2010 International Joint Conference on Neural Networks (IJCNN). 2010. p. 1–8.
  30. 30. Bertsekas DP. Constrained optimization and Lagrange multiplier methods. Academic Press; 2014.
  31. 31. Shahriari B, Swersky K, Wang Z, Adams RP, de Freitas N. Taking the human out of the loop: a review of Bayesian optimization. Proc IEEE. 2016;104(1):148–75.
  32. 32. Abdel-Basset M, Abdel-Fatah L, Sangaiah A. Metaheuristic algorithms: a comprehensive review. In: Computational intelligence for multimedia big data on the cloud with engineering applications. 2018. p. 185–231.
  33. 33. Wang Z, He G, Du W, Zhou J, Han X, Wang J, et al. Application of parameter optimized variational mode decomposition method in fault diagnosis of gearbox. IEEE Access. 2019;7:44871–82.
  34. 34. Slowik A, Kwasnicka H. Evolutionary algorithms and their applications to engineering problems. Neural Comput Appl. 2020;32(16):12363–79.
  35. 35. SS VC, HS A. Nature inspired meta heuristic algorithms for optimization problems. Computing. 2022;104(2):251–69.
  36. 36. Deb K, Pratap A, Agarwal S, Meyarivan T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Computat. 2002;6(2):182–97.
  37. 37. Kennedy J, Eberhart R. Particle swarm optimization. In: Proceedings of ICNN’95 - International Conference on Neural Networks, vol. 4. IEEE; 1995. p. 1942–8. https://doi.org/10.1109/icnn.1995.488968
  38. 38. Mirjalili S, Mirjalili SM, Lewis A. Grey wolf optimizer. Adv Eng Softw. 2014;69:46–61.
  39. 39. Zolf K. Gold rush optimizer: a new population-based metaheuristic algorithm. Oper Res Decis. 2023;33(1).
  40. 40. Su H, Zhao D, Heidari AA, Liu L, Zhang X, Mafarja M, et al. RIME: a physics-based optimization. Neurocomputing. 2023;532:183–214.
  41. 41. Jia H, Rao H, Wen C, Mirjalili S. Crayfish optimization algorithm. Artif Intell Rev. 2023;56(S2):1919–79.
  42. 42. Zhong C, Li G, Meng Z. Beluga whale optimization: a novel nature-inspired metaheuristic algorithm. Knowl-Based Syst. 2022;251:109215.
  43. 43. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, et al. Attention is all you need. Adv Neural Inf Process Syst. 2017;30.
  44. 44. Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers); 2019. p. 4171–86.
  45. 45. Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman F. Gpt-4 technical report. arXiv preprint. 2023. https://arxiv.org/abs/2303.08774
  46. 46. Sharma S, Sharma S, Athaiya A. Activation functions in neural networks. Towards Data Sci. 2017;6(12):310–6.
  47. 47. Deng W, Zhang X, Zhou Y, Liu Y, Zhou X, Chen H, et al. An enhanced fast non-dominated solution sorting genetic algorithm for multi-objective problems. Inf Sci. 2022;585:441–53.
  48. 48. Antoni J. The spectral kurtosis: a useful tool for characterising non-stationary signals. Mech Syst Signal Process. 2006;20(2):282–307.
  49. 49. Antoni J. Fast computation of the kurtogram for the detection of transient faults. Mech Syst Signal Process. 2007;21(1):108–24.
  50. 50. Udmale SS, Singh SK. A mechanical data analysis using kurtogram and extreme learning machine. Neural Comput Appl. 2019;32(8):3789–801.
  51. 51. Hutter F, Hoos HH, Leyton-Brown K. Sequential model-based optimization for general algorithm configuration. In: Learning and Intelligent Optimization: 5th International Conference, LION 5, Rome, Italy, January 17–21, 2011. Selected Papers 5. Springer; 2011. p. 507–23.
  52. 52. Damianou A, Lawrence N. Deep gaussian processes. Artificial intelligence and statistics. PMLR; 2013. p. 207–15.
  53. 53. Garnett R. Bayesian optimization. Cambridge University Press; 2023.
  54. 54. Bergstra J, Bardenet R, Bengio Y, Kégl B. Algorithms for hyper-parameter optimization. Adv Neural Inf Process Syst. 2011;24.