Figures
Abstract
The medical image fusion is a critical application in medical diagnosis, where anatomical and functional information from different imaging modalities, e.g., Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) can be integrated. However, edge preservation, texture richness and structure consistency are a major challenge in complex fusion scenarios. This paper presents a novel multimodal medical image fusion technique based on the Contourlet Transform for multiscale directional decomposition and mean curvature filter for edge preservation. The proposed approach decomposes the source images into low- frequency and high-frequency components via a three-level Contourlet Transform. The low-frequency layers are fused via weighted averaging for brightness consistency, while the detail layers are processed by the mean curvature filter and then fused via maximum absolute selection to maintain edges and texture. The approach was evaluated against a variety of multimodal medical image datasets with consistent improvements against conventional methods such as Guided Filter Fusion (GFF), Laplacian Pyramid (LP), and Discrete Wavelet Transform (DWT). Experimental results showed average improvement of 19.4% in Spatial Frequency (SF), 17.6% in Average Gradient (AG), and 13.2% in Entropy (EN) over baseline methods. The results demonstrate that the method is useful for medical applications such as brain tumor localization, tissue differentiation, and surgery planning where high fidelity within fused imaging is critical.
Citation: Sharma S, Rani S, Dogra A, Shabaz M (2025) Structure-aware medical image fusion via mean curvature enhancement in the contourlet domain. PLoS One 20(9): e0332869. https://doi.org/10.1371/journal.pone.0332869
Editor: Lin Xu, Chengdu University of Traditional Chinese Medicine Wenjiang Campus: Chengdu University of Traditional Chinese Medicine, CHINA
Received: May 10, 2025; Accepted: September 6, 2025; Published: September 29, 2025
Copyright: © 2025 Sharma et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are located at Github: https://surl.lt/jcruxh.
Funding: The author(s) received no specific funding for this work.
Competing interests: No authors have competing interests.
1 Introduction
Medical imaging is the foundation of diagnosis, treatment planning and surgical guidance for modern healthcare. CT, MRI, Positron Emission Tomography (PET), Ultrasound (US) and Single-Photon Emission Computed Tomography (SPECT) are among the technologies providing different perspectives on anatomy, functional, and molecular behaviors of the human body. Each method is therefore limited in its ability to fully gather all significant information [1,2]. CT scans, for instance, have useful bone structure visualising detail but poor soft tissue contrast [3]. Conversely, MRI is ideal in soft tissue separation but fails to identify bone structures with extreme clarity [4]. Despite their insufficient anatomical detail, functional modalities like PET and SPECT are essential for monitoring metabolic activity. Consequently, clinicians may need to evaluate several images separately to obtain a comprehensive view, which can be challenging and prone to diagnostic uncertainty [5].
Medical image fusion has developed as a key method combining complementing features from several modalities into a single, more useful image to overcome these constraints. This approach guarantees that both anatomical and functional characteristics are preserved and shown concurrently, improves tissue contrast, and increases spatial resolution. By delivering richer visual representations, image fusion significantly supports decision-making, especially in scenarios such as tumor detection, neurodegenerative disease monitoring, vascular assessment, and image-guided interventions [2].
Traditionally, image fusion methods have fused data using statistical modeling, multi-scale transformations, and pixel-level operations. Although somewhat effective, these techniques usually experience issues in noise suppression, detail preservation and maintaining structural integrity across modalities with varied resolutions [3]. Recent advances in Artificial Intelligence (AI), particularly Deep Learning (DL), have transformed the domain of medical image fusion to solve these challenges. Data-driven models may now automatically learn spatial and semantic representations from large medical datasets, hence enabling more intelligent and adaptive fusion processes. Attention mechanisms, deep autoencoders, and Convolutional Neural Networks (CNNs) have all shown remarkable effectiveness in capturing hierarchical features ensuring the accuracy and consistency of the fused image [4].
Newer developments like diffusion models and Generative Adversarial Networks (GANs) have created new paradigms for image fusion in addition to more conventional DL architectures. These approaches offer efficient solutions for noise-aware reconstruction, simultaneous fusion and augmentation, and high-resolution fusion. These models now make it possible to combine tri-modal data, expanding the quality of fused outputs in medical practice [5].
Beyond the domain of DL, multimodal fusion is being more and more assisted by optimization-driven designs and invertible networks, which allow bidirectional mapping and lossless feature translation. These methods guarantee that fusion not only combines information but does so in a reversible and interpretable way, hence preserving the diagnostic value of source modalities [6]. Moreover, ML approaches are being used to develop disease-specific fusion strategies, where systems are trained to identify and preserve the most diagnostically relevant features for conditions such as cancer, cardiovascular disorders, and neurodegenerative diseases [7].
These advances are also accompanied by the expansion of smart image analysis systems where fusion is combined with downstream activities such as segmentation, classification and anomaly detection. Hierarchical multi-scale fusion networks have further strengthened the granularity of fused features, resulting in improved performance in classification and recognition tasks [8]. As a result, fused images are no longer confined to visual augmentation but are actively integrated into end-to-end diagnostic pipelines [9].
The path of medical image fusion research is toward explainability, real-time capabilities and personalization as one looks forward. Future systems are expected to be more context-aware, combining patient history, information, and feedback to dynamically drive the fusion process [10]. Additionally, fusion approaches will increasingly focus on adapting across imaging protocols, scanner types, and patient demographics, ensuring resilience and generalizability in real-world medical situations [11].
By means of the integration of thorough and diagnostically rich representations, medical image fusion helps to overcome the constraints of single imaging modalities. Medical image fusion is a revolutionary force in medical diagnostics with the inclusion of sophisticated DL, optimization algorithms and scalable structures, hence enabling more accurate, efficient, and individualized healthcare solutions.
The remainder of this paper is organized as follows: Sect 2 reviews related work, Sect 3 presents preliminaries, Sect 4 describes the preprocessing steps, Sect 5 details the proposed fusion methodology, Sect 6 introduces the evaluation metrics, Sect 7 outlines the experimental setup, Sect 8 reports and analyzes the results, Sect 9 presents the ablation study and Sect 10 concludes the paper with future research directions.
1.1 Motivation
Multimodal medical imaging, by merging structural and functional data from complementary modalities such as CT and MRI, is crucial for diagnosis. However, because of limitations in preserving directional attributes, edge details, and intricate anatomical textures, the effective integration of diverse modalities continues to pose challenges. Traditional wavelet-based methods lack the ability to capture geometrical structures effectively. The Nonsubsampled Contourlet Transform (NSCT), with its multiscale and multidirectional capabilities, has shown significant potential in medical image fusion. As demonstrated in earlier work, NSCT combined with advanced fusion rules such as spatial frequency or neural network models results in improved structural retention and contrast enhancement in fused outputs [12]. Motivated by these strengths, this study proposes a hybrid fusion framework that integrates Contourlet decomposition and Mean Curvature Filtering to generate informative and edge-preserving fused medical images.
1.2 Our contribution
The main findings of this study are described below:
- By combining a multi-resolution Contourlet Transform with mean curvature-based filtering, a new medical image fusion approach is suggested to improve anatomical detail and visual contrast.
- The approach utilizes three-level Contourlet decomposition to obtain low and high-frequency components to best extract intensity and texture information from the input modalities.
- These low frequency sub-bands are combined with a weighted average strategy whereas a mean curvature filter is applied carefully to the high-frequency details of the third level and fused using maximum absolute rule.
- When assessed on several medical imaging datasets, the suggested fusion technique shows greater performance, hence outperforming current fusion techniques in visual contrast in the combined images.
2 Related work
The following section discusses the work of various authors in the related field.
A DL-based approach for multimodal medical image fusion and categorization was proposed by Veeraiah et al. [13]. The objective was to enhance the accuracy of diagnosis by integrating fusion with disease classification. Despite the fact that the method was to utilize neural networks to extract and combine characteristics from multiple imaging modalities, the study was ultimately withdrawn due to methodological and repeatability concerns. This underscores the importance of comprehensive validation and open reporting in medical image fusion research. Robustness, dependability, and ethical research practice compliance should be the primary focus of future endeavors in this field.
Liu et al. [14] proposed a medical image fusion method based on convolutional neural networks. This approach learns feature mappings directly from the source modalities and demonstrated promising fusion quality in early deep learning research. However, its performance heavily depends on large datasets and network design parameters.
Agrawal et al. [15] introduced a simplified parameter-adaptive DCPCNN model. Their method balances complexity with performance by reducing the number of trainable parameters, making it more suitable for real-time or embedded fusion tasks.
Lin et al. [16] developed a multibranch, multiscale neural network utilizing semantic perception for the fusion of multimodal medical images. This method divides the fusion process into multiple branches across various scales to capture both fine and coarse features. A semantic module enables the network to focus on diagnostically relevant regions, hence improving the clarity and precision of the integrated image. This approach’s benefit is its preservation of structure and improvement of textural richness. The model is complex and may require for significant computing resources. Future studies might emphasize maximizing the design while ensuring the fusion’s quality is preserved.
Sinha et al. [17] suggested a multi-modal fusion method employing an improved dual-channel pulse-coupled neural network (PCNN). The method uses two parallel PCNNs to extract and synchronize features from the source images, hence improving both spatial and spectral information retention. The method reduces data loss and maintains edge information well. This model’s ability to fit the firing activity of neurons across all modalities is one of its key benefits. Conversely, PCNN-based methods can be costly in terms of computation and require exact parameter tuning. Future studies can focus on maximizing the dual-channel architecture for time-sensitive applications.
Zhang et al. [18] proposed a unified dictionary joint sparse model for the integration of medical images. This approach ensures the successful extraction and integration of common features by developing a shared language across modalities. It retains the essential information from both images while eliminating redundancy. The approach yields distinct textures and contours in the resultant composition. A disadvantage is that sparse coding may incur significant computational costs. Future advancements may involve the formulation of expedited sparse approximation methodologies or the integration of this model into deep learning processes.
Jie et al. [19] introduced a multi-modality fusion approach using fuzzy set theory and compensation dictionary learning. This method balances noise suppression with detailed feature preservation and offers flexibility across imaging domains. However, it is sensitive to dictionary construction and may require tuning for medical applications.
Tang et al. [20] introduced MATR, a transformer-based approach for the fusion of multimodal medical images. The method utilizes multiscale adaptive transformers to capture long-range dependencies and cross-modal interactions. The model improves structural alignment and maintains intricate textures by the use of attention processes at various sizes. The primary advantage of this method is its strong contextual modeling, which surpasses most traditional convolutional techniques. Nevertheless, transformer models generally necessitate prolonged training periods and considerable training datasets. Future study may concentrate on hybrid models or adaptations of lightweight transformers to enhance the efficiency of medical imaging.
Bavirisetti et al. [21] combined MRI and CT image using guided image filter and image statistics. The technique keeps important structure information from both source images by using guided filtering to keep the edges and statistical features to drive the fusion process. The main advantage about this method is that it keeps the clarity and contrast of the output, which is very important for medical analysis. But the fact that it depends on the accuracy of the statistical readings could be a problem because it can’t be used in all imaging situations. In the future, researchers may focus on finding the best ways to use the statistical selection process so that it works better with more datasets.
Bavirisetti et al. [22] suggested a two-scale image fusion technique for combining visible and infrared images using saliency detection. Saliency maps are used to guide the fusion of the base and detail layers generated by the approach, which ensures that key features are adequately represented in the final image. The advantage of this approach is that it prioritizes important image regions, focusing on human visual perception. On the other hand, its performance could suffer in low contrast environments when important components are not well defined. Trial results demonstrated that the method enhances visual targets and eliminates unnecessary information. Adding adaptive saliency models to the system could improve the results of future research.
Liu et al. [23] presented a thorough framework for image fusion that combines sparse representation and multi-scale transformations. This technique uses the sparsity of signal representations in transform domains, including wavelet and contourlet transforms, to enable effective feature extraction and integration. The main benefit of this method is that it can be applied to a wide range of image data formats, including multi-modal and multi-focus images. It reduces data loss and guarantees spatial consistency. However, the approach requires significant computer resources for transform domain operations and sparse coding. Across several datasets, their experimental results showed excellent fusion quality. Future research may focus on real-time applications and computing efficiency optimization efficiency and explore real-time implementations.
Zhu et al. [24] proposed a method for medical image fusion based on phase congruency and local Laplacian energy in the NSCT domain. The approach achieves high visual clarity by enhancing perceptual features and improving contrast but suffers from redundancy introduced by NSCT, which may impact scalability.
Kumar et al. [25] proposed method based on pixel significance using the cross bilateral filter for image fusion. By combining images utilizing the edge-preserving features and calculating the importance of each pixel, this method maintains sharp edges and reduces noise. The main advantage of this method is its ability to enhance structural features, particularly those close to object edges. However, the method may not scale well for high-resolution images or real-time applications because it operates at the pixel level. The results demonstrated that this approach effectively strikes a balance between detail retention and noise reduction. Future research might incorporate hierarchical or multi-resolution methods to improve performance and scalability.
Li et al. [26] devised a fusion technique based on guided filtering to enhance edge information and minimize artifacts in the fused image. Guided filtering offers rapid, edge-aware smoothing with minimal computational demand, making it suitable for high-quality fusion. This method’s notable advantage is its ability to handle many types of visual content with consistent performance. Conversely, it may be less effective in instances where input images exhibit significant modality discrepancies, resulting in the loss of complementary information. Their research confirmed that the method yields structurally coherent fused images. Future directions may prioritize adaptive filtering methods to more effectively address modality variance.
Kurban [27] proposed a lightweight and general-purpose fusion strategy called Gaussian of Differences (GoD). It simplifies the image fusion process by applying statistical Gaussian differences to identify and merge significant features, demonstrating competitive accuracy with high efficiency and low computational demand.
Song et al. [28] introduced D2-LRR, a dual-decomposed MDLatLRR-based fusion method. This technique shows strong performance in preserving local texture and enhancing salient features but can sometimes exaggerate contrast, making it prone to over-enhancement.
Kumar et al. [29] developed a fusion technique for multi-focus and multispectral images by integrating the importance of the pixels with the discrete cosine harmonic wavelet transform. The method emphasizes critical pixel regions and employs frequency domain decomposition to enhance focus and spectral fidelity. The primary advantage of this technology is its ability to preserve spectral and spatial information while minimizing noise. Its susceptibility to variations in illumination and ambient noise in the source images, however, may constitute a disadvantage. The results indicated improvements in both visual quality and quantitative fusion metrics. Future research may explore hybrid decomposition methodologies or adaptive thresholding to enhance robustness. Punjabi et al. [30] proposed a deep learning-based framework for the classification of Alzheimer’s disease by combining MRI and PET neuroimaging modalities using convolutional neural networks (CNN). The study showed that modality fusion significantly improved the classification performance compared to using single modalities. Specifically, the fused features extracted from CNN enhanced the model’s ability to distinguish between the normal control, mild cognitive impairment, and Alzheimer’s disease groups. Wu et al. [31] presented a fusion technique for infrared and visible images that combines nonsubsampled contourlet transform (NSCT) with a pulse-coupled neural network (PCNN). The method utilizes dual-channel NSCT to decompose source images and PCNN for coefficient fusion, achieving better detail preservation and contrast enhancement in the fused image. Although primarily aimed at general imaging, the fusion principles are applicable to medical scenarios involving thermal and visual information. Ogbuanya et al. [32] developed a hybrid optimization strategy for the fusion of multimodal medical images. The proposed approach integrated a hybrid of particle swarm optimization and artificial bee colony algorithms to optimize the fusion rules. The method demonstrated improved computational efficiency and fusion quality in various pairs of medical images, including magnetic resonance imaging, CT, and PET scans. Guo et al. [33] introduced a trimodality fusion technique to enhance target delineation in brain tumor radiotherapy. The study used MRI, CT, and PET images to create a complete representation of tumor regions. Their fusion method effectively combined anatomical and metabolic information, thus supporting a more accurate radiotherapy planning. Niroshana et al. [34] proposed a fused image-based method to detect obstructive sleep apnea using a single-lead ECG signal. The ECG was transformed into fused images that were then analyzed using a 2D CNN. The model achieved a high classification performance, indicating that image-based representation of bio signals can be effective for diagnosing sleep disorders. Jamil et al. [35] proposed a precancerous change detection method for mammography images that combines mean ratio and log ratio features with fuzzy c-means classification, using Gabor filters for texture enhancement. Their approach effectively detects subtle tissue changes indicative of early breast cancer, highlighting the role of hybrid feature extraction and clustering in medical imaging.
Javed et al. [36] presented a rare case study of Bing–Neel Syndrome, a central nervous system manifestation of Waldenström macroglobulinemia, which clinically mimicked giant cell arteritis. The report underscores the importance of accurate imaging interpretation and differential diagnosis in complex pathological presentations.
Abdullah et al. [37] introduced a joint learning framework for fake news detection that integrates multiple feature modalities within a unified model. Although developed for text-based misinformation detection, the architecture demonstrates the broader applicability of multi-feature fusion strategies, which can inspire cross-domain applications in medical image analysis.
Different multi-modal medical image fusion techniques have been attempted, ranging from DL and transformer models to traditional filtering and sparse coding. They all have their strengths, e.g., edge preservation, semantic highlight, or efficiency, and their trade-offs, e.g., high resource demands or image condition sensitivity. PCNN, guided filtering, and pixel-significance-based methods are robust in structural definition and visual enhancement.
A concise overview of commonly used fusion methods, including their datasets, key contributions, and limitations, is provided in Table 1.
3 Preliminaries
3.1 Contourlet transform
Contourlet Transform is a new form of representing images. It is designed to solve the problems of traditional wavelet transforms, which can not capture the direction and shape in images. Unlike wavelets that can only choose a few directions, the Contourlet Transform forms a more effective means of displaying smooth edges and lines through its capacity to decompose images at multiple directions and scales [38].
3.1.1 Double filter bank structure.
The Contourlet Transform consists of a two-stage filtering process: a multiscale decomposition using a Laplacian Pyramid (LP) followed by a directional decomposition using a Directional Filter Bank (DFB) as shown in Fig 1 [39]. The LP is used to capture the point discontinuities (i.e., edges) in the image and decompose it into low-pass and band-pass subbands. The DFB then links the point discontinuities into linear structures by grouping frequency content in specific directions.
Let f(x,y) be the original grayscale image. The LP decomposition is expressed as:
Here, h is a low-pass analysis filter, Li is the low-frequency approximation at level i, and Hi is the corresponding band-pass detail component.
3.1.2 Directional decomposition.
Each high-pass detail component Hi is further decomposed directionally using the DFB. This stage more precisely represents edge information in several directions by splitting the frequency spectrum into wedge-shaped directional subbands. One way to express the directional decomposition is as follows:
where Di,j is the directional subband at scale i and direction j, and 2k is the number of directional channels at that scale.
3.1.3 Reconstruction.
The reconstruction of the original image from Contourlet coefficients is achieved by first applying the inverse DFB on the directional subbands to reconstruct the band-pass component, followed by the inverse LP to synthesize the image across scales. The inverse LP can be written as:
The final reconstruction is given by:
The transform guarantees perfect reconstruction under ideal filter conditions, making it suitable for both analysis and synthesis tasks.
3.2 Curvature filter
Designed to enhance visual structures while reducing noise and variational artifacts, curvature filters are advanced edge-preserving smoothing techniques. These filters operate on the geometric properties of the image surface, regarding intensity values as a two-dimensional manifold lying within a higher-dimensional space. Emphasizing Total Variation (TV), Mean Curvature (MC), and Gaussian Curvature (GC) filters, this part describes the theoretical foundation and mathematical formulations of curvature filters.
3.2.1 Total Variation (TV) filtering.
Total variation filtering minimizes the overall variation in the image, promoting piecewise smoothness and edge preservation [40]. The total variation of an image u(x,y) is given by:
where denotes the image gradient and Ω is the image domain. The minimization of this functional leads to the well-known Rudin–Osher–Fatemi (ROF) model. The corresponding Euler–Lagrange equation governing the gradient descent evolution is:
This equation acts as a local curvature-based diffusion, smoothing the image in homogeneous regions while preserving sharp edges [41,42].
3.2.2 Gaussian Curvature (GC) filtering.
Gaussian curvature takes into account the intrinsic geometry of the surface formed by the image intensity. The Gaussian curvature at a point is the product of the principal curvatures k1 and k2, and for a 2D image u(x,y), it is expressed as:
where are first-order partial derivatives, and
are second-order partial derivatives of u. Gaussian curvature filtering introduces anisotropic smoothing by distinguishing between ridge and saddle-like features and is particularly effective for preserving fine structures [43].
3.2.3 Weighted mean curvature.
An enhanced version of MC filtering incorporates weights to adjust diffusion strength locally. The weighted mean curvature (WMC) is defined as:
where w(x,y) is a spatially varying function that controls the degree of smoothing based on local features [44].
These properties make curvature filters suitable for medical image fusion and other applications requiring structure-preserving enhancement.
3.3 Weighted averaging
Weighted averaging is a fundamental technique employed in pixel-level image fusion, particularly effective in the context of low-frequency component combination. It provides a straightforward yet reliable way to preserve the brightness and intensity consistency of the input images. This method is widely utilized due to its simplicity, computational efficiency, and ability to retain the overall visual structure of the source images.
Let I1(x,y) and I2(x,y) denote two spatially registered source images representing different imaging modalities, such as Magnetic Resonance Imaging (MRI) and Computed Tomography (CT). The fused image F(x,y) at pixel location (x,y) using weighted averaging is defined as:
where is the weighting coefficient assigned to I1(x,y), and
is the weight assigned to I2(x,y). A common choice for α is 0.5, which ensures equal contribution from both modalities in the absence of any prior preference. This static or fixed weighting strategy is particularly effective when the goal is to combine the overall intensity features of both input images without introducing bias.
In the context of medical image fusion, particularly in low-frequency subband fusion derived from multiscale transforms like wavelets or contourlets, weighted averaging is used to maintain smooth intensity transitions and global contrast. Although it is effective in preserving average luminance, it may underperform in retaining edge features or fine textures when the source modalities differ substantially in information content. Therefore, it is commonly complemented with more sophisticated rules for high-frequency fusion, such as maximum selection or sparse representations [45].
3.4 Maximum-absolute selection rule
The Maximum Absolute Selection Rule (MASR) is a commonly adopted decision-level fusion strategy used for combining high-frequency components of source images in multiscale transform domains. This method is especially suitable in medical image fusion where sharp anatomical details and edges must be preserved from multiple imaging modalities. As introduced by Prakash et al. [46], MASR has been effectively used in wavelet and pyramid-based decomposition frameworks, such as the Steerable Pyramid, to enhance the structural fidelity of fused outputs.
Mathematically, let and
represent the high-frequency coefficients of the
decomposition level obtained from two source images, say MRI and CT, using a multiscale directional transform. The fused high-frequency coefficient
at each pixel location (x,y) is computed using the maximum of the absolute values from the corresponding source coefficients:
This rule ensures that the dominant feature from either modality is selected based on its local intensity variation strength. The rationale is that high-frequency subbands primarily capture edges, contours, and fine structural details. By choosing the coefficient with the highest absolute magnitude, the fused image retains sharper transitions and prominent features, which are crucial for diagnostic interpretation.
Unlike averaging-based fusion rules, which may result in edge blurring or attenuation, the MASR method preserves high-frequency texture without dilution. It is particularly useful in medical contexts where either MRI or CT might better capture specific pathological or anatomical structures. The computational simplicity of MASR also makes it attractive for real-time or embedded imaging systems where fast processing is essential.
When integrated with multiscale frameworks like the Contourlet Transform, MASR enables efficient feature-level fusion across different directional subbands, offering enhanced spatial resolution and texture richness in the final fused image.
4 Preprocessing techniques
Effective preprocessing enhances the reliability of image fusion by preparing source modalities for integration. It ensures consistency in brightness, contrast, noise level, and structural visibility across multi-modal inputs, which is essential in medical imaging where variations in acquisition protocols and modality characteristics are common. The key preprocessing operations used to enhance fusion quality are summarized visually in Fig 2, illustrating their interconnected role in preparing multimodal images for effective integration.
Adaptive thresholding is a dynamic binarization approach where threshold values are computed based on the local characteristics of image regions. This technique is particularly useful in handling non-uniform lighting or varying tissue intensities, allowing more accurate edge definition and object extraction in complex medical images. It enables robust identification of salient features even in low-contrast conditions, and when combined with filtering, it improves spatial clarity and suppresses background interference [47,48].
Illumination correction addresses lighting inconsistencies that often arise from different imaging devices or acquisition parameters. Correcting uneven illumination enhances visual uniformity and prevents bias in fusion outputs. It is especially important when combining modalities like PET or SPECT with MRI, where lighting imbalance can distort integrated features. Preprocessing to normalize lighting conditions improves alignment and texture consistency across modalities [49].
Contrast normalization techniques, such as histogram equalization and intensity stretching, adjust the dynamic range of source images. These operations ensure that both modalities contribute equally to the fusion process by balancing intensity distributions. Without contrast normalization, dominant modalities may overshadow others, leading to information loss. Normalizing contrast across images supports fair feature extraction and fusion weight computation [50].
Noise reduction eliminates random fluctuations caused by electronic interference or low-dose acquisition. Techniques like Gaussian filtering, median filtering, and anisotropic diffusion smooth intensity values while preserving important anatomical edges. Removing noise from both modalities prior to fusion reduces the risk of false edge enhancement and improves metric stability, such as gradient and entropy calculations [51].
Edge enhancement highlights structural boundaries that are critical in medical interpretation. Preprocessing methods such as Laplacian sharpening or unsharp masking enhance high-frequency details, making edges more prominent in the fusion process. Enhanced edges contribute to better gradient computation, spatial frequency, and correlation retention in the final fused image, improving both subjective and quantitative quality [51].
Together, these preprocessing techniques form a robust foundation for high-quality fusion by improving structural consistency, enhancing informative content, and reducing imaging artifacts. Their integration into the fusion pipeline allows the method to generalize better across varied imaging conditions and datasets.
5 Proposed methodology
The proposed framework in Fig 3 combines Contourlet Transform-based multiscale decomposition with mean curvature-guided edge enhancement to generate a high-quality fused medical image. The framework is developed to maintain the soft tissue contrast of the MRI image and the structural details of the CT image and utilize multi-resolution analysis and geometric filtering. The overall framework consists of four important phases: multi-scale decomposition, low and high-frequency component fusion and image reconstruction.
5.1 Multiscale decomposition using contourlet transform
The input images are decomposed into low-frequency and high-frequency components using Contourlet Transform. The decomposition is conducted up to three levels (L = 3). In this work, a three-level Contourlet decomposition is employed, as it provides an optimal balance between detail preservation and computational efficiency. Excessive decomposition beyond three levels may introduce redundancy or degrade image quality due to over-smoothing, as observed in prior studies [52]. At each level i, the decomposition is performed as:
Here, denotes Gaussian smoothing with a large standard deviation
, and
represents downsampling by a factor of 2. This process isolates the texture and structural details (high-frequency components) and general intensity information (low-frequency components).
5.2 Low-frequency fusion
Low-frequency components ( and
) represent smooth regions and contrast. A weighted averaging strategy is used to combine them:
where denotes equal weighting. This ensures that both modalities contribute uniformly to the intensity structure of the fused image.
5.3 Curvature-based filtering
To improve fine details and suppress noise, third-level high-frequency sub-bands are filtered with the mean curvature filter. Out of various curvature-based filters, this work adopts the Mean Curvature (MC) filter due to its desirable trade-off between computational expense and effective edge enhancement. In comparison to Gaussian curvature, which concentrates on surface topology and may result in sudden jumps, MC offers a measure of average surface bending, hence offering smooth and isotropic regularization. The filter improves on complex structures without sacrificing significant edge transitions, making it particularly well-suited to filter high-frequency subbands. In addition, in comparison to total variation approaches, MC filtering prevents staircase artifacts and promotes improved continuity along anatomical borders. Its stability coupled with its capacity to denoise while retaining texture-rich details makes it particularly well-suited to medical image fusion tasks where diagnostic precision is paramount [53].
The mean curvature evolution is described by the partial differential equation:
where is the spatial gradient of the image,
ensures numerical stability (ε is a small constant) and
denotes the mean curvature of the level set and is computed as:
with being the components of the normalized gradient (i.e., the unit normal vector).
The equation is solved iteratively with a time-step for 15 iterations.
5.3.1 High-frequency fusion.
After curvature-based enhancement, the high-frequency components from each decomposition level are fused using a pixel-wise maximum absolute selection rule:
This approach ensures that sharper edges and salient structural features are retained from either MRI or CT at every level i of decomposition.
5.4 Reconstruction of the fused image
Once the fused low-frequency component (FL) and fused high-frequency components ( for
) are obtained, the fused image is reconstructed using inverse operations:
Here, denotes upsampling by a factor of 2, and the addition integrates details across scales in a coarse-to-fine manner.
The suggested strategy takes advantage of the inherent strength of multiscale transforms and geometric filtering. The method guarantees maximally sharp and contrasted filtered image by combining features across multiple frequency bands and curvature filtering on the final decomposition. Empirical research verifies that the hybrid solution greatly improves the diagnostic image resolution.
6 Evaluation metrics for image fusion
Medical image fusion is designed to merge complementary information from a collection of images into a single composite image. The quality of fusion techniques is evaluated by a variety of objective estimations. In this paper, the rigorous descriptions and mathematical expressions of commonly used measures such as API, SD, AG, Entropy, MIF, FS1, Corr, and SF are discussed.
6.1 Average Pixel Intensity (API)
API computes the mean intensity of the combined image F of size :
Here, M and N represent the dimensions (rows and columns) of the concatenated image, and F(i,j) represents the intensity of pixel at position (i,j). The greater the API value, the brighter the image, which is useful in medical imaging for better visibility and contrast.
6.2 Standard Deviation (SD)
SD quantifies the contrast and detail preservation in the fused image:
Here, is the average intensity of the fused image. The greater the value of SD, the greater the contrast and the greater the level of detail preservation, which is crucial in medical image analysis where fine details can be diagnostically significant.
6.3 Average Gradient (AG)
AG evaluates the accuracy and sharpness of the combined image:
In such a scenario, and
represent horizontal and vertical partial derivatives of image intensity, respectively. Greater AG value implies a better image with sharper edges necessary for enhancing critical anatomical features in medical imaging.
6.4 Entropy
Entropy evaluates the information content of the fused image:
Here, pk represents the probability of intensity level k in the image histogram. Entropy is a measure of randomness or information content; a higher entropy value suggests that more details have been preserved from the original images, leading to a richer, more informative fused image.
6.5 Mutual Information-based Fusion (MIF)
MIF assesses how much information the fused image retains from the source images A and B:
where mutual information is defined as:
In this equation, pF,A(i,j) represents the joint probability distribution of the fused image and source image A, while pF(i) and pA(j) are their marginal probability distributions. A higher MIF value indicates that the fusion process has effectively retained more information from the input images.
6.6 Fusion Symmetry 1 (FS1)
FS1 evaluates the symmetry in information retention between the two source images:
A value close to zero indicates balanced fusion, meaning the fusion process does not favor one input image over the other. This balance is essential to ensure that critical information from both images is preserved equally.
6.7 Correlation coefficient (Corr)
The correlation coefficient measures the similarity between the fused image and the source images:
In this equation, and
represent the mean intensities of the fused image and source image A, respectively. A higher correlation coefficient suggests a better preservation of structural similarity between the fused and original images, which is crucial for maintaining diagnostic integrity.
6.8 Spatial Frequency (SF)
SF measures the amount of detail in an image using row and column frequencies:
where row frequency (RF) and column frequency (CF) are defined as:
Here, RF represents the variations in intensity along rows (horizontal changes), and CF represents the variations along columns (vertical changes). A higher SF value implies greater sharpness and detail in the fused image, making it more useful for medical analysis.
These metrics provide a quantitative evaluation of medical image fusion performance, guiding the development and assessment of fusion algorithms. By understanding these metrics, researchers can optimize fusion techniques to maximize information retention, contrast enhancement, and overall image quality.
7 Experimental setup
This section describes the dataset used and implementation details. The experiments are designed to ensure fairness in comparison with state-of-the-art methods.
7.1 Dataset description
The dataset comprises a variety of brain images that have been gathered from various medical imaging modalities, such as MRI and CT, containing several contrast-weighting methods and viewpoints as illustrated in Fig 4. These images record intricate structural and pathological data pertinent to medical analysis and diagnosis. Each set of images represents a different view of brain anatomy, offering distinct emphasis on particular features like gray-white matter distinction, cerebrospinal fluid (CSF) level, vascular structures and potential pathology.
The source image 1(A) is a T2-weighted MRI brain scan, which has high fluid content sensitivity. In this imaging modality, CSF is bright, and there is good gray-white matter differentiation. This image is especially helpful in the detection of pathological changes like edema, demyelination, infarctions, and other neurological disorders. The key anatomical landmarks are well demonstrated, such as the cerebral hemispheres with gray matter that is darker than white matter based on its content of myelin. The ventricular system is well demonstrated by the lateral and third ventricles, containing bright CSF. The thalamus and basal ganglia are well seen, lying closer to the midline of the brain. In addition, the corpus callosum presents as dense white matter on the structure uniting the two hemispheres. The brainstem and cerebellum are revealed partly on the posterior aspect, with optic nerves running from the orbits to the optic chiasm. The scan provides good morphology and pathology assessment of the brain, with a focus on fluid-based contrast.
The second image 1(B) of this series is a CT brain scan, obtained in the axial projection using a bony window setting. CT scans are extremely useful in assessing for dense structures like bone and acute hemorrhage. In this scan, the skull and cranial bones are hyperdense (bright white), and there is good visualization of bony anatomy. The brain tissue is seen in shades of gray, with the distinction between gray and white matter being less pronounced than in MRI. This view is especially helpful in identifying fractures, calcifications, and hemorrhagic occurrences. The ventricular system can be seen, but less clearly than in MRI scans. No overt indication of acute hemorrhage, midline shift, or structural anomalies is seen. The definition of the bony structures renders this scan useful in trauma evaluation and neurosurgical planning.
Image 2(A) is a T1-weighted magnetic resonance imaging (MRI) brain scan, characterized by the dark color of cerebrospinal fluid (CSF) and increased contrast between gray and white matter. In comparison to T2-weighted images, T1-weighted imaging renders white matter brighter than gray matter, thereby offering improved anatomical detail. This imaging modality is especially helpful for the evaluation of brain morphology, assessment of structural integrity, and identification of abnormalities such as tumors or hemorrhages. In this scan, the ocular orbits and globes are well delineated, and the vitreous body is hyperintense (bright). The cortical folding pattern, including gyri and sulci, is well demarcated, and no cortical atrophy is significant. The basal ganglia, such as the caudate nucleus, putamen, and globus pallidus, are well visualized. The brainstem and midbrain are also partially visualized, and the clear demarcation between gray and white matter offers critical information for neurological assessments.
The 2(B) image is a non-contrast CT scan of the brain optimized for the assessment of acute pathological states. CT scans in this mode are very sensitive to the detection of hemorrhages, fractures, and calcification. The skull is hyperdense, giving a good outline of cranial anatomy. The brain tissue shows intermediate density, with gray-white matter differentiation less marked than on MRI scans. The orbits and optic nerves are seen, as well as the frontal and ethmoid sinuses, which are aerated and normal in appearance. Symmetry of brain structures indicates no appreciable midline shift, mass effect, or acute pathology. This scan is especially useful in emergency situations, where quick evaluation of traumatic injury is required.
3(A) is a further T2-weighted MRI scan, focused on fluid contrast and showing possible pathological change. In this scan, high signal intensity (bright) areas are cerebrospinal fluid (CSF) in the ventricles and sulci, and low signal intensity (dark) areas are white matter structures. This scan is particularily focused on fluid accumulation and possible edema of the left temporal lobe and is suggestive of a pathological condition like an infarct or infection. Important structures that are visible in this scan are the lateral ventricles, corpus callosum, basal ganglia, thalamus, and the cerebral cortex with its typical gyri and sulci. The good contrast between these structures makes this scan extremely useful for identifying abnormalities due to brain swelling, demyelination, or vascular disease.
The 3(B) is a post-contrast T1-weighted MRI, following administration of a contrast agent to maximize visualization of specific brain structures. In this projection, the CSF signal is dark, suppressed, and the signal of the white matter is brightened, with accentuation of vascular and structural detail. The ventricles are well delineated, and there is possible involvement of the corpus callosum and deep white matter. Post-contrast T1-weighted imaging is especially valuable to identify blood-brain barrier disruptions, tumors, inflammation, and lesions. The contrast enhancement facilitates better visualization of abnormalities that may not be as clear on non-contrast MRI.
The Image 4(A), is described as an MRI axial T1-weighted image that demonstrates a large vascular malformation located in the right parietal lobe. It distinctly visualizes serpiginous signal voids corresponding to abnormal vascular structures, along with surrounding edema.
The Image 4(B), is a Tc-99m HMPAO SPECT scan of the same brain region. This image highlights decreased radiotracer uptake specifically in the region of the malformation, offering functional information regarding cerebral perfusion patterns associated with the lesion.
Image 5(A) is an MRI axial T2-weighted image that clearly demonstrates a large tumor located in the left frontal lobe. The image also reveals prominent hyperintense surrounding edema, reflecting perilesional tissue reaction or fluid accumulation.
In contrast, Image 5(B) is an FDG PET image of the same brain region, which illustrates marked hypometabolism within the tumor, signifying reduced glucose uptake characteristic of certain tumor pathologies. The combination of these structural and functional imaging modalities provides complementary diagnostic information, making them suitable for validating the performance of the proposed image fusion technique.
The dataset consists of multiple medical imaging modalities that provide a comprehensive view of brain anatomy and pathology. The combination of T1-weighted MRI, T2-weighted MRI, post-contrast MRI, and CT scans ensures a broad spectrum of diagnostic capabilities, covering fluid differentiation, tissue contrast, bony anatomy, and pathological lesion detection. Each imaging modality contributes uniquely to the analysis of brain structures, facilitating in-depth medical research and assessments.
The dataset is publically available at: “https://github.com/dawachyophel/medical-fusion/tree/main/MyDataset”, “https://www.med.harvard.edu/aanlib/.”
7.2 Implementation details
The suggested fusion approach has been tested using MATLAB on a standalone GPU-enabled machine with 16GB RAM and Intel Core i5/i7 processor. MATLAB is a robust environment for image processing, multi-resolution analysis, and fusion techniques, thereby ensuring optimal computations and accurate results. Multi-scale transformations of the fusion approaches are the Laplacian Pyramid and Curvelet Transform, which are achieved through the CurveLab toolbox. Moreover, the system employs advanced fusion approaches such as maximum selection, averaging, and weighted fusion to process multi-modal medical images efficiently. This approach ensures high-quality fused images by extracting significant information from input modalities without sacrificing structural and textural integrity.
8 Experimental results and analysis
For quantitative and qualitative evaluation, the performance of the proposed fusion framework was compared against multiple state-of-the-art techniques. The baselines include traditional multi-scale transform methods such as Laplacian Pyramid (LP), Discrete Cosine Harmonic Wavelet Transform (DCHWT), and Non-Subsampled Contourlet Transform (NSCT). Comparative deep learning-based approaches include a Convolutional Neural Network (CNN)-based fusion model and Guided Filtering Fusion (GFF). Additionally, the Cross Bilateral Filter (CBF) was included as a spatial-domain benchmark. These models were selected due to their frequent adoption in medical image fusion literature and their distinct representation capabilities. All methods were implemented or reproduced under the same dataset conditions to ensure fairness in evaluation.
8.1 Subjective (qualitative) evaluation
The qualitative performance of the proposed fusion method is assessed against representative baseline techniques, including gradient and filter-based approaches (GFF, GFS, CBF), multiscale methods (LP, NSCT, DWT, DCHWT), learning-based models (CNN, D2-LRR), and fuzzy/PCNN variants. Evaluation focuses on structural fidelity of anatomical boundaries (skull, cortex, ventricles), contrast balance between modalities, suppression of artifacts, and preservation of modality-specific features such as PET or SPECT uptake. These visual trends correspond to the quantitative metrics reported in later sections, but the emphasis here is on the interpretability of medical structures.
In Fig 5, the proposed fusion achieves sharp skull contours and well-defined cortical and ventricular boundaries while maintaining the soft-tissue contrast of the T2 MRI. Competing methods such as CBF and GFF exhibit mild blurring at tissue interfaces, LP outputs appear under-contrasted, and DCHWT/CNN variants introduce slight artifacts or excessive bone enhancement that reduces soft-tissue visibility. The proposed method retains cerebrospinal fluid (CSF)–gray/white matter delineation, a feature essential for assessing edema and structural anomalies.
As shown in Fig 6, the fusion output preserves gyri and sulci definition as well as basal ganglia detail, with balanced intensity distribution between T1 MRI and CT. GFF and LP show reduced soft-tissue contrast, and certain learning-based methods introduce halo artifacts. Methods that inadequately combine low-frequency CT content with high-frequency MRI detail tend to compromise skull definition, whereas the proposed approach maintains both cortical folds and clear bone structure, which is critical in stroke and atrophy analysis.
In Fig 7, the proposed method provides clear lesion boundaries, reduced background noise, and uniform CSF preservation. LP and NSCT maintain structural features but lose contrast in deep-gray regions such as the thalamus, while DCHWT and CNN highlight enhancing tumor features yet disrupt intensity uniformity. The proposed result clearly distinguishes contrast-enhanced areas, peritumoral edema, and fine structures such as the corpus callosum, which is important for neuro-oncological assessment.
As depicted in Fig 8, the proposed approach preserves detailed MRI anatomy, including ventricle geometry, midbrain contours, and cortical outlines, while effectively overlaying SPECT functional uptake. CNN and CBF outputs often allow functional information to dominate the anatomical background, whereas GFF and LP under-integrate the SPECT data, leading to visually under-informative fusion results. The balanced integration in the proposed method is suitable for functional localization and perfusion asymmetry evaluation.
Fig 9 shows that the fused result maintains crisp cortical and skull boundaries from CT along with high-fidelity PET activity representation. CBF and DCHWT often produce PET-dominant images with blurred anatomical backgrounds, while CNN-based outputs may over-enhance hot regions and distort skull outlines. LP and GFF show difficulty balancing anatomy with functional signal levels. The proposed fusion clearly displays both tumor boundaries and active regions, facilitating oncological and surgical planning.
Across all datasets, the proposed method consistently delivers sharper edges, balanced contrast, and reduced artifacts compared to competing techniques. Structural information from anatomical modalities is preserved while functional or contrast-enhanced features are faithfully integrated, improving interpretability and diagnostic value.
8.2 Objective evaluation
Evaluating the effectiveness of medical image fusion techniques depends on objective evaluation. Unlike subjective assessments, which rely on human perception, objective measures provide consistent and quantifiable evaluations of the degree to which the fused image retains important features from the input modalities. In this study, eight widely accepted metrics have been used for quantitative analysis: API, SD, AG, EN, MIF, FS1, Corr, and SF. These metrics collectively assess the brightness, contrast, sharpness, information content, symmetry in information preservation, and structural similarity of the fused output.
For SET1 as shown in Table 2, the proposed method achieved an API of 49.75 and SD of 63.64, surpassing conventional methods such as LP and CBF, which recorded lower contrast spreads. The AG value of 10.25 and SF of 24.89 indicated sharp edges and strong texture preservation, outperforming GFF and CNN-based methods which showed softer edges. The Entropy of 4.53 and MIF of 3.65 were competitive, reflecting rich information transfer, while the QAB/F of 0.8203 was higher than LP and DCHWT, confirming better edge retention.
In SET2 as shown in Table 3, the API of 43.39 and SD of 61.39 demonstrated balanced intensity and contrast, with AG of 10.14 exceeding LP and NSCT methods. The Entropy of 5.87 was among the highest across methods, suggesting rich information fusion. The QAB/F of 0.8529 outperformed most conventional methods, especially LP and GFF, but was slightly lower than CNN-based fusion in edge-specific tasks.
For SET3 as shown in Table 5, the proposed approach recorded the highest SD (78.18) and AG (11.98) among all compared methods, indicating superior edge sharpness and contrast distribution. The SF of 30.07 also exceeded competing approaches, particularly LP and GFF. The Entropy (4.43) and MIF (3.94) were competitive, showing strong modality feature integration. The QAB/F of 0.8293 was slightly higher than GFF and LP but close to CNN-based fusion.
In SET4 as shown in Table 4, the fusion result achieved a QAB/F of 0.8341, exceeding CNN, LP, and DCHWT approaches, confirming effective anatomical and functional feature integration. Although the API (20.27) and SD (43.00) were moderate, this was expected due to the mixed nature of SPECT and MRI data. The AG (6.43) and SF (16.38) values outperformed LP and CBF but remained slightly lower than CNN-based fusion, which tends to over-enhance edges.
Finally, SET5 demonstrated in Table 6 depicts the highest SF (38.17) of all methods, confirming exceptional fine detail and structure retention. The SD of 81.19 was also the highest, highlighting a wide intensity spread. While the QAB/F score (0.7302) was slightly lower than CNN and GFF methods, the proposed approach maintained a strong balance between PET’s high-frequency metabolic details and CT’s structural information, avoiding the over-smoothing observed in LP and CBF.
Overall, the dataset-wise comparisons show that the proposed method consistently excels in SD, AG, and SF, confirming superior contrast and detail preservation, while maintaining competitive QAB/F values across diverse modality combinations.
8.3 Graphical analysis
The graphical analysis comprehensively visualizes the quantitative performance of the proposed fusion approach compared to multiple state-of-the-art techniques across all five datasets (SET1–SET5). Each graph illustrates the variation of key image quality metrics, enabling a multidimensional assessment of brightness, contrast, edge sharpness, informational content, and structural preservation.
In Graph 10, the proposed method performs consistently well across all metrics. It shows a balanced API, indicating moderate brightness that avoids both overexposure and underexposure, which is crucial for maintaining diagnostic clarity in fused outputs. The Standard Deviation (SD) value for the proposed method is among the highest, reflecting its strong ability to retain contrast variations across anatomical structures. The Average Gradient (AG) is notably elevated compared to conventional techniques, confirming enhanced edge preservation and visual sharpness. Furthermore, the Entropy and MIF values suggest that the fused images capture more complementary information from the source images without adding unnecessary noise. While FS1 remains moderately biased, the Correlation (Corr) value is high, indicating that structural similarity with the original inputs is well preserved. Lastly, the Spatial Frequency (SF) for the proposed method significantly surpasses most techniques, validating its efficacy in texture and detail preservation.
In Graph 11, a similar trend is observed where the proposed method delivers competitive or superior results across most metrics. Its API is slightly lower than in SET1, implying controlled brightness which is suitable for T1-weighted and CT fusion scenarios where intensity disparities need balance. The SD value is again high, suggesting effective contrast retention. The AG and SF scores confirm that the method enhances anatomical boundaries and texture richness better than most baseline approaches. The method also achieves favorable Entropy and MIF values, which indicate successful fusion of diverse modality features. The Correlation metric remains stable and high, reinforcing structural coherence, and the FS1 score shows acceptable symmetry in source contribution.
In Graph 12, which involves more complex modality combinations like T2 and post-contrast MRI, the proposed method demonstrates exceptional performance. It achieves one of the highest SD and SF scores in this dataset, confirming its robustness in enhancing contrast and preserving intricate texture features, which are essential for detecting soft-tissue abnormalities. The AG value is also elevated, suggesting strong edge retention despite the complexity of input contrasts. The API remains moderate, which supports consistent exposure levels across tissues. Entropy and MIF remain well-balanced, signifying effective integration of information without introducing randomness. Correlation remains strong, and FS1 indicates a fair contribution from both source modalities.
In Graph 13, which includes fusion between T2 MRI and SPECT, the proposed method maintains competitive performance despite the inherent modality dissimilarity. While the API and SD values are more moderate compared to earlier sets, the method achieves relatively high AG and SF scores, confirming its ability to preserve both edge information and spatial detail even when fusing structurally rich with functionally coarse data. The Entropy and MIF metrics indicate that the proposed method efficiently integrates metabolic and anatomical cues, while maintaining a controlled information density. FS1 is consistent with earlier sets, and the Correlation remains high, suggesting structural fidelity is well-preserved even in mixed modality conditions.
In Graph 14, where CT is fused with PET, the proposed method delivers some of its strongest results. The API and SD are higher here, reflecting enhanced brightness and contrast, which are desirable in highlighting metabolic activity from PET superimposed on anatomical CT. The AG and SF values are among the highest across all methods, demonstrating excellent preservation of structural boundaries and textural details. Entropy and MIF are both well-balanced, suggesting that the fusion process captures informative content from both modalities without redundancy. Correlation values are again strong, underscoring structural reliability, and FS1 maintains the usual moderate asymmetry observed in optimized fusion, where functional content may be subtly emphasized.
Collectively, these graphical plots affirm that the proposed fusion strategy consistently outperforms or closely rivals existing techniques across a wide spectrum of image quality metrics and modality types. Its ability to retain contrast, preserve anatomical structures, maintain visual clarity, and balance modality contributions makes it a robust and versatile solution for multimodal medical image fusion.
The implementation of the proposed fusion framework is publicly available at:
9 Ablation study
To assess the contribution of individual components in the proposed fusion framework, an ablation study was conducted by creating different variants of the method, each obtained by selectively altering a specific module while keeping the rest of the pipeline unchanged. The analysis considered the roles of contourlet decomposition, weighted averaging of low-frequency components, and the max-absolute selection rule for high-frequency fusion. Starting from the complete proposed model, two key variants were generated: (i) replacing the max-absolute rule with simple weighted averaging for high-frequency fusion, and (ii) using the low-frequency subband from only one source image instead of combining both.
The results, presented in Table 7, show that replacing the max-absolute fusion rule with simple weighted averaging leads to a reduction in performance across most metrics, particularly in spatial frequency (SF) and QABF, indicating loss of sharpness and detail preservation. Similarly, using the low-frequency subband from only one image results in weaker overall performance, with notable drops in API, SD, and QABF, reflecting a reduction in contrast representation and structural fidelity. The original proposed method consistently achieves higher QABF values (above 0.82 for most cases) and superior scores in other metrics, confirming that combining low-frequency content from both sources and using the max-absolute rule for high-frequency fusion are essential for preserving both modality-specific and fine structural details.
10 Conclusion and future work
The paper introduced a new hybrid medical image fusion technique that successfully integrates the multiscale, direction features of the Contourlet Transform with the edge-preserving features of a mean curvature filter. The proposed method employs a three-level Contourlet decomposition to decompose source images into low-frequency and high-frequency components. Low-frequency components are fused via weighted averaging for global intensity and contrast preservation, while high-frequency components are enhanced with a curvature filter at level 3 and fused via maximum absolute selection for fine structural detail preservation.
Extensive experimental comparisons on multimodal medical datasets confirmed the dominance of the proposed method over a number of classical and DL-based fusion methods. The method consistently recorded higher values for objective measures like Entropy, Average Gradient, Spatial Frequency, and Mutual Information-based Fusion. For example, on SET3, the proposed method recorded an AG of 18.4231, Entropy of 4.8727, SF of 34.0673, and MIF of 1.7280—reflecting substantial improvements in texture retention, sharpness, and information content.
Despite the fact that it performs well, the current method is less scalable. Future study can enhance the method employing adaptive weighting strategies for fusion, maybe driven by saliency detection or semantic segmentation to focus relevant areas. The performance of the technique in real diagnostic tasks will also be considered by means of real-time evaluation and testing on big, modality-diverse datasets.
References
- 1. Azam MA, Khan KB, Salahuddin S, Rehman E, Khan SA, Khan MA, et al. A review on multimodal medical image fusion: compendious analysis of medical modalities, multimodal databases, fusion techniques and quality metrics. Comput Biol Med. 2022;144:105253. pmid:35245696
- 2. Zhou T, Li Q, Lu H, Cheng Q, Zhang X. GAN review: models and medical image fusion applications. Information Fusion. 2023;91:134–48.
- 3. Li Y, Zhao J, Lv Z, Li J. Medical image fusion method by deep learning. International Journal of Cognitive Computing in Engineering. 2021;2:21–9.
- 4. Kaur M, Singh D. Multi-modality medical image fusion technique using multi-objective differential evolution based deep neural networks. J Ambient Intell Humaniz Comput. 2021;12(2):2483–93. pmid:32837596
- 5.
Xu Y, Li X, Jie Y, Tan H. Simultaneous tri-modal medical image fusion and super-resolution using conditional diffusion model. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. 2024. p. 635–45.
- 6. He D, Li W, Wang G, Huang Y, Liu S. MMIF-INet: multimodal medical image fusion by invertible network. Information Fusion. 2025;114:102666.
- 7. Peng Y, Deng H. Medical image fusion based on machine learning for health diagnosis and monitoring of colorectal cancer. BMC Med Imaging. 2024;24(1):24. pmid:38267874
- 8. Huo X, Sun G, Tian S, Wang Y, Yu L, Long J. HiFuse: hierarchical multi-scale feature fusion network for medical image classification. Biomedical Signal Processing and Control. 2024;87:105534.
- 9. Liang N. Medical image fusion with deep neural networks. Sci Rep. 2024;14(1):7972. pmid:38575689
- 10.
Hu Y, Yang H, Xu T, He S, Yuan J, Deng H. Exploration of multi-scale image fusion systems in intelligent medical image analysis. In: 2024 IEEE 2nd International Conference on Sensors, Electronics and Computer Engineering (ICSECE). IEEE; 2024. p. 1224–9.
- 11. Kumar A. Deep learning for multi-modal medical imaging fusion: enhancing diagnostic accuracy in complex disease detection. Int J Eng Technol Res Manag. 2022;6(11):183.
- 12. Das S, Kundu MK. NSCT-based multimodal medical image fusion using pulse-coupled neural network and modified spatial frequency. Med Biol Eng Comput. 2012;50(10):1105–14. pmid:22825746
- 13. Veeraiah D, Sai Kumar S, Ganiya RK, Rao KS, Nageswara Rao J, Manjith R. RETRACTED: multimodal medical image fusion and classification using deep learning techniques. Journal of Intelligent & Fuzzy Systems. 2025;48(1_suppl):637–51.
- 14.
Liu Y, Chen X, Cheng J, Peng H. A medical image fusion method based on convolutional neural networks. In: 2017 20th international conference on information fusion (Fusion). IEEE; 2017. p. 1–7.
- 15.
Agrawal C, Yadav SK, Singh SP, Panigrahy C. A simplified parameter adaptive DCPCNN based medical image fusion. In: Proceedings of International Conference on Communication and Artificial Intelligence: ICCAI 2021, 2022. p. 489–501.
- 16. Lin C, Chen Y, Feng S, Huang M. A multibranch and multiscale neural network based on semantic perception for multimodal medical image fusion. Sci Rep. 2024;14(1):17609. pmid:39080442
- 17. Sinha A, Agarwal R, Kumar V, Garg N, Pundir DS, Singh H, et al. Multi-modal medical image fusion using improved dual-channel PCNN. Med Biol Eng Comput. 2024;62(9):2629–51. pmid:38656734
- 18. Zhang C, Zhang Z, Feng Z, Yi L. Joint sparse model with coupled dictionary for medical image fusion. Biomedical Signal Processing and Control. 2023;79:104030.
- 19. Jie Y, Li X, Tan T, Yang L, Wang M. Multi-modality image fusion using fuzzy set theory and compensation dictionary learning. Optics & Laser Technology. 2025;181:112001.
- 20. Tang W, He F, Liu Y, Duan Y. MATR: multimodal medical image fusion via multiscale adaptive transformer. IEEE Trans Image Process. 2022;31:5134–49. pmid:35901003
- 21. Bavirisetti DP, Kollu V, Gang X, Dhuli R. Fusion of MRI and CT images using guided image filter and image statistics. International Journal of Imaging Systems and Technology. 2017;27(3):227–37.
- 22. Bavirisetti DP, Dhuli R. Two-scale image fusion of visible and infrared images using saliency detection. Infrared Physics & Technology. 2016;76:52–64.
- 23. Liu Y, Liu S, Wang Z. A general framework for image fusion based on multi-scale transform and sparse representation. Information fusion. 2015;24:147–64.
- 24. Zhu Z, Zheng M, Qi G, Wang D, Xiang Y. A phase congruency and local Laplacian energy based multi-modality medical image fusion method in NSCT domain. IEEE Access. 2019;7:20811–24.
- 25. Kumar SB. Image fusion based on pixel significance using cross bilateral filter. Signal, Image and Video Processing. 2015;9:1193–204.
- 26. Li S, Kang X, Hu J. Image fusion with guided filtering. IEEE Trans Image Process. 2013;22(7):2864–75. pmid:23372084
- 27. Kurban R. Gaussian of differences: a simple and efficient general image fusion method. Entropy (Basel). 2023;25(8):1215. pmid:37628245
- 28.
Song X, Shen T, Li H, Wu XJ. D2-LRR: a dual-decomposed MDLatLRR approach for medical image fusion. In: 2023 International Conference on Machine Vision, Image Processing and Imaging Technology (MVIPIT). 2023. p. 24–30.
- 29. Kumar B S. Multifocus and multispectral image fusion based on pixel significance using discrete cosine harmonic wavelet transform. Signal, Image and Video Processing. 2013;7:1125–43.
- 30. Punjabi A, Martersteck A, Wang Y, Parrish TB, Katsaggelos AK, Alzheimer’s Disease Neuroimaging Initiative. Neuroimaging modality fusion in Alzheimer’s classification using convolutional neural networks. PLoS One. 2019;14(12):e0225759. pmid:31805160
- 31. Wu C, Chen L. Infrared and visible image fusion method of dual NSCT and PCNN. PLoS One. 2020;15(9):e0239535. pmid:32946533
- 32. Ogbuanya CE, Obayi A, Larabi-Marie-Sainte S, Saad AO, Berriche L. A hybrid optimization approach for accelerated multimodal medical image fusion. PLoS One. 2025;20(7):e0324973. pmid:40638635
- 33. Guo L, Shen S, Harris E, Wang Z, Jiang W, Guo Y, et al. A tri-modality image fusion method for target delineation of brain tumors in radiotherapy. PLoS One. 2014;9(11):e112187. pmid:25375123
- 34. Niroshana SMI, Zhu X, Nakamura K, Chen W. A fused-image-based approach to detect obstructive sleep apnea using a single-lead ECG and a 2D convolutional neural network. PLoS One. 2021;16(4):e0250618. pmid:33901251
- 35. Jamil R, Dong M, Bano S, Javed A, Abdullah M. Precancerous change detection technique on mammography breast cancer images based on mean ratio and log ratio using fuzzy c mean classification with gabor filter. Curr Med Imaging. 2024;20:e18749445290351. pmid:38803183
- 36. Javed A, Javed SA, Ostrov B, Qian J, Ngo K. Bing-neel syndrome: an unknown GCA mimicker. Case Rep Rheumatol. 2024;2024:2043012. pmid:39161396
- 37. Abdullah M, Hongying Z, Javed A, Mamyrbayev O, Caraffini F, Eshkiki H. A joint learning framework for fake news detection. Displays. 2025;:103154.
- 38. Do MN, Vetterli M. The contourlet transform: an efficient directional multiresolution image representation. IEEE Trans Image Process. 2005;14(12):2091–106. pmid:16370462
- 39. Po DDY, Do MN. Directional multiscale modeling of images using the contourlet transform. IEEE Trans Image Process. 2006;15(6):1610–20. pmid:16764285
- 40.
Selesnick IW, Bayram I. Total variation filtering. 2010.
- 41. Louchet C, Moisan L. Total variation as a local filter. SIAM Journal on Imaging Sciences. 2011;4(2):651–94.
- 42. Gong Y, Sbalzarini IF. Curvature filters efficiently reduce certain variational energies. IEEE Trans Image Process. 2017;26(4):1786–98. pmid:28141519
- 43.
Hildebrandt K, Polthier K. Computer graphics forum. Wiley Online Library. 2004. p. 391–400.
- 44. Gong Y, Goksel O. Weighted mean curvature. Signal Processing. 2019;164:329–39.
- 45. Singh S, Singh H, Bueno G, Deniz O, Singh S, Monga H. A review of image fusion: methods, applications and performance metrics. Digital Signal Processing. 2023;137:104020.
- 46.
Prakash O, Kumar A, Khare A. Pixel-level image fusion scheme based on steerable pyramid wavelet transform using absolute maximum selection fusion rule. In: 2014 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT). 2014. p. 765–70.
- 47. Chung KL, Chen WY. Fast adaptive PNN-based thresholding algorithms. Pattern Recognition. 2003;36(12):2793–804.
- 48.
Mehendale ND, Shah SA. Image fusion using adaptive thresholding and cross filtering. In: 2015 International Conference on Communications and Signal Processing (ICCSP). IEEE; 2015. p. 144–8.
- 49. Song YS. Analyzing preprocessing for correcting lighting effects in hyperspectral images. Journal of the Korean Society of Industry Convergence. 2023;26(5):785–92.
- 50. Chitradevi B, Srimathi P. An overview on image processing techniques. International Journal of Innovative Research in Computer and Communication Engineering. 2014;2(11):6466–72.
- 51.
Sonka M, Hlavac V, Boyle R. Image pre-processing. Image processing, analysis and machine vision. Springer; 1993. p. 56–111.
- 52. Dou J, Li J. Optimal image-fusion method based on nonsubsampled contourlet transform. Optical Engineering. 2012;51(10):107006.
- 53. Pan Y, Liu D, Wang L, Xing S, Benediktsson JA. A multispectral and panchromatic images fusion method based on weighted mean curvature filter decomposition. Applied Sciences. 2022;12(17):8767.