FKG-MM: A multi-modal fuzzy knowledge graph with data integration in healthcare

Nguyen Hong Tan; Tran Manh Tuan; Pham Minh Chuan; Nguyen Duc Hoang; Le Quang Thanh; Le Hoang Son

doi:10.1371/journal.pone.0339864

Abstract

Artificial Intelligence (AI) has been dramatically applied to healthcare in various tasks to support clinicians in disease diagnosis and prognosis. It has been known that accurate diagnosis must be drawn from multiple evidence, namely clinical records, X-Ray images, IoT data, etc called the multi-modal data. Despite the existence of various approaches for multi-modal medical data fusion, the development of comprehensive systems capable of integrating data from multiple sources and modalities remains a considerable challenge. Besides, many machine learning models face difficulties in representation and computation due to the uncertainty and diversity of medical data. This study proposes a novel multi-modal fuzzy knowledge graph framework, called FKG-MM, which integrates multi-modal medical data from multiple sources, offering enhanced computational performance compared to unimodal data. In addition, the FKG-MM framework is based on the fuzzy knowledge graph model, one of the models that represent and compute effectively with medical data in tabular form. Through some experiment scenarios utilizing the well-known BRSET dataset on multi-modal diabetic retinopathy, it has been experimentally validated that the feature selection method, when combining image features with tabular medical data features, gives the highest reliability results among 5 methods including Feature Selection Method, Tensor Product, Hadamard Product, Filter Selection, and Wrapper Selection. In addition, the experiment also confirms that the accuracy of FKG-MM increases by 12–14% when combining image data with tabular medical data than the related methods diagnosing only on tabular data.

Citation: Tan NH, Tuan TM, Chuan PM, Hoang ND, Thanh LQ, Son LH (2026) FKG-MM: A multi-modal fuzzy knowledge graph with data integration in healthcare. PLoS One 21(1): e0339864. https://doi.org/10.1371/journal.pone.0339864

Editor: Fucai Lin, Minnan Normal University, CHINA

Received: April 29, 2025; Accepted: December 8, 2025; Published: January 2, 2026

Copyright: © 2026 Tan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The dataset used in this study is the Brazilian Multilabel Ophthalmological Dataset (BRSET), available on the PhysioNet platform (https://doi.org/10.13026/1pht-2b69). In order to gain access to this dataset, we strictly followed the PhysioNet access procedures, including the completion of the “CITI Data or Specimens Only Research” training and signing the data use agreement to comply with the dataset’s terms of use.

Funding: This research was funded by the project QG.23.66 of Vietnam National University, Hanoi. The role of the funder in this study was that of the corresponding author (Le Hoang Son). The author’s contributions to this research included conceptualization, formal analysis, methodology, supervision, writing, review, and editing.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Recent advances in Artificial Intelligence (AI) technologies have significantly transformed numerous domains, ranging from transportation and finance to healthcare and education. The utilization of AI methods in healthcare and medicine, particularly Machine Learning (ML) and Deep Learning (DL), has become popular and has created many different models for diagnosis, prognosis, and prediction of several diseases. The predominant dependence on unimodal data sources, such as X-ray images, intraoral photos, computed tomography (CT) scans, magnetic resonance imaging (MRI), endoscopy images, etc. poses distinct challenges for contemporary healthcare applications. These models often lack the integration of critical complementary data sources and multiple modalities, thereby constraining their ability to deliver in-depth insights [1]. However, despite their effectiveness, most current methods practice of combining diverse sources of information to enhance decision-making [2,3].

As many diverse data source in healthcare, including medical imaging and electronic health records, become increasingly accessible, the demand for effective integration and fusion of the multi-modal information are generally based on data from only one modality, which restricts their ability to accurately mirror the clinical has grown, aiming to enable more comprehensive analysis and informed clinical decision-making [4]. The integration of data from diverse modalities, commonly referred to as multi-modal data fusion, has significantly influenced advancements in the healthcare domain [5]. The fusion of multi-modal data within healthcare fields is consistent with the foundational concepts of predictive, preventive, and personalized medicine (3 PM), as it leverages heterogeneous data sources to enhance clinical decision-making [6]. This integrative methodology supports the development of predictive models, facilitates early intervention strategies, and promotes personalized treatment plans, ultimately enhancing patient outcomes and optimizing the efficiency of healthcare delivery.

In particular, Dongwei Xie et al. [7] introduced a multi-modal fusion framework utilizing a late fusion method to merge RGB and skeletal features, thereby improving the model’s capacity to effectively capture both spatial and temporal dependencies. A model MDF-Net proposed by Hsieh et al. [8] derived from and extended upon the Mask R-CNN framework is designed to simultaneously integrate clinical and chest X-ray images data, enabling more accurate identification of abnormal regions within chest radiographs. The adoption of integrating multi-modal data presents a promising approach, facilitating the development of AI-driven healthcare systems that offer more refined diagnostic assessment, accurate prognostic predictions, and tailored therapeutic interventions [4].

Research in the field of multi-modal medical diagnosis remains in its early developmental phase, with ongoing efforts focused on formulating novel methodologies to effectively process and analyze multi-modal medical data [9]. For example, COVID-19 has manifested as a highly lethal global pandemic, leading to the death of more than three million individuals across the world [10]. In response, a multi-modal diagnostic framework named Ai-CovScan was introduced to detect COVID-19 by integrating data from chest X-ray imaging, respiratory sounds, and rapid antigen test outcomes. This framework demonstrated a preliminary accuracy of 80% in analyzing breathing sound and achieved a COVID-19 detection accuracy of 99.66% on the chest X-ray image dataset [10]. AutoPrognosis-M presents a multi-modal methodology that facilitates the integration of structured clinical (tabular) data and medical imaging through automated ML techniques [2]. Similarly, Silva and Rohr [11] developed MultiSurv, a multi-modal approach designed for the diagnosis of over 30 cancer types. Furthermore, a study by Lu [12] introduced a multi-modal DL model that leverages multidimensional and multi-level temporal data to predict multi-drug resistance in patients with pulmonary tuberculosis.

Despite these advancements, prior studies have predominantly concentrated on the fusion of metadata and imaging features, often overlooking the exploration of the intrinsic relationship between these two modalities. Imaging and tabular data have been identified as the most frequently utilized modalities [13]. Imaging modalities serve a critical function in the diagnosis and monitoring of numerous medical conditions, benefiting from their integration into clinical workflows, standardized protocols, and the availability of specialized expertise for interpretation. Tabular data, representing structured clinical information, is equally vital for the holistic evaluation and management of patients, serving as a cornerstone for the implementation of personalized medicine [13]. Metadata offers complementary contextual information that can enhance the interpretation of imaging data, while image features inherently contain distinctive visual cues that may inform a more nuanced understanding of metadata [14]. Therefore, the integration of these modalities holds the potential to more effectively reveal features critical for accurate disease detection and classification.

Multi-modal data fusion enhances the precision and depth of clinical decision support systems by enabling a more holistic understanding of patient information in medical data analysis [15]. These graphs serve as essential frameworks for knowledge representation and reasoning, facilitating more informed clinical inferences [15]. Knowledge graphs (KGs) structurally encode factual information through structured triples, composing of head and tail entities connected by a binary relation [16]. With the rapid evolution of KG technologies, they have been widely adopted in various applications, including semantic search [17], recommender systems [18], and question-answering tasks [19,20]. As a fundamental component of AI, KGs offer a powerful framework for knowledge representation; however, they often struggle to address ambiguities inherent in fuzzy semantic contexts [21]. To address this limitation, fuzzy knowledge graphs (FKGs) have been proposed as semantic networks that not only capture entity relationships but also provide a formal mechanism for representing real-world concepts and their relationships [22]. Despite their potential, existing FKG research has largely been confined to experiments using unimodal datasets, typically based on symptom data derived from test indicators, without evaluating performance on multi-modal datasets originating from many diverse sources [23–25]. Furthermore, these FKG studies have yet to assess the influence of input features on predictive outcomes. Identifying the key symptoms that significantly impact diagnostic conclusions is essential for informed clinical decision-making and personalized treatment planning [26,27].

For the above reasons, within the scope of this study, a new multi-modal data fusion with fuzzy knowledge graph approach is developed to address the integration of medical imaging data and electronic health records (EHRs) for enhancing the accuracy of diagnosis against the uni-modal approaches. It is then applied for supporting ophthalmological disease diagnosis. The contributions and novelties of this paper are shown as follows:

Proposing a new multi-modal data integration framework: A novel framework for multi-modal data fusion has been introduced, grounded in the principles of fuzzy rule-based systems and fuzzy knowledge graphs (FKG). This framework is specifically engineered to effectively manage the challenges posed by the high dimensionality and heterogeneity inherent in multi-modal data, thereby rendering it highly applicable to complex domains such as healthcare.
Proposing a method to fuse tabular data and image data in the medical field: This technique extracts important image features integrated with tabular data features to increase confidence in disease diagnosis.
A demonstrative experiment in the case study of ophthalmological disease: Comparative experiments are conducted on the benchmark medical BRSET dataset, which comprises retinal fundus images and patient metadata, to predict demographic characteristics and diabetic retinopathy disease detection. This shows the applicability of the proposed method.

From an academic perspective, this paper proposes a framework that introduces novel techniques for fusing features from both image and tabular data, two commonly used data types in the healthcare domain. From a practical application standpoint, the paper demonstrates that the proposed model is both suitable and effective in integrating diverse data types to enhance the diagnosis of diabetic retinopathy.

Related works

Data modalities in healthcare

Data modalities in healthcare refer to the different types or forms of data that are collected, generated, and used within the healthcare system to monitor, diagnose, and treat patients. Healthcare encompasses a wide range of data modalities, as illustrated in Fig 1. Through appropriate data processing techniques, such as extracting features from medical images, structuring electronic health records (EHRs), and analyzing data from wearable devices, this raw information is transformed into structured, meaningful insights that can support clinical analysis and decision-making [28].

Download:

Fig 1. Data modalities in healthcare.

https://doi.org/10.1371/journal.pone.0339864.g001

Medical imaging data.

Medical imaging serves a vital function in healthcare by offering essential diagnostic insights and supporting the management of a broad spectrum of medical conditions [14,29–31]. It utilizes sophisticated imaging technologies to produce high-resolution visual representations of internal anatomical structures, thereby enabling doctors and clinicians to identify abnormalities, assess disease progression, and guide therapeutic interventions. Nowadays, medical imaging is progressively being combined with advanced data analytics and digital technologies, contributing to greater diagnostic precision, increased operational efficiency, and expanded access to healthcare services [32].

Electronic Health Records (EHRs).

Digital Medical Records function as a centralized archive of patient medical data, facilitating information access by healthcare providers [14,33]. The widespread adoption of EHR systems has led to a substantial growth in both the volume and complexity of patient-related data [34]. Despite their richness and patient-specific nature, these datasets are frequently fragmented and lack a standardized structure. These datasets involve a wide range of variables, including medication histories, laboratory results, imaging findings, physiological measurements, and clinical notes [35,36], thereby introducing analytical challenges due to their heterogeneity and complexity. Machine learning (ML) techniques offer a promising approach to address these challenges by uncovering complex, non-linear relationships within the diverse variables embedded in EHR datasets [37].

Other data modalities in healthcare.

Wearable devices have gained significant prominence in the healthcare field, offering promising capabilities for continuous monitoring and assessment of various health and wellness indicators. Typically designed for wear on the body or embedded in clothing and accessories, these devices capture real-time data on sleep behavior, physical activity, vital signs, and additional health indicators [38]. This data modality provides crucial understanding of an individual’s comprehensive health status, supporting customized health surveillance and the strategies for the prevention of disease [39]. Furthermore, wearable technologies enable remote patient monitoring, allowing healthcare professionals to track patients’ health conditions from a remote location and take appropriate action when required.

Similarly, sensor data is pivotal in the advancement of smart healthcare, as it allows for the live monitoring of physiological indicators and daily routines. The continuous collection of sensor data aids in the early identification and prompt intervention of potential health issues [40]. This form of data collection provides healthcare providers with objective and accurate information, enhancing clinical decision-making and enabling the development of personalized treatment regimens. For instance, sensor technologies are instrumental in managing chronic diseases such as diabetes or cardiovascular disorders through continuous monitoring of variables like blood glucose concentrations and heart rate variability. Real-time sensor data is also fundamental to telemedicine, digital health services, and remote monitoring of patients, thereby facilitating care delivery for patients with limited mobility or those in geographically isolated regions, reducing the necessity for frequent individual clinical visits [41,42].

Genomic data is crucial to the advancement of healthcare, providing vital insights into a person’s genetic makeup and its influence on health and disease [43]. Recent progress in genomic sequencing technologies has greatly enhanced the accessibility and affordability of acquiring personal genetic information. Genomic data supports a wide range of applications, including the prediction and diagnosis of hereditary conditions, along with the genetic markers identification linked to disease susceptibility and treatment responsiveness [44]. Moreover, it serves as a cornerstone of personalized medicine by informing therapeutic strategies based on an individual’s specific genetic profile [45].

In smart healthcare, environmental data also holds substantial value, understanding the impact of environmental variables on personal health, including variables such as humidity, ambient temperature, noise levels, pollution levels, air quality, and other environmental-specific conditions. Incorporating environmental data into smart healthcare systems enables a more comprehensive understanding of how external conditions may impact patient health results [46]. For instance, air quality monitoring allows for the identification of regions with elevated pollution levels, which is especially beneficial for persons suffering from respiratory diseases such as asthma.

Data fusion techniques

Data fusion techniques involve the combination of information from various sources to extract meaningful and actionable insights. These methods enhance the accuracy and reliability of inferences beyond what can be achieved through the analysis of individual data sources alone. Accordingly, data fusion aims to aggregate information originating from heterogeneous and, at times, complementary or competing sources, thereby capturing the collaborative dynamics within complex systems [47]. By transforming raw data into knowledge-driven insights, data fusion facilitates more informed and dependable decision-making processes, rendering it a valuable approach in the context of Structural Health Monitoring (SHM) systems [48].

Over time, a variety of mathematical frameworks have been employed to develop sophisticated data integration algorithms. An extensive introduction and discussion of these methodologies were provided by Meng et al. [49] provided an extensive introduction and discussion of these methodologies. This overview highlights the most widely adopted data fusion techniques, detailing their distinct features, associated challenges, advantages, and limitations within SHM applications. As outlined by Pires et al. [50], traditional data fusion approaches can generally be classified into three main categories: (1) probability-based methods, (2) evidence reasoning approaches, and (3) knowledge-based techniques. These classifications, along with corresponding data fusion strategies, are illustrated in Fig 2 [47].

Download:

Fig 2. Traditional data fusion.

https://doi.org/10.1371/journal.pone.0339864.g002

Unimodal and Multi-modal data integration models

Diabetic retinopathy (DR) is a retinal disorder resulting from diabetes, predominantly impacting the retina’s structure and function. It represents one of the primary causes of visual impairment and blindness in developed nations [51]. Prior research on ophthalmic diagnostic models has highlighted the significant possible of image recognition-based artificial intelligence in automating tasks traditionally reliant on clinical expertise [52]. However, individual deep learning models often encounter limitations in effectively extracting critical features from complex retinal images. To address this challenge, Khan et al. propose a robust ensemble-based approach for diabetic retinopathy (DR) diagnosis, structured into four primary phases: image pre-processing, selection of pre-trained backbone models, feature enhancement, and optimization. The process begins with image pre-processing, wherein Contrast Limited Adaptive Histogram Equalization (CLAHE) is employed to enhance image contrast. The proposed model was evaluated using the multiclass APTOS 2019 Kaggle dataset, achieving an accuracy of 88.52% [53]. Meanwhile, many AI systems that support the diagnosis of eye diseases in general and diabetic retinopathy in particular have been recognized by prestigious organizations such as the FDA and are widely deployed [54]. Before being applied in practice, these AI systems must undergo an evaluation and verification process for algorithms and data [55]. This verification can be done by a team of experts in the field of ophthalmology or based on previously certified AI products [56].

The fusion of multi-modal medical data has become a revolutionary approach in the field of medicine, facilitating a more holistic insight into patient health conditions and enabling the development of personalized treatment strategies [28]. The process of constructing a multi-modal deep learning framework in healthcare generally adheres to the conventional machine learning pipeline, encompassing stages such as problem formulation, data preprocessing, model training, and performance evaluation [3]. Healthcare researchers are increasingly leveraging the benefits of multi-modality to improve clinical outcomes. For example, Pingali, L. introduced a multi-modal machine learning approach for predicting the progression of knee osteoarthritis by combining clinical data with plain radiographs [57]. Similarly, the study in [29] proposed a personalized, multi-modal, and cost-efficient Oral Health Advisor, capable of automatically classifying sensor-derived data and delivering interactive oral health guidance. Collectively, these studies underscore the advantages of multi-modal machine learning, particularly its ability to incorporate diverse data sources and enhance predictive accuracy by capturing a broader set of features through data fusion. Ophthalmology, as a field that extensively depends on multi-modal information, necessitates comprehensive patient histories alongside detailed visual assessments. As a result, multi-modal machine learning is gaining growing significance in advancing diagnostic capabilities within ophthalmic practice [58]. Specifically for diabetic retinopathy, there have been a number of review studies demonstrating the effectiveness and benefits of using multi-modal image data in diagnosing this disease, such as color photography, OCTA, or OCT, etc. [59,60]. In addition, the study by Restrepo et al. [5] in 2024 also conducted experiments on a multi-modal dataset combining image and tabular data. Therefore, there exists both a significant need and considerable potential for the continued advancement of multi-modal artificial intelligence models to support the diagnosis and triage of ophthalmic diseases.

Knowledge Graphs (KGs) and Fuzzy Knowledge Graphs (FKGs)

The rapid expansion of AI and big data technologies has underscored the critical need for effective methods of organizing and representing vast volumes of knowledge. Knowledge graphs (KGs) have emerged as a powerful tool for structuring and conveying real-world information through graph-based data models [61]. These characteristics have led to the increasing application of KGs across diverse domains, particularly in healthcare. Nevertheless, traditional KGs face limitations when dealing with datasets that contain ambiguous, incomplete, or uncertain information.

To address these challenges, Fuzzy Knowledge Graphs (FKGs) have recently emerged as a promising extension of KGs, combining fuzzy logic with approximate reasoning to improve inference capabilities in uncertain environments [23]. Introduced in 2020, FKGs integrate fuzzy inference mechanisms to identify implicit relationships and derive novel labels that cannot be inferred through conventional rule-based systems. Although the M-CFIS-FKG model proposed by Lan et al. has addressed several limitations inherent in traditional knowledge graphs, it remains constrained by its reliance on single entity pairs during the inference process. To overcome this limitation, Long et al. [24] introduced an enhanced model known as the pair-form fuzzy knowledge graph (FKG-Pairs) for the diagnosis of preeclampsia symptoms in pregnant women, aiming to facilitate decision-making in clinical settings where input datasets may be partially incomplete [62]. The effectiveness of the FKG-Pairs3 model has been demonstrated through its application in approximate reasoning for disease diagnosis within the domain of traditional medicine [63]. Furthermore, Long et al. [25] developed the FKG-Extreme model to support decision-making in complex or extreme cases, successfully applying it to the diagnosis of chronic kidney disease, demonstrating its utility in handling challenging medical scenarios [23,64].

Although FKG-based models have demonstrated strong capabilities in representing uncertain knowledge, a key limitation lies in their reliance on restricted datasets derived from one-source data. To address this constraint, a study of Tan et al. [65] introduced a novel conceptual framework, termed FKG-S, which integrates data from multiple sources. However, this paper has some limitations, such as only using unimodal datasets, not enhancing the multi-source data integration module to support the integration of diverse input data types, or not showing the integration strategies using many data fusion techniques to effectively combine heterogeneous data modalities.

The proposed FKG-MM framework

In this section, the proposed FKG-MM framework is introduced for integrating multi-modal data based on the fuzzy knowledge graph model FKG applied to the medical field. Firstly, the general framework is described step by step from data collection to model representation and output. Next, the solution to support the diagnosis of diabetic retinopathy based on tabular data and medical image data is described.

The FKG-MM framework

The FKG-MM framework is based on the FKG model to integrate multi-modal data, such as image data, tabular data, text data, and EEG signal data, to support highly effective disease diagnosis. The framework consists of several components, with the first part being the component that collects and stores data from various sources and different data models. Next, the data is preprocessed, fusing different types of data as a basis for model building. The final component is the predictive and diagnostic model that gives the results. The components of the FKG-MM framework are shown in Fig 3 and are described in detail as follows:

Data Sources: In the era of big data, data comes from many sources and exists in many different models. In the field of data, it can be structured or unstructured, but it often has common forms such as data images, data texts, time series data, data tables, data videos, etc. These types of data are collected through the examination process or medical devices. Input data sources play an important role in supporting patient diagnosis and treatment.
Data storage: In this step, data collection modules are designed to gather data relevant to specific contexts, identify the data types of each dataset, and deliver them to the corresponding storage space, and automatically store data in a distributed storage system and segregate different types of data (text, images, videos, and audio) accordingly.
Multi-modal data fusion: Once the data is collected, multi-modal AI systems fuse the data types together. Unlike unimodal systems, multi-modal models use architectures that process multiple data formats simultaneously. For example, a model might analyze both the pixels of an image and its associated text annotations to better understand the visual content. This approach takes advantage of complementary strengths: text provides descriptive context, while images provide spatial and visual detail. To fuse these data types, multi-modal systems can use different strategies such as early fusion, late fusion, intermediate fusion, or concatenate data features. This step is the most crucial step of the proposed model, where instead of processing raw data, feature extraction and data fusion are performed. Details of this phase will be presented in the next section.
Fuzzy rule generation: This function utilizes common fuzzy logic mechanisms, such as FIS or expert knowledge, to transform the knowledge stored in the data repository into a fuzzy rule base (FRB).
FKG generation: FKG is a type of knowledge graph that adds fuzzy relationships on the edges. The input of this module is the fuzzy rule base. Lan et al. [23] in 2020 published an algorithm to build FKG from FRB. Later, Long et al. [24] improved the FKG representation model into the FKG-Pairs model.
FKGS sampling: Although FKG is suitable and effective when applied to datasets with missing or uncertain properties. However, due to the complexity of representation and calculation, the FKG model is difficult to apply to huge data sets. Graph sampling is an effective approach for FKG modeling when the input data set is large. Within the model, a purposeful random sampling algorithm is employed on the FKG to generate a sample set of FKGs, referred to as FKGS, also known as FKGS. FKGS has all the basic properties of FKG. Instead of being calculated on FKG, only FKGS needs to be calculated. This significantly reduces the computational cost.
Classification: In the classification module, the FISA algorithm [23] is utilized to compute on FKGS for the purpose of classification or aiding in decision-making.

Download:

Fig 3. The proposed FKG-MM framework.

https://doi.org/10.1371/journal.pone.0339864.g003

As shown above, a multi-modal data integration framework based on the FKG fuzzy knowledge graph model is described to improve the performance and reliability of disease classification diagnosis. However, in the medical field, there are many different types of diseases, each with a different set of symptoms, so that effective diagnosis requires different input from examination data sources. In addition, the FKG and FKGS models have proven their effectiveness when applied to disease diagnosis with a single data source in tabular form. For that reason, the FKG-MM framework was introduced with the aim of integrating image data with tabular data to diagnose Diabetic Retinopathy. The next section will present this integration model in detail.

A case study of FKG-MM for diabetic retinopathy disease diagnosis

This section describes the application of FKG-MM for the classification of diabetic retinopathy diseases. The process is comprised of three main stages: initially, image feature extraction is conducted; subsequently, feature selection from tabular data is performed; and finally, the extracted image features are integrated with the selected tabular data features. Fig 4 depicts an overview of the proposal strategy. However, in real-world scenarios, the process of data collection and preparation often involves potential conflicts between different types of data. In such cases, it is necessary to perform disagreement checks to assess the degree of inconsistency, which helps determine the feasibility of unifying the data before proceeding with integration.

Download:

Fig 4. FKG-MM for diagnosis of diabetic retinopathy diseases.

https://doi.org/10.1371/journal.pone.0339864.g004

Processing on medical image modality:.

Data preprocessing. As a crucial initial stage, preprocessing is applied to retinal fundus images to reduce noise and variability while enhancing image quality and contrast. Beyond contrast enhancement and noise suppression, this step also facilitates image normalization and correction of non-uniform illumination, helping to minimize artifacts and boost the accuracy of subsequent processing stages. Additionally, diabetic eye disease (DED) features are localized, extracted, and segmented from the fundus images to enable more accurate classification using pre-trained models.
Image Resizing. Initially, a grayscale transformation is performed to convert input RGB images into grayscale images. A grayscale image contains only shades of gray, with no presence of red, green, or blue. Using grayscale images simplifies image processing tasks and reduces complexity.
Noise Removal. Digital images may be affected by noise during acquisition, transmission, or subsequent processing stages. To mitigate or eliminate such noise, a variety of filtering techniques—such as Gaussian blurring, median blurring, mean blurring, and bilateral filtering—are commonly employed. To blur or remove noise, various filtering techniques such as mean blur, median blur, Gaussian blur, and bilateral filtering are commonly used. This process adjusts digital images to improve subsequent analysis and aid in the identification of important features. Image enhancement is performed through histogram equalization, which improves the contrast of the image.
Segmentation. Segmentation primarily focuses on identifying similar regions within an image and dividing objects into distinct areas based on a threshold value. One of the most effective image segmentation techniques is Otsu’s thresholding, a histogram-based global thresholding method. This technique assumes that the image consists of two pixel categories (foreground and background) based on a bi-modal histogram. It determines the optimal threshold that that enhances the differences between classes or, equivalently, reduces the variance within each class. Otsu’s thresholding is a nonlinear method for converting a grayscale image into a binary one. The largest segmented object is selected for further analysis, while smaller objects are removed to prevent inaccurate results. The image preprocessing and segmentation operations are described in detail in Algorithm 1. Image feature extraction. Feature extraction is a fundamental step in the analysis and identification of relationships among objects. Because image prediction, categorization, and recommendation algorithms cannot process images in their raw form, feature extraction is necessary to convert images into an interpretable format. This increases classifier complexity and computational load when processing irrelevant features, reducing classification accuracy. To achieve precise image classification, it is necessary to extract sufficient relevant features. Segmenting images and extracting multiple features from different regions is the most effective approach. In this study, texture classification was performed using GLCM features, while various statistical features were employed to analyze color information for disease classification. Algorithm 2 details the feature extraction method.
GLCM Features [66]. Initially, each image is processed through the Gray Level Co-occurrence Matrix (GLCM) method. The extracted GLCM features, along with their corresponding descriptions, are presented below:
- Contrast reflects the spatial frequency of an image and a moment of the GLCM, describing the difference between adjacent pixel values. It measures local variations, where low contrast results in GLCM values concentrated near the main diagonal, indicating low spatial frequencies.
- Homogeneity, or inverse difference moment, measures image uniformity and is higher when gray-level differences between pixel pairs are small. It is sensitive to elements near the GLCM diagonal and peaks when all pixel values are identical. Homogeneity inversely correlates with contrast: as contrast increases, homogeneity decreases, with energy remaining constant.
- Energy is the square root of the angular second moment and increases when the image shows greater uniformity or structure.
- Entropy measures the degree of randomness and uniformity among pixels within an image.
- Correlation quantifies the degree of association between a pixel and its neighboring pixels across the entire image.
- Angular Second Moment (ASM), which measures textural uniformity through pixel pair repetitions and detects texture irregularities. Its maximum value is 1, with higher values indicating a constant periodic gray-level pattern.

Algorithm 1 Image preprocessing and segmentation.

1: function UNSHARP_MASK(image) ▷ Sharpen image using unsharp masking technique

2: blurred GaussianBlur(image)

3: sharpened image (1 + ) – blurred

4: return sharpened

5: end function

6: function APPLY_CLAHE(image) ▷ Increase local contrast

7: lab_image convert image to LAB color space

8: l,a,b split lab_image

9: clahe CLAHE object (clipLimit = 2.0, tileGridSize = (8,8))

10: apply CLAHE to l

11: merged merge , a, b and convert to BGR

12: return merged

13: end function

14: function DENOISE_IMAGE(image) ▷ Denoising using fast Non-local Means Denoising

15: denoised apply fastNlMeansDenoisingColored

16: return denoised

17: end function

18: function NORMALIZE_NUMERIC_FEATURES(dataframe) ▷ Normalize numeric columns to the range [0, 1] or by Z-score

19: for column in dataframe do

20: if column is numeric then

21: Apply Min-Max Scaling or StandardScaler

22: end if

23: end for

24: return dataframe

25: end function

26: function PREPROCESS_IMAGE(image) ▷ Image preprocessing pipeline

27: sharpened UNSHARP_MASK(image)

28: denoised DENOISE_IMAGE(sharpened)

29: clahe_image APPLY_CLAHE(denoised)

30: gray convert clahe_image to grayscale

31: lesion_mask apply Otsu thresholding on gray

32: return (clahe_image, lesion_mask)

33: end function

Algorithm 2 Image features extraction.

1: function EXTRACT_GLCM_FEATURES(image_folder) ▷ Extract texture features (GLCM) from retinal images

2: Initialize empty list glcm_features

3: image_paths list all image files in image_folder

4: for each path in image_paths do

5: image read and resize image

6: (enhanced_image, lesion_mask) PREPROCESS_IMAGE(image)

7: variance_feature calculate_variance(data)

8: std_dev_feature calculate_standard_deviation(data)

9: rms_feature calculate_rms(data)

10: mean_feature calculate_mean(data)

11: Append (variance_feature, std_dev_feature, rms_feature, mean_feature) to features_arr

12: gray_image convert enhanced_image to grayscale

13: masked_image apply lesion_mask on gray_image

14: normalized_image rescale pixel values into 16 gray levels

15: glcm compute GLCM (distances = [1], angles = [, , , ])

16: features extract contrast, homogeneity, correlation, etc. from glcm features

17: Append features to features_arr

18: end for ▷ Normalize continuous features from images

19: columns_to_normalize [“Variance Feature”, “Standard Deviation Feature”, “RMS Feature”, “Mean Feature”]

20: for col in columns_to_normalize do

21: if col features_arr.columns then

22: features_arr[col] MIN_MAX_SCALING(features_arr[col])

23: end if

24: end for

25: return features_arr

26: end function

Each of the listed GLCM features is computed by the Eqs 1 to 6:(1)(2)(3)(4)(5)(6)where P(i, j) represents the symmetrically normalized GLCM, N denotes the total number of gray levels in the image, and are the means of the row and column as in Eqs 7, 8 and , correspond to the variances of the row and column as in Eqs 9, 10.(7)(8)(9)(10)

Next, statistical features are extracted [67]. In this study, mean, root mean square, standard deviation, and variance are extracted as statistical features. The descriptions and formulas of the extracted features are provided below

Mean: The mean represents the average value of the color intensities across all pixels within the image.
Variance. The variance of an image measures how spread out the values are around the mean.
Standard Deviation. The standard deviation is calculated as the square root of the variance of the distribution.
Root Mean Square. The RMS is calculated as the square root of the mean of all squared intensity values.

Each of the listed statistical features is computed by the Eqs 11 to 14:

(11)

(12)

(13)

(14)

Processing on medical tabular record modality:.

Tabular Data preprocessing. To ensure data quality, basic preprocessing operations for tabular data, such as noise and outlier removal, handling missing values, and eliminating duplicate data, are performed. After preprocessing, relevant attributes in the table that significantly impact the output label are selected. The specific steps to perform tabular data preprocessing are described in Algorithm 3.
Tabular feature selection. First, Pearson correlation was used to remove highly correlated features and those with weak linear relationships by setting a correlation threshold, thus reducing dimensionality and mitigating the curse of dimensionality. Next, mutual information was used on the remaining features to identify those with strong dependencies on the target variable. Mutual information, based on entropy, quantifies the information shared between variables without assuming a specific data distribution. This two-step feature selection process helped retain only the most relevant features for further analysis.

Algorithm 3 Tablular data preprocessing.

1: function LOAD_SELECTED_COLUMNS(file_path, columns)

2: df ReadCSV(file_path)

3: df df[columns]

4: return df

5: end function

6: function MAP_CATEGORICAL_VALUES(df, mappings)

7: for all (col, map_dict) mappings do

8: df[col] Map(df[col], map_dict)

9: end for

10: return df

11: end function

12: function PREPROCESS_NUMERIC_COLUMNS(df, numeric_cols)

13: for all col numeric_cols do

14: df[col] ConvertToNumeric(df[col])

15: df[col] FillNAWithMean(df[col])

16: end for

17: return df

18: end function

19: function REMAP_TARGET_LABELSdf, target_col, label_map

20: df[target_col] Map(df[target_col], label_map)

21: return df

22: end function

23: function BALANCE_DATAdf, target_col, method

24: features Drop(df, target_col)

25: labels df[target_col]

26: balancer method(random_state=42)

27: FitResample(balancer, features, labels)

28: return Concat(X_res, y_res)

29: end function

30: function PROCESS_TABULAR_DATA(file_path, columns, cat_mappings, num_cols, target_col, label_map, balance_method, corr_threshold)

31: df LOAD_SELECTED_COLUMNS(file_path, columns)

32: df MAP_CATEGORICAL_VALUES(df, cat_mappings)

33: df PREPROCESS_NUMERIC_COLUMNS(df, num_cols)

34: df REMAP_TARGET_LABELS(df, target_col, label_map)

35: df_balanced BALANCE_DATA(df, target_col, balance_method)

36: return df_balanced

37: end function

Multi-modal feature fusion from medical features:.

Current multi-modal fusion methods, such as early or intermediate fusion and various feature combination techniques (e.g., concatenation or learning-based), often depend on deep learning methods, which inflates feature dimensions, especially with image data, making meaningful feature selection difficult. Moreover, feature imbalance between data types (e.g., tabular vs. image) complicates integration. Our approach uses simple mathematical operations and carefully selected features from preprocessed data, reducing complexity and enhancing interpretability compared to the deep learning-based methods, which often lack generalizability in diverse medical contexts.

Below are five of the methods that can be used to combine image features and tabular data features.

Feature Selection Fusion: This method evaluates the importance of each individual feature and only retains the most important features from both sources. It’s a simple but effective approach to eliminate redundant data and reduce the dimension of the feature vector.
Algorithm:
1. (a) Step 1. Calculate importance scores for each feature in F_img and F_tab (e.g., using mutual information, variance, or feature importance from models like Random Forest).
2. (b) Step 2. Sort features by importance scores in descending order.
3. (c) Step 3. Select the k_img and k_tab most important features from each source.
4. (d) Step 4. Combine the selected features into the F_fused vector.
The feature selection method is described specifically in Algorithm 4.
Algorithm 4 FeatureSelectionFusion.
1: function FEATURESELECTIONFUSION(Fimg, Ftab, k_img, k_tab)
2: img_scores COMPUTEFEATUREIMPORTANCE(Fimg)
3: tab_scores COMPUTEFEATUREIMPORTANCE(Ftab)
4: sorted_img_indices SORTINDICESDESCENDING(img_scores)
5: sorted_tab_indices SORTINDICESDESCENDING(tab_scores)
6: selected_img_indices sorted_img_indices[1:]
7: selected_tab_indices sorted_tab_indices[1:]
8: [ ]
9: for all selected_img_indices do
10: Append(F_fused, F_img[i])
11: end for
12: for all selected_tab_indices do
13: Append(F_fused, F_tab[j])
14: end for
15: return F_fused
16: end function
Illustrative example 1. Suppose there is an input set consisting of
By applying the algorithm step by step, the result is obtained:
Step 1: Suppose the importance of the attributes can be calculated as follows:
Step 2: Sort the indices in descending order based on scores:
Step 3: Select the top-k indices:
Step 4: Initialize an empty fused feature list:
Add selected image features to the fused list:
Add selected tabular features to the fused list:
Output.
After being calculated, the output result shows the selection of 3 image features and 2 tabular data features with the highest level of influence.
Tensor Product Fusion: This method uses the Tensor Product to model interactions between all feature pairs from the two data sources. This allows capturing complex non-linear relationships that simpler methods cannot detect. To handle the large dimensionality issue, the method uses a low-rank approximation.
Algorithm:
1. (a) Step 1. Normalize the features.
2. (b) Step 2. Compute the Tensor Product between the two feature sets.
3. (c) Step 3. Use SVD decomposition to reduce the dimensionality of the resulting tensor.
4. (d) Step 4. Create the final feature representation from the projection matrices.
The Tensor Product method is described specifically in Algorithm 5.
Algorithm 5 Tensor product fusion.
1: function TENSOR_PRODUCT_FUSION(Fimg, Ftab, rank)
2: normalizedFimg NORMALIZE(Fimg)
3: normalizedFtab NORMALIZE(Ftab)
4: fullTensorProduct TENSORPRODUCT(normalizedFimg, normalizedFtab)
5: TRUNCATEDSVD(fullTensorProduct, rank)
6: sqrtS COMPUTESQRTDIAGONAL(S)
7: Pimg sqrtS
8: Ptab sqrtS
9: imgProjection normalizedFimg Pimg
10: tabProjection normalizedFtab Ptab
11: fusedFeatures Concatenate(imgProjection, tabProjection)
12: return fusedFeatures
13: end function
Illustrative example: It is supposed that there is:
- An image feature matrix (2 samples, each with 3 features)
- A text feature matrix (2 samples, each with 2 features)
Applying the Tensor algorithm (see example 2 in the Appendix Illustrative example 2 for details), the following result is obtained.
Result.
The final fused feature matrix has dimensions , where:
- The first two columns contain the projected image features.
- The last two columns contain the projected text features.
Hadamard Product Fusion: This method projects features from both sources into a common space of the same dimension, then applies the Hadamard product (element-wise multiplication) to capture direct interactions between corresponding components. This ensures that the integrated feature exploits correlations between aligned features.
Algorithm:
1. (a) Step 1. Project features from both sources into a common space of the same dimension.
2. (b) Step 2. Normalize the projected features.
3. (c) Step 3. Compute the Hadamard product (element-wise multiplication).
4. (d) Step 4. Apply a non-linear transformation and combine with the original projected features.
The Tensor Product method is described specifically in Algorithm 6.
Algorithm 6 Hadamard product fusion.
1: function HADAMARD_PRODUCT_FUSION(Fimg, Ftab, commonDim)
2: Wimg INITIALIZEMATRIX(p, commonDim)
3: Wtab INITIALIZEMATRIX(q, commonDim)
4: imgProjected Fimg Wimg
5: tabProjected Ftab Wtab
6: imgNormalized L2NORMALIZE(imgProjected)
7: tabNormalized L2NORMALIZE(tabProjected)
8: hadamardProduct ELEMENTWISEMULTIPLY(imgNormalized, tabNormalized)
9: activatedFeatures RELU(hadamardProduct)
10: fusedFeatures CONCATENATE(activatedFeatures, imgNormalized, tabNormalized)
11: fusedFeatures LINEARPROJECTION(fusedFeatures, commonDim)
12: return fusedFeatures
13: end function
Illustrative example: It is supposed that there is:
- Image feature matrix (2 samples, each with 3 features)
- Tabular feature matrix (2 samples, each with 2 features)
- Common dimension:
Matrices are:
Applying the Hadamard algorithm (see example 3 in the Appendix [Illustrative example 3]Illustrative example 3 for details), the following result is obtained.
Result. The final output of the Hadamard Product Fusion algorithm is:
This fused representation combines information from both image and tabular features, capturing the interactions between corresponding dimensions through the Hadamard product.
Filter-based multi-modal Feature Selection: Filter-based feature selection evaluates each feature independently using statistical measures, without involving the learning algorithm. multi-modal fusion selects the most relevant features from each modality (image and tabular) while considering both intra-modal redundancy and inter-modal correlation.
Algorithm:
1. (a) Step 1. Calculate feature importance scores for each feature in Fimg and Ftab using multiple criteria (mutual information and random forest importance).
2. (b) Step 2. Rank features by importance scores within each modality.
3. (c) Step 3. Select a larger initial set of candidate features from each modality.
4. (d) Step 4. Remove highly correlated features within each modality to reduce redundancy.
5. (e) Step 5. Analyze cross-modal correlation to ensure complementary information.
6. (f) Step 6. Combine the selected features from both modalities to form F_fused.
The Filter-based multi-modal Feature Selection method is described specifically in Algorithm 7.
Algorithm 7 Filter multi-modal selection.
1: function FILTER_MULTI-MODAL_SELECTION(Fimg, Ftab, target, k_img, k_tab)
2: imgScores COMPUTEFEATUREIMPORTANCE(Fimg, target)
3: tabScores COMPUTEFEATUREIMPORTANCE(Ftab, target)
4: sortedImgIndices SORTINDICESDESCENDING(imgScores)
5: sortedTabIndices SORTINDICESDESCENDING(tabScores)
6: candidateImgIndices sortedImgIndices[1:]
7: candidateTabIndices sortedTabIndices[1:]
8: finalImgIndices REMOVECORRELATEDFEATURES(Fimg, candidateImgIndices, corrThreshold)
9: finalTabIndices REMOVECORRELATEDFEATURES(Ftab, candidateTabIndices, corrThreshold)
10: finalImgIndices finalImgIndices[1:]
11: finalTabIndices finalTabIndices[1:]
12: fusedFeatures CONCATENATE(Fimg[:, finalImgIndices], Ftab[:, finalTabIndices])
13: return fusedFeatures, {finalImgIndices, finalTabIndices}
14: end function
Illustrative example:
Given:
Other parameters:
Applying the Filter algorithm (see example 4 in the Appendix [Illustrative example 4]Illustrative example 4 for details), the following result is obtained.
Final Result The final selected feature set is:
The algorithm has selected:
- 2 image features: img₃ and img₁
- 2 tabular features: tab₂ and tab₄
This is the optimal feature set based on importance criteria, with low intra-modal correlation and complementary information across modalities.
Wrapper-based multi-modal Feature Selection: Wrapper methods assess feature subsets by iteratively training and evaluating a specific model, selecting the subset of features that maximizes the model’s performance. For multi-modal fusion, it performs Sequential Forward Selection (SFS) to incrementally build an optimal feature set from both modalities, directly optimizing for the fusion task.
Algorithm:
1. (a) Step 1. Define evaluation model (e.g., Random Forest) and performance metric based on task.
2. (b) Step 2. Initialize empty feature sets for both modalities.
3. (c) Step 3. Pre-filter features using simpler filter method to reduce search space.
4. (d) Step 4. Ensure minimum representation from each modality.
5. (e) Step 5. Perform Sequential Forward Selection, evaluating all potential feature additions.
6. (f) Step 6. Continue until maximum features are selected or no improvement is seen.
7. (g) Step 7. Combine the selected features to form F_fused.
The Wrapper-based multi-modal Feature Selection method is described specifically in Algorithm 8.
Algorithm 8 Wrapper multi-modal selection.
1: function WRAPPERMULTI-MODALSELECTION(Fimg, Ftab, target, max_img, max_tab)
2: selected_img_indices [] ▷ Initialize empty image feature set
3: selected_tab_indices [] ▷ Initialize empty tabular feature set
4: best_score
5: for i = 1 to min_img_features do
6: selected_img_indices ADDBESTFEATURE(Fimg, selected_img_indices, target)
7: end for
8: for i = 1 to min_tab_features do
9: selected_tab_indices ADDBESTFEATURE(Ftab, selected_tab_indices, target)
10: end for
11: while len(selected_img_indices) < max_img or len(selected_tab_indices) < max_tab do
12: best_new_score
13: best_new_feature NULL
14: best_modality NULL
15: for each feature i in Fimg not in selected_img_indices do
16: temp_score EVALUATEFEATURESET(Concatenate(Fimg[:, selected_img_indices + [i]], Ftab[:, selected_tab_indices]), target)
17: if temp_score > best_new_score then
18: best_new_score temp_score
19: best_new_feature i
20: best_modality ’img’
21: end if
22: end for
23: for each feature j in Ftab not in selected_tab_indices do
24: temp_score EVALUATEFEATURESET(Concatenate(Fimg[:, selected_img_indices], Ftab[:, selected_tab_indices + [j]]), target)
25: if temp_score > best_new_score then
26: best_new_score temp_score
27: best_new_feature j
28: best_modality ’tab’
29: end if
30: end for
31: if best_new_score > best_score then
32: if best_modality = ’img’ then
33: selected_img_indices.append(best_new_feature)
34: else
35: selected_tab_indices.append(best_new_feature)
36: end if
37: best_score best_new_score
38: else
39: Break
40: end if
41: end while
42: Ffused CONCATENATE(Fimg[:, selected_img_indices], Ftab[:, selected_tab_indices])
43: return Ffused, {selected_img_indices, selected_tab_indices, best_score}
44: end function
Illustrative example:
Let me provide a step-by-step example illustrating the Wrapper-based multi-modal Feature Selection algorithm.
The following data is going to be used:
- F_img: Image feature matrix with features [img_1, img_2, img_3, img_4, img_5]
- F_tab: Tabular feature matrix with features [tab_1, tab_2, tab_3, tab_4]
- target: Target variable
- max_img = 3 (Maximum image features to select)
- max_tab = 2 (Maximum tabular features to select)
- min_img_features = 1 (Minimum image features required)
- min_tab_features = 1 (Minimum tabular features required)
Applying the Wrapper algorithm (see example 5 in the Appendix [Illustrative example 5]Illustrative example 5 for details), the following result is obtained.
Result. Final selected features:
- Image features: img_3, img_1, img_5
- Tabular features: tab_2, tab_4
F_fused is formed by concatenating these selected features.
The final feature set is [img_3, img_1, img_5, tab_2, tab_4] with a performance score of 0.85.
This example demonstrates how the wrapper-based approach systematically evaluates combinations of features from both modalities to find the optimal subset that maximizes the model’s performance on the specific task.

FKGS generation from FKG and FRB

Rule-based systems function by processing and interpreting information using pre-established rules or logical statements. Inferences can be drawn, and knowledge can be extracted from the provided data through the application of these rules. Fuzzy logic, a mathematical framework designed to manage uncertainty, is utilized by rule-based systems to model and reason with imprecise or uncertain data. Membership degrees are assigned to both antecedents and consequents by fuzzy rules, allowing for more versatile decision-making and acknowledging the intrinsic uncertainty in the fusion of medical data. Fuzzy rules facilitate more flexible and adaptive decision-making by attributing degrees of membership to both antecedents and consequents, thereby effectively representing the inherent uncertainty involved in the integration of medical data.

This section describes the process of constructing FKG from FKG and FKGS from FKG through specific examples. As defined by Lan et al. [23], fuzzy knowledge graph FKG is a tuple , where V is a set of input attributes, each attribute takes on a linguistic variable value; L is a set of output labels; Matrices A and B represent the relationships between the input attribute vertices and the output labels, while R denotes a set of fuzzy rule bases (FRB). For example, an FRB with 6 rules, as shown in Table 1, is given. Applying the algorithm to build a fuzzy knowledge graph by calculating the adjacency matrices A and B, A fuzzy knowledge graph can be built, as shown in Fig 5.

Download:

Fig 5. FKG of 6 fuzzy rules [65].

https://doi.org/10.1371/journal.pone.0339864.g005

Download:

Table 1. An illustration of 6 fuzzy rules [23].

https://doi.org/10.1371/journal.pone.0339864.t001

FKG is still limited when applied to problems with large data due to the computation time. Tan et al. (2025) [65] proposed a sampling algorithm designed to identify the structure of the fuzzy knowledge graph (FKGS). This is an abstract version of FKG to reduce the computation time but still ensure reliability. FKGS has all the components as in FKG but on a smaller scale. The following example distinguishs and clarifies the process of finding FKGS from FKG. Assume that a collection of 14 fuzzy rules is presented in Table 2, each consisting of 4 attributes and 2 output labels [65]. Sampling algorithm [65] when sample rate 30% and error threshold 0.3 are applied on fuzzy knowledge graph with fuzzy rule base as in Table 2, The obtained FKGS consists of 5 rules. as in Table 3.

Download:

Table 2. FRB consists of 14 fuzzy rules [65].

https://doi.org/10.1371/journal.pone.0339864.t002

Download:

Table 3. FKGS consists of 5 fuzzy rules [65].

https://doi.org/10.1371/journal.pone.0339864.t003

Advantages and disadvantages of the FKG-MM model

The main characteristic of the proposed framework (FKG-MM) is the ability to integrate data from many different sources and models. Specifically, the FKG model of Lan et al. [23], the FKG-Pairs model of Long et al. [24], FKGS model of Tan et al. [65] calculate on unimodal tabular data sets, while the FKG-MM framework can integrate additional image data components with tabular data to improve the reliability of diagnosis.

The FKG-Integration model has several advantages as follows: (1) The FKG-MM framework shows its suitability for integrating multi-component data in the medical field; (2) The multi-modal data integration module is specifically designed to integrate tabular medical data with image data, which are two of the most common data types in medical diagnosis; (3) The FKG sampling algorithm shows their suitability for the large, and multi-component characteristics of medical data in assisting decision making in disease diagnosis.

Experimental results

In this section, the experimental results are shown to confirm the effectiveness of the suggested approach. The experiments were carried out to assess how effectively the model performed when combining multi-modal data, which included both images and symptom data, in contrast to using each type of data individually.

Experimental environments

The experiments are conducted on a system with configurations of HP Victus 16-e0175AX equipped with an AMD Ryzen processor (Family 25 Model 80, 3.3 GHz) and 8GB of RAM for setting up Python 3.11.5. Currently, vast quantities of multi-modal medical data are generated daily from a variety of medical devices and healthcare events. These medical data include structured data, semi-structured data, and unstructured data [68]. The experimental data are taken from the publicly available Brazilian ophthalmological dataset (BRSET) [69,70]. All data records are publicly available at the PhysioNet database. The images were directly obtained from the Nikon NF505 and Canon CR-2 devices in JPEG format without the application of any preprocessing techniques. All images were acquired with a focus on the macula and annotated by a retinal specialist ophthalmologist, following labeling criteria defined by the research team. All images were annotated by a retinal specialist ophthalmologist, following labeling criteria defined by the research team. The retinal labeling process was accompanied by metadata including the retinal imaging device used, patients’ nationality, age (in years), sex, clinical history, insulin usage, and duration of diabetes. Demographic and clinical details were derived from electronic medical records according to self-reported health information. The BRSET dataset includes 16,266 fundus images from 8,524 patients, with each image accompanied by demographic and clinical metadata. The aim is to validate the capability of the proposed multi-modal FKG-MM approach for handling large, variable and multi-modal datasets.

Exploratory data analysis and preprocessing

Before training the model, data analysis and preprocessing play a particularly important role. The BRSET multi-modal dataset consists of images and tabular data. Correlation analysis of tabular data features is shown in the correlation matrix in Fig 6. With image data, after extracting GLCM features, features are evaluated for their correlation using the Heatmap chart in Fig 7. In addition, to evaluate the impact of both tabular and image-based data features on the output labels for diabetic retinopathy diagnosis, Feature importance was analyzed using the Random Forest algorithm. The analysis results, illustrated in Fig 8, indicate that the features macula, duration of diabetes (in years), and diabetes are the three most influential attributes. Notably, all three are derived from tabular data, underscoring the significant role of tabular information in determining the diagnostic outcomes for diabetic retinopathy.

Download:

Fig 6. Heatmap tabular feature.

https://doi.org/10.1371/journal.pone.0339864.g006

Download:

Fig 7. Heatmap image feature.

https://doi.org/10.1371/journal.pone.0339864.g007

Download:

Fig 8. Feature importance chart.

https://doi.org/10.1371/journal.pone.0339864.g008

Experimental scenarios and evaluation metrics

In this study, for later convenience, the FKG model for unimodal data in [65] is called FKG-UM. To evaluate the effectiveness of the proposed method, experimental scenarios are conducted. During the experiment, the datasets were divided into a 70% training set and a 30% testing set.

The following scenarios are used to validate the performance of the proposed model:

(1) In the first scenario, fundus images are combined with medical records to obtain the FKG-MM. The evaluation criteria applied in this context include the accuracy and computational time. The aim of this scenario is to verify the hypothesis: “FKG-MM is more accurate than FKG-UM but it is much more time-consuming overall”.

(2) The second scenario is conducted to compare different fusion methods to find the best method among five feature fusions, including Feature Selection, Tensor Product Method, Hadamard Product Method, Filter Method, and Wrapper Method. The comparison criteria are the same as in scenario 1. In addition, ANOVA analysis was performed to assess the fluctuation in the experimental outcomes of Scenario 2 in light of the accuracy.

Considering the nature of the task, evaluation metrics such as precision, accuracy, and time-consuming for classification tasks were utilized to assess the effectiveness of the introduced model and mitigate overfitting. Detailed information about the evaluation metrics used is provided in Table 4, where:

True Positive (TP) refers to a positive instance that is correctly classified as positive;
True Negative (TN) denotes a negative instance that is accurately identified as negative;
False Positive (FP) represents a negative instance that is incorrectly classified as positive;
Negative (FN) corresponds to a positive instance that is incorrectly classified as negative.

Download:

Table 4. Performance evaluation metrics.

https://doi.org/10.1371/journal.pone.0339864.t004

The following sections will present the experimental results and evaluate the effectiveness of the proposed framework.

The first scenario: Performance comparison of FKG-MM with FKG-UM

Experimental results.

Tables 5, 6, 7, and 8 are the results with different cases of the FKGS model when changing the sample rate and error threshold with the aim of evaluating the performance of the FKG-MM compared to the FKG-UM. With all tables above, in terms of accuracy, the FKG-MM model has significantly higher accuracy, achieving approximately 84-85% across all configurations, while the FKG-UM model has lower accuracy, ranging from 70-72%. Indeed, the improvement in accuracy when using both fundus images and medical records (FKG-MM) is approximately 12-14%. On the other hand, in terms of computational time, FKG-UM is a unimodal that always takes less time than that FKG-MM when integrating new data modality.

Download:

Table 5. Feature selection method with sampling ratio of 15% and error threshold of 0.2.

https://doi.org/10.1371/journal.pone.0339864.t005

Download:

Table 6. Feature selection method with sampling ratio of 15% and error threshold of 0.3.

https://doi.org/10.1371/journal.pone.0339864.t006

Download:

Table 7. Feature selection method with sampling ratio 20% and error threshold 0.2.

https://doi.org/10.1371/journal.pone.0339864.t007

Download:

Table 8. Feature selection method with sampling ratio 20% and error threshold 0.3.

https://doi.org/10.1371/journal.pone.0339864.t008

The correlation of the accuracy of FKG-UM with FKG-MM is shown specifically and visually in Fig 9

Download:

Fig 9. Comparison of accuracy.

https://doi.org/10.1371/journal.pone.0339864.g009

ANOVA analysis.

To conduct ANOVA, the null hypothesis H₀ is set up as follows:

“H₀: The accuracy of FKG-MM is significantly higher than that of FKG-UM, but is more time-consuming"

As shown in the accuracy results in Table 9, it is observed that

Download:

Table 9. ANOVA results for accuracy.

https://doi.org/10.1371/journal.pone.0339864.t009

Reject the null hypothesis H₀: There is a significant difference in accuracy between FKG-MM and FKG-UM. FKG-MM has significantly higher accuracy than FKG-UM.

Main findings.

From the above results, some observations are given as follows:

In environments requiring real-time processing or with limited computational resources, FKG-UM may be the appropriate choice.
In applications demanding high accuracy where longer processing times are acceptable (such as offline analysis), FKG-MM would be a better option.
If investment in more powerful hardware is possible, using FKG-MM will provide significant benefits in terms of accuracy.