Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Integrating image and gene-data with a semi-supervised attention model for prediction of KRAS gene mutation status in non-small cell lung cancer

  • Yuting Xue,

    Roles Conceptualization, Formal analysis, Methodology, Software, Writing – original draft

    Affiliation College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China

  • Dongxu Zhang,

    Roles Conceptualization, Data curation, Software, Writing – review & editing

    Affiliation College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China

  • Liye Jia ,

    Contributed equally to this work with: Liye Jia, Wanting Yang

    Roles Conceptualization, Formal analysis, Investigation, Validation

    Affiliation College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China

  • Wanting Yang ,

    Contributed equally to this work with: Liye Jia, Wanting Yang

    Roles Formal analysis, Investigation, Validation, Writing – review & editing

    Affiliation College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China

  • Juanjuan Zhao ,

    Roles Funding acquisition, Methodology, Writing – review & editing

    zhaojuanjuan@tyut.edu.cn

    Affiliations School of Software, Taiyuan University of Technology, Taiyuan, Shanxi, China, College of Information, Jinzhong College of Information, Taiyuan, Shanxi, China

  • Yan Qiang,

    Roles Funding acquisition, Project administration, Resources

    Affiliation College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China

  • Long Wang,

    Roles Methodology, Supervision, Validation

    Affiliation College of Information, Jinzhong College of Information, Taiyuan, Shanxi, China

  • Ying Qiao,

    Roles Conceptualization, Data curation, Writing – review & editing

    Affiliation First Hospital of Shanxi Medical University, Taiyuan, Shanxi, China

  • Huajie Yue

    Roles Conceptualization, Data curation, Formal analysis, Writing – review & editing

    Affiliation First Hospital of Shanxi Medical University, Taiyuan, Shanxi, China

Abstract

KRAS is a pathogenic gene frequently implicated in non-small cell lung cancer (NSCLC). However, biopsy as a diagnostic method has practical limitations. Therefore, it is important to accurately determine the mutation status of the KRAS gene non-invasively by combining NSCLC CT images and genetic data for early diagnosis and subsequent targeted therapy of patients. This paper proposes a Semi-supervised Multimodal Multiscale Attention Model (S2MMAM). S2MMAM comprises a Supervised Multilevel Fusion Segmentation Network (SMF-SN) and a Semi-supervised Multimodal Fusion Classification Network (S2MF-CN). S2MMAM facilitates the execution of the classification task by transferring the useful information captured in SMF-SN to the S2MF-CN to improve the model prediction accuracy. In SMF-SN, we propose a Triple Attention-guided Feature Aggregation module for obtaining segmentation features that incorporate high-level semantic abstract features and low-level semantic detail features. Segmentation features provide pre-guidance and key information expansion for S2MF-CN. S2MF-CN shares the encoder and decoder parameters of SMF-SN, which enables S2MF-CN to obtain rich classification features. S2MF-CN uses the proposed Intra and Inter Mutual Guidance Attention Fusion (I2MGAF) module to first guide segmentation and classification feature fusion to extract hidden multi-scale contextual information. I2MGAF then guides the multidimensional fusion of genetic data and CT image data to compensate for the lack of information in single modality data. S2MMAM achieved 83.27% AUC and 81.67% accuracy in predicting KRAS gene mutation status in NSCLC. This method uses medical image CT and genetic data to effectively improve the accuracy of predicting KRAS gene mutation status in NSCLC.

Introduction

Lung cancer is specifically divided into non-small cell lung cancer (NSCLC) and small cell lung cancer. NSCLC accounts for approximately 85% of newly diagnosed lung cancers yearly [1]. The emergence of targeted therapy has substantially increased the survival rate of NSCLC patients. Prior to targeted therapy, it should be determined whether important disease-causing genes are mutated. KRAS is a common causative gene in NSCLC, and approximately one-third of patients with NSCLC have KRAS mutations. The usual diagnostic tool is a puncture biopsy. However, this invasive method has many limitations, such as it is unsuitable for all body types and has unpredictable consequences such as increased risk of cancer metastasis [2]. Therefore, there is an urgent need for a non-invasive diagnostic method that can accurately predict KRAS mutations in lung cancer patients. This method will not only improve the treatment outcome of patients but also guide prognosis.

In recent years, researchers have used CT images to predict gene mutations based on traditional radiomics and machine learning. Song et al. [3] propose a machine-learning model for predicting EGFR and KRAS mutation status. They used the model to extract statistical, shape, pathological, and deep learning features from 144 CT scans of tumor regions. Shiri et al. [4] used minimum redundancy, maximum correlation feature selection, and random forest classifier to build a multivariate model. The model analyzed radiological features extracted from images of tumors and successfully predicted EGFR and KRAS mutation status in cancer patients.

The radiomics and machine learning methods mentioned above have successfully predicted gene mutations. However, most of these methods rely on hand-crafted features. In recent years, deep learning based on convolutional neural networks has attracted much attention in the field of medical image computing. This data-driven approach can automatically extract complex image features [57]. In addition, imaging genomics is more expected to develop in the field of deep learning than single modality data for analytical studies. It integrates disease imaging data and genomic data. Imaging genomics is a high-throughput research method correlating imaging features with genomic data. In recent imaging genomics studies, researchers have proposed a series of deep learning algorithms and theoretical models based on image or genetic data. Dong et al. [8] proposed a multichannel and multitasking deep learning (MMDL) model. They used the fusion of radiological features of CT images and clinical information of patients to improve the accuracy of the model to predict KRAS gene mutations. Hou et al. [9] proposed a multimodal information fusion module based on attention that successfully predicted lymph node metastasis using deep learning features of CT images fused with genetic data. Therefore, machine learning and deep learning-based imaging genomics approaches have great potential and application in predicting KRAS gene mutation status in NSCLC.

Although the above model achieved considerable performance, there are still some challenges in the study of deep learning methods based on image and genetic data for predicting KRAS mutation status in NSCLC: 1) Majority of deep learning methods [8, 9] that study classification tasks focus only on classification methods. However, these studies did not use the segmentation features generated by the segmentation task to facilitate the classification task to improve the performance and effectiveness of the classification task. Lesion segmentation and classification are two highly related tasks. The segmentation can help remove distractions from CT images and thus is highly beneficial for improving the accuracy of lesion classification. 2) Most of the studied fusion methods used simple fusion means of direct concatenation. However, they ignore the correlation and difference between medical images and genetic data. It not only leads to ineffective mining of useful semantic features between multi-scale image features and gene features but also fails to make full use of the complementarity of multimodal information. 3) Many studies used models that overemphasized the deep features of lesion abstraction. Nonetheless, they did not pay sufficient attention to the importance of detailed shallow features in prediction results. This leads to limitations in improving accuracy.

To overcome these difficulties and achieve non-invasive and accurate prediction of KRAS gene mutations in NSCLC. We propose a Semi-supervised Multimodal Multiscale Attention Model (S2MMAM) for predicting KRAS gene mutation status in NSCLC. The model uses the Mean Teacher [10] framework as the main structure of the network. Mean Teacher can make full use of labeled images to achieve analytical prediction of unlabeled images in order to diminish the dependence of the network on manual annotation. In order to compensate for the information loss of single-modal unlabeled image data to the network, the model not only uses Semi-supervised Multimodal Fusion Classification Networks (S2MF-CN) to share the parameter strategy of the Supervised Multilevel Fusion Segmentation Network (SMF-SN) to enrich the key information of the lesion. S2MMAM also multimodally fuses the patient’s genetic data with the image data to expand the mutation knowledge. Specifically, SMF-SN designs a new Triple Attention-guided Feature Aggregation (TAFA) module. It aims to adaptively fuse high-level semantic features with low-level semantic features using an attention-guided mechanism. TAFA can ignore background noise and localize the extraction of lesion key features. In S2MF-CN, we propose an Intra and Inter Mutual Guidance Attention Fusion (I2MGAF) module to guide the fusion between inter-information and between intra-information in a staged manner. I2MGAF can effectively extract complementary information from different modalities at different scales to facilitate classification efficiency improvement.

In contrast to conventional radiomics and machine learning [3, 4], we used a convolutional neural network technique for CT image feature extraction as compared to previous studies for KRAS mutation prediction. This technique is more efficient and reduces the cost of manual annotation. Moreover, it can realize the prospect of end-to-end applications. Studies [59] that have made predictions for other diseases in multimodal-based classification tasks have used simple multimodal fusion methods. In contrast, our proposed method focuses more on extracting different dimensions of information from different modal data to achieve complementary fusion.

The contributions of this paper are as follows:

  • A Semi-supervised Multimodal Multiscale Attention Model (S2MMAM) based on imaging genomics is proposed, which effectively solves the problem of difficult intermediate fusion of multimodal heterogeneous data. S2MMAM exploits the facilitation of supervised segmentation features for semi-supervised classification tasks to improve the model performance for predicting KRAS gene mutation status in NSCLC.
  • A new Triple Attention-guided Feature Aggregation (TAFA) module is designed. It is based on the attention module to adaptively fuse high-level semantic features with low-level semantic features. TAFA can suppress low-level background noise and retain detailed local semantic information.
  • We use the Intra and Inter Mutual Guidance Attention Fusion (I2MGAF) module to guide segmentation and classification feature fusion, as well as CT image and genetic data fusion, respectively. It can achieve multi-scale multimodal information fusion and improve classification performance.

Related work

Mean Teacher in semi-supervised learning

Semi-supervised learning has been studied in the medical imaging community for a long time [11, 12]. It can reduce the human workload on labeled data. Current research has shown the potential to improve network performance when labels are scarce. There are three semi-supervised models based on the principle of consistency: the Π-Model [13], Temporal Ensembling (TE) [13], and the Mean Teacher model. In order to show the advantages and disadvantages of three consistency-based semi-supervised methods more succinctly, we summarize Table 1, which allows a more precise comparison of the three approaches.

thumbnail
Table 1. Comparison of three commonly used consistency-based semi-supervised methods.

https://doi.org/10.1371/journal.pone.0297331.t001

In recent years, Mean Teacher has achieved good results as a basic framework in semi-supervised classification tasks. Wang et al. [14] successfully identified diabetic macular edema based on the Mean Teacher model using a small amount of roughly labeled data and a large amount of unlabeled data. Liu et al. [15] used the Mean Teacher-based framework of the network model to successfully achieve skin lesion diagnosis with ISIC 2018 challenge and thorax disease classification with ChestX-ray14. Wang et al. [16] proposed a model that unifies diverse knowledge into a generic knowledge distillation framework for skin disease classification. It enables the student model to acquire richer knowledge from the faculty model. The above model demonstrates that Mean Teacher achieves excellent results in semi-supervised classification tasks, so we use it as the basic framework for Our S2MMAM.

Segmentation facilitates classification

Using segmentation tasks to facilitate classification network tasks is a basic form of multitask learning [17]. In multitask learning, the segmentation task associated with the classification task can assist the learning of the target by the classification task, thus improving the performance of the classification task [18]. Similarly, in a single-task classification model, this idea is borrowed from above. The information captured by the segmentation branch of the model can be transferred to the classification model to expand the foci information. The supervised segmentation task is trained using masked labeled data. The aim is to obtain the most comprehensive high-level semantic features of the target region and reduce the learning of noisy backgrounds. Rich segmentation features can support the classification task to learn more and richer semantic information. Thus, a supervised segmentation network can assist the classification task by suppressing the background noise introduced by missing physician labeling information in semi-supervised classification networks and improving the classification accuracy.

According to Table 2, the above works demonstrate that segmentation has a facilitating effect on classification. However, there is a common problem: they are all studied for supervised models. Supervised models have high requirements for data labeling costs. We believe that the combination of segmentation and classification tasks can make the network more informative. Therefore, our research aims to combine the idea of segmentation facilitating classification with semi-supervised models. We combined two related tasks of NSCLC lesion segmentation and KRAS gene mutation status prediction. S2MMAM allows S2MF-CN to obtain the key features of lesions upon initialization through the strategy of sharing network parameters between SMF-SN and S2MF-CN. In S2MF-CN, the segmentation features are guided to merge with the classification features to obtain the extracted key features. This strategy can enrich the lesion information and improve the network model classification performance.

thumbnail
Table 2. Comparison of three commonly used consistency-based semi-supervised methods.

https://doi.org/10.1371/journal.pone.0297331.t002

Multiscale features and attention learning

Traditional convolution operations mostly focus on extracting local features. However, due to the limited information contained in local features, the model cannot learn the full range of region of interest contents well. Multi-scale features contain local features of multiple regions of interest. The extracted local features are fused with other operations to obtain comprehensive information about the target, which helps the network model to learn. To extract multi-scale features, The Atrous Spatial Pyramid Pooling (ASPP) module [21] captures contextual information by multi-step convolution of the target region using different expansion rates. In the medical image domain, the PSE [22] module uses a patch-level pyramid design to extend SE operations to multiple scales, allowing the network to adaptively focus on vessels of variable width. The scale-aware Feature Aggregation (SFA) module [23] effectively extracts hidden multi-scale background information and aggregates multi-scale features to improve the model’s ability to handle complex vasculature.

The Convolutional Block Attention Module (CBAM) [24] introduces channel and spatial attention. It extracts multiple key feature information from both dimensions to enrich the network content. In the medical image application domain, Context-assisted full Attention Network (CAN) [25] combines Non-Local Attention (NLA), Channel Attention (CA), and Dual-pathway Spatial Attention (DSA) to extract lesion information in multiple directions.

Currently, it is widely believed that both multi-scale features and attention mechanisms can help models enhance the recognition of feature maps from different dimensions. However, the above papers have a common problem: they do not combine the ideas of multi-scale and attention mechanism. Therefore, we combine these two techniques and design the TAFA module. On the one hand, fuse high and low dimensional segmentation features to obtain abstract and detailed information. On the other hand, we fuse segmentation and classification features of different levels to guide the features to learn key factors adaptively and enhance the ability of the network to capture lesions. Thus, the predictive capability of the model is improved.

Method

Overview

In this paper, we propose a Semi-supervised Multimodal Multiscale Attention Model (S2MMAM). The overall architecture of the model is divided into two parts: Supervised Multilevel Fusion Segmentation Network (SMF-SN) and Semi-supervised Multimodal Fusion Classification Network (S2MF-CN), as shown in Fig 1. In this model, the useful information of CT images is captured by SMF-SN and transferred to S2MF-CN to facilitate the execution of image prediction tasks. The S2MMAM utilizes the fusion of CT images and genetic data to accurately predict whether KRAS is mutated in NSCLC.

thumbnail
Fig 1.

Overview of our S2MMAM, including: (a) Supervised Multilevel Fusion Segmentation Network (SMF-SN). The inputs are CT images and pixel-level mask images, and the outputs are segmented lesion images, (b) Semi-supervised Multimodal Fusion Classification Network (S2MF-CN), and (c) processing of gene data. In the S2MMAM, the useful information of CT images is captured by SMF-SN and transferred to S2MF-CN to facilitate the execution of image prediction tasks. The S2MMAM utilizes the fusion of CT images and genetic data to accurately predict whether KRAS is mutated in NSCLC.

https://doi.org/10.1371/journal.pone.0297331.g001

In the NSCLC dataset, each patient corresponds to a set of CT images and gene data (Section Dataset). Specifically, in our problem setting, we are given a training set containing N labeled data and M unlabeled data where N<<M. Let the labeled training dataset be denoted by and , where SL represents dataset for segmentation, CL represents dataset for classification, represents i-th labeled CT image, represents the pixel-level annotation of and represents the results of whether the KRAS gene is mutated. where 0 means negative and 1 means positive. Let the unlabeled training dataset be denoted by , where represents i-th unlabeled image. The entire model pipline can be summarized as follows: First, we pre-train SMF-SN, which is initialized on SL, to train the network’s ability to capture focal regions. It can eliminate problems such as large noise from CT and promote the ability of classification—meanwhile, the network body of S2MF-CN shares encoder and decoder parameters with SMF-SN. Therefore, the encoder and decoder of S2MF-CN are also initialized in this step, and practical segmentation features for different levels of lesions are obtained. The classification network in S2MF-CN can capture the key classification features of lesions using these segmentation features. Finally, after S2MF-CN fuses segmentation, classification, and genetic data features, the semi-supervised Student Model is trained to determine patients’ KRAS gene mutation status accurately.

Supervised multilevel fusion segmentation network

The architecture of SMF-SN.

This section introduces a supervised segmentation network based on multidimensional feature fusion. SMF-SN can precisely localize lesion edges and internal regions and greatly reduce the impact of image background noise on network performance. SMF-SN mainly utilizes our proposed SE-ResNeXt and TAFA modules.

We use the enhanced segmentation training dataset SL to train SMF-SN to obtain rich segmentation features. The obtained segmentation features can provide the semi-supervised classification network with a priori information about the lesion location. This improves the classification network’s ability to localize and identify lesions.

As shown in Fig 2, SMF-SN includes a stem block, three encoder blocks, three TAFA blocks, a bridge block, three decoder blocks, and an output block.

thumbnail
Fig 2. Block diagram of the proposed SMF-SN architecture.

We adjust the dilation rates in ASPP in the bridge from 6,12,18 to 3,6,9 to better adapt SMF-SN to our segmentation task.

https://doi.org/10.1371/journal.pone.0297331.g002

In the encoder, each encoder is composed of a SE-ResNeXt and a max-pooling layer with step size 2. As shown in Fig 3, SE-ResNeXt is improved from ResNeXt with SENet. ResNext achieves aggregating a set of transitions with the same topology by repeating multiple blocks. SENet can perform feature learning on the aggregated features in the channel dimension to form the importance of each channel. SE-ResNeXt can enhance the network in both the channel and spatial dimensions to capture richer segmentation features. Applying the MaxPooling layer can reduce the spatial dimension of the feature map by half to reduce the computational cost. The output of the encoder is passed through a bridge consisting of SE-ResNeXt and Atrous Spatial Pyramid Pooling (ASPP). It provides the largest receptive domain for TAFA to include a wider range of contextual information, facilitating more efficient integration between multiple levels. Between high-level and low-level semantics, we use the proposed TAFA module. This module utilizes multi-scale and attention fusion mechanisms. The module both suppresses low-level irrelevant background noise and complements each other with contextual difference information, preserving more detailed local semantic information and better learning of focal information. TAFA module is depicted in detail in Section Triple Attention-guided Feature Aggregation.

thumbnail
Fig 3. The architecture of SE-ResNext.

SE-ResNeXt is improved from ResNeXt with SENet.

https://doi.org/10.1371/journal.pone.0297331.g003

Triple attention-guided feature aggregation

Since CT images of lung nodules may contain a large amount of noise, for example, there are problems of grayscale overlap between lung tissues, blurred boundaries, and challenging to distinguish. High-level features of the decoder and low-level features of the encoder are crucial for capturing lesion features. However, most of the existing UNet-based connection methods directly connect shallow and deep semantic features of different scales. This behavior ignores that high-level features contain rich semantic information that can help low-level features identify semantically important locations. Likewise, low-level features contain rich spatial information that can help high-level features reconstruct accurate details.

Considering the above factors, we design a Triple attention-guided feature aggregation (TAFA) module to guide the fusion between high and low-dimensional features. TAFA can guide different layers to extract key feature information individually and then fuse after retaining the domain invariant key information, as shown in Fig 4. In the TAFA module, we first upsample the high-dimensional feature to have the same size as the low-dimensional feature . After that, we perform the high and low-dimensional feature concatenating based on channels to obtain .

(1)

Where Concat represents the concatenation operation, fup represents up-sampling operations. Then, to better mine the most useful feature channels between different levels. We introduce a scale channel attention-aware mechanism to automatically select the appropriate receptive domain for the feature map and suppress the interference of irrelevant background noise. We feed the concatenate feature of high and low dimensional features into global average pooling (GAP) and global max pooling (GMP) respectively. TAFA uses the GAP module to excite the feature channel information and the GMP layer to retain the semantic maximum information. Afterward, the corresponding feature maps and are obtained using a multi-layer perceptron (MLP) sharing the same parameters. The feature maps and are summed. Then the sum feature passes through a sigmoid function to generate a global bootstrap feature coefficient Wglobal.

(2)

Where fσ represents sigmoid activation, fmlp represents the MLP operator, fgap represents the global average pooling, fgmp represents the global max pooling. In addition, using the high and low level semantic binding information and as guidance, they are combined with high and low dimensional features, respectively, and the high level guidance semantic features and low level guidance semantic features are obtained after the attention operation, respectively.

(3)(4)

Finally, the weighted features are concatenated. The concatenated feature maps are multiplied with Wglobal. Then domain-invariant information is captured while reducing the dimensionality through 1x1 convolutional layers to obtain the final fusion module Fi. (5) Where Conv represents 1×1 convolution operation, ⊕ represents element-wise sum and ⊗ represents element-wise multiplication.

Our proposed TAFA transfers features from shallower convolutional layers to deeper convolutional layers. Performing the shallow features in the deeper convolutional layers prevents the shallow features from being forgotten. It makes the obtained features have more vital characterization ability. By gradually guiding the fusion between high and low features, SMF-SN can be guided to adaptively combine high and low-dimensional semantic information to reassign feature weights and better capture critical domain invariant information. Thus, lung nodules can be separated from the noise.

Semi-supervised multimodal fusion classification networks

The architecture of S2MF-CN.

The proposed S2MF-CN structure is shown in Fig 1(B), which adopts the Mean Teacher model structure as the main framework of the classification network. In Mean Teacher, the Teacher network has the same structure as the Student network. The Student model is the target model to be trained. It assigns the exponential moving average (EMA) of its weights to the Teacher model at each step of training. The predictions of the Teacher model will be considered as additional supervision of the learning of the Student model. Our model uses the final Student model to make predictions. The specific training Student model is shown in Fig 5(A) and consists of three parts: encoder, decoder, and Intra and Inter Mutual Guidance Attention Fusion (I2MGAF) Module. The encoder and decoder have the same structure and parameters as the SMF-SN. This allows focusing on the lesion region and capturing the necessary segmentation features through the encoder and decoder. I2MGAF performs feature purification using mutual guidance attention modules. It is able to extract multi-scale lung CT image features and genetic features fully. It also performs an adaptive fusion of features through an attentional fusion mechanism for KRAS gene mutation prediction in NSCLC. I2MGAF is described in detail in Section Intra and Inter Mutual Guidance Attention Fusion Module.

thumbnail
Fig 5.

The overview of the Student Module, including (a) the specific implementation details of the Student Model, (b) Intra fusion component (IntraFC) aims to fuse classification and segmentation features at different levels, and (c) Inter fusion component (InterFC) aims to fuse CT image features and genetic features.

https://doi.org/10.1371/journal.pone.0297331.g005

Intra and inter mutual guidance attention fusion module.

In the S2MF-CN network, we propose an I2MGAF module. I2MGAF fully fuses multi-scale image segmentation, classification features, and genetic features by using the IntraFC component and InterFC component with a dual attention fusion mechanism. Its aim is to improve the classification capability of the classification network.

  1. Intra Fusion Component (IntraFC)
    We propose the IntraFC based on the MultiRes Block, which can capture multi-scale information [26]. We adopted a strategy of fusing classification features with segmentation features at each level. The information favoring the prediction of KRAS gene mutation status is jointly retained.
    The specific structure of the IntraFC component is shown in Fig 5(B). The final level segmentation features are subjected to convolutional operations to obtain the initial classification features FC. Due to the problem of induction bias inherent in the convolution mechanism, it is easy to lose the key features of the lesion after multiple convolutions. Therefore, it is necessary for us to fuse the previous segmentation features with the existing classification features to compensate for the bias problem due to the deep network. First we reshape the segmented feature through dimensionality until C×W×H is the same size as the classified feature FC. Then, after the segmentation features and classification features are each applied 3×3 convolution. We will introduce the convolutional features from the previous stage and the initial fusion feature before the subsequent convolution. This can effectively model the correlation between segmentation and classification features. It ensures that the features from the shallow convolutional layer of segmentation and classification are better transferred to the deeper layers. The final fused result is obtained after several feature fusions.
  2. Inter Fusion Component (InterFC)
    We propose the InterFC to find the bidirectional mapping relationship between lung cancer image features and causative genes from the sagittal view (x-axis), coronal view (y-axis), and axial view (z-axis), respectively. InterFC can adaptively enhance the necessary information in different modal features, allowing a more adequate fusion of multimodal features.
    The specific structure of the InterFC component is shown in Fig 5(C). The initial classification feature FC, the fusion result output by IntraFC, and the processed genetic data G are firstly subjected to a splicing operation to obtain the multimodal fusion feature MC. After that, MC is delivered to InterFC to further model the importance of each modal data.
    (6)
    Where Concat denotes the concatenation operation. Then the concatenated multimodal data features are fed to three convolutional layers with BN and ReLU. The size of the convolution kernel is 1×3×1, 3×1×1 and 1×1×3 respectively, to produce three feature maps QueryRC×H×W, KeyRC×H×W and ValueRC×H×W (where C,H,W indicate the channel, height, width of the input features F respectively). We first transpose the Query feature. Then, we perform a softmax layer on the matrix multiplication of QueryT and Key to encode the feature relationships in sagittal and coronal views. Finally, matrix multiplication is multiplied with Value to obtain the voxel-level attention enhanced fusion features FInter, which are then reshaped to be in RC×H×W. (7) Where ⊕ denotes element-wise sum, ⊗ denotes element-wise multiplication.

Data

Dataset

In this study, we applied NSCLC-Radiogenomics [27], directly accessible on the Cancer Imaging Archive (TCIA) website. NSCLC-Radiogenomics is part of a public dataset. The patients involved in the dataset have been ethically approved. Users can download the relevant data for research and publication free of charge. Our study is based on open-source data and is therefore free from ethical issues and other conflicts of interest. NSCLC-Radiogenomics has developed a unique radiogenomic dataset from the NSCLC dataset of 211 subjects. The imaging data include mainly CT, semantic annotation of tumors observed on CT images using controlled vocabulary, and segmentation maps of tumor lesions (lung nodules) on CT scans; the genetic data include mainly RNA sequencing (RNA-seq) data. In the training and testing datasets, patients would be excluded for 1) lack of RNA-seq data, 2) lack of CT images, and 3) lack of physician-annotated segmentation maps of CT lesions. After screening, the number of cases with complete images and genetic data was 124. Of the 124 patients, 94 were of the wildtype, and 30 were of the mutation type. The clinical information of these patients is shown in Table 3. All data were randomly divided into training and test datasets in a 4:1 ratio.

thumbnail
Table 3. Patients’ medical record information in the dataset.

https://doi.org/10.1371/journal.pone.0297331.t003

Data preprocessing

CT image.

In our experiments, for 124 sets of CT images inspired by Cubuk et al. [28], we use the simple procedure of AutoAugment to automatically search for improved data enhancement strategies. By designing a search space in which a strategy consists of many sub-strategies, one sub-strategy is randomly selected for each image in each small batch. The sub-strategies contain two operations, each of which is an image processing function, such as clipping or applying the probability and magnitude of that function. Thus, we obtained 6696 images with a fixed size of 512×512.

Genes selection.

The gene expression data used in this study is RNA-seq data. Since the vast gene dataset contains more than 20,000 gene expression data per patient, the huge amount of gene expression data can significantly increase the computational cost and decrease the prediction accuracy. Therefore, before training the model, we screened the gene expression data from RNA-seq sequencing by the feature selection algorithm [29] to retain the most relevant genes with KRAS mutations. A total of 115 relevant genes were finally screened. The obtained correlated genes were fed into MLP to obtain effective gene features, which achieved mapping high-dimensional gene data to low-dimensional space.

Experiments and results

Implementation details

Our model S2MMAM is divided into SMF-SN and S2MF-CN. The labeled image data applied to SMF-SN is 30% of the total dataset, about 2100 images. The training dataset applied to S2MF-CN consists of 30% labeled data and 70% unlabeled data. Our experiments are mainly done on 2 NVIDIA RTX A5000 GPUs and 64 GB of memory. All models in the experiments are trained using 10-fold cross-validation. The specific initialization network configurations are shown in Table 4.

thumbnail
Table 4. The initialization network configurations of model.

https://doi.org/10.1371/journal.pone.0297331.t004

Evaluation metrics

To quantitatively analyze the experimental results, we used six performance metrics to evaluate the classification results obtained, including Accuracy (AC), Recall, Precision, Specificity (SP), Area Under the receiver operating Curve (AUC) and F1 score (F1). They are defined as follows: (8) (9) (10) (11) (12) (13)

Where TP is true positive, TN is true negative, FP is false positive, FN is false negative, tpr is the true positive rate, fpr is the false positive rate, X1 and X0 are the confidence scores for negative instances of sexual instances, respectively.

Ablation studies

In this section, we evaluate the impact of the SE-ResNeXt, TAFA module, and the I2MGAF module on our S2MMAM respectively.

Ablation study of SE-ResNeXt.

Using SE-ResNeXt as the backbone of the network can not only enhance the network to extract focal features. It can also take advantage of the lightweight feature of ResNeXt to reduce the computational burden of the network and improve the network’s efficiency. To verify the performance of our proposed SE-ResNeXt, we replace the backbone network with S2MMAM(UNet), S2MMAM(ResNet), S2MMAM(ResNeXt) and S2MMAM(Inception-V3), respectively. These methods compare with our proposed SE-ResNeXt on the same dataset. The results are shown in Table 5.

thumbnail
Table 5. Comparison of classification performance of UNet, ResNet, ResNeXt, Inception-v3 and SE-ResNeXt on S2MMAM.

SE-ResNeXt(Ours) achieved the best results in all six comparative metrics.

https://doi.org/10.1371/journal.pone.0297331.t005

As shown in Table 5, it is evident from the results that our S2MMAM(Ours) performed the best in KRAS gene mutation prediction among the five models. S2MMAM(Ours) achieved the best results in all six comparative metrics. The AUC was 83.27%, 5.96% higher than the second-place S2MMAM(ResNeXt). Compared to the more popular S2MMAM (Inception-V3), the AUC was 6.43% higher. SE-ResNeXt has a simpler architecture and lower computational complexity than Inception-v3. SE-ResNeXt effectively eliminates the semantic differences between features by utilizing multi-scale and attention mechanisms. This enables SE-ResNeXt to outperform other traditional networks trained on the data and helps the model to better localize the lesion area.

Ablation study of TAFA module.

Using TAFA as the basic module to build S2MMAM can better capture the key and complementary information of high-level semantic features and low-level semantic features. It further enhances the feature representation capability, improves the model to extract segmented feature quality and promotes classification performance. To validate the performance of our proposed TAFA, we compare our proposed S2MMAM (Ours) with Addition, Concatenation, Adaptive Enhanced Attention Fusion (AEAF) [34], and Adaptive Spatiotemporal Semantic Calibration Module (ASSCM) [35] on the test dataset, respectively. The results are shown in Table 6.

thumbnail
Table 6. Comparison of classification performance of TAFA on S2MMAM and four models with different fusion blocks.

TAFA(Ours) achieved the best results in all six comparative metrics.

https://doi.org/10.1371/journal.pone.0297331.t006

The results show that the highest performance metrics were achieved on the classification task using our proposed S2MMAM constructed from TAFA. TAFA (Ours) not only obtained the highest AUC value of 83.27% compared to the other four models. It also achieved the best results on the other five classification performance metrics, with a maximum AC of 81.67% and a maximum SP of 82.66%. The AUC is 4.39% higher compared to the second place AEAF, proving that TAFA can effectively fuse multi-scale information. It proves that our model S2MMAM can better detect more patients and effectively reduce the underdiagnosis rate. TAFA achieved 82.73% in F1 score, which is higher than the AEAF at 4.46% and the ASSCM at 4.3%. It is demonstrated that our TAFA has a more stable classification performance and better classification ability.

Ablation study of I2MGAF module.

The I2MGAF module was implemented to guide the fusion of features in segmentation and classification tasks, as well as the fusion of image features with genetic data. To demonstrate that the I2MGAF module can better guide the fusion of multimodal and multiscale features in the model. We replaced the IntraFC module in I2MGAF with Addition, Concatenation, and Adaptive Feature Fusion (AFF Block) [23], respectively. The InterFC module was replaced with Group Feature Learning (GFL Block) [36] and Non-Local Attention (NLA Block) [25], respectively. The five obtained models are compared with the performance of I2MGAF on the classification test dataset. The results are shown in Figs 6 and 7.

thumbnail
Fig 6. Comparison of the classification performance of IntraFC and three models using other fusion methods.

https://doi.org/10.1371/journal.pone.0297331.g006

thumbnail
Fig 7. Comparison of the classification performance of InterFC and two models using other fusion methods.

https://doi.org/10.1371/journal.pone.0297331.g007

Fig 6 shows a visual comparison of the six classification performance metrics after replacing the IntraFC module in I2MGAF with addition, concatenation, and AFF Block, respectively. From Fig 6, we find that the Concatenation fusion method achieves the lowest AUC value, so the [59] method cannot fully take advantage of the multimodal information. AFF Block is 5.9% lower than our IntraFC in AUC. This is due to the fact that AFF Block only focuses on inter-channel fusion of features at different levels, ignoring the potential loss of information due to network depth. Our IntraFC module not only focuses on channel fusion of segmentation and classification features but also solves the problem of information loss caused by multiple fusions.

Fig 7 shows the comparison of the six classification performance metrics after replacing the InterFC module in I2MGAF with the GFL Block and NLA Block, respectively. Our InterFC outperforms the second-place NLA Block by 4.11% and 3.6% in AUC and F1 scores, respectively. Our InterFC solves the limitation that NLA Block only focuses on the fusion of information in a single dimension. InterFC can fully combine the information in three dimensions to fuse the data of different modalities and improve the model sensitivity, thus obtaining a better prediction of KARS mutation.

Comparison experiment

We compare the proposed S2MMAM with the classical Semi-supervised Learning (SSL), and the recently published SSL image classification models with better results, trained on data with 100% and 30% of labeled data, respectively. Among the classical SSL methods include Π-Model [13] and Mean Teacher. The competing methods include Relation-driven Self-ensembling Model (RSM) [15], SS-TBN [37], and DAB [38]. Note that we reproduce the above methods on the same testset for the sake of fairness.

Table 7 shows that the key evaluation metrics of S2MMAM outperform the other models on both 100% and 30% of the data with labeled data. This means that our S2MMAM can be used not only for supervised training but also for semi-supervised applications. We use the fully supervised model with 100% labeled data as the upper bound. And the SSL model trained on 30% labeled data as the target model. As can be seen from the Table 7, S2MMAM(Ours) achieved an AUC of 83.27% on the 30% labeled dataset. Mean Teacher only obtains an AUC result of 80.04% on the 100% labeled dataset. This shows the superiority of our S2MMAM for the classification task and even achieves accurate prediction with less cost. Compared with other models, our S2MMAM has the smallest gap of AUC, which is only 4.65% between 30% of the labeled dataset and the upper bound. This result indicates that our TAFA module and I2MGAF module effectively fuse the key features of multi-scale multi-modality. They can solve the problem of feature disappearance due to deep convolution and re-establish the fusion of high and low dimensional semantic key features. Compared with other SSL models that use only CT images for classification, our model has an AUC 6.9% higher than the second best SS-TBN model and 7.21% higher than the DBA model. This is due to our design of a new multimodal fusion module, I2MGAF. I2MGAF guides the fusion of features for multiple tasks and the fusion of multimodal data. It utilizes segmentation features to facilitate the classification task and efficiently extract important features from different modalities. I2MGAF has the ability to compensate for the specificity information that can be easily overlooked by a single data modality and achieve the complementary effects of multi-modal data. As well as to find the pathogenic features of lesions based on multi-dimensionality, thus enhancing the classification ability. We also plot the AUC curves of our S2MMAM with the other five models in Fig 8 to demonstrate the classification performance of our S2MMAM more visually.

thumbnail
Fig 8. AUC of our S2MMAM and five other medical image classification models on 30% labeled image dataset.

https://doi.org/10.1371/journal.pone.0297331.g008

thumbnail
Table 7. Comparison of the classification performance of S2MMAM and five other semi-supervised medical image classification models.

https://doi.org/10.1371/journal.pone.0297331.t007

Discussion

Superiority of the model

Although ablation studies and comparison experiments have demonstrated the merits of our proposed method, further discussions are needed on 1) the positive effects of segmentation features for the classification task, 2) the superiority of multimodal data over single modal data, and 3) the selection of the proportion of labeled images within the training dataset.

We designed three sets of experiments and empirically used data with the proportion of labeled data of 100%, 40%, and 30% as the training dataset. Baseline is used as our base architecture, where Baseline is only constructed by S2MF-CN using CT image data for the classification task. Based on this, we conducted a comparative study by gradually adding SMF-SN, genetic data, and both SMF-SN and genetic data. The experimental results are shown in Table 8.

thumbnail
Table 8. Six metrics were achieved on the test set by Baseline, Baseline+SMF-SN, Baseline+Gene, and our S2MMAM when using 30%, 40%, and 100% labeled training images.

https://doi.org/10.1371/journal.pone.0297331.t008

  1. 1) The positive effects of segmentation features for the classification task

As shown in Table 8, better classification results are obtained when the model utilizes the idea of segmentation to facilitate classification. Compared to Baseline, Baseline+SMF-SN improves the AUC values by 6.03%, 3.62%, and 4.11% in 30%, 40%, and 100% labeled datasets, respectively. We also visualize some of our Baseline and Baseline+SMF-SN segmentation results in Fig 9. The results are output in the form of a segmentation graph, which visualizes the ability of the network to localize the lesion area. As can be seen from Fig 9, the model with segmentation task can better localize the lesion area. It can avoid mixing impurities that can easily interfere with the judgment to improve the accuracy of diagnosis.

thumbnail
Fig 9. Comparison of the segmentation results obtained after training on Baseline strategy and Baseline+SMF-SN strategy: Baseline: Only classification task.

Baseline+SMF-SN: classification task and segmentation task. (a) and (b) are the wild type of NSCLC. (c) and (d) are the mutation of NSCLC. The region surrounded by the red line is the ground truth, and the region surrounded by the green line is the segmentation results.

https://doi.org/10.1371/journal.pone.0297331.g009

  1. 2) The superiority of multimodal data over single modal data

As shown in Table 8, when we used genetic data, the AUC improved by 3.94%, 2.41%, and 2.81%, respectively, compared with Baseline. This indicates that image data can also extract genotypic features from biological data that can express individual differences and reflect disease characteristics at the micro level. Further, enhances the network information richness and promotes the classification performance.

  1. 3) The selection of the proportion of labeled images within the training dataset

As shown in Table 8, when the proportion of labeled data was 30% and 40%, respectively, the difference in the values of the four metrics was small, with a 0.71% difference in AUC and a 0.83% difference in Recall. Compared with the cost of physician labeling, this result indicates that the guidance information contained in 30% labeled training images is sufficient for the network to learn the key information of the lesion. Therefore, we used 30% labeled images and 70% unlabeled images as the training ratio of the model.

To show the classification performance of our S2MMAM more visually, we also plotted the 3D comparison histograms of AUC and F1 score, as shown in Figs 10 and 11.

thumbnail
Fig 10. AUC were achieved on the test set by Baseline, Baseline+SMF-SN, Baseline+Gene and our S2MMAM, when using 30%, 40% and 100% labeled training images.

https://doi.org/10.1371/journal.pone.0297331.g010

thumbnail
Fig 11. F1 score were achieved on the test set by Baseline, Baseline+SMF-SN, Baseline+Gene and our S2MMAM, when using 30%, 40% and 100% labeled training images.

https://doi.org/10.1371/journal.pone.0297331.g011

In summary, the strategy of sharing segmentation network parameters by the classification network can assist the network to better localize the lesion region. The complementary nature of multimodal data allows the network to learn more abstract features besides addressing the challenge of less information in semi-supervised strategies. Therefore, our S2MMAM is better able to preserve the pathogenic regions, ignore irrelevant information, and improve model sensitivity. This leads to better KRAS mutation prediction results for NSCLC.

Performance in supervised learning

In order to demonstrate the scalability of our model, our application scenarios will not be limited to semi-supervised learning but will be extended to supervised learning. We compare our S2MMAM with current multimodal classification models that have better results. The competing methods include Multimodal Feature Fusion Diagnostic Model (MFFDM) [39], PLNM [9]. Note that we reproduce the above methods on the same test set for the sake of fairness.

As shown in Table 9, our S2MMAM achieved the best AC, SP, and AUC values. This shows that our model has excellent classification performance even in supervised learning applications. The AUC is 1.6% more than the second place PLNM and 3.75% more than the MFFDM. The fusion method of the MFFDM employs a simple splicing fusion, which we believe is the reason for the poor classification performance. Our S2MMAM employs a multidimensional fusion, which means that it is better able to adaptively fuse complementary information. Our S2MMAM and PLNM are similar in classification performance, but our method achieves better AUC values. We believe that SSL models can achieve the purpose of utilizing limited information to achieve accurate prediction. When we train with more labeled data, our S2MMAM can have a better ability to extract information and integrate information. In summary, as described, our S2MMAM can be used not only in SSL but also in supervised learning. It is a non-invasive method to determine whether the KRAS gene is mutated or not, to determine the treatment for patients early, and to improve the survival rate of patients.

thumbnail
Table 9. Comparison of the classification performance of S2MMAM and two other supervised medical image classification models.

https://doi.org/10.1371/journal.pone.0297331.t009

Conclusion

In this paper, we propose an integrating Image and Gene Data with a Semi-Supervised Attention Model for the Prediction of KRAS Gene Mutation Status in Non-Small Cell Lung. The model consists of two components: supervised multilevel fusion segmentation network (SMF-SN) and semi-supervised multimodal fusion classification network (S2MF-CN) fusion. The results on the NSCLC-Radiogenomics dataset demonstrate that S2MMAM can achieve a more accurate prediction of KRAS gene mutation status.

However, our S2MMAM still has some limitations. First, the model tested in this study used a single dataset and was not tested on multiple different datasets. Second, although CT images have been shown to aid in the prediction of KRAS gene mutations. However, in the clinical setting, histopathology images are the gold standard. We will try to combine CT images, histopathology images, and genetic data to further improve the accuracy of KRAS gene mutation status prediction in non-small cell lung cancer.

References

  1. 1. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. CA Cancer J Clin. 2019 Jan;69(1):7–34. pmid:30620402
  2. 2. Johannet P, Coudray N, Donnelly DM, Jour G, Illa-Bochaca I, Xia Y, et al. Using Machine Learning Algorithms to Predict Immunotherapy Response in Patients with Advanced Melanoma. Clin Cancer Res. 2021 Jan 1;27(1):131–140. pmid:33208341
  3. 3. Song Y. CT Radio Genomics of Non-Small Cell Lung Cancer Using Machine and Deep Learning. ICCECE.2021, January.128–139.
  4. 4. Shiri I, Amini M, Nazari M, Hajianfar G, Haddadi Avval A, Abdollahi H, et al. Impact of feature harmonization on radiogenomics analysis: Prediction of EGFR and KRAS mutations from non-small cell lung cancer PET/CT images. Comput Biol Med. 2022 Mar;142:105230. pmid:35051856
  5. 5. Ma Y, Wang J, Song K, Qiang Y, Jiao X, Zhao J. Spatial-Frequency dual-branch attention model for determining KRAS mutation status in colorectal cancer with T2-weighted MRI. Comput Methods Programs Biomed. 2021 Sep;209:106311. pmid:34352652
  6. 6. Yang W, Dong Y, Du Q, Qiang Y, Wu K, Zhao J, et al. Integrate domain knowledge in training multi-task cascade deep learning model for benign–malignant thyroid nodule classification on ultrasound images. Eng Appl Artif Intell.2021;98:104064.
  7. 7. Zhao Z, Zhao J, Song K, Hussain A, Du Q, Dong Y, et al. Joint DBN and Fuzzy C-Means unsupervised deep clustering for lung cancer patient stratification. Eng Appl Artif Intell. 2020; 91: 103571.
  8. 8. Dong Y, Hou L, Yang W, Han J, Wang J, Qiang Y, et al. Multi-channel multi-task deep learning for predicting EGFR and KRAS mutations of non-small cell lung cancer on CT images. Quant Imaging Med Surg. 2021 Jun;11(6):2354–2375. pmid:34079707
  9. 9. Hou G, Jia L, Zhang Y, Wu W, Zhao L, Zhao J, et al. Deep learning approach for predicting lymph node metastasis in non-small cell lung cancer by fusing image–gene data. Eng Appl Artif Intell. 2023;122:106140.
  10. 10. Tarvainen A, Valpola H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Adv Neural Inf Process Syst. 2017;30.
  11. 11. Zhu F, Zhao S, Wang P, Wang H, Yan H, Liu S. Semi-supervised wide-angle portraits correction by multi-scale transformer. IEEE Conf. Comput. Vis. Pattern Recognit. 2022;19689–19698.
  12. 12. Kwon D, Kwak S. Semi-supervised semantic segmentation with error localization network. CVPR. 2022;9957–9967.
  13. 13. Laine S, Aila T. Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242, 2016.
  14. 14. Wang X, Tang F, Chen H, Cheung C. Y, Heng P. A. Deep semi-supervised multiple instance learning with self-correction for DME classification from OCT images. Med Image Anal. 2023;83: 102673. pmid:36403310
  15. 15. Liu Q, Yu L, Luo L, Dou Q, Heng PA. Semi-Supervised Medical Image Classification With Relation-Driven Self-Ensembling Model. IEEE Trans Med Imaging. 2020 Nov;39(11):3429–3440. pmid:32746096
  16. 16. Wang Y, Wang Y, Cai J, Lee TK, Miao C, Wang ZJ. Ssd-kd: A self-supervised diverse knowledge distillation method for lightweight skin lesion classification using dermoscopic images. Med Image Anal. 2023;84: 102693. pmid:36462373
  17. 17. Ruder S. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098, 2017. Preprint at https://doi.org/10.48550/arXiv.1706.05098.
  18. 18. Xie Y, Zhang J, Xia Y, Shen C. A Mutual Bootstrapping Model for Automated Skin Lesion Segmentation and Classification. IEEE Trans Med Imaging. 2020 Jul;39(7):2482–2493. pmid:32070946
  19. 19. Zhao L, Song K, Ma Y, Cai M, Qiang Y, Sun J, et al. A segmentation-based sequence residual attention model for KRAS gene mutation status prediction in colorectal cancer. Appl Intell, 2022;53:10232–10254.
  20. 20. Song P, Hou J, Xiao N, Zhao J, Zhao J, Qiang Y, et al. MSTS-Net: malignancy evolution prediction of pulmonary nodules from longitudinal CT images via multi-task spatial-temporal self-attention network. Int J Comput Assist Radiol Surg. 2023 Apr;18(4):685–693. pmid:36447076
  21. 21. Papandreou G, Kokkinos I, Savalle P. A. Modeling local and global deformations in deep learning: Epitomic convolution, multiple instance learning, and sliding window detection. CVPR. 2015;390–399.
  22. 22. Ye Y, Pan C, Wu Y, Wang S, Xia Y. MFI-Net: Multiscale Feature Interaction Network for Retinal Vessel Segmentation. IEEE J Biomed Health Inform. 2022 Sep;26(9):4551–4562. pmid:35696471
  23. 23. Wu H, Wang W, Zhong J, Lei B, Wen Z, Qin J. Scs-net: A scale and context sensitive network for retinal vessel segmentation. Med Image Anal. 2021;70: 102025. pmid:33721692
  24. 24. Woo S, Park J, Lee J. Y, Kweon I. S. Cbam: Convolutional block attention module. ECCV. 2018;3–19.
  25. 25. Li Z, Zhang C, Zhang Y, Wang X, Ma X, Zhang H, et al. CAN: Context-assisted full Attention Network for brain tissue segmentation. Med Image Anal. 2023;85: 102710. pmid:36586394
  26. 26. Ibtehaz N, Rahman MS. MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural Netw. 2020 Jan;121:74–87. pmid:31536901
  27. 27. Bakr S, Gevaert O, Echegaray S, Ayers K, Zhou M, Shafiq M, et al. A radiogenomic dataset of non-small cell lung cancer. Sci Data. 2018 Oct 16;5:180202. pmid:30325352
  28. 28. Cubuk E.D, Zoph B, Mane D, Vasudevan V, Le Q.V. Autoaugment: Learning augmentation strategies from data. CVPR.2019;113–123.
  29. 29. Jia L, Wu W, Hou G, Zhang Y, Zhao J, Qiang Y, et al. DADFN: dynamic adaptive deep fusion network based on imaging genomics for prediction recurrence of lung cancer. Phys Med Biol. 2023 Mar 23;68(7). pmid:36867882
  30. 30. Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. MICCAI.2015;Part III:234–241.
  31. 31. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. CVPR.2016; 770–778.
  32. 32. Xie S, Girshick R, Dollár P, Tu Z, He K. Aggregated residual transformations for deep neural networks. CVPR.2017;1492–1500.
  33. 33. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. CVPR.2016; 2818–2826.
  34. 34. Cai M, Zhao L, Zhang Y, Wu W, Jia L, Zhao J, et al. A progressive phased attention model fused histopathology image features and gene features for lung cancer staging prediction. Int J Comput Assist Radiol Surg. 2023 Oct;18(10):1857–1865 pmid:36943546
  35. 35. Wu H, Liu J, Xiao F, Wen Z, Cheng L, Qin J. Semi-supervised segmentation of echocardiography videos via noise-resilient spatiotemporal semantic calibration and fusion. Med Image Anal. 2022;78: 102397. pmid:35259635
  36. 36. Zhao C, Chen W, Qin J, Yang P, et al. IFT-Net: Interactive Fusion Transformer Network for Quantitative Analysis of Pediatric Echocardiography. Med Image Anal. 2022;82: 102648. pmid:36242933
  37. 37. Zeng LL, Gao K, Hu D, Feng Z, Hou C, Rong P, et al. SS-TBN: A Semi-Supervised Tri-Branch Network for COVID-19 Screening and Lesion Segmentation. IEEE Trans Pattern Anal Mach Intell. 2023 Aug;45(8):10427–10442 pmid:37022260
  38. 38. Chen X, Bai Y, Wang P, Luo J. Data augmentation based semi-supervised method to improve COVID-19 CT classification. Math Biosci Eng. 2023 Feb 6;20(4):6838–6852. pmid:37161130
  39. 39. Tu Y, Lin S, Qiao J, Zhuang Y, Zhang P. Alzheimer’s disease diagnosis via multimodal feature fusion. Computers in Biology and Medicine, 2022, 148: 105901 pmid:35908497