Zum Hauptinhalt springen

Integrating image and gene-data with a semi-supervised attention model for prediction of KRAS gene mutation status in non-small cell lung cancer.

Xue, Y ; Zhang, D ; et al.
In: PloS one, Jg. 19 (2024-03-11), Heft 3, S. e0297331
Online academicJournal

Integrating image and gene-data with a semi-supervised attention model for prediction of KRAS gene mutation status in non-small cell lung cancer  Introduction

KRAS is a pathogenic gene frequently implicated in non-small cell lung cancer (NSCLC). However, biopsy as a diagnostic method has practical limitations. Therefore, it is important to accurately determine the mutation status of the KRAS gene non-invasively by combining NSCLC CT images and genetic data for early diagnosis and subsequent targeted therapy of patients. This paper proposes a Semi-supervised Multimodal Multiscale Attention Model (S2MMAM). S2MMAM comprises a Supervised Multilevel Fusion Segmentation Network (SMF-SN) and a Semi-supervised Multimodal Fusion Classification Network (S2MF-CN). S2MMAM facilitates the execution of the classification task by transferring the useful information captured in SMF-SN to the S2MF-CN to improve the model prediction accuracy. In SMF-SN, we propose a Triple Attention-guided Feature Aggregation module for obtaining segmentation features that incorporate high-level semantic abstract features and low-level semantic detail features. Segmentation features provide pre-guidance and key information expansion for S2MF-CN. S2MF-CN shares the encoder and decoder parameters of SMF-SN, which enables S2MF-CN to obtain rich classification features. S2MF-CN uses the proposed Intra and Inter Mutual Guidance Attention Fusion (I2MGAF) module to first guide segmentation and classification feature fusion to extract hidden multi-scale contextual information. I2MGAF then guides the multidimensional fusion of genetic data and CT image data to compensate for the lack of information in single modality data. S2MMAM achieved 83.27% AUC and 81.67% accuracy in predicting KRAS gene mutation status in NSCLC. This method uses medical image CT and genetic data to effectively improve the accuracy of predicting KRAS gene mutation status in NSCLC.

Lung cancer is specifically divided into non-small cell lung cancer (NSCLC) and small cell lung cancer. NSCLC accounts for approximately 85% of newly diagnosed lung cancers yearly [[1]]. The emergence of targeted therapy has substantially increased the survival rate of NSCLC patients. Prior to targeted therapy, it should be determined whether important disease-causing genes are mutated. KRAS is a common causative gene in NSCLC, and approximately one-third of patients with NSCLC have KRAS mutations. The usual diagnostic tool is a puncture biopsy. However, this invasive method has many limitations, such as it is unsuitable for all body types and has unpredictable consequences such as increased risk of cancer metastasis [[2]]. Therefore, there is an urgent need for a non-invasive diagnostic method that can accurately predict KRAS mutations in lung cancer patients. This method will not only improve the treatment outcome of patients but also guide prognosis.

In recent years, researchers have used CT images to predict gene mutations based on traditional radiomics and machine learning. Song et al. [[3]] propose a machine-learning model for predicting EGFR and KRAS mutation status. They used the model to extract statistical, shape, pathological, and deep learning features from 144 CT scans of tumor regions. Shiri et al. [[4]] used minimum redundancy, maximum correlation feature selection, and random forest classifier to build a multivariate model. The model analyzed radiological features extracted from images of tumors and successfully predicted EGFR and KRAS mutation status in cancer patients.

The radiomics and machine learning methods mentioned above have successfully predicted gene mutations. However, most of these methods rely on hand-crafted features. In recent years, deep learning based on convolutional neural networks has attracted much attention in the field of medical image computing. This data-driven approach can automatically extract complex image features [[5]–[7]]. In addition, imaging genomics is more expected to develop in the field of deep learning than single modality data for analytical studies. It integrates disease imaging data and genomic data. Imaging genomics is a high-throughput research method correlating imaging features with genomic data. In recent imaging genomics studies, researchers have proposed a series of deep learning algorithms and theoretical models based on image or genetic data. Dong et al. [[8]] proposed a multichannel and multitasking deep learning (MMDL) model. They used the fusion of radiological features of CT images and clinical information of patients to improve the accuracy of the model to predict KRAS gene mutations. Hou et al. [[9]] proposed a multimodal information fusion module based on attention that successfully predicted lymph node metastasis using deep learning features of CT images fused with genetic data. Therefore, machine learning and deep learning-based imaging genomics approaches have great potential and application in predicting KRAS gene mutation status in NSCLC.

Although the above model achieved considerable performance, there are still some challenges in the study of deep learning methods based on image and genetic data for predicting KRAS mutation status in NSCLC: 1) Majority of deep learning methods [[8]] that study classification tasks focus only on classification methods. However, these studies did not use the segmentation features generated by the segmentation task to facilitate the classification task to improve the performance and effectiveness of the classification task. Lesion segmentation and classification are two highly related tasks. The segmentation can help remove distractions from CT images and thus is highly beneficial for improving the accuracy of lesion classification. 2) Most of the studied fusion methods used simple fusion means of direct concatenation. However, they ignore the correlation and difference between medical images and genetic data. It not only leads to ineffective mining of useful semantic features between multi-scale image features and gene features but also fails to make full use of the complementarity of multimodal information. 3) Many studies used models that overemphasized the deep features of lesion abstraction. Nonetheless, they did not pay sufficient attention to the importance of detailed shallow features in prediction results. This leads to limitations in improving accuracy.

To overcome these difficulties and achieve non-invasive and accurate prediction of KRAS gene mutations in NSCLC. We propose a Semi-supervised Multimodal Multiscale Attention Model (S2MMAM) for predicting KRAS gene mutation status in NSCLC. The model uses the Mean Teacher [[10]] framework as the main structure of the network. Mean Teacher can make full use of labeled images to achieve analytical prediction of unlabeled images in order to diminish the dependence of the network on manual annotation. In order to compensate for the information loss of single-modal unlabeled image data to the network, the model not only uses Semi-supervised Multimodal Fusion Classification Networks (S2MF-CN) to share the parameter strategy of the Supervised Multilevel Fusion Segmentation Network (SMF-SN) to enrich the key information of the lesion. S2MMAM also multimodally fuses the patient's genetic data with the image data to expand the mutation knowledge. Specifically, SMF-SN designs a new Triple Attention-guided Feature Aggregation (TAFA) module. It aims to adaptively fuse high-level semantic features with low-level semantic features using an attention-guided mechanism. TAFA can ignore background noise and localize the extraction of lesion key features. In S2MF-CN, we propose an Intra and Inter Mutual Guidance Attention Fusion (I2MGAF) module to guide the fusion between inter-information and between intra-information in a staged manner. I2MGAF can effectively extract complementary information from different modalities at different scales to facilitate classification efficiency improvement.

In contrast to conventional radiomics and machine learning [[3]], we used a convolutional neural network technique for CT image feature extraction as compared to previous studies for KRAS mutation prediction. This technique is more efficient and reduces the cost of manual annotation. Moreover, it can realize the prospect of end-to-end applications. Studies [[5]–[9]] that have made predictions for other diseases in multimodal-based classification tasks have used simple multimodal fusion methods. In contrast, our proposed method focuses more on extracting different dimensions of information from different modal data to achieve complementary fusion.

The contributions of this paper are as follows:

  • A Semi-supervised Multimodal Multiscale Attention Model (S2MMAM) based on imaging genomics is proposed, which effectively solves the problem of difficult intermediate fusion of multimodal heterogeneous data. S2MMAM exploits the facilitation of supervised segmentation features for semi-supervised classification tasks to improve the model performance for predicting KRAS gene mutation status in NSCLC.
  • A new Triple Attention-guided Feature Aggregation (TAFA) module is designed. It is based on the attention module to adaptively fuse high-level semantic features with low-level semantic features. TAFA can suppress low-level background noise and retain detailed local semantic information.
  • We use the Intra and Inter Mutual Guidance Attention Fusion (I2MGAF) module to guide segmentation and classification feature fusion, as well as CT image and genetic data fusion, respectively. It can achieve multi-scale multimodal information fusion and improve classification performance.
Related work

Mean Teacher in semi-supervised learning

Semi-supervised learning has been studied in the medical imaging community for a long time [[11]]. It can reduce the human workload on labeled data. Current research has shown the potential to improve network performance when labels are scarce. There are three semi-supervised models based on the principle of consistency: the Π-Model [[13]], Temporal Ensembling (TE) [[13]], and the Mean Teacher model. In order to show the advantages and disadvantages of three consistency-based semi-supervised methods more succinctly, we summarize Table 1, which allows a more precise comparison of the three approaches.

Graph

Table 1 Comparison of three commonly used consistency-based semi-supervised methods.

MethodsPurposeLimitations
Π-ModelBased on the consistency principle and perturbs the input dataHigh complexity and nosiy-prone results
Temporal Ensembling (TE)Employs an exponential moving average (EMA) prediction for each unlabeled data as the consistency target.Maintain a huge prediction matrix during the prediction process, and the training time complexity is high for large data sets.
Mean TeacherImproves the problem of high time complexity caused by the TE method. Constructs a teacher model using the EMA weights of the student model.

In recent years, Mean Teacher has achieved good results as a basic framework in semi-supervised classification tasks. Wang et al. [[14]] successfully identified diabetic macular edema based on the Mean Teacher model using a small amount of roughly labeled data and a large amount of unlabeled data. Liu et al. [[15]] used the Mean Teacher-based framework of the network model to successfully achieve skin lesion diagnosis with ISIC 2018 challenge and thorax disease classification with ChestX-ray14. Wang et al. [[16]] proposed a model that unifies diverse knowledge into a generic knowledge distillation framework for skin disease classification. It enables the student model to acquire richer knowledge from the faculty model. The above model demonstrates that Mean Teacher achieves excellent results in semi-supervised classification tasks, so we use it as the basic framework for Our S2MMAM.

Segmentation facilitates classification

Using segmentation tasks to facilitate classification network tasks is a basic form of multitask learning [[17]]. In multitask learning, the segmentation task associated with the classification task can assist the learning of the target by the classification task, thus improving the performance of the classification task [[18]]. Similarly, in a single-task classification model, this idea is borrowed from above. The information captured by the segmentation branch of the model can be transferred to the classification model to expand the foci information. The supervised segmentation task is trained using masked labeled data. The aim is to obtain the most comprehensive high-level semantic features of the target region and reduce the learning of noisy backgrounds. Rich segmentation features can support the classification task to learn more and richer semantic information. Thus, a supervised segmentation network can assist the classification task by suppressing the background noise introduced by missing physician labeling information in semi-supervised classification networks and improving the classification accuracy.

According to Table 2, the above works demonstrate that segmentation has a facilitating effect on classification. However, there is a common problem: they are all studied for supervised models. Supervised models have high requirements for data labeling costs. We believe that the combination of segmentation and classification tasks can make the network more informative. Therefore, our research aims to combine the idea of segmentation facilitating classification with semi-supervised models. We combined two related tasks of NSCLC lesion segmentation and KRAS gene mutation status prediction. S2MMAM allows S2MF-CN to obtain the key features of lesions upon initialization through the strategy of sharing network parameters between SMF-SN and S2MF-CN. In S2MF-CN, the segmentation features are guided to merge with the classification features to obtain the extracted key features. This strategy can enrich the lesion information and improve the network model classification performance.

Graph

Table 2 Comparison of three commonly used consistency-based semi-supervised methods.

MethodsContributionsLimitations
Xie et al. [18]Proposed the Mutual Bootstrapping Deep Convolutional Neural Networks (MB-DCNN) model for simultaneous segmentation and classification of skin lesions. The rough lesion masks generated by the segmentation network in MB-DCNN help the classification network for training. The segmentation and classification networks transfer knowledge to each other in a bootstrap manner and facilitate each other.Non-end-to-end model Professional doctors are needed to manually label each image
Zhao et al. [19]Proposed a Segmentation-based Sequence Residual Attention Model (SSRAM) for the dual task of colorectal cancer lesion segmentation and KRAS gene mutation status prediction. The SSRAM utilizes the information provided by the segmentation network and the mask to successfully improve the accuracy of the classification task.1. Data pre-processing is more complex 2、Professional doctors are needed to manually label each image
Song et al. [20]Utilized the lung nodule segmentation task to assist the lung nodule malignant development prediction task.Professional doctors are needed to manually label each image

Multiscale features and attention learning

Traditional convolution operations mostly focus on extracting local features. However, due to the limited information contained in local features, the model cannot learn the full range of region of interest contents well. Multi-scale features contain local features of multiple regions of interest. The extracted local features are fused with other operations to obtain comprehensive information about the target, which helps the network model to learn. To extract multi-scale features, The Atrous Spatial Pyramid Pooling (ASPP) module [[21]] captures contextual information by multi-step convolution of the target region using different expansion rates. In the medical image domain, the PSE [[22]] module uses a patch-level pyramid design to extend SE operations to multiple scales, allowing the network to adaptively focus on vessels of variable width. The scale-aware Feature Aggregation (SFA) module [[23]] effectively extracts hidden multi-scale background information and aggregates multi-scale features to improve the model's ability to handle complex vasculature.

The Convolutional Block Attention Module (CBAM) [[24]] introduces channel and spatial attention. It extracts multiple key feature information from both dimensions to enrich the network content. In the medical image application domain, Context-assisted full Attention Network (CAN) [[25]] combines Non-Local Attention (NLA), Channel Attention (CA), and Dual-pathway Spatial Attention (DSA) to extract lesion information in multiple directions.

Currently, it is widely believed that both multi-scale features and attention mechanisms can help models enhance the recognition of feature maps from different dimensions. However, the above papers have a common problem: they do not combine the ideas of multi-scale and attention mechanism. Therefore, we combine these two techniques and design the TAFA module. On the one hand, fuse high and low dimensional segmentation features to obtain abstract and detailed information. On the other hand, we fuse segmentation and classification features of different levels to guide the features to learn key factors adaptively and enhance the ability of the network to capture lesions. Thus, the predictive capability of the model is improved.

Method

Overview

In this paper, we propose a Semi-supervised Multimodal Multiscale Attention Model (S2MMAM). The overall architecture of the model is divided into two parts: Supervised Multilevel Fusion Segmentation Network (SMF-SN) and Semi-supervised Multimodal Fusion Classification Network (S2MF-CN), as shown in Fig 1. In this model, the useful information of CT images is captured by SMF-SN and transferred to S2MF-CN to facilitate the execution of image prediction tasks. The S2MMAM utilizes the fusion of CT images and genetic data to accurately predict whether KRAS is mutated in NSCLC.

Graph: Overview of our S2MMAM, including: (a) Supervised Multilevel Fusion Segmentation Network (SMF-SN). The inputs are CT images and pixel-level mask images, and the outputs are segmented lesion images, (b) Semi-supervised Multimodal Fusion Classification Network (S2MF-CN), and (c) processing of gene data. In the S2MMAM, the useful information of CT images is captured by SMF-SN and transferred to S2MF-CN to facilitate the execution of image prediction tasks. The S2MMAM utilizes the fusion of CT images and genetic data to accurately predict whether KRAS is mutated in NSCLC.

In the NSCLC dataset, each patient corresponds to a set of CT images and gene data (Section Dataset). Specifically, in our problem setting, we are given a training set containing N labeled data and M unlabeled data where N<. Let the labeled training dataset be denoted by

Graph

SL={XLi,YLi}i=1N and

Graph

CL={XLi,YLi,ZLi}i=1N , where SL represents dataset for segmentation, CL represents dataset for classification,

Graph

XLi represents i-th labeled CT image,

Graph

YLi represents the pixel-level annotation of

Graph

XLi and

Graph

ZLi represents the results of whether the KRAS gene is mutated.

Graph

ZLi{0,1} where 0 means negative and 1 means positive. Let the unlabeled training dataset be denoted by

Graph

CU={XUi}i=1M , where

Graph

XUi represents i-th unlabeled image. The entire model pipline can be summarized as follows: First, we pre-train SMF-SN, which is initialized on SL, to train the network's ability to capture focal regions. It can eliminate problems such as large noise from CT and promote the ability of classification—meanwhile, the network body of S2MF-CN shares encoder and decoder parameters with SMF-SN. Therefore, the encoder and decoder of S2MF-CN are also initialized in this step, and practical segmentation features for different levels of lesions are obtained. The classification network in S2MF-CN can capture the key classification features of lesions using these segmentation features. Finally, after S2MF-CN fuses segmentation, classification, and genetic data features, the semi-supervised Student Model is trained to determine patients' KRAS gene mutation status accurately.

Supervised multilevel fusion segmentation network

The architecture of SMF-SN

This section introduces a supervised segmentation network based on multidimensional feature fusion. SMF-SN can precisely localize lesion edges and internal regions and greatly reduce the impact of image background noise on network performance. SMF-SN mainly utilizes our proposed SE-ResNeXt and TAFA modules.

We use the enhanced segmentation training dataset SL to train SMF-SN to obtain rich segmentation features. The obtained segmentation features can provide the semi-supervised classification network with a priori information about the lesion location. This improves the classification network's ability to localize and identify lesions.

As shown in Fig 2, SMF-SN includes a stem block, three encoder blocks, three TAFA blocks, a bridge block, three decoder blocks, and an output block.

Graph: We adjust the dilation rates in ASPP in the bridge from 6,12,18 to 3,6,9 to better adapt SMF-SN to our segmentation task.

In the encoder, each encoder is composed of a SE-ResNeXt and a max-pooling layer with step size 2. As shown in Fig 3, SE-ResNeXt is improved from ResNeXt with SENet. ResNext achieves aggregating a set of transitions with the same topology by repeating multiple blocks. SENet can perform feature learning on the aggregated features in the channel dimension to form the importance of each channel. SE-ResNeXt can enhance the network in both the channel and spatial dimensions to capture richer segmentation features. Applying the MaxPooling layer can reduce the spatial dimension of the feature map by half to reduce the computational cost. The output of the encoder is passed through a bridge consisting of SE-ResNeXt and Atrous Spatial Pyramid Pooling (ASPP). It provides the largest receptive domain for TAFA to include a wider range of contextual information, facilitating more efficient integration between multiple levels. Between high-level and low-level semantics, we use the proposed TAFA module. This module utilizes multi-scale and attention fusion mechanisms. The module both suppresses low-level irrelevant background noise and complements each other with contextual difference information, preserving more detailed local semantic information and better learning of focal information. TAFA module is depicted in detail in Section Triple Attention-guided Feature Aggregation.

Graph: SE-ResNeXt is improved from ResNeXt with SENet.

Triple attention-guided feature aggregation

Since CT images of lung nodules may contain a large amount of noise, for example, there are problems of grayscale overlap between lung tissues, blurred boundaries, and challenging to distinguish. High-level features of the decoder and low-level features of the encoder are crucial for capturing lesion features. However, most of the existing UNet-based connection methods directly connect shallow and deep semantic features of different scales. This behavior ignores that high-level features contain rich semantic information that can help low-level features identify semantically important locations. Likewise, low-level features contain rich spatial information that can help high-level features reconstruct accurate details.

Considering the above factors, we design a Triple attention-guided feature aggregation (TAFA) module to guide the fusion between high and low-dimensional features. TAFA can guide different layers to extract key feature information individually and then fuse after retaining the domain invariant key information, as shown in Fig 4. In the TAFA module, we first upsample the high-dimensional feature

Graph

Fhighi+1 to have the same size as the low-dimensional feature

Graph

Flowi(i{1,2,3}) . After that, we perform the high and low-dimensional feature concatenating based on channels to obtain

Graph

FCi .

Graph

FCi=Concat(Flowi,fup(Fhighi+1)) (1)

Where Concat represents the concatenation operation, fup represents up-sampling operations. Then, to better mine the most useful feature channels between different levels. We introduce a scale channel attention-aware mechanism to automatically select the appropriate receptive domain for the feature map and suppress the interference of irrelevant background noise. We feed the concatenate feature

Graph

FCi of high and low dimensional features into global average pooling (GAP) and global max pooling (GMP) respectively. TAFA uses the GAP module to excite the feature channel information and the GMP layer to retain the semantic maximum information. Afterward, the corresponding feature maps

Graph

FAMi and

Graph

FMMi are obtained using a multi-layer perceptron (MLP) sharing the same parameters. The feature maps

Graph

FAMi and

Graph

FMMi are summed. Then the sum feature passes through a sigmoid function to generate a global bootstrap feature coefficient Wglobal.

Graph

Wglobal=fσ{fmlp(fgmp(FCi))fmlp(fgap(FCi))} (2)

Where fσ represents sigmoid activation, fmlp represents the MLP operator, fgap represents the global average pooling, fgmp represents the global max pooling. In addition, using the high and low level semantic binding information

Graph

FAMi and

Graph

FMMi as guidance, they are combined with high and low dimensional features, respectively, and the high level guidance semantic features

Graph

Fhigh_atti+1 and low level guidance semantic features

Graph

Flow_atti are obtained after the attention operation, respectively.

Graph

Whigh=fσ(Fhighi+1FAMi),Wlow=fσ(FlowiFMMi) (3)

Graph

Fhigh_atti+1=Fhighi+1WhighFhighi+1,Flow_atti=FlowiWlowFlowi (4)

Graph: Fig 4 info:doi/10.1371/journal.pone.0297331.g004

Finally, the weighted features are concatenated. The concatenated feature maps are multiplied with Wglobal. Then domain-invariant information is captured while reducing the dimensionality through 1x1 convolutional layers to obtain the final fusion module Fi.

Graph

Fi=Conv(Wglobal(Concat(Flow_atti,Fhigh_atti+1))) (5)

Where Conv represents 1×1 convolution operation, ⊕ represents element-wise sum and ⊗ represents element-wise multiplication.

Our proposed TAFA transfers features from shallower convolutional layers to deeper convolutional layers. Performing the shallow features in the deeper convolutional layers prevents the shallow features from being forgotten. It makes the obtained features have more vital characterization ability. By gradually guiding the fusion between high and low features, SMF-SN can be guided to adaptively combine high and low-dimensional semantic information to reassign feature weights and better capture critical domain invariant information. Thus, lung nodules can be separated from the noise.

Semi-supervised multimodal fusion classification networks

The architecture of S2MF-CN

The proposed S2MF-CN structure is shown in Fig 1(B), which adopts the Mean Teacher model structure as the main framework of the classification network. In Mean Teacher, the Teacher network has the same structure as the Student network. The Student model is the target model to be trained. It assigns the exponential moving average (EMA) of its weights to the Teacher model at each step of training. The predictions of the Teacher model will be considered as additional supervision of the learning of the Student model. Our model uses the final Student model to make predictions. The specific training Student model is shown in Fig 5(A) and consists of three parts: encoder, decoder, and Intra and Inter Mutual Guidance Attention Fusion (I2MGAF) Module. The encoder and decoder have the same structure and parameters as the SMF-SN. This allows focusing on the lesion region and capturing the necessary segmentation features through the encoder and decoder. I2MGAF performs feature purification using mutual guidance attention modules. It is able to extract multi-scale lung CT image features and genetic features fully. It also performs an adaptive fusion of features through an attentional fusion mechanism for KRAS gene mutation prediction in NSCLC. I2MGAF is described in detail in Section Intra and Inter Mutual Guidance Attention Fusion Module.

Graph: The overview of the Student Module, including (a) the specific implementation details of the Student Model, (b) Intra fusion component (IntraFC) aims to fuse classification and segmentation features at different levels, and (c) Inter fusion component (InterFC) aims to fuse CT image features and genetic features.

Intra and inter mutual guidance attention fusion module

In the S2MF-CN network, we propose an I2MGAF module. I2MGAF fully fuses multi-scale image segmentation, classification features, and genetic features by using the IntraFC component and InterFC component with a dual attention fusion mechanism. Its aim is to improve the classification capability of the classification network.

  • Intra Fusion Component (IntraFC)
  • We propose the IntraFC based on the MultiRes Block, which can capture multi-scale information [[26]]. We adopted a strategy of fusing classification features with segmentation features at each level. The information favoring the prediction of KRAS gene mutation status is jointly retained.
  • The specific structure of the IntraFC component is shown in Fig 5(B). The final level segmentation features

Graph

FS3 are subjected to convolutional operations to obtain the initial classification features FC. Due to the problem of induction bias inherent in the convolution mechanism, it is easy to lose the key features of the lesion after multiple convolutions. Therefore, it is necessary for us to fuse the previous segmentation features with the existing classification features to compensate for the bias problem due to the deep network. First we reshape the segmented feature

Graph

FSi{i(1,2,3)} through dimensionality until C×W×H is the same size as the classified feature FC. Then, after the segmentation features and classification features are each applied 3×3 convolution. We will introduce the convolutional features from the previous stage and the initial fusion feature

Graph

FSCi before the subsequent convolution. This can effectively model the correlation between segmentation and classification features. It ensures that the features from the shallow convolutional layer of segmentation and classification are better transferred to the deeper layers. The final fused result

Graph

FIntrai{i(1,2,3)} is obtained after several feature fusions.

  • Inter Fusion Component (InterFC)
  • We propose the InterFC to find the bidirectional mapping relationship between lung cancer image features and causative genes from the sagittal view (x-axis), coronal view (y-axis), and axial view (z-axis), respectively. InterFC can adaptively enhance the necessary information in different modal features, allowing a more adequate fusion of multimodal features.
  • The specific structure of the InterFC component is shown in Fig 5(C). The initial classification feature FC, the fusion result

Graph

FIntrai{i(1,2,3)} output by IntraFC, and the processed genetic data G are firstly subjected to a splicing operation to obtain the multimodal fusion feature MC. After that, MC is delivered to InterFC to further model the importance of each modal data.

Graph

MC=Concat(FIntra1,FIntra2,FIntra3,FC,G) (6)

  • Where Concat denotes the concatenation operation. Then the concatenated multimodal data features are fed to three convolutional layers with BN and ReLU. The size of the convolution kernel is 1×3×1, 3×1×1 and 1×1×3 respectively, to produce three feature maps QueryRC×H×W, KeyRC×H×W and ValueRC×H×W (where C,H,W indicate the channel, height, width of the input features F respectively). We first transpose the Query feature. Then, we perform a softmax layer on the matrix multiplication of QueryT and Key to encode the feature relationships in sagittal and coronal views. Finally, matrix multiplication is multiplied with Value to obtain the voxel-level attention enhanced fusion features FInter, which are then reshaped to be in RC×H×W.

Graph

FInter=MCsoftmax(QueryTKey)Value (7)

  • Where ⊕ denotes element-wise sum, ⊗ denotes element-wise multiplication.
Data

Dataset

In this study, we applied NSCLC-Radiogenomics [[27]], directly accessible on the Cancer Imaging Archive (TCIA) website. NSCLC-Radiogenomics is part of a public dataset. The patients involved in the dataset have been ethically approved. Users can download the relevant data for research and publication free of charge. Our study is based on open-source data and is therefore free from ethical issues and other conflicts of interest. NSCLC-Radiogenomics has developed a unique radiogenomic dataset from the NSCLC dataset of 211 subjects. The imaging data include mainly CT, semantic annotation of tumors observed on CT images using controlled vocabulary, and segmentation maps of tumor lesions (lung nodules) on CT scans; the genetic data include mainly RNA sequencing (RNA-seq) data. In the training and testing datasets, patients would be excluded for 1) lack of RNA-seq data, 2) lack of CT images, and 3) lack of physician-annotated segmentation maps of CT lesions. After screening, the number of cases with complete images and genetic data was 124. Of the 124 patients, 94 were of the wildtype, and 30 were of the mutation type. The clinical information of these patients is shown in Table 3. All data were randomly divided into training and test datasets in a 4:1 ratio.

Graph

Table 3 Patients' medical record information in the dataset.

CategoryTotalMutationWildtype
Amount1243094
Gender
Male932469
Female31625
Smoking History
Smoking1073077
Non-smoking17017
Pathological type
Adenocarcinoma1052976
Squamous Carcinoma17017
Other211

Data preprocessing

CT image

In our experiments, for 124 sets of CT images inspired by Cubuk et al. [[28]], we use the simple procedure of AutoAugment to automatically search for improved data enhancement strategies. By designing a search space in which a strategy consists of many sub-strategies, one sub-strategy is randomly selected for each image in each small batch. The sub-strategies contain two operations, each of which is an image processing function, such as clipping or applying the probability and magnitude of that function. Thus, we obtained 6696 images with a fixed size of 512×512.

Genes selection

The gene expression data used in this study is RNA-seq data. Since the vast gene dataset contains more than 20,000 gene expression data per patient, the huge amount of gene expression data can significantly increase the computational cost and decrease the prediction accuracy. Therefore, before training the model, we screened the gene expression data from RNA-seq sequencing by the feature selection algorithm [[29]] to retain the most relevant genes with KRAS mutations. A total of 115 relevant genes were finally screened. The obtained correlated genes were fed into MLP to obtain effective gene features, which achieved mapping high-dimensional gene data to low-dimensional space.

Experiments and results

Implementation details

Our model S2MMAM is divided into SMF-SN and S2MF-CN. The labeled image data applied to SMF-SN is 30% of the total dataset, about 2100 images. The training dataset applied to S2MF-CN consists of 30% labeled data and 70% unlabeled data. Our experiments are mainly done on 2 NVIDIA RTX A5000 GPUs and 64 GB of memory. All models in the experiments are trained using 10-fold cross-validation. The specific initialization network configurations are shown in Table 4.

Graph

Table 4 The initialization network configurations of model.

Network ConfigurationsSetting
Epochs per fold20
OptimizerAdams
Initial learning rate0.001
Batch size16

Evaluation metrics

To quantitatively analyze the experimental results, we used six performance metrics to evaluate the classification results obtained, including Accuracy (AC), Recall, Precision, Specificity (SP), Area Under the receiver operating Curve (AUC) and F1 score (F1). They are defined as follows:

Graph

AC=TP+TNTP+TN+FN+FP (8)

Graph

Recall=TPTP+FN (9)

Graph

Precision=TPTP+FP (10)

Graph

SP=TNTN+FP (11)

Graph

AUC=01tpr(fpr)dfpr=P(X1>X0) (12)

Graph

F1=2×Recall×PrecisionRecall+Precision (13)

Where TP is true positive, TN is true negative, FP is false positive, FN is false negative, tpr is the true positive rate, fpr is the false positive rate, X1 and X0 are the confidence scores for negative instances of sexual instances, respectively.

Ablation studies

In this section, we evaluate the impact of the SE-ResNeXt, TAFA module, and the I2MGAF module on our S2MMAM respectively.

Ablation study of SE-ResNeXt

Using SE-ResNeXt as the backbone of the network can not only enhance the network to extract focal features. It can also take advantage of the lightweight feature of ResNeXt to reduce the computational burden of the network and improve the network's efficiency. To verify the performance of our proposed SE-ResNeXt, we replace the backbone network with S2MMAM(UNet), S2MMAM(ResNet), S2MMAM(ResNeXt) and S2MMAM(Inception-V3), respectively. These methods compare with our proposed SE-ResNeXt on the same dataset. The results are shown in Table 5.

Graph

Table 5 Comparison of classification performance of UNet, ResNet, ResNeXt, Inception-v3 and SE-ResNeXt on S2MMAM. SE-ResNeXt(Ours) achieved the best results in all six comparative metrics.

MethodsAC(%)Recall(%)Precision(%)SP(%)AUC(%)F1(%)
UNet [30]71.29±0.5371.35±0.1173.24±0.6470.09±0.3670.33±0.3872.28±0.37
ResNet [31]73.95±1.0574.01±2.1974.15±0.4176.32±2.4676.48±0.4774.07±1.29
ResNeXt [32]77.99±3.1676.92±1.3778.14±3.2175.28±3.1477.31±2.2277.53±2.27
Inception-v3 [33]75.29±2.8677.39±3.1575.14±2.2177.20±2.1876.84±1.4376.25±2.66
SE-ResNeXt(Ours)81.67±2.6782.31±2.5183.15±1.2182.66±2.0783.27±1.4982.73±1.86

As shown in Table 5, it is evident from the results that our S2MMAM(Ours) performed the best in KRAS gene mutation prediction among the five models. S2MMAM(Ours) achieved the best results in all six comparative metrics. The AUC was 83.27%, 5.96% higher than the second-place S2MMAM(ResNeXt). Compared to the more popular S2MMAM (Inception-V3), the AUC was 6.43% higher. SE-ResNeXt has a simpler architecture and lower computational complexity than Inception-v3. SE-ResNeXt effectively eliminates the semantic differences between features by utilizing multi-scale and attention mechanisms. This enables SE-ResNeXt to outperform other traditional networks trained on the data and helps the model to better localize the lesion area.

Ablation study of TAFA module

Using TAFA as the basic module to build S2MMAM can better capture the key and complementary information of high-level semantic features and low-level semantic features. It further enhances the feature representation capability, improves the model to extract segmented feature quality and promotes classification performance. To validate the performance of our proposed TAFA, we compare our proposed S2MMAM (Ours) with Addition, Concatenation, Adaptive Enhanced Attention Fusion (AEAF) [[34]], and Adaptive Spatiotemporal Semantic Calibration Module (ASSCM) [[35]] on the test dataset, respectively. The results are shown in Table 6.

Graph

Table 6 Comparison of classification performance of TAFA on S2MMAM and four models with different fusion blocks. TAFA(Ours) achieved the best results in all six comparative metrics.

MethodsAC(%)Recall(%)Precision(%)SP(%)AUC(%)F1(%)
Addition72.56±1.0272.28±1.6971.43±2.1472.47±1.7972.49±1.2371.85±1.97
Concatenation73.03±1.5273.63±1.4675.24±1.2273.13±1.8672.15±0.8774.43±1.34
AEAF [34]77.25±1.6778.22±1.6277.73±2.4476.56±2.7778.88±2.4578.27±2.72
ASSCM [35]78.39±1.4278.37±1.1978.49±0.7678.57±2.0677.89±1.0678.43±0.97
TAFA(Ours)81.67±2.6782.31±2.5183.15±1.2182.66±2.0783.27±1.4982.73±1.86

The results show that the highest performance metrics were achieved on the classification task using our proposed S2MMAM constructed from TAFA. TAFA (Ours) not only obtained the highest AUC value of 83.27% compared to the other four models. It also achieved the best results on the other five classification performance metrics, with a maximum AC of 81.67% and a maximum SP of 82.66%. The AUC is 4.39% higher compared to the second place AEAF, proving that TAFA can effectively fuse multi-scale information. It proves that our model S2MMAM can better detect more patients and effectively reduce the underdiagnosis rate. TAFA achieved 82.73% in F1 score, which is higher than the AEAF at 4.46% and the ASSCM at 4.3%. It is demonstrated that our TAFA has a more stable classification performance and better classification ability.

Ablation study of I2MGAF module

The I2MGAF module was implemented to guide the fusion of features in segmentation and classification tasks, as well as the fusion of image features with genetic data. To demonstrate that the I2MGAF module can better guide the fusion of multimodal and multiscale features in the model. We replaced the IntraFC module in I2MGAF with Addition, Concatenation, and Adaptive Feature Fusion (AFF Block) [[23]], respectively. The InterFC module was replaced with Group Feature Learning (GFL Block) [[36]] and Non-Local Attention (NLA Block) [[25]], respectively. The five obtained models are compared with the performance of I2MGAF on the classification test dataset. The results are shown in Figs 6 and 7.

Graph: Fig 6 info:doi/10.1371/journal.pone.0297331.g006

Graph: Fig 7 info:doi/10.1371/journal.pone.0297331.g007

Fig 6 shows a visual comparison of the six classification performance metrics after replacing the IntraFC module in I2MGAF with addition, concatenation, and AFF Block, respectively. From Fig 6, we find that the Concatenation fusion method achieves the lowest AUC value, so the [[5]–[9]] method cannot fully take advantage of the multimodal information. AFF Block is 5.9% lower than our IntraFC in AUC. This is due to the fact that AFF Block only focuses on inter-channel fusion of features at different levels, ignoring the potential loss of information due to network depth. Our IntraFC module not only focuses on channel fusion of segmentation and classification features but also solves the problem of information loss caused by multiple fusions.

Fig 7 shows the comparison of the six classification performance metrics after replacing the InterFC module in I2MGAF with the GFL Block and NLA Block, respectively. Our InterFC outperforms the second-place NLA Block by 4.11% and 3.6% in AUC and F1 scores, respectively. Our InterFC solves the limitation that NLA Block only focuses on the fusion of information in a single dimension. InterFC can fully combine the information in three dimensions to fuse the data of different modalities and improve the model sensitivity, thus obtaining a better prediction of KARS mutation.

Comparison experiment

We compare the proposed S2MMAM with the classical Semi-supervised Learning (SSL), and the recently published SSL image classification models with better results, trained on data with 100% and 30% of labeled data, respectively. Among the classical SSL methods include Π-Model [[13]] and Mean Teacher. The competing methods include Relation-driven Self-ensembling Model (RSM) [[15]], SS-TBN [[37]], and DAB [[38]]. Note that we reproduce the above methods on the same testset for the sake of fairness.

Table 7 shows that the key evaluation metrics of S2MMAM outperform the other models on both 100% and 30% of the data with labeled data. This means that our S2MMAM can be used not only for supervised training but also for semi-supervised applications. We use the fully supervised model with 100% labeled data as the upper bound. And the SSL model trained on 30% labeled data as the target model. As can be seen from the Table 7, S2MMAM(Ours) achieved an AUC of 83.27% on the 30% labeled dataset. Mean Teacher only obtains an AUC result of 80.04% on the 100% labeled dataset. This shows the superiority of our S2MMAM for the classification task and even achieves accurate prediction with less cost. Compared with other models, our S2MMAM has the smallest gap of AUC, which is only 4.65% between 30% of the labeled dataset and the upper bound. This result indicates that our TAFA module and I2MGAF module effectively fuse the key features of multi-scale multi-modality. They can solve the problem of feature disappearance due to deep convolution and re-establish the fusion of high and low dimensional semantic key features. Compared with other SSL models that use only CT images for classification, our model has an AUC 6.9% higher than the second best SS-TBN model and 7.21% higher than the DBA model. This is due to our design of a new multimodal fusion module, I2MGAF. I2MGAF guides the fusion of features for multiple tasks and the fusion of multimodal data. It utilizes segmentation features to facilitate the classification task and efficiently extract important features from different modalities. I2MGAF has the ability to compensate for the specificity information that can be easily overlooked by a single data modality and achieve the complementary effects of multi-modal data. As well as to find the pathogenic features of lesions based on multi-dimensionality, thus enhancing the classification ability. We also plot the AUC curves of our S2MMAM with the other five models in Fig 8 to demonstrate the classification performance of our S2MMAM more visually.

Graph: Fig 8 info:doi/10.1371/journal.pone.0297331.g008

Graph

Table 7 Comparison of the classification performance of S2MMAM and five other semi-supervised medical image classification models.

MethodsLabeledUnlabeledDataResult(%)
CTGeneACRecallPrecisionSPAUCF1
Π-Model [13]100%076.35±2.3278.21±2.3679.32±2.6877.32±2.6576.23±2.3178.76±2.51
Mean Teacher [10]81.24±2.1880.15±1.8182.34±1.7981.86±2.4380.04±2.7881.23±1.8
RSM [15]84.21±1.2681.93±2.1584.21±2.0384.72±2.0783.41±2.6583.05±2.09
SS-TBN [37]80.25±1.7179.38±2.1679.88±2.7178.62±3.8981.23±2.4480.35±2.66
DAB [38]81.79±2.380.42±1.5182.11±2.2382.8±2.4383.56±2.3682.37±1.97
S2MMAM(Ours)86.94±3.1285.97±2.1984.28±1.7386.11±2.5487.92±1.6985.12±1.96
Π-Model [13]30%70%71.23±2.4972.11±1.6572.56±1.2471.16±2.7770.15±2.1772.33±1.45
Mean Teacher [10]74.28±2.5374.16±2.7375.29±2.9474.62±2.8274.21±3.4874.72±2.83
RSM [15]75.91±2.3775.13±3.2176.37±2.8675.49±3.5375.94±2.3475.74±3.04
SS-TBN [37]76.01±1.5475.17±1.8977.22±1.4777.01±2.1576.37±2.2277.74±1.98
DAB [38]75.73±2.4677.22±2.4976.16±2.7376.49±2.8776.06±1.8776.58±2.12
S2MMAM(Ours)81.67±2.6782.31±2.5183.15±1.2182.66±2.0783.27±1.4982.73±1.86

Discussion

Superiority of the model

Although ablation studies and comparison experiments have demonstrated the merits of our proposed method, further discussions are needed on 1) the positive effects of segmentation features for the classification task, 2) the superiority of multimodal data over single modal data, and 3) the selection of the proportion of labeled images within the training dataset.

We designed three sets of experiments and empirically used data with the proportion of labeled data of 100%, 40%, and 30% as the training dataset. Baseline is used as our base architecture, where Baseline is only constructed by S2MF-CN using CT image data for the classification task. Based on this, we conducted a comparative study by gradually adding SMF-SN, genetic data, and both SMF-SN and genetic data. The experimental results are shown in Table 8.

Graph

Table 8 Six metrics were achieved on the test set by Baseline, Baseline+SMF-SN, Baseline+Gene, and our S2MMAM when using 30%, 40%, and 100% labeled training images.

MethodsLabeledUnlabeledDataResult(%)
CTGeneACRecallPrecisionSPAUCF1
Baseline100%076.29±5.2275.39±2.3174.92±2.6477.57±2.8478.26±3.1475.15±2.39
Baseline+SMF-SN83.19±2.4679.38±1.3781.31±1.6782.47±3.6484.29±4.3680.33±1.52
Baseline+Gene82.37±1.6780.06±3.4978.15±2.1781.66±3.5882.2±2.6179.09±2.81
S2MMAM(Ours)86.94±3.1285.97±2.1984.28±1.7386.11±2.5487.92±1.6985.12±1.96
Baseline40%60%74.04±1.2173.72±3.1174.3±2.1673.74±2.9475.61±2.4773.28+2.76
Baseline+SMF-SN78.37±2.1477.49±2.7678.06±1.9878.34±2.4879.23±3.1679.35+2.54
Baseline+Gene78.41±1.4378.16±1.3279.64±2.3478.14±1.7978.02±1.9979.87+1.29
S2MMAM(Ours)82.35±1.7283.14±1.4883.78±1.7781.87±1.7683.98±1.0183.62+1.52
Baseline30%70%73.65±2.1872.91±1.0372.11±2.3673.67±2.7273.33±2.3172.51±1.7
Baseline+SMF-SN77.29±5.2276.43±4.2177.24±3.2177.17±1.277.44±5.3278.8±3.69
Baseline+Gene75.11±2.378.81±2.9279.35±2.1675.39±3.4475.14±3.2679.08±2.54
S2MMAM(Ours)81.67±2.6782.31±2.5183.15±1.2182.66±2.0783.27±1.4982.73±1.86

  • 1) The positive effects of segmentation features for the classification task

As shown in Table 8, better classification results are obtained when the model utilizes the idea of segmentation to facilitate classification. Compared to Baseline, Baseline+SMF-SN improves the AUC values by 6.03%, 3.62%, and 4.11% in 30%, 40%, and 100% labeled datasets, respectively. We also visualize some of our Baseline and Baseline+SMF-SN segmentation results in Fig 9. The results are output in the form of a segmentation graph, which visualizes the ability of the network to localize the lesion area. As can be seen from Fig 9, the model with segmentation task can better localize the lesion area. It can avoid mixing impurities that can easily interfere with the judgment to improve the accuracy of diagnosis.

Graph: Baseline+SMF-SN: classification task and segmentation task. (a) and (b) are the wild type of NSCLC. (c) and (d) are the mutation of NSCLC. The region surrounded by the red line is the ground truth, and the region surrounded by the green line is the segmentation results.

  • 2) The superiority of multimodal data over single modal data

As shown in Table 8, when we used genetic data, the AUC improved by 3.94%, 2.41%, and 2.81%, respectively, compared with Baseline. This indicates that image data can also extract genotypic features from biological data that can express individual differences and reflect disease characteristics at the micro level. Further, enhances the network information richness and promotes the classification performance.

  • 3) The selection of the proportion of labeled images within the training dataset

As shown in Table 8, when the proportion of labeled data was 30% and 40%, respectively, the difference in the values of the four metrics was small, with a 0.71% difference in AUC and a 0.83% difference in Recall. Compared with the cost of physician labeling, this result indicates that the guidance information contained in 30% labeled training images is sufficient for the network to learn the key information of the lesion. Therefore, we used 30% labeled images and 70% unlabeled images as the training ratio of the model.

To show the classification performance of our S2MMAM more visually, we also plotted the 3D comparison histograms of AUC and F1 score, as shown in Figs 10 and 11.

Graph: Fig 10 info:doi/10.1371/journal.pone.0297331.g010

Graph: Fig 11 info:doi/10.1371/journal.pone.0297331.g011

In summary, the strategy of sharing segmentation network parameters by the classification network can assist the network to better localize the lesion region. The complementary nature of multimodal data allows the network to learn more abstract features besides addressing the challenge of less information in semi-supervised strategies. Therefore, our S2MMAM is better able to preserve the pathogenic regions, ignore irrelevant information, and improve model sensitivity. This leads to better KRAS mutation prediction results for NSCLC.

Performance in supervised learning

In order to demonstrate the scalability of our model, our application scenarios will not be limited to semi-supervised learning but will be extended to supervised learning. We compare our S2MMAM with current multimodal classification models that have better results. The competing methods include Multimodal Feature Fusion Diagnostic Model (MFFDM) [[39]], PLNM [[9]]. Note that we reproduce the above methods on the same test set for the sake of fairness.

As shown in Table 9, our S2MMAM achieved the best AC, SP, and AUC values. This shows that our model has excellent classification performance even in supervised learning applications. The AUC is 1.6% more than the second place PLNM and 3.75% more than the MFFDM. The fusion method of the MFFDM employs a simple splicing fusion, which we believe is the reason for the poor classification performance. Our S2MMAM employs a multidimensional fusion, which means that it is better able to adaptively fuse complementary information. Our S2MMAM and PLNM are similar in classification performance, but our method achieves better AUC values. We believe that SSL models can achieve the purpose of utilizing limited information to achieve accurate prediction. When we train with more labeled data, our S2MMAM can have a better ability to extract information and integrate information. In summary, as described, our S2MMAM can be used not only in SSL but also in supervised learning. It is a non-invasive method to determine whether the KRAS gene is mutated or not, to determine the treatment for patients early, and to improve the survival rate of patients.

Graph

Table 9 Comparison of the classification performance of S2MMAM and two other supervised medical image classification models.

MethodsAC(%)Recall(%)Precision(%)SP(%)AUC(%)F1(%)
MFFDM [39]84.15±1.4584.22±2.0483.98±2.7784.02±1.9784.17±1.0384.16±1.17
PLNM [9]86.34±2.1186.21±2.6185.24±1.6985.73±2.6586.32±1.8785.23±2.31
S2MMAM(Ours)86.94±3.1285.97±2.1984.28±1.7386.11±2.5487.92±1.6985.12±1.96

Conclusion

In this paper, we propose an integrating Image and Gene Data with a Semi-Supervised Attention Model for the Prediction of KRAS Gene Mutation Status in Non-Small Cell Lung. The model consists of two components: supervised multilevel fusion segmentation network (SMF-SN) and semi-supervised multimodal fusion classification network (S2MF-CN) fusion. The results on the NSCLC-Radiogenomics dataset demonstrate that S2MMAM can achieve a more accurate prediction of KRAS gene mutation status.

However, our S2MMAM still has some limitations. First, the model tested in this study used a single dataset and was not tested on multiple different datasets. Second, although CT images have been shown to aid in the prediction of KRAS gene mutations. However, in the clinical setting, histopathology images are the gold standard. We will try to combine CT images, histopathology images, and genetic data to further improve the accuracy of KRAS gene mutation status prediction in non-small cell lung cancer.

Decision Letter 0

Gwak Jeonghwan Academic Editor

22 Aug 2023

PONE-D-23-16921Integrating Image and Gene-Data with a Semi-Supervised Attention Model for Prediction of KRAS Gene Mutation Status in Non-Small Cell Lung CancerPLOS ONE

Dear Dr. Zhao,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Oct 06 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Jeonghwan Gwak, PhD

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne%5fformatting%5fsample%5fmain%5fbody.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

  • 2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.
  • 3. Thank you for stating the following financial disclosure:

"This work was supported by the National Natural Science Foundation of China (Grant No. U21A20469); the National Natural Science Foundation of China (Grant No. 61972274); the Central Government Guides Local Science and Technology Development Fund Project (Grant No. YDZJSX2022C004); the Natural Science Foundation of Shanxi Province (Grant No. 202103021224066); and NHC Key Laboratory of Pneumoconiosis Shanxi China Project, (Grant No.2020-PT320-005), the Non-profit Central Research Institute Fund of Chinese Academy of Medical Science."

Please state what role the funders took in the study.  If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

If this statement is not correct you must amend it as needed.

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

4. Thank you for stating the following in the Acknowledgments Section of your manuscript:

"This work was supported by the National Natural Science Foundation of China (Grant No. U21A20469); the National Natural Science Foundation of China (Grant No. 61972274); the Central Government Guides Local Science and Technology Development Fund Project (Grant No. YDZJSX2022C004); the Natural Science Foundation of Shanxi Province (Grant No. 202103021224066); and NHC Key Laboratory of Pneumoconiosis Shanxi China Project, (Grant No.2020-PT320-005), the Non-profit Central Research Institute Fund of Chinese Academy of Medical Science."

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

"This work was supported by the National Natural Science Foundation of China (Grant No. U21A20469); the National Natural Science Foundation of China (Grant No. 61972274); the Central Government Guides Local Science and Technology Development Fund Project (Grant No. YDZJSX2022C004); the Natural Science Foundation of Shanxi Province (Grant No. 202103021224066); and NHC Key Laboratory of Pneumoconiosis Shanxi China Project, (Grant No.2020-PT320-005), the Non-profit Central Research Institute Fund of Chinese Academy of Medical Science."

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

Additional Editor Comments:

AE Comments: Thank you for submitting your manuscript. I appreciate the efforts you have put into this research. I have received feedback from the reviewers, and I would like to share their comments and suggestions with you.

1) Clarity and Comprehension: Reviewer 1 points out a lack of clarity in the explanation of your proposed method. The reviewer found it difficult to understand, making it challenging to reproduce the experiments. Specific feedback has been given regarding figures and their captions (Fig. 1, Fig. 2, and Fig. 5), as well as the use of equations (e.g., Equations 8 and 9).

  • 2) Novelty and Originality: Reviewer 2 has raised concerns about the originality of the work. It's essential to clarify the unique contributions of your research compared to existing literature.
  • 3) Related Work: Both reviewers emphasize the need to improve the section on related works. The current version lists existing works without analyzing their limitations. Consider adding a more detailed analysis and perhaps summarizing existing studies in a tabular form to improve readability.
  • 4) Methodology and Experimental Details: Both reviewers have made suggestions to provide more information on the methodology, hyperparameters, network configurations, and a thorough description of the experimental phases.
  • 5) Source Code: Reviewer 2 suggests providing a GitHub link for the source code to enhance repeatability and verification of the study.
  • 6) Grammar and Typos: Both reviewers have found grammatical errors and typos in the manuscript. It is advised to run the manuscript through a grammar checker and proofread it carefully.
  • 7) Additional Feedback: Reviewer 2 has provided an extensive list of recommendations to enhance the quality and clarity of the manuscript. These include improving the introduction, elaborating on tables, addressing overfitting, revisiting results, and ensuring that the references are up-to-date.

In light of the feedback, I recommend revising your manuscript, addressing the concerns raised by the reviewers. This will not only enhance the clarity and quality of your research but also strengthen its contribution to the field.

I hope this feedback is constructive and assists you in enhancing your manuscript. I look forward to receiving your revised submission.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

***

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: I Don't Know

***

3. Have the authors made all data underlying the findings in their manuscript fully available?

The http://www.plosone.org/static/policies.action#sharing requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

***

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: No

***

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This study proposes a deep learning-based methodology for classifying the oncogenic gene KRAS, which frequently involves non-small cell lung cancer (NSCLC), using CT images and genetic information. Their main contributions are the development of the 'Semi-supervised Multimodal Multiscale Attention' mechanism and the novel 'Attention-guided Feature Aggregation' module. The proposed method appears to be novel and innovative, and the data and analysis seem to fully support their claims. However, due to the lack of clarity in the explanation of the proposed method, it is difficult to understand, making it seem impossible to reproduce the experiments. Therefore, 'minor revisions' are suggested to improve the paper.

In Section 3.1, while it seems that the input of the Supervised Multilevel Fusion Segmentation Network (SMF-SN) is X_L and the output is Y_L, in Fig. 1. (a), it is not clearly introduced what the input and output of SMF-SN are, as both X_L and Y_L are shown. This requires modification.

Overall, the introduction of the proposed system is difficult to comprehend. For example, in Fig. 2, there is only one ASPP block, but the caption suggests the presence of multiple ASPPs. Additionally, in Fig. 1 (b), is the input of the Student Model S_L and the input of the Teacher Model C_U? How are the predictions of the Student and Teacher integrated?

In Fig. 5, an explanation is needed for 'Genes Selection'.

The paper requires revisions for grammatical errors.

Equations must be used in the right way (e.g. Equations 8 and 9 should have the first two letters of Recall and Precision in italics.)

Reviewer #2: The experimental study is interesting information in this paper. However, the main weakness of the paper lies in its lack of originality and novelty. The following suggestions may be considered to enhance the quality and clarity of the manuscript

1- The motivation is not clear. Why did this work? Is any problem does it address that the previous methods cannot?

  • 2- The introduction section could be improved by clarifying the similarities and differences between the related work and the proposed method are not clearly described. It is recommended to add a separate subsection and clear description in this regard.
  • 3- Related work: The paper only lists existing works in the research community without any analysis of existing work's limitations. Therefore, I suggest that the authors mention more summary and limitation analysis so that readers can easily appreciate the contributions made by this paper.
  • 4- In the related works, existing studies can also be summarized in a tabular form to improve readability
  • 5- Elaborate all tables briefly.
  • 6- How to deal with overfitting in your model?
  • 7- Results and illustrations need to be revisited.
  • 8- Background information of this work can be provided more systematically and comprehensively, i.e. logic of this paper should be further enhanced.
  • 9- - Hyperparameters of the model:

- The initialization method is not mentioned.

  • 10- Similarly, the network configurations can be summarized in a table e.g. input size, # of layers, learning rate, optimizers etc.
  • 11- Furthermore, the study's application is not explained in an intelligible manner. You should include an experimentation section to provide readers with a thorough description of all the experimental phases in a straightforward and accessible manner.
  • 12- The theoretical and practical sections of the study are not adequately convincing, and the writing style is absolutely insufficient to highlight the subjective contribution to your research when compared to past research findings.
  • 13- Another important aspect of scientific research is the capacity to repeat the experiment or study in a different setting and reuse or adapt the findings. This is an important point, and you could elaborate on it further in the discussion area to give additional scientific value to this critical study.
  • 14- Please include a link in the research article that allows the complete applied side of this study to be downloaded for verification, validation, and inspection, as well as so that it may be used as a scientific reference.

The code source of this work must be added as a comment to the paper and must be uploaded as a GitHub link to be visible and referenceable.

  • 15- In addition to these specific recommendations, the authors should also run the manuscript through a grammar checker like Grammarly to address any language or grammatical errors. Finally, the authors should ensure that all references cited in the manuscript are up-to-date and relevant to the research topic.
  • 16- Typos/Grammatical Errors:

Subsection Segmentation facilitates classification

Deep Convolutional Nneural Networks --> N should be removed from neural

Section Conclusion:

Mutation Status in Non-Small Cell Lung The model --> period (.) is missing

network (S2 MF-CN). fusion. --> the extra period (.) should be removed

***

6. PLOS authors have the option to publish the peer review history of their article (https://journals.plos.org/plosone/s/editorial-and-peer-review-process#loc-peer-review-history). If published, this will include your full peer review and any attached files.

If you choose "no", your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our https://www.plos.org/privacy-policy.

Reviewer #1: No

Reviewer #2: No

***

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Author response to Decision Letter 0

13 Oct 2023

Responses to Reviewers'

Dear Editors and Reviewers,

Thank you for your letter and comments on our manuscript entitled "Integrating Image and Gene-Data with a Semi-Supervised Attention Model for Prediction of KRAS Gene Mutation Status in Non-Small Cell Lung Cancer (ID: PONE-D-23-16921)". We sincerely thank all reviewers for their time and effort. According to the constructive comments of the editors and reviewers on improving the quality of the revised version of this paper, we have revised the whole manuscript carefully and tried to avoid any grammar or syntax errors. In addition, we have asked several colleagues who are skilled in English papers to help us thoroughly check the organization and language of the paper. We hope for accepting our further improved submitted paper for possible publication in the PLOS ONE distinguished journal.We have revised the manuscript point by point. We apologize for not using "the Tracked Changes function in Word." The reason is that we revised all grammar or syntax errors and made a lot of changes, which might have interfered with the editors and reviewers reviewing the paper. We have highlighted the important changes in red. We hope you will be satisfied with our revised manuscript. Responses to comments, as well as details of revisions, are given below.

Sincerely yours,

Juanjuan Zhao (on behalf of all the co-authors)

Reviewer #1:

Comment1:

In Section 3.1, while it seems that the input of the Supervised Multilevel Fusion Segmentation Network (SMF-SN) is X_L and the output is Y_L, in Fig. 1. (a), it is not clearly introduced what the input and output of SMF-SN are, as both X_L and Y_L are shown. This requires modification.

Response1:

We sincerely thank the reviewers for asking rigorous questions.

We feel very sorry that we lacked some explanations here. The input of SMF-SN is CT images, which is , and pixel-level mask images annotated by physicians, which is. The output of SMF-SN is the segmented lesion map. Since segmentation performance is not a concern in this study, it is not highlighted in the figure. We have added explanations for input and output in the caption of Fig. 1(a) in new manuscript to improve readability.

Comment2:

Overall, the introduction of the proposed system is difficult to comprehend. For example, in Fig. 2, there is only one ASPP block, but the caption suggests the presence of multiple ASPPs. Additionally, in Fig. 1 (b), is the input of the Student Model S_L and the input of the Teacher Model C_U? How are the predictions of the Student and Teacher integrated?

Response2:

Sorry we sincerely appreciate these insightful questions and apologize for our lack of rigor.

(1)We apologize for our carelessness.We have changed 'ASPPs' to 'ASPP' in new manuscript.

(2)and (3)

Fig.1 The Mean Teacher method. The figure depicts a training batch with a single labeled example. Both the student and the teacher model evaluate the input applying noise (η, η') within their computation. The softmax output of the student model is compared with the one-hot label using classification cost and with the teacher output using consistency cost. After the weights of the student model have been updated with gradient descent, the teacher model weights are updated as an exponential moving average of the student weights. Both model outputs can be used for prediction.

We very much apologize for not articulating this clearly. Fig.1 comes from the paper Mean Teacher Model [1]. According to the description in the paper, the inputs to the Mean Teacher Model are labeled and unlabeled data. The principle of the model is that first, the labeled data and unlabeled data are trained in the Student model. The labeled dataset will produce a classification loss L1. The unlabeled dataset will produce a prediction p1. Then, the unlabeled dataset is put into the Teacher model for training and produces a prediction p2. At this time, a distribution consistency loss L2 is computed, which is the difference between p1 and p2. At this point, the total loss L of the network is calculated as the sum of the L1 and L2. According to the L, the network parameters θ of the Student model are updated. Based on the EMA algorithm, θ will update the parameters θ' of the Teacher model. After training, the model prediction performance is improved, and it can perform accurate predictions on unlabeled data.

Based on your suggestion, We redraw Fig. 1 to clearly show the inputs to the model. means labled dataset for segmentation. means labled dataset for classification. means unlabled dataset for classification. is the input of SMF-SN but not for classification network.

Comment3:

In Fig. 5, an explanation is needed for 'Genes Selection'.

Response3:

We sincerely thank the reviewers for the detailed comments.

We explained a detailed description of 'Genes Selection' in Section Data preprocessing. However, in Section Data preprocessing, our caption was set to 'Gene Data' in the previous version of the manuscript, causing ambiguity. We feel very sorry for this and have changed 'Gene Data' to 'Genes Selection' in the new version of the manuscript to prevent ambiguity.

Comment4:

The paper requires revisions for grammatical errors.

Response4:

We sincerely thank the reviewers for the detailed comments.

According to your suggestion, We have checked our manuscript carefully and corrected the grammatical, styling, and typos found in our new manuscript. Moreover, we have asked several colleagues who are skilled in English papers to help us thoroughly check the organization and language of the paper.

Comment5:

Equations must be used in the right way (e.g. Equations 8 and 9 should have the first two letters of Recall and Precision in italics.)

Response5:

We sincerely thank the reviewers for this insightful question.

We have checked all the Equations carefully for formatting issues and made corrections.

Reviewer #2:

Comment1:

The motivation is not clear. Why did this work? Is any problem does it address that the previous methods cannot?

Response1:

We sincerely thank the reviewers for asking rigorous questions.

We have supplemented a detailed description of the motivation for the research in the new manuscript (Section Introduction, P1, L3-L4). The emergence of targeted therapy has substantially increased the survival rate of NSCLC patients. Mutations of essential pathogenic genes should be identified before targeted therapy. KRAS is a gene type with a high probability of mutation. It is necessary for diagnosing whether a patient has a KRAS gene mutation.

We listed the limitations of the previous methodology in the fourth paragraph of Section Introduction. The main solution of this study is the three problems listed. 1)Majority of deep learning methods that study classification tasks focus only on methods for classification. However, these studies did not use the segmentation features generated by the segmentation task to facilitate the classification task to improve the performance and effectiveness of the classification task. 2) Most of the studied fusion methods used simple fusion means of direct concatenation. But, they ignore the correlation and difference between medical images and genetic data. It not only leads to ineffective mining of useful semantic features between multi-scale image features and gene features, but also fails to make full use of the complementarity of multimodal information. 3) Many studies used models that overemphasized the deep features of lesion abstraction. Nonetheless, they did not pay sufficient attention to the importance of detailed shallow features in prediction results. This leads to limitations in improving accuracy.

Comment2:

The introduction section could be improved by clarifying the similarities and differences between the related work and the proposed method are not clearly described. It is recommended to add a separate subsection and clear description in this regard.

Response2:

We feel great thanks for your professional review work on our paper.

We have added a new subsection according to the your suggestion in sixth paragraph of Introduction section to further compare the similarities and differences between previous work and ours.

Comment3:

Related work: The paper only lists existing works in the research community without any analysis of existing work's limitations. Therefore, I suggest that the authors mention more summary and limitation analysis so that readers can easily appreciate the contributions made by this paper.

Response3:

Thank you very much for the professional review work you have done on our papers.

Following your third and fourth comments, we have summarized and compared past work in a tabular format. The reason for not tabulating the comparison in the section Multiscale features and attention learning is that we consider both approaches classic and valid. We are concerned that the papers listed above all focus on only one aspect. Our contribution is to combine both methods to obtain better performance in extracting lesion information.

Comment4:

In the related works, existing studies can also be summarized in a tabular form to improve readability.

Response4:

Thank you very much for the professional review work you have done on our papers.

Following your third and fourth comments, we have summarized and compared past work in a tabular format to improve readability.

Comment5:

Elaborate all tables briefly.

Response5:

We feel great thanks for your professional review work on our paper.

Based on your suggestions, we have further elaborated all the tables for a better understanding of the readers.

Comment6:

How to deal with overfitting in your model?

Response6:

We feel great thanks for your professional review work on our paper.

We mainly used a cross-validation approach to prevent overfitting problems. We set the cross-validation to 5-fold, 10-fold and 15-fold. Table 1 records the AUC values of the test dataset at different parameter settings and with different scales of labeled data. The results show that the test dataset has the highest accuracy when the cross-validation is equal to 10-fold.

Table 1. AUC values of the test dataset under different parameter settings for different scales of labeled data.

Setting 5-fold 10-fold 15-fold

  • 100% Labeled 84.16 87.92 88.76
  • 40% Labeled 78.64 83.98 82.44
  • 30% Labeled 70.02 83.27 80.69

Comment7:

Results and illustrations need to be revisited.

Response7:

Thank you very much for your professional review of our paper.

We feel sorry for our lack of rigor. We have re-examined the results and illustrations. We found that some of the illustrations' descriptions did not match the illustrations' content, as in Fig 8. We have corrected the incorrect parts and confirmed that the results and illustrations are correct. We will pay more attention to the uploading requirements to ensure readers see the correct results and illustrations.

Comment8:

Background information of this work can be provided more systematically and comprehensively, i.e. logic of this paper should be further enhanced.

Response8:

We feel great thanks for your professional review work on our paper.

Based on your suggestions, we have reorganized the logic of the article and partially rewritten it to make it present the objectives of the study more clearly.

Comment9:

Hyper-parameters of the model:The initialization method is not mentioned.

Response9:

We feel great thanks for your professional review work on our paper.

The initialization of hyper-parameters is mentioned in the Implementation details of Section Implementation details. The following is the initialization information for the hyper-parameters: All models in the experiments are trained using 10-fold cross-validation, with the number of epochs per fold set to 20. Adams was used as our optimizer. The initial learning rate is set to 0.001 empirically, and the batch size is set to 16.

Comment10:

Similarly, the network configurations can be summarized in a table e.g. input size, # of layers, learning rate, optimizers etc.

Response10:

We sincerely thank the reviewers for asking rigorous questions.

Based on your suggestion, we have redrawn Fig. 2. In Fig. 2, we have added the basic parameters of the network, such as input size. Since SMF-SN and S2MF-CN have the same structure, there is no separate structural diagram for S2MF-CN.

According to your suggestion, we design the hyper-parameter contents to be summarized in the form of Table 4 to improve readability.

Comment11:

Furthermore, the study's application is not explained in an intelligible manner. You should include an experimentation section to provide readers with a thorough description of all the experimental phases in a straightforward and accessible manner.

Response11:

Thank you very much for your advice.

We have added the visualization of the results of the experimental procedure in the Section Superiority of the model. The visualization shows the focus and significance of our study in an intuitive way. We believe that readers can understand the advantages of our method in this way.

Comment12:

The theoretical and practical sections of the study are not adequately convincing, and the writing style is absolutely insufficient to highlight the subjective contribution to your research when compared to past research findings.

Response12:

We feel great thanks for your professional review work on our paper.

In the theoretical part, we have comprehensively revised the logic of the paper with the above comments to highlight the superiority of our method.

In the experimental part, we have further analyzed the experimental results according to your suggestions. Detailed descriptions are provided in Sections Ablation studies and Sections Comparison experiment, to better demonstrate the good classification performance of our model. In the Discussion section, we have reorganized the logic to highlight the advantages of the model in a more logical form. In addition, according to #comment13, we have also supplemented extended experiments in the Discussion section to demonstrate the expandability and reusability of the experiments.

Through the above methods, we hope to make the paper more convincing.

Comment13:

Another important aspect of scientific research is the capacity to repeat the experiment or study in a different setting and reuse or adapt the findings. This is an important point, and you could elaborate on it further in the discussion area to give additional scientific value to this critical study.

Response13:

We feel great thanks for your professional review work on our paper.

Following your suggestion, we have added 'Performance in Supervised Learning' in Section Performance in Supervised Learning to demonstrate the extensibility of our model. The experimental proof demonstrates that our model is not only applicable in semi-supervised learning but can also be used in supervised learning. This shows that our research can be realized in various application scenarios.

Comment14:

Please include a link in the research article that allows the complete applied side of this study to be downloaded for verification, validation, and inspection, as well as so that it may be used as a scientific reference.The code source of this work must be added as a comment to the paper and must be uploaded as a GitHub link to be visible and referenceable.

Response14:

We feel great thanks for your professional review work on our paper.

We've included a link to the code in the Data availability.

The specific modifications are as follows:

The data are available from the website

https://wiki.cancerimagingarchive.net/display/Public/NSCLC+Radiogenomics. The code for S2MMAM is available on a GitHub repository at https://github.com/xyttttboom/SSMMAM.

Comment15:

In addition to these specific recommendations, the authors should also run the manuscript through a grammar checker like Grammarly to address any language or grammatical errors. Finally, the authors should ensure that all references cited in the manuscript are up-to-date and relevant to the research topic.

Response15:

We sincerely appreciate these insightful questions and apologize for our lack of rigor.

According to your suggestion, we have used Grammarly to address all language and grammatical errors.Moreover, we have asked several colleagues who are skilled in English papers to help us thoroughly check the organization and language of the paper.

We rechecked the references, deleting papers with little relevance to the topic and adding new papers with vital relevance. In Section Comparison experiment, we also rechecked the literature and compared it with recently published papers with better results.

e.g.

34.Cai M, Zhao L, Zhang Y, Wu W, Jia L, Zhao J, Yang Q, Qiang Y. A progressive phased attention model fused histopathology image features and gene features for lung cancer staging prediction. Int J Comput Assist Radiol Surg. 2023 Oct;18(10):1857-1865.

39.Zeng LL, Gao K, Hu D, Feng Z, Hou C, Rong P, Wang W. SS-TBN: A Semi-Supervised Tri-Branch Network for COVID-19 Screening and Lesion Segmentation. IEEE Trans Pattern Anal Mach Intell. 2023 Aug;45(8):10427-10442.

40.Chen X, Bai Y, Wang P, Luo J. Data augmentation based semi-supervised method to improve COVID-19 CT classification. Math Biosci Eng. 2023 Feb 6;20(4):6838-6852. doi: 10.3934/mbe.2023294.

41.Tu Y, Lin S, Qiao J, et al. Alzheimer's disease diagnosis via multimodal feature fusion. Computers in Biology and Medicine, 2022, 148: 105901.

Comment16:

Typos/Grammatical Errors:

Subsection Segmentation facilitates classification

Deep Convolutional Nneural Networks --> N should be removed from neural

Section Conclusion:

Mutation Status in Non-Small Cell Lung The model --> period (.) is missing

network (S2MF-CN). fusion. --> the extra period (.) should be removed.

Response16:

We sincerely appreciate these insightful questions and apologize for our lack of rigor.

Based on your comments, we have made the corrections to revise the typos and grammatical errors throughout the paper.

References: [1]arvainen A, Valpola H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Adv Neural Inf Process Syst. 2017;30. doi:10.5555/3294771.3294885.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Gwak Jeonghwan Academic Editor

3 Jan 2024

Integrating Image and Gene-Data with a Semi-Supervised Attention Model for Prediction of KRAS Gene Mutation Status in Non-Small Cell Lung Cancer

PONE-D-23-16921R1

Dear Dr. Zhao,

We're pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you'll receive an e-mail detailing the required amendments. When these have been addressed, you'll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they'll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Jeonghwan Gwak, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

AE: After careful consideration and based on the insightful feedback from our reviewers, I am delighted to announce that your paper is now deemed publishable.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the "Comments to the Author" section, enter your conflict of interest statement in the "Confidential to Editor" section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

***

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

***

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

***

4. Have the authors made all data underlying the findings in their manuscript fully available?

The http://www.plosone.org/static/policies.action#sharing requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

***

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

***

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: (No Response)

***

7. PLOS authors have the option to publish the peer review history of their article (https://journals.plos.org/plosone/s/editorial-and-peer-review-process#loc-peer-review-history). If published, this will include your full peer review and any attached files.

If you choose "no", your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our https://www.plos.org/privacy-policy.

Reviewer #2: Yes: Zahid Ullah

***

Acceptance letter

Gwak Jeonghwan Academic Editor

1 Mar 2024

PONE-D-23-16921R1

PLOS ONE

Dear Dr. Zhao,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Jeonghwan Gwak

Academic Editor

PLOS ONE

Footnotes The authors have declared that no competing interests exist. References Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. CA Cancer J Clin. (2019) Jan; 69(1):7–34. doi: 10.3322/caac.21551, 30620402 2 Johannet P, Coudray N, Donnelly DM, Jour G, Illa-Bochaca I, Xia Y, et al. Using Machine Learning Algorithms to Predict Immunotherapy Response in Patients with Advanced Melanoma. Clin Cancer Res. (2021) Jan 1; 27(1):131–140. doi: 10.1158/1078-0432.CCR-20-2415, 33208341 3 Song Y.CT Radio Genomics of Non-Small Cell Lung Cancer Using Machine and Deep Learning. ICCECE.(2021), January.128–139. 4 Shiri I, Amini M, Nazari M, Hajianfar G, Haddadi Avval A, Abdollahi H, et al. Impact of feature harmonization on radiogenomics analysis: Prediction of EGFR and KRAS mutations from non-small cell lung cancer PET/CT images. Comput Biol Med. (2022) Mar; 142:105230. doi: 10.1016/j.compbiomed.2022.105230, 35051856 5 Ma Y, Wang J, Song K, Qiang Y, Jiao X, Zhao J. Spatial-Frequency dual-branch attention model for determining KRAS mutation status in colorectal cancer with T2-weighted MRI. Comput Methods Programs Biomed. (2021) Sep; 209:106311. doi: 10.1016/j.cmpb.2021.106311, 34352652 6 Yang W, Dong Y, Du Q, Qiang Y, Wu K, Zhao J, et al. Integrate domain knowledge in training multi-task cascade deep learning model for benign–malignant thyroid nodule classification on ultrasound images. Eng Appl Artif Intell.(2021); 98:104064. doi: 10.1016/j.engappai.2020.104064 7 Zhao Z, Zhao J, Song K, Hussain A, Du Q, Dong Y, et al. Joint DBN and Fuzzy C-Means unsupervised deep clustering for lung cancer patient stratification. Eng Appl Artif Intell. (2020); 91: 103571. doi: 10.1016/j.engappai.2020.103571 8 Dong Y, Hou L, Yang W, Han J, Wang J, Qiang Y, et al. Multi-channel multi-task deep learning for predicting EGFR and KRAS mutations of non-small cell lung cancer on CT images. Quant Imaging Med Surg. (2021) Jun; 11(6):2354–2375. doi: 10.21037/qims-20-600, 34079707 9 Hou G, Jia L, Zhang Y, Wu W, Zhao L, Zhao J, et al. Deep learning approach for predicting lymph node metastasis in non-small cell lung cancer by fusing image–gene data. Eng Appl Artif Intell. (2023); 122:106140. doi: 10.1016/j.engappai.2023.106140 Tarvainen A, Valpola H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Adv Neural Inf Process Syst. (2017);30. doi: 10.5555/3294771.3294885 Zhu F, Zhao S, Wang P, Wang H, Yan H, Liu S. Semi-supervised wide-angle portraits correction by multi-scale transformer. IEEE Conf. Comput. Vis. Pattern Recognit. (2022);19689–19698. doi: 10.48550/arXiv.2109.08024 Kwon D, Kwak S. Semi-supervised semantic segmentation with error localization network. CVPR. (2022);9957–9967. doi: 10.48550/arXiv.2204.02078 Laine S, Aila T. Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:(1610).02242, 2016. doi: 10.48550/arXiv.1610.02242 Wang X, Tang F, Chen H, Cheung C. Y, Heng P. A.Deep semi-supervised multiple instance learning with self-correction for DME classification from OCT images. Med Image Anal. (2023); 83: 102673. doi: 10.1016/j.media.2022.102673, 36403310 Liu Q, Yu L, Luo L, Dou Q, Heng PA. Semi-Supervised Medical Image Classification With Relation-Driven Self-Ensembling Model. IEEE Trans Med Imaging. (2020) Nov; 39(11):3429–3440. doi: 10.1109/TMI.2020.2995518, 32746096 Wang Y, Wang Y, Cai J, Lee TK, Miao C, Wang ZJ. Ssd-kd: A self-supervised diverse knowledge distillation method for lightweight skin lesion classification using dermoscopic images. Med Image Anal. (2023); 84: 102693. doi: 10.1016/j.media.2022.102693, 36462373 Ruder S.An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:(1706).05098, 2017. Preprint at https://doi.org/10.48550/arXiv.1706.05098. Xie Y, Zhang J, Xia Y, Shen C. A Mutual Bootstrapping Model for Automated Skin Lesion Segmentation and Classification. IEEE Trans Med Imaging. (2020) Jul; 39(7):2482–2493. doi: 10.1109/TMI.2020.2972964, 32070946 Zhao L, Song K, Ma Y, Cai M, Qiang Y, Sun J, et al. A segmentation-based sequence residual attention model for KRAS gene mutation status prediction in colorectal cancer. Appl Intell, (2022); 53:10232–10254. doi: 10.1007/s10489-022-04011-3 Song P, Hou J, Xiao N, Zhao J, Zhao J, Qiang Y, et al. MSTS-Net: malignancy evolution prediction of pulmonary nodules from longitudinal CT images via multi-task spatial-temporal self-attention network. Int J Comput Assist Radiol Surg. (2023) Apr; 18(4):685–693. doi: 10.1007/s11548-022-02744-7, 36447076 Papandreou G, Kokkinos I, Savalle P. A. Modeling local and global deformations in deep learning: Epitomic convolution, multiple instance learning, and sliding window detection. CVPR. (2015);390–399. doi: 10.1109/CVPR.2015.7298636 Ye Y, Pan C, Wu Y, Wang S, Xia Y. MFI-Net: Multiscale Feature Interaction Network for Retinal Vessel Segmentation. IEEE J Biomed Health Inform. (2022) Sep; 26(9):4551–4562. doi: 10.1109/JBHI.2022.3182471, 35696471 Wu H, Wang W, Zhong J, Lei B, Wen Z, Qin J. Scs-net: A scale and context sensitive network for retinal vessel segmentation. Med Image Anal. (2021); 70: 102025. doi: 10.1016/j.media.2021.102025, 33721692 Woo S, Park J, Lee J. Y, Kweon I. S. Cbam: Convolutional block attention module. ECCV. (2018);3–19. doi: 10.1007/978-3-030-01234-2_1 Li Z, Zhang C, Zhang Y, Wang X, Ma X, Zhang H, et al. CAN: Context-assisted full Attention Network for brain tissue segmentation. Med Image Anal. (2023); 85: 102710. doi: 10.1016/j.media.2022.102710, 36586394 Ibtehaz N, Rahman MS. MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural Netw. (2020) Jan; 121:74–87. doi: 10.1016/j.neunet.2019.08.025, 31536901 Bakr S, Gevaert O, Echegaray S, Ayers K, Zhou M, Shafiq M, et al. A radiogenomic dataset of non-small cell lung cancer. Sci Data. (2018) Oct 16; 5:180202. doi: 10.1038/sdata.2018.202, 30325352 Cubuk E.D, Zoph B, Mane D, Vasudevan V, Le Q.V. Autoaugment: Learning augmentation strategies from data. CVPR.(2019);113–123. doi: 10.48550/arXiv.1805.09501 Jia L, Wu W, Hou G, Zhang Y, Zhao J, Qiang Y, et al. DADFN: dynamic adaptive deep fusion network based on imaging genomics for prediction recurrence of lung cancer. Phys Med Biol. (2023) Mar 23; 68(7). doi: 10.1088/1361-6560/acc168, 36867882 Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. MICCAI.(2015);Part III:234–241. doi: 10.1007/978-3-319-24574-4_28 He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. CVPR.(2016); 770–778. doi: 10.1109/CVPR.2016.90 Xie S, Girshick R, Dollár P, Tu Z, He K. Aggregated residual transformations for deep neural networks. CVPR.(2017);1492–1500. doi: 10.1109/CVPR.2017.634 Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. CVPR.(2016); 2818–2826. doi: 10.1109/CVPR.2016.308 Cai M, Zhao L, Zhang Y, Wu W, Jia L, Zhao J, et al. A progressive phased attention model fused histopathology image features and gene features for lung cancer staging prediction. Int J Comput Assist Radiol Surg. (2023) Oct; 18(10):1857–1865doi: 10.1007/s11548-023-02844-y, 36943546 Wu H, Liu J, Xiao F, Wen Z, Cheng L, Qin J. Semi-supervised segmentation of echocardiography videos via noise-resilient spatiotemporal semantic calibration and fusion. Med Image Anal. (2022); 78: 102397. doi: 10.1016/j.media.2022.102397, 35259635 Zhao C, Chen W, Qin J, Yang P, et al. IFT-Net: Interactive Fusion Transformer Network for Quantitative Analysis of Pediatric Echocardiography. Med Image Anal. (2022); 82: 102648. doi: 10.1016/j.media.2022.102648, 36242933 Zeng LL, Gao K, Hu D, Feng Z, Hou C, Rong P, et al. SS-TBN: A Semi-Supervised Tri-Branch Network for COVID-19 Screening and Lesion Segmentation. IEEE Trans Pattern Anal Mach Intell. (2023) Aug; 45(8):10427–10442doi: 10.1109/TPAMI.2023.3240886, 37022260 Chen X, Bai Y, Wang P, Luo J. Data augmentation based semi-supervised method to improve COVID-19 CT classification. Math Biosci Eng. (2023) Feb 6; 20(4):6838–6852. doi: 10.3934/mbe.2023294, 37161130 Tu Y, Lin S, Qiao J, Zhuang Y, Zhang P. Alzheimer's disease diagnosis via multimodal feature fusion. Computers in Biology and Medicine, (2022), 148: 105901 doi: 10.1016/j.compbiomed.2022.105901, 35908497

By Yuting Xue; Dongxu Zhang; Liye Jia; Wanting Yang; Juanjuan Zhao; Yan Qiang; Long Wang; Ying Qiao and Huajie Yue

Reported by Author; Author; Author; Author; Author; Author; Author; Author; Author

Titel:
Integrating image and gene-data with a semi-supervised attention model for prediction of KRAS gene mutation status in non-small cell lung cancer.
Autor/in / Beteiligte Person: Xue, Y ; Zhang, D ; Jia, L ; Yang, W ; Zhao, J ; Qiang, Y ; Wang, L ; Qiao, Y ; Yue, H
Link:
Zeitschrift: PloS one, Jg. 19 (2024-03-11), Heft 3, S. e0297331
Veröffentlichung: San Francisco, CA : Public Library of Science, 2024
Medientyp: academicJournal
ISSN: 1932-6203 (electronic)
DOI: 10.1371/journal.pone.0297331
Schlagwort:
  • Humans
  • Proto-Oncogene Proteins p21(ras) genetics
  • Biopsy
  • Mutation
  • Image Processing, Computer-Assisted
  • Carcinoma, Non-Small-Cell Lung diagnostic imaging
  • Carcinoma, Non-Small-Cell Lung genetics
  • Lung Neoplasms diagnostic imaging
  • Lung Neoplasms genetics
Sonstiges:
  • Nachgewiesen in: MEDLINE
  • Sprachen: English
  • Publication Type: Journal Article
  • Language: English
  • [PLoS One] 2024 Mar 11; Vol. 19 (3), pp. e0297331. <i>Date of Electronic Publication: </i>2024 Mar 11 (<i>Print Publication: </i>2024).
  • MeSH Terms: Carcinoma, Non-Small-Cell Lung* / diagnostic imaging ; Carcinoma, Non-Small-Cell Lung* / genetics ; Lung Neoplasms* / diagnostic imaging ; Lung Neoplasms* / genetics ; Humans ; Proto-Oncogene Proteins p21(ras) / genetics ; Biopsy ; Mutation ; Image Processing, Computer-Assisted
  • References: Math Biosci Eng. 2023 Feb 6;20(4):6838-6852. (PMID: 37161130) ; Clin Cancer Res. 2021 Jan 1;27(1):131-140. (PMID: 33208341) ; CA Cancer J Clin. 2019 Jan;69(1):7-34. (PMID: 30620402) ; Comput Biol Med. 2022 Sep;148:105901. (PMID: 35908497) ; Med Image Anal. 2021 May;70:102025. (PMID: 33721692) ; IEEE Trans Med Imaging. 2020 Jul;39(7):2482-2493. (PMID: 32070946) ; IEEE J Biomed Health Inform. 2022 Sep;26(9):4551-4562. (PMID: 35696471) ; Med Image Anal. 2022 May;78:102397. (PMID: 35259635) ; Quant Imaging Med Surg. 2021 Jun;11(6):2354-2375. (PMID: 34079707) ; Comput Biol Med. 2022 Mar;142:105230. (PMID: 35051856) ; IEEE Trans Med Imaging. 2020 Nov;39(11):3429-3440. (PMID: 32746096) ; Int J Comput Assist Radiol Surg. 2023 Apr;18(4):685-693. (PMID: 36447076) ; Med Image Anal. 2023 Feb;84:102693. (PMID: 36462373) ; IEEE Trans Pattern Anal Mach Intell. 2023 Aug;45(8):10427-10442. (PMID: 37022260) ; Med Image Anal. 2022 Nov;82:102648. (PMID: 36242933) ; Med Image Anal. 2023 Jan;83:102673. (PMID: 36403310) ; Phys Med Biol. 2023 Mar 23;68(7):. (PMID: 36867882) ; Comput Methods Programs Biomed. 2021 Sep;209:106311. (PMID: 34352652) ; Neural Netw. 2020 Jan;121:74-87. (PMID: 31536901) ; Sci Data. 2018 Oct 16;5:180202. (PMID: 30325352) ; Med Image Anal. 2023 Apr;85:102710. (PMID: 36586394) ; Int J Comput Assist Radiol Surg. 2023 Oct;18(10):1857-1865. (PMID: 36943546)
  • Substance Nomenclature: EC 3.6.5.2 (Proto-Oncogene Proteins p21(ras)) ; 0 (KRAS protein, human)
  • Entry Date(s): Date Created: 20240311 Date Completed: 20240313 Latest Revision: 20240313
  • Update Code: 20240313
  • PubMed Central ID: PMC10927133

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

oder
oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

oder
oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.

xs 0 - 576
sm 576 - 768
md 768 - 992
lg 992 - 1200
xl 1200 - 1366
xxl 1366 -