BACKGROUND: Developing deep learning networks to classify between benign and malignant lung nodules usually requires many samples. Due to the precious nature of medical samples, it is difficult to obtain many samples. OBJECTIVE: To investigate and test a DCA-Xception network combined with a new data enhancement method to improve performance of lung nodule classification. METHODS: First, the Wasserstein Generative Adversarial Network (WGAN) with conditions and five data enhancement methods such as flipping, rotating, and adding Gaussian noise are used to extend the samples to solve the problems of unbalanced sample classification and the insufficient samples. Then, a DCA-Xception network is designed to classify lung nodules. Using this network, information around the target is obtained by introducing an adaptive dual-channel feature extraction module, and the network learns features more accurately by introducing a convolutional attention module. The network is trained and validated using 274 lung nodules (154 benign and 120 malignant) and tested using 52 lung nodules (23 benign and 29 malignant). RESULTS: The experiments show that the network has an accuracy of 83.46% and an AUC of 0.929. The features extracted using this network achieve an accuracy of 85.24% on the K-nearest neighbor and random forest classifiers. CONCLUSION: This study demonstrates that the DCA-Xception network yields higher performance in classification of lung nodules than the performance using the classical classification networks as well as pre-trained networks.
Keywords: Lung nodule classification; Wasserstein Generative Adversarial Networks (WGAN); Xception; convolutional attention module; classifier
Lung cancer is more common and has a higher mortality rate than most cancers, posing a significant threat to people's health and lives [[
For the problem of classifying benign and malignant lung nodules, some researchers have chosen to extract features manually, which are usually size [[
As the study progressed, the researchers found that classification lung nodules using deep neural networks required many samples. Due to the precious nature of medical images, which are difficult to obtain in large quantities, researchers have come up with the following solutions. Zhao et al. [[
Table 1 Overview of the existing classification methods
Reference Database Method Performance metrics Liu et al. [ LIDC-IDRI CNN+SVM ACC = 91.94% Chae et al. [ Chonbuk National University Hospital CT-lungNet AUC = 0.85 Naik et al. [ LUNA FractalNet ACC = 94.7% TNR = 90.41% TPR = 96.68% AUC = 0.98 Zhao et al. [ LIDC-IDRI Agile CNN ACC = 82.2% AUC = 0.877 Zhao et al. [ LIDC-IDRI Transfer learning AUC = 0.94 ResNet TPR = 94% ACC = 85% Nobrega et al. [ LIDC-IDRI Transfer learning ACC = 88.41% ResNet50+SVM AUC = 0.932 Xie et al. [ LIDC-IDRI SSAC ACC = 92.53% AUC = 0.958 Onishi et al. [ Fujita Health University Hospital GAN+DCNN TNP = 66.7% TPR = 93.9% Sun et al. [ LIDC-IDRI DBN ACC = 81% Xie et al. [ LIDC-IDRI MV-KBC ACC = 91.60% AUC = 0.957 Tran et al. [ LUNA16 CNN+Focal loss ACC = 97.2% TPR = 96.0% TNR = 97.3% Ali et al. [ LUNGx Transfer learning ACC = 90.46±0.25% Wang et al. [ JSRT Transfer learning TPR = 95.41% InceptionV3 TNR = 80.09% Mastouri et al. [ LUNA16 BCNN ACC = 91.99% AUC = 0.959
In this paper, to address the problems of insufficient samples and class imbalance, first CT slices are randomly cropped. Then, a Wasserstein Generative Adversarial Network (WGAN) with conditions is used to balance the class of the samples, and finally, five data augmentation methods such as flipping, rotating, and adding Gaussian noise are used to extend the dataset. As can be seen from Table 1, for the lung nodule classification problem, many researchers have used pre-trained networks to classify lung nodules. Still, the scalability, flexibility, and generalization of pre-trained networks are poor compared to customized networks. Therefore, in this paper, a DCA-Xception network is designed, which has higher flexibility and better classification performance than the fine-tuned pre-trained network. The network obtains information around the target through an adaptive dual-channel feature extraction module. The convolutional block attention module enables the network to learn effective features in a targeted manner.
Figure 1 shows the workflow of the DCA-Xception network for lung nodule classification. As can be seen in the figure, the regions of interest are first extracted from the collected data, followed by expanding the data using WGAN with conditions and data augmentation, followed by inputting the patches into the network for classification of lung nodules. There are two ways of classification, the first one directly uses the improved network to classify lung nodules, i.e., the fully connected layer acts as a classifier. In a second way, the output of the last global average pooling layer of the network is used as the feature result. Subsequently, the feature input for the six classifiers (logistic, multinomial Bayesian [[
Graph: Fig. 1 Workflow based on DCA-Xception for lung nodule classification.
The dataset used in this paper is the LIDC-IDRI public dataset, in which four physicians classify the malignancy of lung nodules. The degree of malignancy is classified into five categories, where categories one and two indicate benign, category three indicates indeterminate benignity, and categories four and five indicate malignancy. In this paper, nodules that are jointly identified by three or more physicians are selected, and their malignancy is determined by the average of the physician-labeled categories. A nodule is considered benign when the mean value of that nodule category is less than or equal to 2.5, nodules between 2.5 and 3.5 are removed, and nodules greater than or equal to 3.5 are considered malignant. A total of 326 nodules are selected, of which 274 nodules (154 benign and 120 malignant) are used for the training and validation sets, and 52 nodules (23 benign and 29 malignant) are used for the test set.
To make full use of the data, all CT scans slices covering the nodules are utilized, and each slice is treated as a sample. The total number of the generated slices and their distribution are shown in Table 2 Initial. In addition, patches with a size of 64×64, including the pulmonary nodules, are extracted after establishing the location of the pulmonary nodules based on the XML file as well as ground truth labels. The nodules are not concentrated in the center of the patches but are located at arbitrary positions. Four patches are extracted for each slice of 274 lung nodules using the above method to obtain lung nodules in different contexts. The number of patches obtained is divided into the training and validation sets by 9:1, as shown in Table 2 Original. Patches are extracted once for malignant nodule slices in the test set, and benign nodule slices are extracted in multiple rounds until their number approaches the number of malignant nodule patches, as shown in Test in the last column of Table 2.
Table 2 Specific settings for each stage of the data set
Class Initial Original Balance Augmentation Train Val Test Train Val Train Val Train Val Test Benign 598 66 81 2391 265 3626 (1235) 402 (137) 25382 2808 196 Malignancy 790 88 197 3162 350 3632 (470) 400 (50) 25424 2800 197 Total 1388 154 278 5553 615 7258 (1705) 802 (187) 50806 5608 393
In CT images, the number of slices of malignant nodules is usually much larger than the number of slices of benign nodules, which results in a smaller sample size than that of malignant nodules even though the number of benign nodules is high. To balance the sample, more benign nodules or fewer malignant nodules are needed. However, because access to medical images is protected by personal privacy and laws, it is difficult to obtain many samples. If the number of malignant nodules is reduced, the diversity of samples will weaken. Using GAN [[
Graph: Fig. 2 Structure of WGAN with conditions.
Graph: Fig. 3 Example of patch images.
To support deeper neural network training, as well as to avoid overfitting, this paper uses five data augmentation methods to expand the dataset again. The data augmentation methods include flipping up and down, flipping left and right, rotating counterclockwise (90°, 180°, 270°), randomly changing brightness, and adding Gaussian noise. The number of patches after data augmentation is shown in Table 2 Augmentation. An example of using five data augmentations is shown in Fig. 3(c).
Compared to classification networks of the same depth, the Xception network utilizes depth-wise separable convolution, which allows the network to have a smaller number of parameters at a better classification performance. However, the network is prone to misclassify the class of lung nodules due to its lack of ability to combine information around the target and extract effective features. Therefore, in this paper, the Xception network is improved to enhance its classification performance for lung nodules. The improved network structure is shown in Fig. 4. The main improvements are the addition of an adaptive dual-channel feature extraction module and a convolutional block attention module to the middle flow of the Xception network.
Graph: Fig. 4 DCA-Xception structure.
Only the 3×3 depth-wise separable convolution is utilized in the middle flow of the Xception network, which obtains single feature information and cannot effectively combine the information around the target. Therefore, in this paper, an adaptive dual-channel feature extraction module is added to the network, whose structure is shown in middle flow part A in Fig. 4. As seen from the figure, the module first performs N×N and 3×3 dual-channel feature extraction on the input features. It then adds adaptive coefficients α to the features extracted by the convolution kernel of N×N. Then the features extracted by the dual-channel are combined, the information is integrated, and the network parameters are reduced by 1×1 convolution. Finally, new features are extracted with a 3×3 depth-wise separable convolution. This paper uses two sets of adaptive dual-channel modules with N of 1 and 5, respectively. The network can use different receptive fields to obtain features and efficiently combine information around the target. Also, the residual structure is used to prevent gradient disappearance and explosion.
The network learns intricate features during the training process, but not all these features are valid. Therefore, the convolutional block attention module [[
In addition to the above improvements, to inherit the advantages of the small parameters of the Xception network, some adjustments are made to the parameters of the improved middle flow, mainly the number of repetitions of the middle flow is adjusted from 8 to 4, and the number of channels is reduced from 728 to 512.
The patch size of the input model is 64*64, and the data is normalized before input to improve the network convergence speed. The algorithm is experimented using the Adam optimization algorithm with a batch size of 64 and a learning rate of 0.001 on a GPU (NVIDIA Tesla K80 16GB). Adaptive decay of the learning rate is chosen. The loss values of the validation set are detected, and the learning rate decayed tenfold when two consecutive epochs (794 iterations per epoch) are no longer decreasing. To save computational resources and to prevent the network from overfitting, the early stopping strategy is chosen. The network stops training when the loss value of the validation set no longer decreases for five consecutive epochs.
To be able to backpropagate the gradients corresponding to each category in a stable manner, the problem of gradient disappearance during back propagation of the network is effectively solved. Therefore, in this paper, cross-entropy is used as the loss function, as shown in Equation (
(
In the formula, gi is the true class of sample i, and pi is the prediction of the network for sample i.
The evaluation metrics accuracy (ACC) Equation (
(
(
(
(
where TP, TN, FP, FN indicate the number of true positives, true negatives, false positives, false negatives, respectively.
(
The true positive rate t
Figure 5 shows the changes in the loss values and accuracy rates of the training and validation sets of the model proposed in this paper during the training process. From the figure, it shows that the accuracy rates of both the training and validation sets are increasing smoothly. The loss values of the training set decrease smoothly and converge gradually, but the loss values of the validation set show slight fluctuations and level off gradually after the 7th epoch. The lower the loss value is, the better the generalization performance of the model, so the weights under the 5th epoch are chosen to test the model's performance.
Graph: Fig. 5 Curves of loss and ACC.
Table 3 shows the experimental results of the ablation experiments. As can be seen from Table 3, A has poor testing results on the original dataset using the original model, which has less than 50% TNR. B uses WGAN with conditions for category balancing on the dataset while expanding the sample size by a small margin. Since the stationary features generated by WGAN make the network easier to learn, therefore, all the metrics in the experimental results are improved except TPR, in which the TNR is improved by 23.17%. C used five data augmentation methods, which greatly increase the amount of data. The network is less likely to be over-fitted and can learn more features. Compared with B, the AUC is improved by 0.098. D builds on the model of C by cutting the middle flow to 4 times and introducing an adaptive dual-channel feature extraction module.
Table 3 Ablation experiments
Model Data ACC (%) AUC F-Score (%) TNR (%) TPR (%) A.Xception Original 64.38 0.722 69.43 47.96 80.71 B.Xception Balance 72.01 (+7.63) 0.790 (+0.068) 72.22 (+2.79) 71.43 (+23.47) 72.59 (–8.12) C.Xception Augmentation 78.88 (+6.87) 0.888 (+0.098) 80.56 (+8.34) 70.41 (–1.02) 87.31 (+14.72) D.+Channel Augmentation 80.66 (+1.78) 0.904 (+0.016) 82.08 (+1.52) 72.96 (+2.55) 88.32 (+1.01) E.+Attention Augmentation 83.46 (+2.80) 0.923 (+0.019) 84.92 (+2.84) 73.98 (+1.02) 92.89 (+4.57)
Thus, the design of the adaptive dual-channel feature extraction module enables the model to combine the surrounding information when classifying the target and classify the lung nodules more accurately. In this case, the model achieves the optimal value of each index compared with A, B, and C. E is based on D and introduces the convolutional block attention module, enabling the model to learn features more accurately through spatial attention and channel attention module. This improvement makes the classification performance of the model improve again. Among them, the TPR improves by 4.57% to 92.89%.
In this paper, the without pre-trained LeNet [[
Table 4 Comparison with typical classification networks
Model ACC (%) AUC F-Score (%) TNR (%) TPR (%) LeNet 64.63 0.695 60.17 76.02 53.30 AlexNet 75.06 0.837 77.00 75.06 66.84 VGG16 75.83 0.841 77.11 70.41 81.22 ResNet50 78.12 0.846 78.92 74.49 81.73 InceptionV3 79.39 0.890 79.70 80.71 Xception 78.88 0.888 80.56 70.41 87.31 Proposed 73.98
Graph: Fig. 6 Comparison of the ROC curves of classification way 1 and classical classification network.
Graph: Fig. 7 Confusion matrix.
In this paper, the pre-trained networks VGG19, RestNet101, InceptionV3, MobileNetV2 [[
The experimental results are shown in Table 5, where the best values of each metric in all models in bolded font. Figure 8 shows the ROC curve for each pre-trained network. Combining Table 5 and Fig. 8, it shows that the fine-tuned pre-trained network has better TPR, but its TNR is lower. The proposed method achieves the best values for ACC, AUC, and F-Score, indicating that the model outperforms the above pre-trained models in classification and achieves a better balance of TPR and TNR.
Table 5 Comparison with pre-training network
Model Classifier ACC (%) AUC F-Score (%) TNR (%) TPR (%) VGG19 Logistic 76.34 0.834 79.47 61.22 91.37 Multinomial NB 78.88 0.879 81.18 66.84 90.86 KNN 77.61 0.817 80.44 63.27 91.88 Random Forest 76.34 0.859 79.37 61.73 90.86 SVM RBF 79.13 0.883 81.61 65.31 87.31 MLP 76.08 0.776 78.92 62.76 89.34 RestNet101 Logistic 79.64 0.885 81.40 70.41 88.83 Multinomial NB 78.37 0.803 80.46 67.86 88.83 KNN 81.68 0.860 81.91 80.61 82.74 Random Forest 80.15 0.855 81.34 73.98 86.29 SVM RBF 80.15 0.877 81.52 72.96 87.31 MLP 83.21 0.834 83.82 79.59 86.80 InceptionV3 Logistic 79.13 0.902 81.70 65.31 92.89 Multinomial NB 79.13 0.795 81.86 64.29 93.91 KNN 79.64 0.815 81.90 67.35 91.88 Random Forest 79.64 0.828 82.14 65.82 93.40 SVM RBF 80.15 0.873 82.74 65.31 94.92 MLP 79.90 0.809 81.84 69.39 90.36 Inception-ResNetV2 Logistic 80.15 0.895 82.66 65.82 94.42 Multinomial NB 78.88 0.821 81.68 63.78 93.91 KNN 80.92 0.848 82.99 68.88 92.89 Random Forest 81.42 0.886 82.28 71.43 91.37 SVM RBF 81.17 0.868 83.37 67.35 94.92 MLP 81.93 0.816 83.75 70.92 92.89 MobileNetV2 Logistic 79.13 0.865 81.11 68.88 89.34 Multinomial NB 79.64 0.798 81.04 72.45 86.80 KNN 79.90 0.845 81.50 71.43 88.32 Random Forest 80.66 0.859 81.99 73.47 87.82 SVM RBF 79.90 0.856 81.41 71.94 87.82 MLP 79.39 0.813 80.58 73.47 85.28 Xception Logistic 80.15 0.903 82.02 69.90 90.36 Multinomial NB 80.15 0.803 82.11 69.39 90.86 KNN 80.66 0.889 82.73 68.88 92.39 Random Forest 83.21 0.861 85.07 70.92 SVM RBF 80.15 0.899 82.27 68.37 91.88 MLP 80.66 0.835 82.88 67.88 93.4 DenseNet121 Logistic 80.66 0.869 81.64 75.51 85.79 Multinomial NB 78.88 0.799 80.65 69.90 87.82 KNN 80.66 0.840 81.09 78.57 82.74 Random Forest 81.17 0.858 82.46 73.98 88.32 SVM RBF 80.41 0.876 81.27 76.02 84.77 MLP 80.92 0.852 82.52 71.94 89.85 Proposed Logistic 83.72 85.05 75.00 92.39 Multinomial NB 83.21 0.837 84.65 73.98 92.39 KNN 0.883 79.59 90.86 Random Forest 0.871 85.99 90.35 SVM RBF 84.22 0.918 85.24 77.55 90.86 MLP 84.99 0.912 85.85 79.08 90.86
Graph: Fig. 8 Comparison of the ROC curves of classification way 2 and pre-trained network.
Table 6 shows the changes in the number of parameters, training speed, and inference speed after adding the adaptive dual-channel feature extraction module and the convolutional block attention module. As can be observed in the table, the improved model improves accuracy and AUC, with ACC increasing by 4.58% and AUC increasing by 0.035. Due to the reduced number of middle flow repetitions, the number of parameters of the model and the model complexity are slightly reduced. However, the training speed and inference speed of the model become slower due to more element-wise operations. For the classification of lung nodules, the importance of classification accuracy is greater than inference speed. Therefore, this model is acceptable to lose a small amount of inference speed while improving the classification accuracy. In addition, this paper explores the effect of the variation of N in the dual-channel feature extraction module on the classification performance. From Table 6, it can be seen that when N is both 1 or 5, the network obtains limited information around the target, which improves the classification performance of the network less. When N is 1,7 or 1,5, the network can perform better in classifying lung nodules because it receives more information about the target. Combining the indicators, the best classification performance of the network for pulmonary nodules is obtained when N is 1,5.
Table 6 Number of participants and speed
Model ACC% AUC Total params FLOPs Ms/step FPS Original None 78.88 0.888 20,890,736 20,864,641 97 18.54 +Channel 1,5 80.66 0.904 17,857,176 17,832,841 102 14.66 +Attention 1,5 83.46 0.923 20,258,852 20,227,401 114 13.63 DCA-Xception 1,1 81.93 0.915 20,209,699 20,178,249 114 13.87 DCA-Xception 5,5 81.93 0.911 20,308,003 20,276,553 115 13.49 DCA-Xception 1,7 83.72 0.911 20,308,003 20,276,553 115 13.51
Through a series of experiments described above, it is shown that the proposed method in this paper produces satisfactory results on the LIDC-IDRI dataset. Compared with classical classification networks and pre-trained networks in the same dataset, it has better performance in classifying benign and malignant lung nodules while maintaining a smaller number of parameters. In addition, the model proposed in this paper has better structural flexibility. The number and size of the convolutional kernels of the adaptive dual-channel feature extraction module can be changed to cope with different data sets. In this paper, two classification ways are utilized. The main difference is that classification way 1 utilizes fully connected layers for classification, while classification way 2 utilizes various classifiers for classification. Experiments show that using classifiers for classification will improve the accuracy of classification. Therefore, the classifiers can be optimized to further improve the classification accuracy. One of the limitations of this paper is the exclusion of benign and malignant easily confused pulmonary nodules between 2.5 and 3.5, so the classification performance of the proposed method for this type of nodules needs to be further investigated. In addition, since pulmonary nodule lesions are present in multiple CT slices, the method in this paper only targets the lesion regions in individual slices, ignoring the association between images of adjacent slices. Therefore, using the pixel association between lung nodule lesions in adjacent slices to improve the classification accuracy will be the next research focus.
In this paper, to address the problem of insufficient samples and unbalanced categories of lung nodules, five data augmentation methods and WGAN with conditions are utilized to expand the samples. An adaptive dual-channel feature extraction module and a convolutional block attention module are introduced in the middle flow of Xceotion to improve the classification performance of the model. The computational complexity, number of parameters, training speed and, inference speed of the DCA-Xception network are researched. The lung nodule classification performance is compared with that of classical classification networks and pre-trained networks to verify the effectiveness of the improved network. The experimental results show that the network outperforms the traditional classification network and the pre-trained network in classifying lung nodules. In summary, the method proposed in this paper, which has a better performance on the lung nodule classification task, can provide effective diagnostic decision support for physicians.
This work was supported by the National Natural Science Foundation of China (NO.51975170), Youth Innovation Fund of Heilongjiang Academy of Sciences (NO.CXJQ2020WL01), Basic Applied Technology of Heilongjiang Institutes Research Special Project (NO.ZNJZ2020WL01), Natural Science Foundation of Heilongjiang Province (NO.LH2019F024).
By Dongjie Li; Shanliang Yuan and Gang Yao
Reported by Author; Author; Author