The multi-sensor, multi-modal, composite design of medical images merged into a single image, contributes to identifying features that are relevant to medical diagnoses and treatments. Although, current image fusion technologies, including conventional and deep learning algorithms, can produce superior fused images, however, they will require huge volumes of images of various modalities. This solution may not be viable for some situations, where time efficiency is expected or the equipment is inadequate. This paper addressed a modified end-to-end Generative Adversarial Network(GAN), termed Loss Minimized Fusion Generative Adversarial Network (LMF-GAN), a triple ConvNet deep learning architecture for the fusion of medical images with a limited sampling rate. The encoding network is combined with a convolutional neural network layer and a dense block called GAN, in contrast to conventional convolutional networks. The loss is minimized by training GAN's discriminator with all the source images by learning more parameters to generate more features in the fused image. The LMF-GAN can produce fused images with clear textures through adversarial training of the generator and discriminator. The proposed fusion method has the ability to achieve state-of-the-art quality in objective and subjective evaluation, in comparison with current fusion methods. The model has experimented with standard data sets.
Keywords: Medical image fusion; generative adversarial network; generator; discriminator; ADAM optimizer
The growing need for different applications, based on images in remote control, video monitoring, and medical diagnostics has led to image fusion becoming a popular trend. The exponential growth of recent imaging expertise and the availability of a wide variety of various imaging methods, such as MRI, CT, and Positron Emission Tomography(PET) have illuminated the area of fusion of image, intern appealing to the community of medicine for its assistance. Besides, the key purpose of the image fusion should be recognized concerning other requirements: the relevant details should not be discarded in each input, objects or anomalies should be discarded and the image fused as far as possible must be robust and accurate. The key obstacles for imagery research are generally: the noise of images, efficient features representing each model and, the similarity in different ways, because the composition of data can be entirely various and unrelated statistically. In the light of this, fused image disease analysis [[
Research into image fusion, using multiple image fusion methods were proposed over more than three decades [[
Generative Adversarial Network is a triple CNN model where one CNN will act as a generator, that tries to generate images, similar to the actual images from the latent space [[
In Fusion GAN, the generator generates fused images from the concatenated input images instead of from a latent space [[
As training continues the generator extracts more features or parameters from the concatenated image and fuse it in the output image. The training continues to a point where the discriminator fails to discriminate between actual and fused images. This output fused image will have more parameters of one image with which the discriminator gets trained. The contribution of another image that is not used to train the discriminator will get less part in the fused image. Therefore high loss will be associated with the output fused image.
Considering the above-mentioned facts, in this research, investigate the best suited LMF-GAN network for multi-modal medical image fusion. The research aims to achieve the following goals:
- – Compare, analyze, and identify the most suitable DL network among the existing pre-trained VGG-11, VGG-13, VGG-16, VGG-19, Alexnet, Squeezenet, and LMF-GAN for our datasets for medical image fusion.
- – Develop a minimum loss, triple ConvNet framework, termed LMF-GAN, for medical images to obtain an improved quality fused image in a single frame.
- – To identify the robustness and effectiveness of our LMF-GAN network, the modal was evaluated quantitatively(13 performance measures) with five different pairs of publically available medical images.
The rest of this article is accordingly structured. Working on the GAN Network briefs in section 2. The proposed methodology is addressed in section 3. Section 4 and 5 presents details about the dataset and the quantitative evaluation measures. Section 6 presents experimental findings and analysis of the result. In section 7, the conclusion is given.
GAN targets to learn a distribution of probabilities, p
(
where the divergence between the two distributions are indicated by Div (·). D can be applied to calculate the divergence and formulate the objective function as:
(
where,
(
Therefore, Equation 1 can be transformed into:
(
The adversarial process constitutes a two-player Min-Max game. Consequently, the samples generated are extremely different from the actual data.
The proposed LMF-GAN conNet presents a loss minimizing fusion framework with GAN networks to attain a better fused resultant image as illustrated in Fig. 1. The proposed LMF-GAN conNet encompasses three stages, (i) Preprocessing of the source image (ii) Training Phase (iii) Testing phase. The outline of the entire LMF-GAN conNet as follows.
Graph: Fig. 1 Proposed Block Diagram.
The preprocessing stage encompasses data analyzation, augmentation, concatenation, and rescale. The acquired source images are remodeled to RGB image before augmentation. The quality and the degree of variance in the training data are improved first by the augmentation process to obtain stronger simplified models, invariant in certain forms of image transformation, and secondly by varying the image quality.
The augmented images are concatenated and pixel values are normalized from [0, 256] to [-1, 1] for training purposes. The order of the frames should be selected in a manner to avoid the color interference of the last channel of the first source image and the first channel of the second source image. The order of the layer arrangement of source images is important in the concatenation. The first layer consists of the R(Red) channel of MRI(provides structural information) followed by the R channel of PET(provides functional information). Further in the second layer, the G(Green) channel of MRI and G channel of PET followed by the B(Blue) channel of MRI and B channel of PET in the last layer. It has to be noted that in all the channels, the images that provide functional information is to be kept behind the images that provide structural information as it provides more structural part as shown in Fig. 2. The concatenated images are passed to the generator.
Graph: Fig. 2 (i) Channel-wise concatenation (ii) Fused image generated due to improper concatenation.
In general, depending on the number of channels provided in the input and output layers, the generator generates an image based on the parameters extracted from the image. The proposed method has 6 input channels and the modeled output image has 3 channels. While training, the generator fits the extracted parameters into three mentioned channels. In order to avoid color space getting disturbed, the stacking order is specified as RR1GG1BB1 and compressed the data in training. WhereR, G, B is theRGB channels of first source image and R1G1B1 is the RGB channels of the second source image.
The LMF-GAN is trained with datasets as shown in Table 4.
For every epoch, the discriminator learns to discriminate between true and fake images. The true images constitute the MRI and PET and are labeled as '1', and the fake images are generated by the generator from the concatenated images and are labeled as '0'. The GAN CNN model which is made up of the generator model followed by the trained discriminator gets trained. During this training, the generator generates the fused image that is passed to the pretrained discriminator model which discriminates it as either real or fake. The loss is calculated and feedback, from which the generator starts to learn more and more parameters from the MRI and PET images, as the discriminator learns source images.
As the training continues, more parameters are infused in the fused image, such that after a point the discriminator fails to discriminate between real and fake images. The training has to be stopped at this point, and the fused image gets noisy due to overfitting and spoils the aesthetic view of the image. The trained generator model is saved and is used to generate the fused resultant image.
CNN architecture is illustrated in Fig. 3, and the model has 4 convolution layers, a single densely connected output node which gets its input from a flattening layer used before it. The process starts with training the discriminator with MRI and PET images labeled as '1' (true) and generated images as '0' (fake). The discriminator will sample the image and extract the parameters to discriminate between true and fake images. All the CNN layers have a filter size of 3 × 3 and the number of filters used in the four layers is 27,81,243 and 347 respectively, which is found using trial and error methods. All the layers have a strong L2 regularisation to prevent overfitting. Batch normalization is used to maintain modal of the data distribution at the end of every layer, together with valid padding. Leaky relu is utilized in the convolution layers for better performance.
Graph: Fig. 3 Discriminator Architecture.
A dropout of 0.45 is used after flattening which is then fed into the single dense node. The output node is enabled with sigmoid function and binary cross-entropy loss is calculated and backpropagated. The discriminator model updates the optimum weight with the help of a properly tuned ADAM optimizer. The parameters or features used in the ADAM optimizer is tabulated in Table 1.
Table 1 ADAM optimizer parameters
Parameters Value Learning_rate 0.0002 Beta_1 0.5 Beta_2 0.89 Epsilon 1e-07
Dropout regularisation is used with 40% of nodes dropped. In all the layers, leaky relu with alpha = 0.2 is used and in the output layer sigmoid is used as activation functions. A binary cross-entropy loss function is used to train the discriminator. Valid padding is used in discriminator and stride size is 2 × 2. L2 regularizer is used in alternate layer with lambda = 0.001
The generator in LMF-GAN generates a 3 channel fused output image from a 6-channel concatenated input image with the layer of 5 layer convolution operation with 379, 243, 81, 27, and 3 filters in each layer with 5 × 5 filter in the first two layers and 3 × 3 in other three layers. The padding is maintained to ensure there is no change in the image's height and width. Batch normalisation and L2 regularisation are used to meet the proper CNN build. The generator architecture is shown in Fig. 4. The last convolution layer is activated using TanH. So the fused image will be generated in the range of [-1,1], which can then be scaled to 0 to 255. The rescaling of the input image with sigmoid causes a vanishing gradient compared to TanH. Therefore the output layer, which produces an image (in a range that matches with input range), is activated with TanH. The tuned ADAM optimizer which is a combination of RMS prop and SGD with momentum is used along with binary cross-entropy loss. The parameters for ADAM optimizer is tabulated in Table 1.
Graph: Fig. 4 Generator Architecture.
In all the layers, leaky relu with alpha = 0.09 is used and in the output layer, TanH is used as activation functions. The same padding is used in discriminator and stride size is (1 X 1).
As discussed above, the discriminator tries to classify between real images and fake images that are produced by the generator. When the generator learns parameters, it creates images with features that are similar to the input image features, therefore the discriminator fails to discriminate properly. At this point, the training should be stopped, else the generator will start creating fused images with noise as a result of overfitting.
The optimizer used for training is ADAM optimizer and the parameters utilized are tabulated in Table 1. Binary cross-entropy loss function is utilized for training and the batch size is 16 images per batch. The number of epochs used is 100 for training the GAN.
As training continues the generator extracts more features or parameters from the concatenated image and fuse it in the output image. The training continues to a point where the discriminator fails to discriminate between actual and fused images. The proposed method acquires more features or parameters of all the source image while training. The contribution of all the source images is utilized while training the discriminator to obtain more features of the fused image. Therefore minimum loss is associated with the output fused image with maximum parameters.
The proposed LMF-GAN triple ConvNet network was tested with 5 data sets pair. The particulars of the dataset are illustrated in Table 2.
Table 2 Source dataset details
Dataset Modality Evaluation Slice no Total pairs of Images (Normal &Abnormal) Organ Set 1 MR-Gad/PET 12 18 Brain Set 2 MR-Gad/SPECT-T1 17 44 Brain Set 3 MR-T1/PET 63 107 Brain Set 4 MR-T2/PET 13 19 Brain Set 5 MR-T2/SPECT 33 78 Brain
In general, the loss is calculated by a loss function. In the proposed GAN, the loss is calculated by a pre-trained, powerful ConvNet(discriminator). This well trained, discriminator reduces loss effectively.
The proposed model, LMF-GAN, was evaluated with different modality images collected from the website
Graph: Fig. 5 Source Images utilized for the Proposed System.
Objective evaluation metrics are important in estimating in what extent important features from source images transferred to the fused image. The performance evaluation metrics for the proposed system is based on quantitative/objective evaluation measures. The objective evaluation utilizes 12 non-reference performance metrics such as entropy(E), joint entropy (JE) [[
Table 3 Performance Metrics
Sl. No Quality Metrics Formula Preferred Value 1 Entropy, E High 2 Joint Entropy, JE Low 3 Cross Corelation, CC Near to 1 4 Mutual Information, MI High 5 Structural Similarity Index, SSIM Range: (0 to 1) Prefer: Nearer to 1 6 Edge Strength, (0 - 1) Nearer to 1 7 (-1 to 1) Nearer to 1 8 Image Spatial Quality Evaluation[ Low 9 Spatial Frequency, SF High 10 Fusion Symmetry, FS High 11 Fusion factor, FF High 12 Fusion Quality Index, FQI Range: (0 to 1) Prefer: Nearer to 1
The main objective of this experiment is to validate and compare our proposed LMF-GAN system with the existing network with visualization and objective criteria. The proposed work is compared with 6 other DL networks to prove our system effectiveness. The DL networks used for comparison with our proposed system MLIIF (Algorithm-7, A7) are VGG-19 [[
SqueezeNet [[
The details about the dataset are provided in section 4. The proposed network, LMF-GAN, trained with 50 pairs of datasets and tested with 5 pairs of datasets. All the fusion algorithms are implemented in the GPU of Google Colab.
The determination of the optimum point is critical in determining the result. Trained the dataset with 40 epochs to avoid the overfitting as shown in Fig. 8. It can be spotted that the optimum point reaches at epoch 40 based on the performance measures as shown in Fig. 8. At epoch 40, the fused image obtained with maximum parameters and less noise. If the training is below 40 epochs, it undergoes underfitting. If the training continues after 40 epoch, creates a more noisy fused final image as illustrated in Fig. 6 and causes overfitting. The results obtained for set 2 and set 5 initially as green tinted and finally rectified as shown in Fig. 7.
Graph: Fig. 6 Images with different epoches of the Proposed LMF-GAN network for set 5 PET/MR.
Graph: Fig. 7 Initial and final result of the proposed LMF-GAN network for set 2 and set 5.
Graph: Fig. 8 Training result with different epoches for the Proposed LMF-GAN network,
Fusion images obtained by the six DL networks and the proposed LMF-GAN network are shown in Fig. 9. It is perceived that the fused image produced by the proposed system preserves more detailed data on the yellow window with less noise and better visualization of the tissue structure.
Graph: Fig. 9 Fusion results of existing DL networks and proposed System for all the sets of data.
The objective evaluation is carried out with 12 non-reference performance measures as shown in Table 3. The objective evaluation results of five sets of datasets are illustrated in Table 4. The better results in the table are indicated in bold letters. All pairs of JE and MI values are similar in values. There is no change in values for JE and MI for the entire table for each pair of sets. The objective evaluation analyses of each pair of data are as follows:
- – It can be noted that entropy of the fused image, Ef [[
22 ]], SD, FF, SF, FQI, ISQE, and Qhnc is better for the proposed LMF-GAN network than compared to other algorithms for set 1. High information content in an image is given by higher entropy value. Higher FF includes reasonably strong detail from the source images. A similarity comparison between the images shows in CC and the values nearer to 1 is preferred. Higher SF provides good feature information and details. The information transfer rate from input to the fused image is provided by FQI. The lower ISQE displays the highest level of perception. Qhnc provides a transfer of information from the input image to a fused image. - – In set 2, better results are provided by the measure Ef, CC, SSIM, FF, FS, and FQI for the proposed system, LMF-GAN, than other algorithms. The SSIM provides structural information of the fused image. FS provides the dissimilarity of the fused resultant image with the source images.
- – The set 3 provides better results for LMF-GAN in Ef, CC, SSIM, FF, SF, and Q
hnc than other algorithms. - – Set 4 provides preferred values for LMF-GAN in Ef, CC, SSIM, FF, SF, and Q
hnc than existing algorithms. - – Set 5 shows better values in Ef, CC, FF, FS, SF, ISQE, and Q
hnc for the LMF-GAN system compared to other existing algorithms.
Table 4 Comparison of Performance Measures existing DL networks with proposed MLIIF for five sets of data
Data set DL N/W Ef CC JE MI SSIM FF FS SF FQI ISQE Qhnc MR-Gad/PET A1 [ 4.8893 0.5831 0.5145 1.7253 0.1119 8.8553 0.7794 0.4198 -0.1317 A2 [ 4.8788 0.5924 0.5148 1.7600 0.1140 8.6548 0.7798 0.4213 -0.1319 A3 [ 5.0182 0.5486 0.5181 1.7590 0.1006 7.5927 0.7691 0.4503 -0.1297 A4 [ 5.0780 0.5638 0.5204 1.8016 6.9470 0.7644 0.4625 -0.1291 A5 [ 4.7205 0.6127 0.5204 1.7135 0.1123 7.2401 0.7856 0.4246 -0.1343 A6 [ 4.9814 0.6961 1.7751 6.4800 0.7686 0.4555 -0.1304 A7 0.5120 0.1012 MR-Gad/SPECT-T1 A1 [ 4.6514 0.3401 0.4890 0.9504 0.3460 0.4734 -0.1616 A2 [ 4.6033 0.4002 0.4906 0.9476 0.3428 15.7540 0.4734 0.3985 -0.1629 A3 [ 4.8054 0.4134 0.4891 0.9424 0.3458 13.3489 0.4756 0.4160 -0.1576 A4 [ 4.8133 0.4268 0.4915 0.9343 0.3462 11.9222 0.4744 0.4183 A5 [ 4.5668 0.4469 0.4923 0.9627 0.3348 12.9068 0.4748 0.4028 -0.1641 A6 [ 4.7105 0.4409 0.4974 0.9344 0.3359 11.1058 0.4740 0.4194 -0.1601 A7 14.5403 0.4061 -0.1603 MRT1/PET A1 [ 4.0346 0.6826 0.6312 1.4174 0.0394 10.4373 0.2894 -0.1723 A2 [ 4.0041 0.6725 0.6329 1.4188 0.0408 10.3293 0.2902 0.3323 -0.1731 A3 [ 4.3225 0.6392 0.6197 1.4362 0.0330 8.9676 0.2829 0.3856 -0.1645 A4 [ 4.3232 0.6938 0.6265 1.5119 8.2315 0.4023 -0.1655 A5 [ 3.9895 0.6698 0.6318 1.3565 0.0369 8.7309 0.2770 0.3515 -0.1728 A6 [ 4.2366 0.7022 0.6271 1.4130 0.0300 7.7037 0.2959 0.3812 -0.1668 A7 0.0351 0.2903 0.3751 MRT2/PET A1 [ 4.8337 0.6943 0.5259 1.7709 0.1072 8.1032 -0.1346 A2 [ 4.8264 0.7052 0.5265 1.7479 0.1063 7.9461 0.7921 0.4188 -0.1347 A3 [ 4.9649 0.7149 0.5272 1.8028 0.1004 6.9327 0.7835 0.4454 -0.1328 A4 [ 5.0113 0.7275 0.5307 1.8400 0.1028 6.1975 0.7771 0.4566 -0.1323 A5 [ 4.7317 0.7328 0.5304 1.7212 0.0992 6.9396 0.7952 0.4197 -0.1366 A6 [ 4.9500 0.7437 0.5299 1.7850 6.1333 0.7822 0.4498 -0.1333 A7 0.1125 0.7752 0.4407 MRT2/SPECT A1 [ 3.8933 0.5026 0.6172 1.2291 0.1371 9.8538 0.8103 0.4731 -0.1723 A2 [ 3.8670 0.5502 0.6184 1.2286 0.1365 9.6206 0.4735 -0.1731 A3 [ 4.0597 0.5219 0.6203 1.2715 0.1314 8.4368 0.8053 0.4801 -0.1683 A4 [ 4.0759 0.5438 0.6242 1.2950 0.1267 7.4978 0.8031 0.4826 -0.1684 A5 [ 3.7031 0.5475 0.6238 1.2143 0.1337 7.7944 0.8103 0.4755 -0.1780 A6 [ 4.0226 0.5491 1.2752 0.1226 7.1131 0.8017 0.4817 -0.1698 A7 0.6113 0.8013
It can be perceived that the fused resultant image entropy is higher than the input images for all the pairs and algorithms. The fused image provides better details than the source individual image.
Based on all objective and subjective evaluations, it can be concluded that the MLIIF system is robust and provides more information than existing DL networks for all the sets of images.
The information loss from the fused image is a very serious matter for diagnosis and treatment in the medical field. This paper proposes a new loss minimizing GAN ConNet framework for medical image fusion method, termed as LMF-GAN. The advantage of the proposed network is the need for limited training data sets. The optimum point to terminate the training is as low as 40 epochs. The proposed LMF-GAN is an end-to-end model, which can avoid designing complicated activity level measurement and fusion rule manually as in traditional fusion strategies. Experiments are demonstrated on the public dataset with 12 non-reference performance measures. The quantitative comparisons with six state-of-the-arts reveal that our proposed LMF-GAN produced a better result.
This work can be further extended for other sets of multi-modal medical image fusion, multi-exposure image fusion, and multi-focus image fusion. Also can be extended with building a more generalized model by training with more dataset.
By Rekha R. Nair; Tripty Singh; Rashmi Sankar; Klement Gunndu; Sabu M. Thampi; El-Sayed M. El-Alfy and Ljiljana Trajkovic
Reported by Author; Author; Author; Author; Author; Author; Author