Multi-modal medical image fusion using LMF-GAN - A maximum parameter infusion technique

Gunndu, Klement ; Singh, Tripty ; et al.

In: Journal of Intelligent & Fuzzy Systems, Jg. 41 (2021-11-17), S. 5375-5386

Online unknown

Zugriff:

Volltext (PDF)

The multi-sensor, multi-modal, composite design of medical images merged into a single image, contributes to identifying features that are relevant to medical diagnoses and treatments. Although, current image fusion technologies, including conventional and deep learning algorithms, can produce superior fused images, however, they will require huge volumes of images of various modalities. This solution may not be viable for some situations, where time efficiency is expected or the equipment is inadequate. This paper addressed a modified end-to-end Generative Adversarial Network(GAN), termed Loss Minimized Fusion Generative Adversarial Network (LMF-GAN), a triple ConvNet deep learning architecture for the fusion of medical images with a limited sampling rate. The encoding network is combined with a convolutional neural network layer and a dense block called GAN, in contrast to conventional convolutional networks. The loss is minimized by training GAN’s discriminator with all the source images by learning more parameters to generate more features in the fused image. The LMF-GAN can produce fused images with clear textures through adversarial training of the generator and discriminator. The proposed fusion method has the ability to achieve state-of-the-art quality in objective and subjective evaluation, in comparison with current fusion methods. The model has experimented with standard data sets.

Multi-modal medical image fusion using LMF-GAN - A maximum parameter infusion technique

The multi-sensor, multi-modal, composite design of medical images merged into a single image, contributes to identifying features that are relevant to medical diagnoses and treatments. Although, current image fusion technologies, including conventional and deep learning algorithms, can produce superior fused images, however, they will require huge volumes of images of various modalities. This solution may not be viable for some situations, where time efficiency is expected or the equipment is inadequate. This paper addressed a modified end-to-end Generative Adversarial Network(GAN), termed Loss Minimized Fusion Generative Adversarial Network (LMF-GAN), a triple ConvNet deep learning architecture for the fusion of medical images with a limited sampling rate. The encoding network is combined with a convolutional neural network layer and a dense block called GAN, in contrast to conventional convolutional networks. The loss is minimized by training GAN's discriminator with all the source images by learning more parameters to generate more features in the fused image. The LMF-GAN can produce fused images with clear textures through adversarial training of the generator and discriminator. The proposed fusion method has the ability to achieve state-of-the-art quality in objective and subjective evaluation, in comparison with current fusion methods. The model has experimented with standard data sets.

Keywords: Medical image fusion; generative adversarial network; generator; discriminator; ADAM optimizer

1 Introduction

The growing need for different applications, based on images in remote control, video monitoring, and medical diagnostics has led to image fusion becoming a popular trend. The exponential growth of recent imaging expertise and the availability of a wide variety of various imaging methods, such as MRI, CT, and Positron Emission Tomography(PET) have illuminated the area of fusion of image, intern appealing to the community of medicine for its assistance. Besides, the key purpose of the image fusion should be recognized concerning other requirements: the relevant details should not be discarded in each input, objects or anomalies should be discarded and the image fused as far as possible must be robust and accurate. The key obstacles for imagery research are generally: the noise of images, efficient features representing each model and, the similarity in different ways, because the composition of data can be entirely various and unrelated statistically. In the light of this, fused image disease analysis [[20]] and prediction [[23]] had to be discussed in a real application to help clinicians determine the human perception of medical images and is constrained by its subjective existence. Multimodal medical image fusion aims to integrate extra details from various recorded sources to acquire more reliable information [[8]]. In this context, image fusion is split into three levels: level of the pixel [[12]], level of the feature [[7]], and level of decision [[21]]. Fusion at the pixel level is a low image fusion level that deals with pixels from the sensor output. Features take out from input images are used to conduct the Fusion process at the feature level. Image fusion on the decision level covers the descriptors of images. The most commonly used techniques include Principal Component Analysis (PCA), intensity-hue-saturation (IHS), and multi-resolution analysis among various image fusion techniques.

Research into image fusion, using multiple image fusion methods were proposed over more than three decades [[19]]. The image fusion schemes comprise various methods on the basis of multi-scale decomposition [[13]], transform domain techniques [[11]]; hybrid frequency domain methods [[2]], sparse based methods [[14]], deep convolutional neural network-based methods [[17]], and few more methods [[10]]. Apart from deep learning, most traditional fusion framework systems normally comprise three major factors: the transformation of the image, measurement of the activity level, and design of the fusion rule [[18]]. Manually designed steps are required for the above. Such fusion systems are extremely complex as regards to better fusion results. It is extremely challenging to build an ideal solution manually by considering the difficulty of implementation and computational complexity [[16]]. Deep learning methods are presently used for the substitution of a portion of traditional methods and are not directly applied for end-to-end mapping. CNN was learned to replace the calculation of activity level in conventional methods [[15]]. Moreover, conventional methods for deep learning requires multiple input images.

Generative Adversarial Network is a triple CNN model where one CNN will act as a generator, that tries to generate images, similar to the actual images from the latent space [[26]]. And the other CNN model acts as a discriminator which is trained in such a way to discriminate between actual images and generated images. The third CNN model is a combination of the generator CNN model followed by the trained discriminator model. As the GAN training continues, the discriminator continues to compare between the generator produced fused image and the source image till such a stage where the similarity between the actual image and generated image is achieved. The training is stopped at the point when the discriminator fails to discriminate between the actual or real image and generated image.

In Fusion GAN, the generator generates fused images from the concatenated input images instead of from a latent space [[17]]. The generator generates an n-channel fused output image from a 2n-channel input, which is a concatenated image of two n-channel images. The discriminator on the other side will be trained with one of the input images that contributes to the maximum parameters or features as a true image and generated images from the generator as a fake image.

As training continues the generator extracts more features or parameters from the concatenated image and fuse it in the output image. The training continues to a point where the discriminator fails to discriminate between actual and fused images. This output fused image will have more parameters of one image with which the discriminator gets trained. The contribution of another image that is not used to train the discriminator will get less part in the fused image. Therefore high loss will be associated with the output fused image.

Considering the above-mentioned facts, in this research, investigate the best suited LMF-GAN network for multi-modal medical image fusion. The research aims to achieve the following goals:

– Compare, analyze, and identify the most suitable DL network among the existing pre-trained VGG-11, VGG-13, VGG-16, VGG-19, Alexnet, Squeezenet, and LMF-GAN for our datasets for medical image fusion.

– Develop a minimum loss, triple ConvNet framework, termed LMF-GAN, for medical images to obtain an improved quality fused image in a single frame.

– To identify the robustness and effectiveness of our LMF-GAN network, the modal was evaluated quantitatively(13 performance measures) with five different pairs of publically available medical images.

The rest of this article is accordingly structured. Working on the GAN Network briefs in section 2. The proposed methodology is addressed in section 3. Section 4 and 5 presents details about the dataset and the quantitative evaluation measures. Section 6 presents experimental findings and analysis of the result. In section 7, the conclusion is given.

2 Generative adversarial networks

GAN targets to learn a distribution of probabilities, pG (x), as an estimate pdata (x) from the real distribution. The sample, x = G (z). where z is the noise variable. It solves the problem by training a generator G and a discriminator D simultaneously [[4]] to form an adversarial process. G generates samples from latent space by sampling noise. D determines whether the sample is from Pdata (x) or PG (x). Through the constant adverse process, samples generated by G gradually approximate true or real samples. The formula G for optimization can be defined as:

(1) $G^{*} = \arg \min_{G} Div (P_{G} (x), P_{data} (x))$

where the divergence between the two distributions are indicated by Div (·). D can be applied to calculate the divergence and formulate the objective function as:

(2) $D^{*} = \arg \max_{D} V (G, D)$

where,

(3) $\begin{matrix} V (G, D) = \\ E_{x \sim p_{data}} [logD (x)] + E_{x \sim p_{G}} [\log (1 - D (x))] \end{matrix}$

Therefore, Equation 1 can be transformed into:

(4) $G^{*} = \arg \min_{G} \max_{D} V (G, D)$

The adversarial process constitutes a two-player Min-Max game. Consequently, the samples generated are extremely different from the actual data.

3 Proposed methodology

The proposed LMF-GAN conNet presents a loss minimizing fusion framework with GAN networks to attain a better fused resultant image as illustrated in Fig. 1. The proposed LMF-GAN conNet encompasses three stages, (i) Preprocessing of the source image (ii) Training Phase (iii) Testing phase. The outline of the entire LMF-GAN conNet as follows.

Graph: Fig. 1 Proposed Block Diagram.

3.1 Preprocessing

The preprocessing stage encompasses data analyzation, augmentation, concatenation, and rescale. The acquired source images are remodeled to RGB image before augmentation. The quality and the degree of variance in the training data are improved first by the augmentation process to obtain stronger simplified models, invariant in certain forms of image transformation, and secondly by varying the image quality.

3.1.1 Concatenation of multiple RGB images

The augmented images are concatenated and pixel values are normalized from [0, 256] to [-1, 1] for training purposes. The order of the frames should be selected in a manner to avoid the color interference of the last channel of the first source image and the first channel of the second source image. The order of the layer arrangement of source images is important in the concatenation. The first layer consists of the R(Red) channel of MRI(provides structural information) followed by the R channel of PET(provides functional information). Further in the second layer, the G(Green) channel of MRI and G channel of PET followed by the B(Blue) channel of MRI and B channel of PET in the last layer. It has to be noted that in all the channels, the images that provide functional information is to be kept behind the images that provide structural information as it provides more structural part as shown in Fig. 2. The concatenated images are passed to the generator.

Graph: Fig. 2 (i) Channel-wise concatenation (ii) Fused image generated due to improper concatenation.

In general, depending on the number of channels provided in the input and output layers, the generator generates an image based on the parameters extracted from the image. The proposed method has 6 input channels and the modeled output image has 3 channels. While training, the generator fits the extracted parameters into three mentioned channels. In order to avoid color space getting disturbed, the stacking order is specified as RR1GG1BB1 and compressed the data in training. WhereR, G, B is theRGB channels of first source image and R1G1B1 is the RGB channels of the second source image.

3.2 Training

The LMF-GAN is trained with datasets as shown in Table 4.

3.2.1 LMF-GAN

For every epoch, the discriminator learns to discriminate between true and fake images. The true images constitute the MRI and PET and are labeled as '1', and the fake images are generated by the generator from the concatenated images and are labeled as '0'. The GAN CNN model which is made up of the generator model followed by the trained discriminator gets trained. During this training, the generator generates the fused image that is passed to the pretrained discriminator model which discriminates it as either real or fake. The loss is calculated and feedback, from which the generator starts to learn more and more parameters from the MRI and PET images, as the discriminator learns source images.

As the training continues, more parameters are infused in the fused image, such that after a point the discriminator fails to discriminate between real and fake images. The training has to be stopped at this point, and the fused image gets noisy due to overfitting and spoils the aesthetic view of the image. The trained generator model is saved and is used to generate the fused resultant image.

3.2.2 Discriminator architecture

CNN architecture is illustrated in Fig. 3, and the model has 4 convolution layers, a single densely connected output node which gets its input from a flattening layer used before it. The process starts with training the discriminator with MRI and PET images labeled as '1' (true) and generated images as '0' (fake). The discriminator will sample the image and extract the parameters to discriminate between true and fake images. All the CNN layers have a filter size of 3 × 3 and the number of filters used in the four layers is 27,81,243 and 347 respectively, which is found using trial and error methods. All the layers have a strong L2 regularisation to prevent overfitting. Batch normalization is used to maintain modal of the data distribution at the end of every layer, together with valid padding. Leaky relu is utilized in the convolution layers for better performance.

Graph: Fig. 3 Discriminator Architecture.

A dropout of 0.45 is used after flattening which is then fed into the single dense node. The output node is enabled with sigmoid function and binary cross-entropy loss is calculated and backpropagated. The discriminator model updates the optimum weight with the help of a properly tuned ADAM optimizer. The parameters or features used in the ADAM optimizer is tabulated in Table 1.

Table 1 ADAM optimizer parameters

Parameters	Value
Learning_rate	0.0002
Beta_1	0.5
Beta_2	0.89
Epsilon	1e-07

Dropout regularisation is used with 40% of nodes dropped. In all the layers, leaky relu with alpha = 0.2 is used and in the output layer sigmoid is used as activation functions. A binary cross-entropy loss function is used to train the discriminator. Valid padding is used in discriminator and stride size is 2 × 2. L2 regularizer is used in alternate layer with lambda = 0.001

3.2.3 Generator architecture

The generator in LMF-GAN generates a 3 channel fused output image from a 6-channel concatenated input image with the layer of 5 layer convolution operation with 379, 243, 81, 27, and 3 filters in each layer with 5 × 5 filter in the first two layers and 3 × 3 in other three layers. The padding is maintained to ensure there is no change in the image's height and width. Batch normalisation and L2 regularisation are used to meet the proper CNN build. The generator architecture is shown in Fig. 4. The last convolution layer is activated using TanH. So the fused image will be generated in the range of [-1,1], which can then be scaled to 0 to 255. The rescaling of the input image with sigmoid causes a vanishing gradient compared to TanH. Therefore the output layer, which produces an image (in a range that matches with input range), is activated with TanH. The tuned ADAM optimizer which is a combination of RMS prop and SGD with momentum is used along with binary cross-entropy loss. The parameters for ADAM optimizer is tabulated in Table 1.

Graph: Fig. 4 Generator Architecture.

In all the layers, leaky relu with alpha = 0.09 is used and in the output layer, TanH is used as activation functions. The same padding is used in discriminator and stride size is (1 X 1).

3.2.4 Determining optimum point to stop training

As discussed above, the discriminator tries to classify between real images and fake images that are produced by the generator. When the generator learns parameters, it creates images with features that are similar to the input image features, therefore the discriminator fails to discriminate properly. At this point, the training should be stopped, else the generator will start creating fused images with noise as a result of overfitting.

3.2.5 Hyperparameters used in training the GAN

The optimizer used for training is ADAM optimizer and the parameters utilized are tabulated in Table 1. Binary cross-entropy loss function is utilized for training and the batch size is 16 images per batch. The number of epochs used is 100 for training the GAN.

3.2.6 Significance of the maximum parameter infusion technique in GAN

As training continues the generator extracts more features or parameters from the concatenated image and fuse it in the output image. The training continues to a point where the discriminator fails to discriminate between actual and fused images. The proposed method acquires more features or parameters of all the source image while training. The contribution of all the source images is utilized while training the discriminator to obtain more features of the fused image. Therefore minimum loss is associated with the output fused image with maximum parameters.

3.3 Testing

The proposed LMF-GAN triple ConvNet network was tested with 5 data sets pair. The particulars of the dataset are illustrated in Table 2.

Table 2 Source dataset details

Dataset	Modality	Evaluation Slice no	Total pairs of Images (Normal &Abnormal)	Organ
Set 1	MR-Gad/PET	12	18	Brain
Set 2	MR-Gad/SPECT-T1	17	44	Brain
Set 3	MR-T1/PET	63	107	Brain
Set 4	MR-T2/PET	13	19	Brain
Set 5	MR-T2/SPECT	33	78	Brain

3.4 Loss reduction in the ConNet framework

In general, the loss is calculated by a loss function. In the proposed GAN, the loss is calculated by a pre-trained, powerful ConvNet(discriminator). This well trained, discriminator reduces loss effectively.

4 Dataset utilized for evaluation

The proposed model, LMF-GAN, was evaluated with different modality images collected from the website www.med.harvard.edu and is displayed in Fig. 5. The following Table 2 provides a brief description of source image data set.

Graph: Fig. 5 Source Images utilized for the Proposed System.

5 Quantitative evaluation measures

Objective evaluation metrics are important in estimating in what extent important features from source images transferred to the fused image. The performance evaluation metrics for the proposed system is based on quantitative/objective evaluation measures. The objective evaluation utilizes 12 non-reference performance metrics such as entropy(E), joint entropy (JE) [[19]], cross correlation(CC) [[1]], mutual information(MI), stuctural similarity index metric (SSIM), edge strenghth(Q _ abf), Q _ hnc, image spatial quality evalution(ISQE), spatial frequency (SF), Fusion Symmetry(FS), and fusion factor(FF) are illustrated in Table 3. The paper [[20]] summarizes all relevant non-reference performance metrics used in our research. The ISQE values are measured in 100's.

Table 3 Performance Metrics

Sl. No	Quality Metrics	Formula	Preferred Value
1	Entropy, E	$E = - \sum_{i = 0}^{i = 1} \log_{2} (x_{i}) \Pr (x_{i})$	High
2	Joint Entropy, JE	JE = H (X, Y) = ∑_x=χ∑_y=γP (x, y) logP (x, y)	Low
3	Cross Corelation, CC	$CC (x, y) = \frac{2 \sum_{i = 0}^{N - 1} \sum_{j = 0}^{N - 1} I_{1} (i, j) . I_{f} (i, j)}{\sum_{i = 0}^{N - 1} \sum_{j = 0}^{N - 1} \| I_{1} (i, j) \|^{2} + \sum_{i = 0}^{N - 1} \sum_{j = 0}^{N - 1} \| I_{f} (i, j) \|^{2}}$	Near to 1
4	Mutual Information, MI	MI = MI^XF + MI^YF	High
5	Structural Similarity Index, SSIM	$SSIM = \frac{SSIMVAL 1 + SSIMVAL 2}{2}$	Range: (0 to 1)
			Prefer: Nearer to 1
6	Edge Strength, Q_abf[23]	$Q_{abf} = \frac{\sum_{a = 1}^{A} \sum_{b = 1}^{B} Q^{XF} (a, b) W^{X} (a, b) + Q^{YF} (a, b) W^{Y} (a, b)}{\sum_{a = 1}^{A} \sum_{b = 1}^{B} W^{X} (a, b) + W^{Y} (a, b)}$	(0 - 1)
			Nearer to 1
7	Q_hnc[23]	$Q_{hnc} = 0.2 ((\frac{I_{(a, f)}}{H_{a} + H_{f}}) + (\frac{I_{(b, f)}}{H_{b} + H_{f}}))$ (-1 to 1)
			Nearer to 1
8	Image Spatial Quality Evaluation[23]	ISQE = brisque (imf)	Low
9	Spatial Frequency, SF	$SpatialFrequency, SF = \sqrt{({RF}^{2} + {CF}^{2})}$	High
10	Fusion Symmetry, FS	$FS = abs (\frac{I_{im 1 imf}}{I_{im 1 imf} + I_{im 2 imf}} - 0.5)$	High
11	Fusion factor, FF	FF = I_im1imf + I_im2imf	High
12	Fusion Quality Index, FQI	FQI = ∑ (c (w) (λ (w) QI (I₁, I_f\|w) ) + (1 - λ (w) QI (I₁, I_f\|w) ) )	Range: (0 to 1)
			Prefer: Nearer to 1

6 Results and discussion

The main objective of this experiment is to validate and compare our proposed LMF-GAN system with the existing network with visualization and objective criteria. The proposed work is compared with 6 other DL networks to prove our system effectiveness. The DL networks used for comparison with our proposed system MLIIF (Algorithm-7, A7) are VGG-19 [[3]] (Algorithm-1, A1), VGG-16 [[24]] (Algorithm-2, A2), VGG-13 [[25]] (Algorithm-3, A3), VGG-11 [[6]] (Algorithm-4, A4), AlexNet [[9]] (Algorithm-5, A5),

SqueezeNet [[5]] (Algorithm-6, A6).

6.1 Experimental settings

The details about the dataset are provided in section 4. The proposed network, LMF-GAN, trained with 50 pairs of datasets and tested with 5 pairs of datasets. All the fusion algorithms are implemented in the GPU of Google Colab.

6.2 Training on LMF-GAN network

The determination of the optimum point is critical in determining the result. Trained the dataset with 40 epochs to avoid the overfitting as shown in Fig. 8. It can be spotted that the optimum point reaches at epoch 40 based on the performance measures as shown in Fig. 8. At epoch 40, the fused image obtained with maximum parameters and less noise. If the training is below 40 epochs, it undergoes underfitting. If the training continues after 40 epoch, creates a more noisy fused final image as illustrated in Fig. 6 and causes overfitting. The results obtained for set 2 and set 5 initially as green tinted and finally rectified as shown in Fig. 7.

Graph: Fig. 6 Images with different epoches of the Proposed LMF-GAN network for set 5 PET/MR.

Graph: Fig. 7 Initial and final result of the proposed LMF-GAN network for set 2 and set 5.

Graph: Fig. 8 Training result with different epoches for the Proposed LMF-GAN network,

6.3 Result of proposed network

6.3.1 Subjective evaluation

Fusion images obtained by the six DL networks and the proposed LMF-GAN network are shown in Fig. 9. It is perceived that the fused image produced by the proposed system preserves more detailed data on the yellow window with less noise and better visualization of the tissue structure.

Graph: Fig. 9 Fusion results of existing DL networks and proposed System for all the sets of data.

6.3.2 Objective evaluation

The objective evaluation is carried out with 12 non-reference performance measures as shown in Table 3. The objective evaluation results of five sets of datasets are illustrated in Table 4. The better results in the table are indicated in bold letters. All pairs of JE and MI values are similar in values. There is no change in values for JE and MI for the entire table for each pair of sets. The objective evaluation analyses of each pair of data are as follows:

– It can be noted that entropy of the fused image, Ef [[22]], SD, FF, SF, FQI, ISQE, and Qhnc is better for the proposed LMF-GAN network than compared to other algorithms for set 1. High information content in an image is given by higher entropy value. Higher FF includes reasonably strong detail from the source images. A similarity comparison between the images shows in CC and the values nearer to 1 is preferred. Higher SF provides good feature information and details. The information transfer rate from input to the fused image is provided by FQI. The lower ISQE displays the highest level of perception. Qhnc provides a transfer of information from the input image to a fused image.

– In set 2, better results are provided by the measure Ef, CC, SSIM, FF, FS, and FQI for the proposed system, LMF-GAN, than other algorithms. The SSIM provides structural information of the fused image. FS provides the dissimilarity of the fused resultant image with the source images.

– The set 3 provides better results for LMF-GAN in Ef, CC, SSIM, FF, SF, and Qhnc than other algorithms.

– Set 4 provides preferred values for LMF-GAN in Ef, CC, SSIM, FF, SF, and Qhnc than existing algorithms.

– Set 5 shows better values in Ef, CC, FF, FS, SF, ISQE, and Qhnc for the LMF-GAN system compared to other existing algorithms.

Table 4 Comparison of Performance Measures existing DL networks with proposed MLIIF for five sets of data

Data set	DL N/W	Ef	CC	JE	MI	SSIM	FF	FS	SF	FQI	ISQE	Qhnc
MR-Gad/PET
	A1 [3]	4.8893	0.5831	6.6548	0.7424	0.5145	1.7253	0.1119	8.8553	0.7794	0.4198	-0.1317
	A2 [24]	4.8788	0.5924	6.6548	0.7424	0.5148	1.7600	0.1140	8.6548	0.7798	0.4213	-0.1319
	A3 [25]	5.0182	0.5486	6.6548	0.7424	0.5181	1.7590	0.1006	7.5927	0.7691	0.4503	-0.1297
	A4 [6]	5.0780	0.5638	6.6548	0.7424	0.5204	1.8016	0.1003	6.9470	0.7644	0.4625	-0.1291
	A5 [9]	4.7205	0.6127	6.6548	0.7424	0.5204	1.7135	0.1123	7.2401	0.7856	0.4246	-0.1343
	A6 [5]	4.9814	0.6961	6.6548	0.7424	0.5220	1.7751	0.1003	6.4800	0.7686	0.4555	-0.1304
	A7	5.3612	0.7091	6.6548	0.7424	0.5120	1.9835	0.1012	8.8612	0.9110	0.4190	-0.1205
MR-Gad/SPECT-T1
	A1 [3]	4.6514	0.3401	4.4184	0.1070	0.4890	0.9504	0.3460	16.1083	0.4734	0.3982	-0.1616
	A2 [24]	4.6033	0.4002	4.4184	0.1070	0.4906	0.9476	0.3428	15.7540	0.4734	0.3985	-0.1629
	A3 [25]	4.8054	0.4134	4.4184	0.1070	0.4891	0.9424	0.3458	13.3489	0.4756	0.4160	-0.1576
	A4 [6]	4.8133	0.4268	4.4184	0.1070	0.4915	0.9343	0.3462	11.9222	0.4744	0.4183	-0.1574
	A5 [9]	4.5668	0.4469	4.4184	0.1070	0.4923	0.9627	0.3348	12.9068	0.4748	0.4028	-0.1641
	A6 [5]	4.7105	0.4409	4.4184	0.1070	0.4974	0.9344	0.3359	11.1058	0.4740	0.4194	-0.1601
	A7	4.8647	0.4467	4.4184	0.1070	0.4986	0.9847	0.3018	14.5403	0.4819	0.4061	-0.1603
MRT1/PET
	A1 [3]	4.0346	0.6826	4.5147	0.7916	0.6312	1.4174	0.0394	10.4373	0.2894	0.3316	-0.1723
	A2 [24]	4.0041	0.6725	4.5147	0.7916	0.6329	1.4188	0.0408	10.3293	0.2902	0.3323	-0.1731
	A3 [25]	4.3225	0.6392	4.5147	0.7916	0.6197	1.4362	0.0330	8.9676	0.2829	0.3856	-0.1645
	A4 [6]	4.3232	0.6938	4.5147	0.7916	0.6265	1.5119	0.0275	8.2315	0.3043	0.4023	-0.1655
	A5 [9]	3.9895	0.6698	4.5147	0.7916	0.6318	1.3565	0.0369	8.7309	0.2770	0.3515	-0.1728
	A6 [5]	4.2366	0.7022	4.5147	0.7916	0.6271	1.4130	0.0300	7.7037	0.2959	0.3812	-0.1668
	A7	4.3437	0.7026	4.5147	0.7916	0.6352	1.5641	0.0351	10.4380	0.2903	0.3751	-0.1632
MRT2/PET
	A1 [3]	4.8337	0.6943	6.4378	0.8580	0.5259	1.7709	0.1072	8.1032	0.7929	0.4178	-0.1346
	A2 [24]	4.8264	0.7052	6.4378	0.8580	0.5265	1.7479	0.1063	7.9461	0.7921	0.4188	-0.1347
	A3 [25]	4.9649	0.7149	6.4378	0.8580	0.5272	1.8028	0.1004	6.9327	0.7835	0.4454	-0.1328
	A4 [6]	5.0113	0.7275	6.4378	0.8580	0.5307	1.8400	0.1028	6.1975	0.7771	0.4566	-0.1323
	A5 [9]	4.7317	0.7328	6.4378	0.8580	0.5304	1.7212	0.0992	6.9396	0.7952	0.4197	-0.1366
	A6 [5]	4.9500	0.7437	6.4378	0.8580	0.5299	1.7850	0.0970	6.1333	0.7822	0.4498	-0.1333
	A7	5.1098	0.7473	6.4378	0.8580	0.5310	1.9804	0.1125	8.2063	0.7752	0.4407	-0.1290
MRT2/SPECT
	A1 [3]	3.8933	0.5026	4.9881	0.3111	0.6172	1.2291	0.1371	9.8538	0.8103	0.4731	-0.1723
	A2 [24]	3.8670	0.5502	4.9881	0.3111	0.6184	1.2286	0.1365	9.6206	0.8112	0.4735	-0.1731
	A3 [25]	4.0597	0.5219	4.9881	0.3111	0.6203	1.2715	0.1314	8.4368	0.8053	0.4801	-0.1683
	A4 [6]	4.0759	0.5438	4.9881	0.3111	0.6242	1.2950	0.1267	7.4978	0.8031	0.4826	-0.1684
	A5 [9]	3.7031	0.5475	4.9881	0.3111	0.6238	1.2143	0.1337	7.7944	0.8103	0.4755	-0.1780
	A6 [5]	4.0226	0.5491	4.9881	0.3111	0.6258	1.2752	0.1226	7.1131	0.8017	0.4817	-0.1698
	A7	4.9873	0.5525	4.9881	0.3111	0.6113	1.3679	0.0877	9.8654	0.8013	0.4617	-0.1654

It can be perceived that the fused resultant image entropy is higher than the input images for all the pairs and algorithms. The fused image provides better details than the source individual image.

Based on all objective and subjective evaluations, it can be concluded that the MLIIF system is robust and provides more information than existing DL networks for all the sets of images.

7 Conclusion

The information loss from the fused image is a very serious matter for diagnosis and treatment in the medical field. This paper proposes a new loss minimizing GAN ConNet framework for medical image fusion method, termed as LMF-GAN. The advantage of the proposed network is the need for limited training data sets. The optimum point to terminate the training is as low as 40 epochs. The proposed LMF-GAN is an end-to-end model, which can avoid designing complicated activity level measurement and fusion rule manually as in traditional fusion strategies. Experiments are demonstrated on the public dataset with 12 non-reference performance measures. The quantitative comparisons with six state-of-the-arts reveal that our proposed LMF-GAN produced a better result.

This work can be further extended for other sets of multi-modal medical image fusion, multi-exposure image fusion, and multi-focus image fusion. Also can be extended with building a more generalized model by training with more dataset.

References 1 Chavan S.S. and Talbar S.N., Multimodality image fusion in frequency domain for radiation therapy, In 2014 International Conference on Medical Imaging, m-Health and Emerging Communication Systems (MedCom), pages 174–178. IEEE, (2014). 2 Doke A.R., Singh T., Shantanu K. and Nayar R., Comparative analysis of wavelet transform methods for fusion of ct and pet images, In 2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI), pages 2152–2156. IEEE, (2017). 3 Garcia-Gasulla D., Parés F., Vilalta A., Moreno J., Ayguadé E., Labarta J., Cortés U. and Suzumura T., On the behavior of convolutional nets for feature extraction, Journal of Artificial Intelligence Research. 61 (2018), 563-592. 4 Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A. and Bengio Y., Generative adversarialnets, Advances in Neural Information Processing Systems. 27 (2014), 2672-2680. 5 Iandola F.N., Han S., Moskewicz M.W., Ashraf K., Dally W.J. and Keutzer K., Squeezenet: Alexnet-level accuracy with 50x fewer parameters and<0.5 mb model size, arXiv preprint arXiv:1602.07360, (2016). 6 Iglovikov V. and Shvets A., Ternausnet: U-net with vgg11 encoder pre-trained on imagenet for image segmentation, arXiv preprint arXiv:1801.05746, (2018). 7 James A.P. and Dasarathy B., A review of feature and data fusion with medical images, arXiv preprint arXiv:1506.00097, (2015). 8 James A.P. and Dasarathy B.V., Medical image fusion: A survey of the state of the art, Information Fusion. 19 (2014), 4-19. 9 Krizhevsky A., Sutskever I. and Hinton G.E., Imagenet classification with deep convolutional neural networks, In Advances in Neural Information Processing Systems (2012), 1097–1105. Li H., Manjunath B. and Mitra S.K., Multisensor image fusion using the wavelet transform, Graphical Models and Image Processing. 57 (3) (1995), 235-245. Li S., Kwok J.T. and Wang Y., Multifocus image fusion using artificial neural networks, Pattern Recognition Letters. 23 (8) (2002), 985-997. Li S., Kang X., Fang L., Hu J. and Yin H., Pixel-level image fusion: A survey of the state of the art, Information Fusion. 33 (2017), 100-112. Liu Y., Jin J., Wang Q., Shen Y. and Dong X., Region level based multi-focus image fusion using quaternion wavelet and normalized cut, Signal Processing. 97 (2014), 9-30. Liu Y., Chen X., Ward R.K. and Wang Z.J., Image fusion with convolutional sparse representation, IEEE Signal Processing Letters. 23 (12) (2016), 1882-1886. Liu Y., Chen X., Peng H. and Wang Z., Multi-focus image fusion with a deep convolutional neural network, Information Fusion. 36 (2017), 191-207. Liu Y., Chen X., Wang Z., Wang Z.J., Ward R.K. and Wang X., Deep learning for pixel-level image fusion: Recent advances and future prospects, Information Fusion. 42 (2018), 158-173. Ma J., Yu W., Liang P., Li C. and Jiang J., Fusiongan: A generative adversarial network for infrared and visible image fusion, Information Fusion. 48 (2019), 11-26. Maneesha P., Singh T., Nayar R. and Kumar S., Multi modal medical image fusion using convolution neural network, In 2019 Third International Conference on Inventive Systems and Control (ICISC), pages 351–357. IEEE, (2019). Nair R.R. and Singh T., Multi-sensor, multi-modal medical image fusion for color images:Amulti-resolution approach, In 2018 Tenth International Conference on Advanced Computing (ICoAC), pages 249–254. IEEE, (2018). Nair R.R. and Singh T., Multi-sensor medical image fusion using pyramid-based dwt: a multi-resolution approach, IET Image Processing. 13 (9) (2019), 1447-1459. Nair R.R. and Singh T., Multi-modal based msmif using hybrid fusion with 1-d wavelet transform, International Journal of Advanced Science and Technology. 29 (5) (2020), 5353-5368. Nair R.R., Nayar R., Singh T. and Kumar S., Modified level cut liver segmentation from ct images, In 2017 Ninth International Conference on Advanced Computing (ICoAC), pages 186–191. IEEE, (2017). Nie D., Zhang H., Adeli E., Liu L. and Shen D., 3d deep learning for multi-modal imaging-guided survival time prediction of brain tumor patients, In International conference on medical image computing and computer-assisted intervention, pages 212–220. Springer, (2016). Qassim H., Feinzimer D. and Verma A., Residual squeeze vgg16. arXiv preprint arXiv:1705.03004, (2017). Simonyan K. and Zisserman A., Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556, (2014). Xu H., Liang P., Yu W., Jiang J. and Ma J., Learning a generative model for fusing infrared and visible images via conditional generative adversarial network with dual discriminators, In IJCAI, pages 3954–3960, (2019).

By Rekha R. Nair; Tripty Singh; Rashmi Sankar; Klement Gunndu; Sabu M. Thampi; El-Sayed M. El-Alfy and Ljiljana Trajkovic

Reported by Author; Author; Author; Author; Author; Author; Author

Titel:	Multi-modal medical image fusion using LMF-GAN - A maximum parameter infusion technique
Autor/in / Beteiligte Person:	Gunndu, Klement ; Singh, Tripty ; Nair, Rekha R. ; Sankar, Rashmi
Link:	Volltext (PDF) View record in OpenAIRE (Volltext) https://doi.org/10.3233/jifs-189860
Zeitschrift:	Journal of Intelligent & Fuzzy Systems, Jg. 41 (2021-11-17), S. 5375-5386
Veröffentlichung:	IOS Press, 2021
Medientyp:	unknown
ISSN:	1875-8967 (print) ; 1064-1246 (print)
DOI:	10.3233/jifs-189860
Schlagwort:	Statistics and Probability Image fusion Computer science business.industry Infusion technique ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION General Engineering 020206 networking & telecommunications 02 engineering and technology Modal Artificial Intelligence 0202 electrical engineering, electronic engineering, information engineering 020201 artificial intelligence & image processing Computer vision Artificial intelligence business
Sonstiges:	Nachgewiesen in: OpenAIRE

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

BibTeX Citavi, JabRef, u.a.
(Literaturverwaltung)

PDF kein Volltext!
(Merkzettel, Notizen)

RIS Endnote, Citavi u.a.
(Literaturverwaltung)

MODS
(XML zur Weiterverarbeitung)

oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

Gewünschter Zitations-Stil:

oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.