In response to the difficulty of plant leaf disease detection and classification, this study proposes a novel plant leaf disease detection method called deep block attention SSD (DBA_SSD) for disease identification and disease degree classification of plant leaves. We propose three plant leaf detection methods, namely, squeeze-and-excitation SSD (Se_SSD), deep block SSD (DB_SSD), and DBA_SSD. Se_SSD fuses SSD feature extraction network and attention mechanism channel, DB_SSD improves VGG feature extraction network, and DBA_SSD fuses the improved VGG network and channel attention mechanism. To reduce the training time and accelerate the training process, the convolutional layers trained in the Image Net image dataset by the VGG model are migrated to this model, whereas the collected plant leaves disease image dataset is randomly divided into training set, validation set, and test set in the ratio of 8:1:1. We chose the PlantVillage dataset after careful consideration because it contains images related to the domain of interest. This dataset consists of images of 14 plants, including images of apples, tomatoes, strawberries, peppers, and potatoes, as well as the leaves of other plants. In addition, data enhancement methods, such as histogram equalization and horizontal flip were used to expand the image data. The performance of the three improved algorithms is compared and analyzed in the same environment and with the classical target detection algorithms YOLOv4, YOLOv3, Faster RCNN, and YOLOv4 tiny. Experiments show that DBA_SSD outperforms the two other improved algorithms, and its performance in comparative analysis is superior to other target detection algorithms.
Keywords: disease detection; degree classification of disease; data enhancement; target recognition; SSD
Plants are susceptible to various diseases, thereby affecting their quality and yield seriously. The formulation of prevention and control plans as soon as possible before the outbreak of the diseases can maximize the effect of prevention and control and reduce economic losses. Therefore, the identification of plant diseases is an effective way to inhibit the rapid development of diseases and avoid their occurrence. Previously, People are used to making subjective judgments by crop disease category, and often disease detection is expert-based, making it a costly and error-prone process.
Agricultural detection based on artificial intelligence, such as crop yield prediction [[
In recent years, CNNs have been increasingly incorporated in plant phenotyping concepts. They have been very successful in modeling complicated systems, owing to their ability of distinguishing patterns and extracting regularities from data. Examples further extent to variety identification in seeds [[
Building a fast and high classification accuracy model is necessary to determine the detection quality of plant disease. The current mainstream target recognition networks include YOLO series, Faster RCNN, SSD [[
This study focuses on proposing a novel end-to-end plant disease detection algorithm called Deep Block Attention SSD (DBA_SSD) for plant leaves. Our main work and contributions are presented as follows:
- (
1 ) We proposed a novel end-to-end detection algorithm for plant disease, DBA_SSD, by combining the attention mechanism and convolution kernel, which combines the attributes of the plant leaf disease pictures and pay more attention to disease details when testing plant disease. - (
2 ) We graded the health of the fruit and vegetable leaves. According to the research results of the paper, different measures can be taken according to the severity of the diseases of the fruit and vegetable leaves. Increasing the yield of plants is of great significance. - (
3 ) We implemented the classic SSD, YOLOv4, YOLOv3, Faster RCNN, and YOLOv4 tiny models and compared them with our proposed DBA_SSD. Our method is better than the classic baseline method on the vegetable and fruit leaf data set.
The main structure of this article is presented as follows. The first chapter mainly introduces the related work on the detection of leaf disease and combs the detection technology of leaf disease. The second chapter introduces the SSD model and related improvement modules and proposes two improved methods for the SSD target detection algorithm. The third chapter introduces the environment of algorithm experiment, data set structure, experiment procedure, and experiment evaluation standard. The fourth chapter conducts a comparative analysis of the results of the two sets of experiments and related ablation experiments on the proposed DBA_SSD. The other is a comparative analysis of the results of SSD improved algorithms and other target detection algorithms. Finally, we summarize and prospect the research in this article.
At present, the research methods on plant disease recognition mainly focuses on two aspects: one is disease recognition based on machine learning, and the general steps include diseased leaf image segmentation, feature extraction, and disease recognition; and the other is target recognition technology based on deep learning, wherein terminal end-to-end target detection is favored by many researchers because of its fast recognition speed and efficient feature extraction methods. End-to-end target detection algorithm is also called one-stage target detection algorithm. One-stage means that no candidate frames are generated and the target frame localization problem is directly transformed into a regression problem processing.
In the research on the identification of plant diseases based on machine learning, Literature [[
In deep learning-based research on fruit and vegetable diseases, Salma Samiei [[
The SSD algorithm model is a one-stage real-time target detection model proposed simultaneously with YOLO series. SSD combines the one-stage regression prediction idea of the YOLO series and the Anchor Box mechanism of the Faster RCNN by using VGG as the base feature extraction network and extracting six different size feature layers from the bottom to the top layer as the regression prediction features. The advantage of SSD is that it improves the operation speed of the algorithm greatly while maintaining the detection accuracy. Moreover, the detection of small targets and large objects are considered. Figure 1 shows the SSD backbone network structure.
The loss function of SSD contains log loss for classification and smooth L1 for regression, and controls the proportion of positive and negative samples, which can improve the speed of optimization and the stability of training results. The total loss function is the sum of the errors of classification and regression.
Total loss:
(
Classified losses:
(
Of which:
(
represents whether the i-th regression box matches the j-th GroundTruth box of type P.
Regression of losses:
(
SSD adopts full convolution for direct regression prediction and no longer generates candidate frames, which greatly improves the detection speed of SSD network. But there are some cases where the detection accuracy is not as good as we expect. When the surface features of leaves are similar or leaves are occluded from each other, SSD will miss and mis-detect, which often occurs in the actual leaf disease detection. For this reason, SSD needs to be improved to enhance feature recognition.
Se_Block [[
To increase the feature extraction capability of SSD feature extraction model and focus more on the feature layers with higher importance, this paper adds Se_Block attention mechanism module in front of the last six effective feature layers used for regression prediction on the basis of SSD model. The feature layers are rescaled by channel dimension. The structure of Se_SSD network is shown in Figure 3.
The residual network module, which is a module with good application in the last two years, is shown in Figure 4a. X is the input feature map, Wi is the weight of the ith layer network, F (X, Wi) + X is the feature output, and F (X, Wi) + X is how the data are computed in the module. The residual network is superior to the traditional convolutional network. The residual network module implements an ultra-deep network and avoids the bottleneck problem of saturating the neural network with correctness due to continuous deepening. In addition, by directly connecting the input and output to achieve the goal of simplifying the learning objective and difficulty. 1 × 1 convolution is shown in Figure 4b, and 1 × 1 convolution is usually followed by a nonlinear layer of Relu for nonlinearization to learn more features. In addition to this 1 × 1 convolution's can change the dimensionality of the image and transform the original image by 1 × 1 convolution to improve the generalization ability to reduce overfitting, and at the same time reduce the computational effort by boosting and reducing the number of channels to achieve cross-channel information interaction and feature integration in the process.
As shown in Figure 5, two kinds of rich feature extraction modules are designed in this paper, as shown in Figure 5a, Deep_Block is used to enhance the network feature extraction capability by using 1 × 1 convolution to reduce the number of channels after convolution, fusing multi-channel information, while introducing a residual structure to prevent the loss of feature layer information. Deep_Block_Attention adds a channel attention mechanism at the end of the Deep_Block structure for fine-tuning at the channel level. As shown in Figure 5b, the feature extraction network of SSD is reconstructed with the rich feature extraction module as the basic feature extraction unit, as shown in Figure 6, to deepen the feature extraction of each layer and increase the richness of feature learning by the rich feature extraction module.
This experiment is a deep learning model built under the Pytorch deep learning framework, using a dataset of 3000 plant leaves, and the final output prediction frame identifies the leaf species and determines the severity of leaf disease. The experiments were conducted on an Asus laptop from Shanghai, China, with an AMD Ryzen 7 4800H processor, NVIDIA Ge-Force RTX 2060 graphics card, and 32G RAM. The deep learning framework we use is Pytorch.
We chose the PlantVillage dataset [[
To ensure the equalization of the dataset and to increase the richness and quality of the dataset, data enhancement and image preprocessing were performed on the images before the experimental tests [[
To better test the performance of the improved algorithm, four experiments were designed. Se_SSD with channel attention mechanism added at the end of the feature extraction network, DB_SSD (Deep Block SSD) with improved VGG feature extraction network, DBA_SSD with fusion of the improved VGG network and channel attention mechanism, and SSD of the original network are compared, and the VGG model trained on Image Net image dataset is trained by migrated convolutional layers to this model.
Experiment 1: The Se_SSD network with the Se_Block channel attention mechanism added is trained and the average accuracy of this network for the detection of plant leaves is tested.
Experiment 2: The DB_SSD network with the Deep_Block module added, where the Deep_Block module does not contain the attention mechanism, is trained in the environment and hardware conditions of Experiment 1.
Experiment 3. The DBA_SSD network with the Deep_Block_Attention module added, where the Deep_Block_Attention module containing the attention mechanism, is trained and tested under the environment and hardware conditions of Experiment 1.
Experiment 4. The original SSD network is trained and tested under the environment and hardware conditions of Experiment 1.
All the four experiments were trained on the basis of 15,000 plant leaf datasets and tested 1500 randomly selected images. The experiments followed the experimental flow in Figure 9, the experiment-comparison-optimization-experiment pattern, to obtain the average accuracy mAP under this model and to compare the mAP values of different models.
Precision is a measure of the accuracy of a model's prediction, and its value is equal to the number of correctly predicted positive samples over the total number of positively predicted samples. Recall (Recall) is a measure of the model's ability to identify positive samples, and its value is the number of correctly predicted positive samples over the total number of positively predicted samples. The prediction results of the model are shown in Table 1 for TP, FP, FN, and TN.
True Positives (TP): indicates the number of correctly identified positive samples; True Negatives (TN): indicates the number of correctly identified negative samples; False Positives (FP): indicates the number of incorrectly identified negative samples; False Negatives (FN): indicates the number of incorrectly identified positive samples.
(
(
The PR curve is a graph drawn with Recall as the horizontal axis and Precision as the vertical axis; Precision is negatively correlated with Recall, and the recall rate decreases as precision increases. AP (Average Precision) as a single category indicator is the integration of PR curve.
(
The value of mAP (mean average precision), as one of the important metrics for the evaluation of the whole model, is the average of the summation of all the category APs.
(
where n is the category and N is the total number of categories.
The first 50 Epochs were trained by freezing some of the network layer weights, and each batch was trained with 8 images. For the last 50 Epochs, the frozen layers were unfrozen and the full network was trained. The learning rate started at 5 × 10
The test results between SSD and its improved algorithm are shown in Table 2. DBA_SSD has the highest accuracy because Deep Block strengthens the network's feature extraction ability on the one hand, and it incorporates the channel attention mechanism to accelerate the network learning on the other hand, so that the network focuses on the channels with high information content for feature learning. The prediction accuracy between its SSD and its improved algorithm for predicting different species of fruit and vegetable diseases is shown in Figure 11. The prediction accuracy of DBA_SSD is relatively high among most of the categories, and the mAP value of DBA_SSD is 92.20%, while the mAP values of SSD, Se_SSD, and DB_SSD are 9.96%, 90.77%, and 89.93%, respectively.
Further observe the data distribution of the experimental results in Figure 12. The horizontal coordinates indicate the improved algorithm types, the vertical coordinates are the distribution of predicted AP values for the 15 types, the points of the triangle indicate the mean, and the thin solid line in the middle of the rectangle indicates the median. From Figure 12, we can see that among the four algorithms SSD, Se_SSD, DB_SSD, and DBA_SSD, DBA_SSD prediction accuracy is more concentrated. Moreover, the median and mean are the highest. DBA_SSD algorithm has better performance compared with other improved algorithms.
This experiment compares and analyzes the test results of the classical target detection algorithms YOLOv4 [[
The disease degree of each plant leaf in this article can be divided into three categories: healthy, normal and severe (Table 3). Figure 14 then averages the detection accuracy of the same leaves on the basis of Table 3. The prediction accuracy of this category is the average of the sum of the prediction accuracy of the three degrees of leaves. Therefore, its horizontal coordinates indicate different target detection algorithms, and its vertical coordinates indicate the average prediction accuracy and the total average prediction accuracy (mAP) of different kinds of plant leaves.
Compared with DBA_SSD, YOLOv4 has lower prediction accuracy for Strawberry and Chili, YOLOv4 tiny has weaker prediction ability for Tomato, and YOLOv3 has lower prediction accuracy for Strawberry. This is the learning difference caused by different algorithms of feature extraction networks focusing on different information of the learned images, and DBA_SSD solves this deficiency by covering all levels of semantic information. The rightmost column indicates the average detection accuracy of the DBA_SSD algorithm in different categories, with the highest classification accuracy of 100% and the lowest of 82.24%.
Figure 15 shows that YOLOv4 corresponds to the largest rectangular box area, and its upper quartile edge is close to 100%, indicating the existence of a certain number of prediction accuracies higher than 95%. However, its predicted category accuracy is more discrete. YOLOv3 has a smaller rectangular area, but its distance at the top of the rectangle is not as far as DBA_SSD, indicating that the number of its higher accuracy is not as high as DBA_SSD. Although the upper quartile line of SSD is in contact with the 100% line, its rectangle area is larger, indicating that the prediction accuracy varies widely and is unstable. The rectangle box area of DBA_SSD is the smallest among other algorithms, indicating that the prediction accuracy is more concentrated and is closer to the 100% line, suggesting that a large part of the prediction accuracy is high and the prediction of each kind is more stable. The experiment shows that the DBA_SSD model has a high accuracy rate for the recognition of fruit and vegetable leaves, and the SSD is a one-stage target recognition algorithm with the advantage of fast recognition speed. The comprehensive performance of DBA_SSD has been improved compared with the previous SSD, and the performance is also higher compared with other target detection algorithms. The detection effect is shown in Figure 16.
In the above experiments, we not only compare the performance of different improved algorithms, but also compare the performance of DBA_SSD with other classical target detection algorithms. The following is the performance comparison of each algorithm:
Table 4 shows the FPS, the number of parameters, and computational complexity for different algorithms based on the same image input. We can see that DBA_SSD has lower number of parameters than other classical target detection algorithms except YOLOv4-tiny method, but a little bit more parameters than SSD, SE_SSD and DB_SSD. It is worth mentioning that the fps of DBA_SSD is not reduced too much. The algorithm can be applied to students' academic research, scientific algorithm research, but it is still far from agricultural applications. The real-time performance of the algorithm still needs to be improved. Another shortcoming is that the algorithm has a high accuracy only for the currently trained species. If the plants that need to be predicted are not mentioned in this paper, they need to be retrained. But on the other hand, the algorithm is more effective if it is applied to the disease identification of the same plant only. At the same time, considering that individual differences occur in the same plant growing in different environments, we add pictures of individual differences of the same plant in the data enhancement process, so that the individual differences will not affect the final detection results and make the algorithm proposed in this paper generalize better. The algorithm proposed in this paper is able to detect plant diseases early in their development and take timely control measures, which helps to reduce production costs. At the commercial scale, it is clear that capital investment in the adopted method is initially required [[
In this paper, we discuss work related to plant disease detection and enhance the number and variety of datasets by performing spatial transformations as well as pixel processing based on the original dataset. To address the problem of low recognition rate and low accuracy of SSD model, we propose a DBA_SSD network model for plant leaf detection by incorporating 1 × 1 convolution, residual network and attention mechanism in the SSD algorithm. In our experiments we compare several classical target detection algorithms and verify the efficacy of DBA_SSD algorithm in plant disease detection. The experiments show that the DBA_SSD algorithm improves the accuracy to 92.20% and has high robustness and speed. The significance of this algorithm is to be able to detect the disease at the early stage of plant disease in time, so as to prevent the disease and reduce the economic loss in time. This is of great significance for disease control. The shortcoming of the algorithm in this paper is that the algorithm is still too far from being applied in real production, so future work will focus on optimizing the algorithm and implanting it easily into embedded devices so that it can be applied to the real-time monitoring of agricultural plant diseases.
Graph: Figure 1 SSD backbone network structure.
Graph: Figure 2 Se_Block Attention Module.
Graph: Figure 3 Se_SSD network structure.
Graph: Figure 4 (a) Residual network module and (b) 1 × 1 convolution.
Graph: Figure 5 Enriched feature extraction module ((a) Deep_Block, a feature extraction module combining residual network and 1 × 1 convolution; (b) Deep_Block_Attention, a feature extraction module adding an attention mechanism to (a)).
Graph: Figure 6 DBA_SSD network structure.
Graph: Figure 7 Data set composition structure.
Graph: Figure 8 Data Enhancement. (a) Positive sample (healthy); (b) Negative sample (Diseases).
Graph: information-12-00474-g008b.tif
Graph: Figure 9 Experimental flow.
Graph: Figure 10 SSD and its improved algorithm loss variation graph.
DIAGRAM: Figure 11 AP diagram of SSD and its improved algorithm for the detection of different kinds of diseases.
DIAGRAM: Figure 12 Box diagram of SSD and its improvement algorithm.
Graph: Figure 13 Target detection algorithm loss diagram.
MAP: Figure 14 Heat map of correlation between different target detection algorithms and vegetable and fruit leaf types.
Graph: Figure 15 Box plot of AP statistics under target detection algorithm.
Graph: Figure 16 DBA_SSD recognition effect.
Graph: information-12-00474-g016b.tif
Table 1 Confusion matrix.
True Class Predict class TP FP FN TN
Table 2 Comparison of accuracy of improved SSD algorithm.
Target Identification Methods Inserted Modules mAP SSD \ 89.96% Se_SSD Se_Block 90.77% DB_SSD Deep_Block 89.93% DBA_SSD Deep_Block_Attention 92.20%
Table 3 Comparison of the accuracy of the improved SSD model and other target detection algorithms for the detection of different kinds of diseases.
Category YOLOv4 YOLOv4 Tiny YOLOv3 SSD Faster RCNN DBA_SSD Algorithm Apple(general) 94.79% 78.32% 88.87% 83.45% 74.85% 91.83% Apple(health) 100.00% 99.78% 94.83% 100.00% 100.00% 99.73% Apple(severe) 82.27% 88.01% 90.71% 88.93% 88.20% 91.56% Chili(general) 73.99% 92.32% 81.60% 83.89% 90.54% 90.65% Chili(health) 98.75% 100.00% 100.00% 100.00% 100.00% 99.12% Chili(severe) 73.74% 72.70% 91.55% 83.94% 100.00% 88.86% Potatoes(general) 92.71% 89.41% 88.37% 80.35% 88.65% 92.72% Potatoes(health) 94.74% 100.00% 100.00% 100.00% 99.80% 100.00% Potatoes(severe) 98.08% 91.18% 89.32% 87.46% 82.88% 82.24% Strawberry(general) 63.52% 80.89% 59.83% 80.69% 73.48% 85.37% Strawberry(health) 99.52% 100.00% 100.00% 100.00% 100.00% 100.00% Strawberry(severe) 67.46% 76.44% 78.95% 84.11% 92.64% 95.07% Tomato(general) 85.70% 85.64% 95.81% 82.85% 85.02% 85.54% Tomato(health) 100.00% 94.58% 100.00% 100.00% 89.14% 91.67% Tomato(severe) 80.35% 64.31% 78.24% 93.69% 86.33% 88.65% mAP 87.04% 87.57% 89.21% 89.96% 90.10% 92.20%
Table 4 Performance comparison of target detection algorithms.
Algorithm Backbone Model Image Size Parameters FPS GFLOPs YOLOv4 CSPDarkNet53 512 × 512 64.62 M 62 45.96GMac YOLOv4-tiny CSPDarknet53-tiny 512 × 512 5.91 M 75 5.19 GMac Faster RCNN VGG16 512 × 512 136.98 M 9 86.0 GMac YOLOv3 darknet53 512 × 512 61.6 M 34 49.7 GMac SSD VGG16 512 × 512 25.48 M 45 85.6 GMac SE_SSD VGG16_SE 512 × 512 25.60M 43 85.62 GMac DB_SSD VGG_DB 512 × 512 30.55 M 40 86.6 GMac DBA_SSD VGG_DBA 512 × 512 30.57M 40 86.6GMac
All authors contributed to this work. J.W. designed the research and processed the corresponding data. J.W. wrote the first draft of the manuscript. J.Y. and H.D. gave some guidance about methods. Writing—review and editing, J.Y. and L.Y. All authors have read and agreed to the published version of the manuscript.
This research was funded by the Higher Education Project of Guizhou Province (No. [2020]005, No. [2020]009); the Science and Technology Project of Guizhou Province (No. [2019]3003).
The data used to support this study's findings are available from the corresponding author upon request.
The authors declare no conflict of interest.
By Jun Wang; Liya Yu; Jing Yang and Hao Dong
Reported by Author; Author; Author; Author