Identification of Alfalfa Leaf Diseases Using Image Recognition Technology

Feng Qin; Dongxia Liu; Bingda Sun; Liu Ruan; Zhanhong Ma; Haiguang Wang

doi:10.1371/journal.pone.0168274

Abstract

Common leaf spot (caused by Pseudopeziza medicaginis), rust (caused by Uromyces striatus), Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and Cercospora leaf spot (caused by Cercospora medicaginis) are the four common types of alfalfa leaf diseases. Timely and accurate diagnoses of these diseases are critical for disease management, alfalfa quality control and the healthy development of the alfalfa industry. In this study, the identification and diagnosis of the four types of alfalfa leaf diseases were investigated using pattern recognition algorithms based on image-processing technology. A sub-image with one or multiple typical lesions was obtained by artificial cutting from each acquired digital disease image. Then the sub-images were segmented using twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree, and linear discriminant analysis). After a comprehensive comparison, the segmentation method integrating the K_median clustering algorithm and linear discriminant analysis was chosen to obtain lesion images. After the lesion segmentation using this method, a total of 129 texture, color and shape features were extracted from the lesion images. Based on the features selected using three methods (ReliefF, 1R and correlation-based feature selection), disease recognition models were built using three supervised learning methods, including the random forest, support vector machine (SVM) and K-nearest neighbor methods. A comparison of the recognition results of the models was conducted. The results showed that when the ReliefF method was used for feature selection, the SVM model built with the most important 45 features (selected from a total of 129 features) was the optimal model. For this SVM model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. Semi-supervised models for disease recognition were built based on the 45 effective features that were used for building the optimal SVM model. For the optimal semi-supervised models built with three ratios of labeled to unlabeled samples in the training set, the recognition accuracies of the training set and the testing set were both approximately 80%. The results indicated that image recognition of the four alfalfa leaf diseases can be implemented with high accuracy. This study provides a feasible solution for lesion image segmentation and image recognition of alfalfa leaf disease.

Citation: Qin F, Liu D, Sun B, Ruan L, Ma Z, Wang H (2016) Identification of Alfalfa Leaf Diseases Using Image Recognition Technology. PLoS ONE 11(12): e0168274. https://doi.org/10.1371/journal.pone.0168274

Editor: Andrea Luvisi, Universita del Salento, ITALY

Received: August 28, 2016; Accepted: November 28, 2016; Published: December 15, 2016

Copyright: © 2016 Qin et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: HGW received Special Fund for Agro-scientific Research in the Public Interest (grant number 201303057). The URL of the funder's website is http://www.kjs.moa.gov.cn/. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Alfalfa (Medicago sativa) is an important forage grass containing various nutrients. The occurrence of disease in alfalfa plants has an important influence on the yield and quality of alfalfa hay, affecting the healthy development of the alfalfa industry [1]. There are more than ten types of alfalfa leaf diseases [2, 3]. Some of these diseases have similar symptoms, resulting in difficulties in achieving an accurate diagnosis and identifying the disease via naked-eye observations of symptoms or microscopic observations of causal agents. The diagnosis and identification of alfalfa diseases mainly rely on the experience of farmers, agricultural experts or agricultural technicians. The complexity of the disease symptoms and the limitations of personnel experience may lead to errors in judgment. The rapid, accurate identification and diagnosis of diseases will help to reduce yield losses and quality decline of alfalfa hay, resulting from the diseases. With the rapid development of computer technology and information technology, it is possible to utilize image-processing technology to diagnose and identify alfalfa leaf diseases quickly, accurately and automatically.

Image-processing technology has been applied to the recognition of many plant diseases [4–19]. The image-based recognition accuracy for plant diseases depends largely on the segmentation of the lesion images. Threshold-based image segmentation methods have been widely used in the segmentation of lesion images of diseased plants [20, 21]. However, there is usually great variance in color both between lesions of different diseases and between lesions from a disease at different stages. Therefore, it is very difficult to determine the appropriate threshold when threshold-based image segmentation methods are used to solving segmentation problems for plant disease images with complex colors. Image segmentation methods based on a fuzzy C-means clustering algorithm [22] or a K_means clustering algorithm [11, 15, 23] have been used to carry out lesion segmentation of plant disease images. Such segmentation methods must specify the number of clusters in advance. Inappropriate clustering number may lead to over-segmentation or under-segmentation of lesion images. However, a great computational cost is required to determine the appropriate number of clusters, especially for segmentation operations for high-pixel images. Supervised classification is a technique based on typical samples to deduce a functional equation for classification. Lesion segmentation of plant disease images can be effectively realized using the supervised classification method [24, 25]. However, the features of typical lesion regions and the features of typical health regions in a disease image cannot be obtained automatically and in a targeted fashion by only using a supervised classification method. Automatic segmentation of plant disease images can be effectively achieved by integrating a clustering algorithm and a supervised classification algorithm [26, 27].

There are color, texture and shape differences between lesion images from different plant diseases. Image recognition of plant diseases can be implemented using an appropriate pattern recognition algorithm based on color, texture and shape features of the lesion images [10, 11, 13, 17, 28]. Moreover, to reduce the complexity of the disease identification model and improve the model’s generalization ability, it is necessary to carry out feature selection according to the importance of features.

To the best of our knowledge, systematic studies on image recognition of alfalfa diseases have not yet been reported. In this study, automatic recognition of four common alfalfa leaf diseases including alfalfa common leaf spot (caused by Pseudopeziza medicaginis), alfalfa rust (caused by Uromyces striatus), alfalfa Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and alfalfa Cercospora leaf spot (caused by Cercospora medicaginis), was investigated based on acquired digital disease images. Of twelve segmentation methods integrating with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree (CART) and linear discriminant analysis), the best image segmentation method was selected for further image processing and image recognition. After extraction of texture, color and shape features from the lesion images, feature selection was conducted using three different methods, i.e., the ReliefF method [29], the 1-rule (1R) method [30] and the correlation-based feature selection (CFS) method [31]. Based on the selected features, disease recognition models were built using three supervised learning methods including random forest, support vector machine (SVM) and K-nearest neighbor (KNN). Moreover, after the features used for building the optimal supervised model were transformed using principal component analysis (PCA), disease recognition semi-supervised models were built using a self-training algorithm based on Naive Bayes classifiers [32, 33]. After comparing the recognition results of each model, the optimal model was determined for disease image recognition. The aim of this study was to provide a solution for rapid and accurate identification of four alfalfa leaf diseases and provide some supports for the development of an automatic alfalfa leaf disease diagnosis system.

Materials and Methods

Image Acquisition

Infected alfalfa leaves with typical symptoms used in this study were sampled from the Langfang Forage Experimental Base, Institute of Animal Science, Chinese Academy of Agricultural Sciences and alfalfa fields in Xuanhua District, Zhangjiakou, Hebei Province, China. The study was conducted with the permission for the Langfang Forage Experimental Base given by Qinghua Yuan from Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China. And the study was conducted with the permission for the alfalfa fields in Xuanhua District given by Dongxia Liu from College of Agriculture and Forestry Science and Technology, Hebei North University, Zhangjiakou, Hebei Province, China. All the diseased alfalfa leaves in the fields resulted from natural infections. The infected alfalfa leaves in the early stage of diseases were not sampled. Samples were taken to the laboratory and disease types of the leaves were determined mainly by using conventional diagnostic methods including naked-eye observations of disease symptoms and microscopic observation of morphological characteristics of causal agents. Images were captured with the lesion side of each diseased leaf facing up on a white background. When taking images, the leaves were expanded as flat as possible, and the camera lens was parallel with the plane of the leaves.

A total of 899 images with typical disease symptoms were acquired, including 76 images of alfalfa common leaf spot, 136 images of alfalfa rust, 231 images of alfalfa Leptosphaerulina leaf spot and 456 images of alfalfa Cercospora leaf spot. The image size was 4,256×2,832 pixels (jepg format). To reduce the workload of image analysis and focus on the regions of interest, a sub-image with one typical lesion or multiple typical lesions was obtained from each original disease image using artificial cutting. The size of a sub-image depended on the number of typical lesions and the size of each typical lesion. Using the sub-images, the image dataset of alfalfa common leaf spot comprising 76 sub-images, the image dataset of alfalfa rust comprising 136 sub-images, the image dataset of alfalfa Leptosphaerulina leaf spot comprising 231 sub-images, the image dataset of alfalfa Cercospora leaf spot comprising 456 sub-images and the aggregated image dataset comprising 899 sub-images, were constructed. These image datasets were used for segmentation of lesion images and evaluation of segmentation methods.

Lesion Image Segmentation

In this study, twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, CART and linear discriminant analysis) were used to segment the sub-images, and then their segmentation effects were evaluated. The main steps for lesion image segmentation are shown in Fig 1.

Download:

Fig 1. Work flow diagram of main steps for lesion image segmentation.

https://doi.org/10.1371/journal.pone.0168274.g001

Each obtained sub-image was converted from RGB color space into HSV color space and L*a*b* color space. In each pixel in the sub-image, the a* component value and the b* component value were regarded as the color features of the pixel. All pixels in the image were clustered into ten classes using K_median clustering, fuzzy C-means clustering and K_median clustering. The three clustering algorithms were carried out using the software MATLAB R2013b (MathWorks, Natick, MA, USA). For K_median clustering, the number of repetitions was set to three, and default values were used for the other parameters. For fuzzy C-means clustering, all the parameters with the default values were used. K_median clustering was implemented using Euclidean distance while the initial clustering seed was obtained using a random selection method. The maximum number of iterations was set to 1,000, the number of repetitions was set to three, and minimizing the sum of the intraclass distances was regarded as the clustering criterion.

After all pixels in a sub-image were clustered into ten classes using a clustering algorithm, the mean of the H components of all pixels in each class was calculated. Compared to healthy alfalfa leaves, the H components of the sub-images of the four alfalfa leaf diseases were smaller. Consequently, the pixels in the class with the minimum mean were treated as typical lesion pixels, and the pixels in the seven classes with the largest means were treated as typical healthy pixels. There is a transition region between the lesion region with typical symptoms and the typical healthy region, and H components are usually between the two regions. The pixels in the two remaining classes were treated as pixels that were not involved in building the pixel classification models. The typical lesion pixels and typical healthy pixels were labeled positive samples and negative samples, respectively, and these pixels constituted the training set for building pixel classification models. With a* component value and b* component value of each pixel in the training set as feature variables, pixel classification models to classify all the pixels in the sub-image were built using logistic regression analysis, Naive Bayes algorithm, CART and linear discriminant analysis, respectively. For each classification model, each pixel classified as lesion was assigned a value of 1 and each remaining pixel in the sub-image was assigned a value of 0. Thus, an initial binary lesion segmentation image was obtained. To avoid the influence of the white background, the pixels with B component values higher than 200 in the original sub-image were identified. Each pixel with B component value higher than 200 was assigned a value of 0, and each remaining pixel was assigned a value of 1 in the initial binary lesion segmentation image to achieve a binary image. A new binary image was obtained by multiplying this binary image with the initial binary lesion segmentation image. The hole filling operation was performed on this new binary image and the areas of all connected regions in the image were calculated. The connected regions with areas less than one-sixteenth of the maximum area were removed, and the final lesion segmentation image was obtained. If there were no pixels with B component values higher than 200, a hole filling operation was carried out on the initial binary lesion segmentation image. Subsequently, the areas of all connected regions in the image were calculated and the connected regions with the areas less than one-sixteenth of the maximum area were removed to give the final lesion segmentation image.

In the process of lesion segmentation, all pixels in a sub-image were classified as either lesion pixels or healthy pixels. Therefore, lesion segmentation is similar to binary classification problem in the field of pattern recognition, and the evaluation of segmentation effects can be carried out using methods for evaluating a binary classification model. Manual segmentation of a sub-image using the Adobe Photoshop CC software was conducted to determine the true class of each pixel. In comparison with manual segmentation method, Recall and Precision, two commonly used indices for evaluating classification models in the field of pattern recognition [34], were used to evaluate the twelve segmentation methods integrated with clustering algorithms and supervised classification algorithms. In this study, the two indices were calculated according to the following formulas: Recall = N₁/N₂ and Precision = N₁/N₃, where N₁ was the total number of lesion pixels in a sub-image correctly classified by using a segmentation method integrated with a clustering algorithm and a supervised classification algorithm, N₂ was the total number of lesion pixels in the sub-image classified using the manual segmentation method, and N₃ was the total number of the pixels in the sub-image. Both Recall and Precision range from 0–1. Larger values of Recall and Precision indicate a better integrated segmentation method. The index “Score” combining Recall and Precision, is proposed in this study to evaluate the twelve segmentation methods and is calculated according to the following formula: Score = (Recall+Precision)/2. The Score also ranges from 0 to 1. Larger Score values demonstrate that the corresponding integrated segmentation method is better. Based on the image datasets described above, the three indices were used to evaluate the twelve integrated segmentation methods for achieving the best method to segment sub-images for further image recognition in this study.

After segmentation, in each final binary lesion segmentation image, each independent white region (i.e., connected component) was labeled a lesion, and the black background region was labeled the healthy region. The location of the smallest rectangle containing each lesion, namely, the independent white region, was determined. After multiplying each of color channels (R, G and B) of the original sub-image with the corresponding final binary lesion segmentation image, the obtained images were integrated into a new RGB image using the MATLAB system function “cat” to remove the background of the original sub-image and retain only the lesion regions. Based on the location information of each smallest rectangle containing a lesion, each rectangle was cut down from the new RGB image using the MATLAB system function “imcrop” to achieve multiple lesion images. For example, if there were two lesions in an original sub-image, two lesion images were achieved through the above operations. After segmentation using the best segmentation method based on the 899 sub-images of the four types of alfalfa leaf diseases, a total of 1,651 typical lesion images, each of which contained only one lesion, were obtained for further feature extraction, feature selection and disease image recognition. For building disease recognition models, the typical lesion images were divided into a training set and a testing set in a ratio of 2:1. The training set consisted of 1,100 lesion images including 111 lesion images of alfalfa common leaf spot, 267 lesion images of alfalfa rust, 371 lesion images of alfalfa Leptosphaerulina leaf spot and 351 lesion images of alfalfa Cercospora leaf spot. The testing set consisted of 551 lesion images including 56 lesion images of alfalfa common leaf spot, 133 lesion images of alfalfa rust, 185 lesion images of alfalfa Leptosphaerulina leaf spot and 177 lesion images of alfalfa Cercospora leaf spot.

Feature Extraction and Normalization

A total of 129 texture, color and shape features were extracted from the 1,651 typical lesion images of the four alfalfa leaf diseases. The 90 extracted texture features included the seven Hu invariant moments (63 features), contrast (nine features), energy (nine features) and homogeneity (nine features) of the gray images of the nine components in RGB color space, HSV color space and L*a*b* color space. There were 30 color features including the first moments (nine features), the second moments (nine features) and the third moments (nine features) of the gray images of the nine components in RGB, HSV and L*a*b* color spaces, and three color ratios (r, g and b) of R, G and B components. Of the nine extracted shape features, circularity of disease lesion, complexity of disease lesion and the seven Hu invariant moments of the binary lesion image were included.

Hu invariant moments used to depict the texture features of an image are invariant to translation, rotation and scaling. Contrast is applied to measure the gray level of a pixel in comparison with the neighbor pixels in an image, energy is a measure of the consistency of an image, and homogeneity is used to measure the spatial closeness of elements with the diagonal distribution in a co-occurrence matrix [35]. Circularity denotes the degree that a lesion region is circular, and a bigger value indicates that the lesion region is more circular [11]. Complexity refers to the complexity and discrete degree of a lesion region, and a bigger value indicates the lesion region with higher complexity and greater discrete degree [11]. The seven Hu invariant moments were calculated using the calculation formulas as described in [36]. The other extracted features were calculated according to the formulas shown in Table 1.

Download:

Table 1. Extracted image features (excluding Hu invariant moments) and calculation formulas.

https://doi.org/10.1371/journal.pone.0168274.t001

Because of the great differences between the ranges of extracted features, which may impact the accuracies of disease recognition models, the values of each extracted feature were normalized to the range of 0–1 using the following formula: , where was the value of the ith feature after normalization and Xⁱ, and were the value of the ith feature, the minimum value and the maximum value of the feature before normalization, respectively.

Feature Selection

To reduce the complexity of image recognition resulting from excessive features and improve the accuracy and applicability of image recognition methods, the extracted features were screened after feature normalization. Based on the training set including 1,100 lesion images described above, feature selection was conducted using the ReliefF method, the 1R method and the CFS method.

For the ReliefF method, a high weight was assigned to a feature that has a high correlation with categories, and a feature with a higher weight indicates that this feature is more important. For the 1R method, the classification accuracy is calculated with each feature as the input of the 1R classifier successively and is used to evaluate the importance of the feature. Higher classification accuracy indicates that the corresponding feature is more important. The CFS method is unlike the ReliefF method and the 1R method, and is aimed to obtain the optimal feature subset. The correlation between the optimal feature subset and dependent variable should be as high as possible. Meanwhile, the correlations among the features in the optimal feature subset should be as small as possible. In this study, the three methods for feature selection, including the ReliefF method, the 1R method and the CFS method, were all implemented using the open source software Weka (Waikato Environment for Knowledge Analysis) 3.7, developed by The University of Waikato in Hamilton, New Zealand. The default values were used for all the parameters involved in the methods. The importance ranking of each feature for classification and recognition could be obtained using the ReliefF method and the 1R method, respectively. A higher ranking for a feature indicates that it is more likely to yield better recognition results if used to build the recognition model. To find the best combination of features, according to the importance ranking of each feature for classification and recognition, the top N (N = 1, 2, 3, …, 129) features were treated as inputs for the disease recognition models based on random forest, SVM and KNN. According to the recognition accuracies of the training set and the testing set, the best top N features were selected as the best feature combination to build the disease recognition models. For the CFS method, the best feature combination, namely, the optimal feature subset, was obtained directly for modeling.

Building of Disease Recognition Models

After the segmentation, feature extraction, feature normalization and feature selection described above, disease recognition models were built based on the 1,651 typical lesion images of the four alfalfa leaf diseases using three supervised learning methods including random forest, SVM and KNN. All models were built using the MATLAB R2013b software. The recognition accuracies of both the training set and the testing set were calculated and used to evaluate the disease recognition models.

Random forest is a combination model composed of a number of fully grown decision trees [38]. Each decision tree produces a predictive value, and the final prediction result of the model can be determined by voting. To a certain extent, the classification effects of a random forest depend on the number of decision trees that constitute the model. Consequently, it is necessary to determine the optimal number of decision trees by testing a variety of values based on the classification results of the random forests. To build a disease recognition model based on the random forest method, the number of decision trees was assigned as 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100, and the optimal number of decision trees was determined according to the recognition results of the model. The number of features randomly selected by each decision tree was set as the arithmetic square root of the total number of features. If the arithmetic square root was a decimal, the value obtained by rounding up the decimal was treated as the number of features randomly selected by each decision tree.

SVM can be well applied to high-dimensional data [39, 40]. It has been widely used in image recognition of plant disease [11, 20, 28, 41]. In this study, SVM models for disease image recognition were built with a radial basis function as the kernel function using C-SVM in the LIBSVM package developed by Chih-Jen Lin Group from Taiwan, China [42]. For each SVM model, both the optimal penalty parameter C and the optimal kernel function parameter g were searched using the grid search algorithm in the range of 2⁻¹⁰–2¹⁰ with a searching step of 0.4. The recognition accuracies were calculated at all points within the grid by running three complete cross validations based on the training set. The values of C and g were selected as the optimal parameters as the recognition accuracy was the highest, and were recorded as C_best and g_best, respectively.

The KNN algorithm treats each sample as a point in a multidimensional space, and a point in the testing set is assigned to a class that most of the K points nearest to that point in the training set belong to [43, 44]. The distance of that point to each of the K points is commonly measured by Euclidean distance. An appropriate value of K is the key to high classification accuracy using the KNN algorithm. In this study, to build the KNN models for image recognition using Euclidean distance, the K values were set as 5, 9 and 13, respectively, and the optimal value of K was determined according to the recognition results of the models.

For the supervised learning methods, the true class that each sample in the training set belongs to is known. In other words, all samples in the training set are labeled samples. In some cases, the cost of obtaining training samples is low, but the cost of determining the true class of the training samples is very high, which requires a large amount of manpower and material resources. When a small number of samples in the training set are labeled, a recognition model can be built using a semi-supervised learning method. In practice, when many disease images are acquired with lower costs, the experts in the corresponding field just need to make artificial recognition and classification of a small number of disease images. Disease recognition models can be built using semi-supervised learning methods, which will greatly reduce the costs of building a plant disease automatic recognition system. In this study, the features used to build the optimal supervised model were transformed using PCA and the disease recognition semi-supervised models were built using a self-training algorithm based on Naive Bayes classifiers [32, 33]. In this method, an initial classifier is built based on the given labeled samples and used to predict the unlabeled samples in the training set. The prediction labels with high confidence in the classifier and their corresponding samples are added to a dataset comprising the labeled samples from the training set. Subsequently, based on the new dataset comprising the labeled samples, a new classifier is built. The above process continues until a certain criterion is reached. The criterion may be the number of iterations reaching the maximum number of iterations or the number of labeled samples reaching the set ratio, etc. In this study, based on the same training set and testing set as used for building the supervised models described above, disease recognition semi-supervised models were built with ratios of the labeled and unlabeled samples in the training set equal to 2:1, 1:1 and 1:2. The first n principal components were successively used to build the disease recognition semi-supervised models, and the corresponding recognition accuracies of the training set and the testing set were obtained. According to the accuracies, the recognition effects of the models were evaluated, and the optimal number of principal components was determined. The above disease recognition semi-supervised models were built using the R3.1.2 software and the function “SelfTrain” in the package “DMwR” as the default values were used for the model parameters.

Results

Image Segmentation Results

Based on the image datasets described above, the comparison results of the twelve segmentation methods integrated with the clustering algorithms and the supervised classification algorithms are shown in Table 2.

Download:

Table 2. Performance evaluations of the twelve segmentation methods based on the sub-images of four alfalfa leaf diseases.

https://doi.org/10.1371/journal.pone.0168274.t002

For the image dataset of alfalfa common leaf spot, when the segmentation method integrated with K_median clustering algorithm and linear discriminant analysis was used, the highest Scores with a mean of 0.8562 and the median of 0.8810 were obtained, and the highest Recalls with a mean of 0.7905 and the median of 0.8199 were also obtained. When the segmentation method integrating with K_means clustering algorithm and linear discriminant analysis was used based on the image dataset, the highest Precisions with a mean of 0.9235 and the median of 0.9392 were obtained.

For the image dataset of alfalfa rust, when the segmentation method integrated with K_median clustering algorithm and linear discriminant analysis was used, the highest values of Scores, Recalls and Precisions were obtained. The results showed that the mean of Scores was 0.9061 and the median of Scores was 0.9137, that the mean of Recalls was 0.8516 and the median of Recalls was 0.8596 and that the mean of Precisions was 0.9606 and the median of Precisions was 0.9671.

For the image dataset of alfalfa Leptosphaerulina leaf spot, when the segmentation method integrated with K_median clustering algorithm and linear discriminant analysis was used, the highest values of Scores and Recalls were obtained. The results showed that the mean of Scores was 0.9462 and the median of Scores was 0.9583 and that the mean of Recalls was 0.9287 and the mean of Recalls was 0.9495. For this image dataset, when the segmentation method integrated with K_means clustering algorithm and linear discriminant analysis was used, the highest mean of Precisions was obtained and its value was 0.9657. For this image dataset, when the segmentation method integrating with fuzzy C-means clustering algorithm and logistic regression analysis was used, the highest median of Precisions was obtained, and its value was 0.9733.

For the image dataset of alfalfa Cercospora leaf spot, when the segmentation method integrated with K_median clustering algorithm and linear discriminant analysis was used, the highest values of Scores, Recalls and Precisions were obtained. The mean and the median of Scores were 0.8369 and 0.8496, respectively. The mean and the median of Recalls were 0.7786 and 0.7938, respectively, and the mean and the median of Precisions were 0.8951and 0.9109, respectively.

For the aggregated image dataset comprising 899 sub-images of the four alfalfa leaf diseases, when the segmentation method integrated with K_median clustering algorithm and linear discriminant analysis was used, the highest values of Scores, Recalls and Precisions were obtained. The results showed that the mean and the median of Scores were 0.8771 and 0.8997, respectively, that the mean and the median of Recalls were 0.8294 and 0.8514, respectively and that the mean and the median of Precisions were 0.9249 and 0.9424, respectively.

In summary, when the segmentation method integrated with K_median clustering algorithm and linear discriminant analysis was used, the segmentation effects for the sub-images of the four alfalfa leaf diseases were best. The segmentation results of the sub-images of the four alfalfa leaf diseases using the segmentation method integrated with K_ median clustering algorithm and linear discriminant analysis are shown in Fig 2. Using this segmentation method, all lesions in the original sub-images were effectively segmented. The results indicated that this segmentation method could effectively implement the automatic segmentation of sub-images of the four alfalfa leaf diseases. Therefore, lesion segmentation was implemented using the segmentation method integrated with K_median clustering algorithm and linear discriminant analysis for further feature extraction, feature normalization, feature selection and building of disease recognition models in this study.

Download:

Fig 2. Results of automatic segmentation of sub-images of four alfalfa leaf diseases using the segmentation method integrated with K_ median clustering algorithm and linear discriminant analysis.

A: Sub-image of alfalfa common leaf spot. B: Image after segmentation of alfalfa common leaf spot. C: Sub-image of alfalfa rust. D: Image after segmentation of alfalfa rust. E: Sub-image of alfalfa Leptosphaerulina leaf spot. F: Image after segmentation of alfalfa Leptosphaerulina leaf spot. G: Sub-image of alfalfa Cercospora leaf spot. H: Image after segmentation of alfalfa Cercospora leaf spot.

https://doi.org/10.1371/journal.pone.0168274.g002

Feature Selection Results Using the Methods of ReliefF, 1R and CFS

For convenience, each extracted feature was given a name, and the names of the 129 image features extracted are listed in Table 3. φLab_L1 denoted the first Hu invariant moment of the gray image of the L* component in L*a*b* color space, φshape1 denoted the first Hu invariant moment of the binary lesion image, the first moment RGB_R denoted the first moment of the gray image of the R component in RGB color space, Color ratio RGB_R denoted the color ratio r of the R component in RGB color space, Contrast RGB_R denoted the contrast of the gray image of the R component in RGB color space, Energy RGB_R denoted the energy of the gray image of the R component in RGB color space and Homogeneity RGB_R denoted the homogeneity of the gray image of the R component in RGB color space. The remaining feature names can be deduced by analogy.

Download:

Table 3. Names of image features extracted and results of feature selection using the ReliefF method, the 1R method and the CFS method.

https://doi.org/10.1371/journal.pone.0168274.t003

The results of feature selection using the ReliefF method, 1R method and CFS method are shown in Table 3. The selection results of both the ReliefF method and the 1R method were the importance ranking of each feature for disease recognition. As shown in Table 3, there were great differences between the importance rankings of features obtained using the ReliefF method and the 1R method. The top 10 features with the highest recognition importance selected using the ReliefF method successively were Energy RGB_B, Circularity, Color ratio RGB_R, second moment RGB_B, Energy Lab_a, Energy HSV_H, first moment HSV_S, first moment HSV_H, Homogeneity HSV_S and Color ratio RGB_G, which included four texture features, one shape feature and five color features. The top 10 features with the highest recognition importance selected using the 1R method in sequence were φLab_a1, Contrast Lab_a, Homogeneity HSV_H, Complexity, Circularity, Contrast HSV_H, Contrast Lab_b, Homogeneity HSV_S, second moment Lab_b and first moment Lab_b, which included six texture features, two shape features and two color features. Only two features, Circularity and Homogeneity HSV_S, were simultaneously selected in the top 10 features with the highest recognition importance using the ReliefF method and the 1R method. The best feature combination (i.e., the optimal feature subset) obtained using the CFS method consisted of 21 features including φLab_a1, φHSV_H1, φHSV_S1, Circularity, Complexity, φshape1, first moment RGB_G, first moment RGB_B, Color ratio RGB_R, Color ratio RGB_G, first moment HSV_H, first moment HSV_V, first moment Lab_b, second moment RGB_G, second moment HSV_H, third moment HSV_S, Energy RGB_B, Energy HSV_S, Homogeneity HSV_S, Homogeneity Lab_L and Contrast Lab_a.

Built Disease Recognition Models and Comparison of Recognition Results

Recognition Results of Disease Recognition Models Based on Random Forest.

The recognition results of the random forest models based on the selected features using the ReliefF method, the 1R method or the CFS method are shown in Table 4. The results showed that when the ReliefF method was used to select the features, with the increase of the number of decision trees, the recognition accuracies of the training set and the testing set for the built random forest models fluctuated by 0%-2.18%, and the number of applied features changed in a range of 52–74. The optimal random forest model was built with the number of decision trees equal to 70 based on the top 62 features in the importance ranking for recognition, and this model was recorded as Model 1. For Model 1, the recognition accuracy of the training set was 100% and the recognition accuracy of the testing set was 92.74%. When the 1R method was used for feature selection, with an increase in the number of decision trees, the recognition accuracies of the training set and the testing set for the built random forest models fluctuated by 0%-2.00%, and the number of applied features changed in a range of 76–129. The optimal random forest model was built with the number of decision trees equal to 60 based on the top 128 features in the importance ranking and was recorded as Model 2. For Model 2, the recognition accuracy of the training set was 100% and the recognition accuracy of the testing set was 91.29%. When the CFS method was applied to feature selection, with the increase of the number of decision trees, the recognition accuracies of the training set and the testing set for the built random forest models fluctuated by 0%-2.18%. The optimal random forest model was built with the number of decision trees equal to 60 based on the 21 selected features, and this model was recorded as Model 3. For Model 3, the recognition accuracy of the training set was 100% and the recognition accuracy of the testing set was 90.20%. As shown in Table 4, with increasing number of decision trees, the recognition accuracies of the training set and the testing set for the built random forest models fluctuated within a small range, indicating that the number of decision trees had little influence on the recognition results of the random forest models in this study. Considering the recognition accuracies of the training set and the testing set and the number of applied features for modeling, the optimality ranking of the three optimal models was Model 1, Model 3, and Model 2.

Download:

Table 4. Recognition results for four alfalfa leaf diseases using random forest models based on selected features using the ReliefF method, the 1R method and the CFS method.

https://doi.org/10.1371/journal.pone.0168274.t004

Recognition Results of Disease Recognition Models Based on SVM.

The recognition results of the SVM models based on the selected features busing the ReliefF method, the 1R method and the CFS method are shown in Table 5. The results showed that when the ReliefF method was used to select the features, the optimal SVM model was built based on the top 45 features in the importance ranking for recognition, and this model was recorded as Model 4 with the optimal parameters C_best and g_best of 6.964 and 0.435. For Model 4, the recognition accuracy of the training set was 97.64% and the recognition accuracy of the testing set was 94.74%. When the 1R method was used to conduct feature selection, the optimal SVM model was built based on the top 122 features in the importance ranking for recognition, and this model was recorded as Model 5, with C_best equal to 36.758 and g_best equal to 0.144. For Model 5, the recognition accuracy of the training set was 97.91% and the recognition accuracy of the testing set was 94.37%. When the CFS method was used for feature selection, the SVM model built based on the 21 selected features was recorded as Model 6, with C_best equal to 21.112 and g_best equal to 0.758. For Model 6, the recognition accuracy of the training set was 95.18% and the recognition accuracy of the testing set was 91.83%. Considering the recognition accuracies of the training set and the testing set and the number of applied features for modeling, the optimality ranking of the three models shown in Table 5 was Model 4, Model 6, and Model 5.

Download:

Table 5. Recognition results for four alfalfa leaf diseases using SVM models based on selected features using the ReliefF method, the 1R method and the CFS method.

https://doi.org/10.1371/journal.pone.0168274.t005

Recognition Results of Disease Recognition Models Based on KNN.

The recognition results of the KNN models based on the selected features using the ReliefF method, the 1R method and the CFS method are shown in Table 6. The results showed that when the ReliefF method was used to select features, with the increase in the value of K, the recognition accuracies of the training set and the testing set for the built KNN models also fluctuated by 0%-3.55%. The optimal KNN model was built with a K value of 5 based on the top 68 features in the importance ranking for recognition. This model was recorded as Model 7. For Model 7, the recognition accuracy of the training set was 93.55% and the recognition accuracy of the testing set was 90.38%. When the 1R method was used to select features, with increasing K value, the recognition accuracies of the training set and the testing set for the built KNN models fluctuated by 0.18%-3.72%. The optimal KNN model was built with a K value of 5 based on the top 71 features in the importance ranking for recognition, and this model was recorded as Model 8. For Model 8, the recognition accuracy of the training set was 92.36% and the recognition accuracy of the testing set was 88.93%. When the CFS method was used to select features, with increasing K value, the recognition accuracies of the training set and the testing set for the built KNN models fluctuated by 0.18%-2.09%. Based on the 21 selected features, the optimal KNN model was built with a K value of 5, and this model was recorded as Model 9. For Model 9, the recognition accuracy of the training set was 92.27% and the recognition accuracy of the testing set was 87.30%. With increasing K value, the recognition accuracies of the training set and the testing set for the built KNN models shown in Table 6 decreased in small-scale amplitude, indicating that the best K value in this study was 5. Considering the recognition accuracies of the training set and the testing set and the number of applied features for modeling, the optimality ranking of the three models shown in Table 6 was Model 7, Model 9, and Model 8.

Download:

Table 6. Recognition results for four alfalfa leaf diseases using KNN models based on selected features using the ReliefF method, the 1R method and the CFS method.

https://doi.org/10.1371/journal.pone.0168274.t006

Recognition Results of Disease Recognition Models Based on Semi-supervised Learning.

Considering the recognition accuracies of the training set and the testing set and the number of applied features for modeling, Model 4 was regarded as the optimal model among the nine models described above. The recognition results of each type of alfalfa leaf disease using the optimal model are shown in Table 7. To eliminate the linear correlation between the features, the 45 features used for building Model 4 were transformed using PCA, and the changes in cumulative contribution rates with increasing number of principal components were achieved as shown in Fig 3. The results showed that the cumulative contribution rate of the first eight principal components reached 90.77% and that the cumulative contribution rate of the first 12 principal components reached 95.54%.

Download:

Fig 3. Changes in cumulative contribution rates with increasing number of principal components based on 45 features used for building Model 4.

https://doi.org/10.1371/journal.pone.0168274.g003

Download:

Table 7. Recognition results of each alfalfa leaf disease using the optimal model (Model 4).

https://doi.org/10.1371/journal.pone.0168274.t007

Based on the same training set and testing set as used for building the supervised models described above, the disease recognition semi-supervised models were built using a ratio of labeled to unlabeled samples in the training set equal to 2:1. The corresponding recognition accuracies of the training set and the testing set were obtained using the first n principal components as the inputs. The changes in recognition accuracies of the training set and testing set are shown in Fig 4 with an increased number of principal components. The results showed that for disease recognition semi-supervised models with a varying number of principal components, there were no obvious differences between the recognition accuracies of the training set and the recognition accuracies of the testing set. Moreover, both the recognition accuracy of the training set and the recognition accuracy of the testing set first increased and then decreased with increasing n. Similarly, the disease recognition semi-supervised models were built with ratios of labeled and unlabeled samples in the training set equal to 1:1 and 1:2. The first n principal components were used as the inputs, and the corresponding recognition accuracies of the training set and the testing set, as shown in Figs 5 and 6, were obtained. The results showed that the recognition accuracies of the training set and the testing set obtained using the semi-supervised models with the different ratios of labeled and unlabeled samples presented similar change tendencies.

Download:

Fig 4. Recognition results for four alfalfa leaf diseases using semi-supervised models at a ratio of labeled to unlabeled samples of 2:1.

https://doi.org/10.1371/journal.pone.0168274.g004

Download:

Fig 5. Recognition results for four alfalfa leaf diseases using semi-supervised models at a ratio of labeled to unlabeled samples of 1:1.

https://doi.org/10.1371/journal.pone.0168274.g005

Download:

Fig 6. Recognition results for four alfalfa leaf diseases using semi-supervised models at a ratio of labeled to unlabeled samples of 1:2.

https://doi.org/10.1371/journal.pone.0168274.g006

The recognition results for the four alfalfa leaf diseases using optimal semi-supervised models with the different ratios of labeled and unlabeled samples are as shown in Table 8. The results showed that when the ratio of labeled to unlabeled samples was 2:1, the optimal semi-supervised model for disease recognition was built with the first nine principal components and was recorded as Model 10. For Model 10, the recognition accuracy of the training set was 82.82% and the recognition accuracy of the testing set was 82.76%. When the ratio of the labeled samples to the unlabeled samples was 1:1, the optimal semi-supervised model for disease recognition was built with the first ten principal components and was recorded as Model 11. For Model 11, the recognition accuracy of the training set was 80.36% and the recognition accuracy of the testing set was 80.58%. When the ratio of the labeled samples to the unlabeled samples was 1:2, the optimal semi-supervised model for disease recognition was built with the first ten principal components and was recorded as Model 12. For Model 12, the recognition accuracy of the training set was 79.18% and the recognition accuracy of the testing set was 80.58%. For Model 10, Model 11 and Model 12, the recognition accuracies of the training set and the testing set were all approximately 80%, indicating that the ratio of the labeled samples to the unlabeled samples in the training set had relatively small effects on the recognition results of the disease recognition semi-supervised models when the models were built with the three ratios.

Download:

Table 8. Recognition results of four alfalfa leaf diseases using optimal semi-supervised models with various ratios of labeled to unlabeled samples.

https://doi.org/10.1371/journal.pone.0168274.t008

Discussion

In this study, lesion image segmentation was conducted using the segmentation methods integrated with clustering algorithms and supervised classification algorithms. Compared to image segmentation methods using only clustering algorithms, there was no need to calculate and choose optimal clustering numbers for the clustering algorithms of the segmentation methods used in this study, which reduced computational costs. For the image segmentation methods using only the supervised classification algorithms, typical lesion pixels and typical health pixels are usually chosen from a large number of disease images to construct the training set. Based on this training set, a supervised classification model with general applicability is built for the lesion segmentation of all the disease images. There may be a certain degree of variation in the color of the lesion regions and the healthy regions of disease images due to the different causal agents and the different stages of disease development. This may result in difficulties in disease image recognition [45]. In the methods used in this study, a targeted training set containing typical lesion pixels and typical healthy pixels was constructed based on each sub-image, and the supervised classification model based on this training set was more suitable for lesion segmentation of this sub-image. However, these segmentation methods are only suitable for lesion segmentation of disease images in which the H component values of the lesion regions are less than the H component values of the healthy regions in HSV color space. Since there are many alfalfa leaf diseases with great differences in color between the lesions of different diseases, it is necessary to develop a lesion image segmentation method with a wider range of application in future studies.

In this study, a total of 129 texture, color and shape features were extracted for disease image recognition. Satisfactory recognition results were obtained using the disease recognition models built after feature selection, indicating that the features extracted from the lesion images could be effectively used to recognize and identify the four alfalfa leaf diseases. However, the 129 extracted features are commonly used in the field of image recognition and greatly differ from disease features used by plant disease experts during disease identification via naked-eye observation, resulting in a poor interpretation of the disease recognition models based on these extracted features. In future studies, attempts could be made to construct lesion image features suitable for certain plant diseases, according to the experience of plant diseases experts, in combination with image processing techniques.

In this study, the best recognition effects were observed in the SVM model based on the top 45 features in the importance ranking obtained using the ReliefF method. The recognition accuracy of the testing set was highest among all the models built in this study and was very close to the recognition accuracy of the training set, which indicated that this model not only could be used to obtain satisfactory recognition results but also had strong generalization ability. When the ReliefF method was used to conduct feature selection, the possible correlation between the features was not considered. However, the existence of the correlation could lead to the redundancy of features and increase the complexity of the disease recognition models. In further studies, the ReliefF method could be combined with the feature transformation methods such as PCA and independent component analysis to remove the correlation between the features, reduce the dimension of features and decrease the complexity of the disease recognition models.

Semi-supervised learning is a technique to conduct training and classification using a small number of labeled samples and a large number of unlabeled samples. In the field of image recognition, the cost of obtaining image samples is very low in some cases, but the cost of adding class labels to samples is very high. In this case, a semi-supervised learning method can be used to build an image recognition model to obtain satisfactory recognition results and reduce the cost of modeling. In research on plant disease image recognition, determining the true categories of diseases requires specialized agricultural technical personnel to conduct naked-eye observations, microscopic observation of morphological characteristics of causal agents, or pathogen detection using molecular biology techniques [7]. Thus, a large amount of manpower and material resources are usually required. Therefore, attempts were made to use semi-supervised learning methods to build image recognition models of alfalfa leaf diseases. The results showed that the recognition accuracies of the training set and the testing set were all approximately 80% for the optimal semi-supervised model when the proportion of the labeled samples in the training set was only 33.33% (i.e., the ratio of the labeled and unlabeled samples was 1:2). This indicated that it was feasible to build an image recognition model of alfalfa leaf diseases based on semi-supervised learning.

The image recognition of only four alfalfa leaf diseases was investigated in this study. Therefore, it is necessary to build a standard and comprehensive lesion image database to lay the foundation for the application of the automatic disease image recognition technology. In addition, the complex background of plant disease images poses great challenges for image segmentation and image recognition [45]. The images of alfalfa leaf diseases used in this study were taken on a white background in the laboratory. Further studies are needed to determine whether the image recognition methods used in this study are suitable for the automatic identification and diagnosis of alfalfa leaf diseases in nature.

Presently, the use of smart phones to take pictures and process data has become very powerful. Smart phone-based plant disease image recognition systems have been reported [46–49]. A mobile application could be developed using the optimal image recognition model of alfalfa leaf diseases built in this study to realize functions such as disease image acquisition, disease diagnosis and disease information sharing based on smart phone platforms. Such an application could facilitate disease management.

Generally, the diagnosis and identification of alfalfa leaf diseases are performed by agricultural experts or agricultural technicians mainly using the conventional diagnostic methods including naked-eye observations of disease symptoms and microscopic observations of morphological characteristics of causal agents. The accuracy and efficiency mainly depend on the experience of experts or technicians. It is subjective and time-consuming. When using PCR techniques to detect the infection of a specific alfalfa leaf disease, professional instruments, reagents and materials are required, and professional personnel are also required to perform operations [7, 50]. In addition, it will take some time to obtain detection results [7, 50]. With increasingly widespread applications of portable cameras or mobile phones with picture-taking features, it is easier to obtain camera equipment than PCR instruments. After image acquisition, it is only needed to input the image into a computer with a disease image recognition system, and then the results of disease identification can be achieved. This process does not require professional personnel or any chemical reagents. It is faster than PCR techniques to achieve identification results. Especially, when the computer image recognition system based on Internet or App (mobile application) based on smart phone is developed, it will be more convenient for image recognition of alfalfa leaf diseases. However, PCR techniques can play an important role in disease detection in the early stage of diseases, especially in detection of latently infected leaves without symptom appearance [50]. The identification and recognition of infected alfalfa leaves in the early stage of diseases using image recognition technology still need more investigations in future studies. Moreover, the method for identification of alfalfa leaf diseases in this study was developed based on the images of four types of alfalfa leaf diseases, it is necessary to conduct further research on evaluating this method with other leaf disorders to evaluate the risk of false positive.

Conclusions

In this study, lesion image segmentation using the methods integrating with clustering algorithms and supervised classification algorithms, feature extraction of lesion images, feature normalization and feature selection were conducted. The disease recognition models were built by using pattern recognition methods. The satisfactory recognition results for four alfalfa leaf diseases were obtained. A feasible solution was provided for diagnosis and identification of alfalfa leaf diseases.

Among the twelve lesion segmentation methods integrating with clustering algorithms and supervised classification algorithms, the segmentation effects were best when the segmentation method integrating with the K_median clustering algorithm (from the clustering algorithms) and the linear discriminant analysis (from the supervised classification algorithms) was used based on an aggregated image dataset comprising 899 sub-images of four types of alfalfa leaf diseases. This segmentation method was thus used to carry out the segmentation of sub-images of four types of alfalfa leaf diseases for further feature extraction, feature normalization, feature selection and modeling.

A total of 129 texture, color and shape features were extracted from the 1,651 typical lesion images, each of which contained only one lesion. Attempts were made to conduct feature selection using three methods including the ReliefF method, the 1R method and the CFS method. The disease recognition models were built using three supervised learning methods, including random forest, SVM and KNN. The results demonstrated that the recognition effects were best in the SVM model based on the top 45 features in the importance ranking for recognition when the ReliefF method was used to conduct feature selection. For this model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. In addition, after the 45 features used for building the model were transformed using PCA, the disease recognition semi-supervised models were constructed using a self-training algorithm based on Naive Bayes classifiers. For the optimal semi-supervised models built with ratios of labeled to unlabeled samples equal to 2:1, 1:1 and 1:2, the recognition accuracies of the training set and the testing set were all approximately 80%. The results indicated that it was feasible to identify and recognize four types of alfalfa leaf diseases using the solution provided in this study.

Supporting Information

S1 Data. Cumulative contribution rates with increase in number of principal components based on 45 features used for building Model 4.

https://doi.org/10.1371/journal.pone.0168274.s001

(XLSX)

S2 Data. Recognition accuracies of the training set and the testing set using semi-supervised models with a different number of principal components (ratio of labeled to unlabeled samples was 2:1).

https://doi.org/10.1371/journal.pone.0168274.s002

(XLSX)

S3 Data. Recognition accuracies of the training set and the testing set using semi-supervised models with a different number of principal components (ratio of labeled to unlabeled samples was 1:1).

https://doi.org/10.1371/journal.pone.0168274.s003

(XLSX)

S4 Data. Recognition accuracies of the training set and the testing set using semi-supervised models with a different number of principal components (ratio of labeled to unlabeled samples was 1:2).

https://doi.org/10.1371/journal.pone.0168274.s004

(XLSX)

Author Contributions

Conceptualization: HGW.
Data curation: FQ HGW.
Formal analysis: FQ HGW ZHM.
Funding acquisition: HGW.
Investigation: FQ DXL BDS LR HGW.
Methodology: FQ HGW.
Project administration: HGW.
Software: FQ HGW.
Supervision: HGW.
Validation: FQ HGW.
Visualization: FQ HGW.
Writing – original draft: FQ HGW.
Writing – review & editing: FQ DXL BDS LR ZHM HGW.

References

1. Li YZ, Nan ZB. The methods of diagnose, investigation and loss evaluation for forage diseases. Nanjing: Phoenix Science Press; 2015.
2. Liu AP, Hou TJ. Pests and their control of grassland plants. Beijing: China Agricultural Science and Technology Press; 2005.
3. Samac DA, Rhodes LH, Lamp WO. Compendium of alfalfa diseases and pests. 3rd ed. St. Paul: APS Press; 2014.
4. Mao HP, Xu GL, Li PP. Diagnosis of nutrient deficiency of tomato based on computer vision. Trans. Chin. Soc. Agric. Mach. 2003; 34: 73–75.
- View Article
- Google Scholar
5. Pydipati R, Burks TF, Lee WS. Identification of citrus disease using color texture features and discriminant analysis. Comput. Electron. Agric. 2006; 52: 49–59.
- View Article
- Google Scholar
6. Zhao YX, Wang KR, Bai ZY, Li SK, Xie RZ, Gao SJ. Research of maize leaf disease identifying system based image recognition. Sci. Agric. Sin. 2007; 40: 698–703.
- View Article
- Google Scholar
7. Sankaran S, Mishra A, Ehsani R, Davis C. A review of advanced techniques for detecting plant diseases. Comput. Electron. Agric. 2010; 72: 1–13.
- View Article
- Google Scholar
8. Story D, Kacira M, Kubota C, Akoglu A, An LL. Lettuce calcium deficiency detection with machine vision computed plant features in controlled environments. Comput. Electron. Agric. 2010; 74: 238–243.
- View Article
- Google Scholar
9. Patil JK, Kumar R. Advances in image processing for detection of plant diseases. J. Adv. Bioinformatics Appl. Res. 2011; 2: 135–141.
- View Article
- Google Scholar
10. Xu GL, Zhang FL, Shah SG, Ye YQ, Mao HP. Use of leaf color images to identify nitrogen and potassium deficient tomatoes. Pattern Recogn. Lett. 2011; 32: 1584–1590.
- View Article
- Google Scholar
11. Li GL, Ma ZH, Wang HG. Image recognition of wheat stripe rust and wheat leaf rust based on support vector machine. J. China Agric. Univ. 2012; 17: 72–79.
- View Article
- Google Scholar
12. Zhang JH, Qi LJ, Ji RH, Wang H, Huang SK, Wang P. Cotton diseases identification based on rough sets and BP neural network. Trans. Chin. Soc. Agric. Eng. 2012; 28: 161–167.
- View Article
- Google Scholar
13. Phadikar S, Sil J, Das AK. Rice diseases classification using feature selection and rule generation techniques. Comput. Electron. Agric. 2013; 90: 76–85.
- View Article
- Google Scholar
14. Barbedo JGA. An automatic method to detect and measure leaf disease symptoms using digital image processing. Plant Dis. 2014; 98: 1709–1716.
- View Article
- Google Scholar
15. Omrani E, Khoshnevisan B, Shamshirband S, Saboohi H, Anuar NB, Nasir MHNM. Potential of radial basis function-based support vector regression for apple disease detection. Measurement 2014; 55: 512–519.
- View Article
- Google Scholar
16. Tan WX, Zhao CJ, Wu HR, Gao RH. A deep learning network for recognizing fruit pathologic images based on flexible momentum. Trans. Chin. Soc. Agric. Mach. 2015; 46: 20–25.
- View Article
- Google Scholar
17. Zhou R, Kaneko S, Tanaka F, Kayamori M, Shimizu M. Image-based field monitoring of Cercospora leaf spot in sugar beet by robust template matching and pattern recognition. Comput. Electron. Agric. 2015; 116: 65–79.
- View Article
- Google Scholar
18. Atoum Y, Afridi MJ, Liu XM, McGrath JM, Hanson LE. On developing and enhancing plant-level disease rating systems in real fields. Pattern Recogn. 2016; 53: 287–299.
- View Article
- Google Scholar
19. Ye HJ, Lang R, Liu CQ, Li MZ. Recognition of cucumber downy mildew disease based on visual saliency map. Trans. Chin. Soc. Agric. Mach. 2016; 47: 270–274.
- View Article
- Google Scholar
20. Camargo A, Smith JS. An image-processing based algorithm to automatically identify plant disease visual symptoms. Biosyst. Eng. 2009; 102: 9–21.
- View Article
- Google Scholar
21. Patil SB, Bodhe SK. Leaf disease severity measurement using image processing. Int. J. Eng. Tech. 2011; 3: 297–301.
- View Article
- Google Scholar
22. Mao HP, Zhang YC, Hu B. Segmentation of crop disease leaf images using fuzzy C-means clustering algorithm. Trans. Chin. Soc. Agric. Eng. 2008; 24: 136–140.
- View Article
- Google Scholar
23. Dubey SR, Jalal AS. Fusing color and texture cues to identify the fruit diseases using images. Int. J. Comput. Vis. Image Process. 2014; 4: 52–67.
- View Article
- Google Scholar
24. Tian YW, Li TL, Li CH, Piao ZL, Sun GK, Wang B. Method for recognition of grape disease based on support vector machine. Trans. Chin. Soc. Agric. Eng. 2007; 23: 175–180.
- View Article
- Google Scholar
25. Zhang M, Meng QG. Automatic citrus canker detection from leaf images captured in field. Pattern Recogn. Lett. 2011; 32: 2036–2046.
- View Article
- Google Scholar
26. Liu L, Wang TY, Jiang YX, Zhi JZ. Algorithm of texture segmentation combining FCM and SVM. Comput. Eng. Appl. 2008; 44: 32–33.
- View Article
- Google Scholar
27. Wang XY, Zhang XJ, Yang HY, Bu J. A pixel-based color image segmentation using support vector machine and fuzzy C-means. Neural Networks 2012; 33: 148–159. pmid:22647833
- View Article
- PubMed/NCBI
- Google Scholar
28. Camargo A, Smith JS. Image pattern classification for the identification of disease causing agents in plants. Comput. Electron. Agric. 2009; 66: 121–125.
- View Article
- Google Scholar
29. Kononenko I. Estimating attributes: analysis and extensions of relief. Lect. Notes Comput. Sci. 1994; 784: 171–182.
- View Article
- Google Scholar
30. Witten IH, Frank E, Hall MA. Data mining: Practical machine learning tools and techniques. 3rd ed. Beijing: China Machine Press; 2014.
31. Hall MA. Correlation-based feature selection for machine learning. PhD Thesis, The University of Waikato. 1999.
32. Rosenberg C, Hebert M, Schneiderman H. Semi-supervised self-training of object detection models. In: Proceedings of seventh IEEE workshop on applications of computer vision, 2005. vol. 1, pp. 29–36.
33. Torgo L. Data mining with R: Learning with case studies. Boca Raton: Chapman and Hall/CRC; 2010.
34. Powers DMW. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Tech. 2011; 2: 37–63.
- View Article
- Google Scholar
35. Gonzalez RC, Woods RE. Digital image processing. 3rd ed. Upper Saddle River: Prentice Hall; 2007.
36. Gonzalez RC, Woods RE, Eddins SL. Digital image processing using MATLAB. Beijing: Publishing House of Electronics Industry; 2005.
37. Stricker MA, Orengo M. Similarity of color images. Proc. SPIE Int. Soc. Opt. Eng. 1995; 2420: 381–392.
- View Article
- Google Scholar
38. Breiman L. Random forests. Mach. Learn. 2001; 45: 5–32.
- View Article
- Google Scholar
39. Cortes C, Vapnik V. Support-vector networks. Mach. Learn. 1995; 20: 273–297.
- View Article
- Google Scholar
40. Burges CJC. A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 1998; 2: 121–167.
- View Article
- Google Scholar
41. Leiva-Valenzuela GA, Aguilera JM. Automatic detection of orientation and diseases in blueberries using image analysis to improve their postharvest storage quality. Food Control 2013; 33: 166–173.
- View Article
- Google Scholar
42. Chang CC, Lin CJ. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2011; 2: 1–27.
- View Article
- Google Scholar
43. Cover TM, Hart PE. Nearest neighbor pattern classification. IEEE T. Inform. Theory 1967; 13: 21–27.
- View Article
- Google Scholar
44. Song Y, Huang J, Zhou D, Zha HY, Giles CL. IKNN: Informative K-nearest neighbor pattern classification. Lect. Notes Comput. Sci. 2007; 4702: 248–264.
- View Article
- Google Scholar
45. Barbedo JGA. A review on the main challenges in automatic plant disease identification based on visible range images. Biosyst. Eng. 2016; 144: 52–60.
- View Article
- Google Scholar
46. Aji AF, Munajat Q, Pratama AP, Kalamullah H, Aprinaldi , Setiyawan J, et al. Detection of palm oil leaf disease with image processing and neural network classification on mobile device. Int. J. Comput. Theor. Eng. 2013; 5: 528–532.
- View Article
- Google Scholar
47. Xia YQ, Wang HM, Zeng S. Plant leaf image disease detection based on Android. J. Zhengzhou Univ. Light Ind. (Nat. Sci. Ed.) 2014; 29: 71–74.
- View Article
- Google Scholar
48. Qu Y, Tao B, Wang ZJ, Wang ST. Design of apple leaf disease recognition system based on Android. J. Agric. Univ. Hebei 2015; 38: 102–106.
- View Article
- Google Scholar
49. Zheng J, Liu LB. Design and application of rice disease image recognition system based on Android. Comput. Eng. Sci. 2015; 37: 1366–1371.
- View Article
- Google Scholar
50. Schaad NW, Frederick RD. Real-time PCR and its application for rapid plant disease diagnostics. Can. J. Plant Pathol. 2002; 24: 250–258.
- View Article
- Google Scholar

[ref1] 1. Li YZ, Nan ZB. The methods of diagnose, investigation and loss evaluation for forage diseases. Nanjing: Phoenix Science Press; 2015.

[ref2] 2. Liu AP, Hou TJ. Pests and their control of grassland plants. Beijing: China Agricultural Science and Technology Press; 2005.

[ref3] 3. Samac DA, Rhodes LH, Lamp WO. Compendium of alfalfa diseases and pests. 3rd ed. St. Paul: APS Press; 2014.

[ref4] 4. Mao HP, Xu GL, Li PP. Diagnosis of nutrient deficiency of tomato based on computer vision. Trans. Chin. Soc. Agric. Mach. 2003; 34: 73–75.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref5] 5. Pydipati R, Burks TF, Lee WS. Identification of citrus disease using color texture features and discriminant analysis. Comput. Electron. Agric. 2006; 52: 49–59.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref6] 6. Zhao YX, Wang KR, Bai ZY, Li SK, Xie RZ, Gao SJ. Research of maize leaf disease identifying system based image recognition. Sci. Agric. Sin. 2007; 40: 698–703.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref7] 7. Sankaran S, Mishra A, Ehsani R, Davis C. A review of advanced techniques for detecting plant diseases. Comput. Electron. Agric. 2010; 72: 1–13.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref8] 8. Story D, Kacira M, Kubota C, Akoglu A, An LL. Lettuce calcium deficiency detection with machine vision computed plant features in controlled environments. Comput. Electron. Agric. 2010; 74: 238–243.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref9] 9. Patil JK, Kumar R. Advances in image processing for detection of plant diseases. J. Adv. Bioinformatics Appl. Res. 2011; 2: 135–141.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref10] 10. Xu GL, Zhang FL, Shah SG, Ye YQ, Mao HP. Use of leaf color images to identify nitrogen and potassium deficient tomatoes. Pattern Recogn. Lett. 2011; 32: 1584–1590.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref11] 11. Li GL, Ma ZH, Wang HG. Image recognition of wheat stripe rust and wheat leaf rust based on support vector machine. J. China Agric. Univ. 2012; 17: 72–79.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref12] 12. Zhang JH, Qi LJ, Ji RH, Wang H, Huang SK, Wang P. Cotton diseases identification based on rough sets and BP neural network. Trans. Chin. Soc. Agric. Eng. 2012; 28: 161–167.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref13] 13. Phadikar S, Sil J, Das AK. Rice diseases classification using feature selection and rule generation techniques. Comput. Electron. Agric. 2013; 90: 76–85.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref14] 14. Barbedo JGA. An automatic method to detect and measure leaf disease symptoms using digital image processing. Plant Dis. 2014; 98: 1709–1716.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref15] 15. Omrani E, Khoshnevisan B, Shamshirband S, Saboohi H, Anuar NB, Nasir MHNM. Potential of radial basis function-based support vector regression for apple disease detection. Measurement 2014; 55: 512–519.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref16] 16. Tan WX, Zhao CJ, Wu HR, Gao RH. A deep learning network for recognizing fruit pathologic images based on flexible momentum. Trans. Chin. Soc. Agric. Mach. 2015; 46: 20–25.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref17] 17. Zhou R, Kaneko S, Tanaka F, Kayamori M, Shimizu M. Image-based field monitoring of Cercospora leaf spot in sugar beet by robust template matching and pattern recognition. Comput. Electron. Agric. 2015; 116: 65–79.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref18] 18. Atoum Y, Afridi MJ, Liu XM, McGrath JM, Hanson LE. On developing and enhancing plant-level disease rating systems in real fields. Pattern Recogn. 2016; 53: 287–299.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref19] 19. Ye HJ, Lang R, Liu CQ, Li MZ. Recognition of cucumber downy mildew disease based on visual saliency map. Trans. Chin. Soc. Agric. Mach. 2016; 47: 270–274.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref20] 20. Camargo A, Smith JS. An image-processing based algorithm to automatically identify plant disease visual symptoms. Biosyst. Eng. 2009; 102: 9–21.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref21] 21. Patil SB, Bodhe SK. Leaf disease severity measurement using image processing. Int. J. Eng. Tech. 2011; 3: 297–301.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref22] 22. Mao HP, Zhang YC, Hu B. Segmentation of crop disease leaf images using fuzzy C-means clustering algorithm. Trans. Chin. Soc. Agric. Eng. 2008; 24: 136–140.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref23] 23. Dubey SR, Jalal AS. Fusing color and texture cues to identify the fruit diseases using images. Int. J. Comput. Vis. Image Process. 2014; 4: 52–67.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref24] 24. Tian YW, Li TL, Li CH, Piao ZL, Sun GK, Wang B. Method for recognition of grape disease based on support vector machine. Trans. Chin. Soc. Agric. Eng. 2007; 23: 175–180.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref25] 25. Zhang M, Meng QG. Automatic citrus canker detection from leaf images captured in field. Pattern Recogn. Lett. 2011; 32: 2036–2046.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref26] 26. Liu L, Wang TY, Jiang YX, Zhi JZ. Algorithm of texture segmentation combining FCM and SVM. Comput. Eng. Appl. 2008; 44: 32–33.
View Article
Google Scholar

[71] View Article

[72] Google Scholar

[ref27] 27. Wang XY, Zhang XJ, Yang HY, Bu J. A pixel-based color image segmentation using support vector machine and fuzzy C-means. Neural Networks 2012; 33: 148–159. pmid:22647833
View Article
PubMed/NCBI
Google Scholar

[74] View Article

[75] PubMed/NCBI

[76] Google Scholar

[ref28] 28. Camargo A, Smith JS. Image pattern classification for the identification of disease causing agents in plants. Comput. Electron. Agric. 2009; 66: 121–125.
View Article
Google Scholar

[78] View Article

[79] Google Scholar

[ref29] 29. Kononenko I. Estimating attributes: analysis and extensions of relief. Lect. Notes Comput. Sci. 1994; 784: 171–182.
View Article
Google Scholar

[81] View Article

[82] Google Scholar

[ref30] 30. Witten IH, Frank E, Hall MA. Data mining: Practical machine learning tools and techniques. 3rd ed. Beijing: China Machine Press; 2014.

[ref31] 31. Hall MA. Correlation-based feature selection for machine learning. PhD Thesis, The University of Waikato. 1999.

[ref32] 32. Rosenberg C, Hebert M, Schneiderman H. Semi-supervised self-training of object detection models. In: Proceedings of seventh IEEE workshop on applications of computer vision, 2005. vol. 1, pp. 29–36.

[ref33] 33. Torgo L. Data mining with R: Learning with case studies. Boca Raton: Chapman and Hall/CRC; 2010.

[ref34] 34. Powers DMW. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Tech. 2011; 2: 37–63.
View Article
Google Scholar

[88] View Article

[89] Google Scholar

[ref35] 35. Gonzalez RC, Woods RE. Digital image processing. 3rd ed. Upper Saddle River: Prentice Hall; 2007.

[ref36] 36. Gonzalez RC, Woods RE, Eddins SL. Digital image processing using MATLAB. Beijing: Publishing House of Electronics Industry; 2005.

[ref37] 37. Stricker MA, Orengo M. Similarity of color images. Proc. SPIE Int. Soc. Opt. Eng. 1995; 2420: 381–392.
View Article
Google Scholar

[93] View Article

[94] Google Scholar

[ref38] 38. Breiman L. Random forests. Mach. Learn. 2001; 45: 5–32.
View Article
Google Scholar

[96] View Article

[97] Google Scholar

[ref39] 39. Cortes C, Vapnik V. Support-vector networks. Mach. Learn. 1995; 20: 273–297.
View Article
Google Scholar

[99] View Article

[100] Google Scholar

[ref40] 40. Burges CJC. A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 1998; 2: 121–167.
View Article
Google Scholar

[102] View Article

[103] Google Scholar

[ref41] 41. Leiva-Valenzuela GA, Aguilera JM. Automatic detection of orientation and diseases in blueberries using image analysis to improve their postharvest storage quality. Food Control 2013; 33: 166–173.
View Article
Google Scholar

[105] View Article

[106] Google Scholar

[ref42] 42. Chang CC, Lin CJ. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2011; 2: 1–27.
View Article
Google Scholar

[108] View Article

[109] Google Scholar

[ref43] 43. Cover TM, Hart PE. Nearest neighbor pattern classification. IEEE T. Inform. Theory 1967; 13: 21–27.
View Article
Google Scholar

[111] View Article

[112] Google Scholar

[ref44] 44. Song Y, Huang J, Zhou D, Zha HY, Giles CL. IKNN: Informative K-nearest neighbor pattern classification. Lect. Notes Comput. Sci. 2007; 4702: 248–264.
View Article
Google Scholar

[114] View Article

[115] Google Scholar

[ref45] 45. Barbedo JGA. A review on the main challenges in automatic plant disease identification based on visible range images. Biosyst. Eng. 2016; 144: 52–60.
View Article
Google Scholar

[117] View Article

[118] Google Scholar

[ref46] 46. Aji AF, Munajat Q, Pratama AP, Kalamullah H, Aprinaldi , Setiyawan J, et al. Detection of palm oil leaf disease with image processing and neural network classification on mobile device. Int. J. Comput. Theor. Eng. 2013; 5: 528–532.
View Article
Google Scholar

[120] View Article

[121] Google Scholar

[ref47] 47. Xia YQ, Wang HM, Zeng S. Plant leaf image disease detection based on Android. J. Zhengzhou Univ. Light Ind. (Nat. Sci. Ed.) 2014; 29: 71–74.
View Article
Google Scholar

[123] View Article

[124] Google Scholar

[ref48] 48. Qu Y, Tao B, Wang ZJ, Wang ST. Design of apple leaf disease recognition system based on Android. J. Agric. Univ. Hebei 2015; 38: 102–106.
View Article
Google Scholar

[126] View Article

[127] Google Scholar

[ref49] 49. Zheng J, Liu LB. Design and application of rice disease image recognition system based on Android. Comput. Eng. Sci. 2015; 37: 1366–1371.
View Article
Google Scholar

[129] View Article

[130] Google Scholar

[ref50] 50. Schaad NW, Frederick RD. Real-time PCR and its application for rapid plant disease diagnostics. Can. J. Plant Pathol. 2002; 24: 250–258.
View Article
Google Scholar

[132] View Article

[133] Google Scholar

Figures

Abstract

Introduction

Materials and Methods

Image Acquisition

Lesion Image Segmentation

Feature Extraction and Normalization

Feature Selection

Building of Disease Recognition Models

Results

Image Segmentation Results

Feature Selection Results Using the Methods of ReliefF, 1R and CFS

Built Disease Recognition Models and Comparison of Recognition Results

Recognition Results of Disease Recognition Models Based on Random Forest.

Recognition Results of Disease Recognition Models Based on SVM.

Recognition Results of Disease Recognition Models Based on KNN.

Recognition Results of Disease Recognition Models Based on Semi-supervised Learning.

Discussion

Conclusions

Supporting Information

S1 Data. Cumulative contribution rates with increase in number of principal components based on 45 features used for building Model 4.

S2 Data. Recognition accuracies of the training set and the testing set using semi-supervised models with a different number of principal components (ratio of labeled to unlabeled samples was 2:1).

S3 Data. Recognition accuracies of the training set and the testing set using semi-supervised models with a different number of principal components (ratio of labeled to unlabeled samples was 1:1).

S4 Data. Recognition accuracies of the training set and the testing set using semi-supervised models with a different number of principal components (ratio of labeled to unlabeled samples was 1:2).

Author Contributions

References