Black Sigatoka Classification Using Convolutional Neural Networks

—In this paper we present a methodology for the automatic recognition of black Sigatoka in commercial banana crops. This method uses a LeNet convolutional neural network to detect the progress of infection by the disease in different regions of a leaf image; using this information, we trained a decision tree in order to classify the level of infection severity. The methodology was validated with an annotated database, which was built in the process of this work and which can be compared with other state-of-the-art alternatives. The results show that the method is robust against atypical values and photometric variations. in industrial agronomic in machine learning, photonic papers are “Detection of foliar diseases using image processing techniques”, “Mobile application for the detection of black Sigatoka” and “Tuning of conic parameters using Tikhonov regularization and L-Curve simulation”. most outstanding papers are “Detection of foliar diseases using image processing techniques”, “Mobile application for the detection of black Sigatoka” and “Supervised learning models for control quality by using color descriptors: A study case”.

techniques that represent new methods of disease detection. Visual representation systems such as HSV, TSL, LAB, and YCbCr commonly use Gaussian filters that soften images; then methods including histogram analysis and OTSU, among others, can be used to threshold images [7]. However, these developments are validated by laboratory testing, under controlled conditions, ignoring lighting conditions growing crops where these algorithms cannot perform properly.
Other sturdier methodologies use classic machine learning techniques to perform feature extraction processes on and classification of the objects of study. The authors of [8] performed threshold operations and color space dilatation using YcbCr, which is necessary for color feature extraction, and then used ANFIS models and support vector machines to classify images of leaves infected with black Sigatoka. These models have a detection rate of 100%; however, the database only has a small number of samples, which generates biased classifiers.
Other studies defined different types of illnesses [9], for example, followed the same preprocessing, feature extraction, and classification steps to train a k-nearest neighbor algorithm to classify bacterial blight, Alternaria alternata, antracnose, and cercospora. In general, the use of wavelet transform, Fourier transform, HOG, SIFT, and SURF techniques in order to train support vector machines, k-neighborhoods, and neural networks is common [10], [11]. However, these methodologies are limited to establishing the presence or absence of the disease; processes in the field are required to detect the progress of infection by the disease and to perform effective control in its early stages.
Convolutional neural networks (CNNs) are powerful tools that enable precision agriculture applications such as leaf and stem counts, leaf size measurements, root localization, and plant recognition. Due to the importance of the farming process, the most frequently used application is disease detection [12]. One study [13] trained an Inception-v3 network to classify infections in yucca plants, accomplished 93% of efficiency. The authors of [12] used a public domain dataset to train the AlexNet and GoogleNet architecture to identify 26 diseases that impact 14 types of corps, accurately identifying disease in 99.35% of cases.
Even though the reported accuracy rates for these studies are high, the tests were made under laboratory conditions where there were no disturbances caused by image noise. These classification models must satisfy all the disturbances which it would generate sturdy systems with high implementation possibilities in lands.
In the field of Sigatoka detection, some studies have shown that the use of LeNet architecture trained with RGB and GRAY model samples can detect disease 98% of the time [6].
A further study introduced the application of pre-training networks such as ResNet50 and Inception-v3, which generated efficiencies of 99.9% [14]. Even though those efficacies have high accuracy, these methodologies only detect the disease while skipping metrics that define different levels of severalty like the Fouré scale [15] and each stage must carry a different action control. Disease detection approaches using CNNs classify diseases using images with a resolution of 64 × 64 to 150 × 150 [16], whereas current mobile devices have cameras with average resolutions of 4160 × 3120. Therefore, it is necessary to develop a suitable methodology for deep learning methods. Nevertheless, these developments show the viability of implementing methods of disease detection using digital images at low computational cost, allowing the potential advancement in mobile devices of easy access for small farmers.
The state of the art research back there, just focus on classifying healthy and diseased images. This is suitable for laboratory tests, however, black leaf Sigatoka disease has different levels of affectation and in each one requires different agricultural management control actions. If it is decided to develop methodologies that impact agricultural processes, it is necessary to train models that generate predictions with more information for the farmer which allow better control decisions.
This article describes a method of detecting different levels of severity of black Sigatoka infection using a methodological variation of the LeNet CNN architecture. The training of the network parameters was undertaken by using a database created under real conditions, generating all the cases for field implementation.
This study is in two parts: in Section II, we describe the contents of each stage including the methodological development of machine learning models, the construction of the annotated database, and the statistical validation processes. In Section III, we describe our experiments and discuss the results. In Section IV, we present the conclusions of this study and its benefits.
The main contributions of this article are: 1) Create an annotated database with three levels of severity and being suitable for training convolutional neural networks. 2) Create the first deep learning methodology that classifies the image of a banana leaf and the severity level of black sigatoka disease in high, medium and low states.

II. MATERIALS AND METHODS
This section presents the proposed methodology for training a CNN to classify black Sigatoka infection according to different levels of severity. Fig. 1 shows the classification and training processes.

A. Classification Models
To perform the disease classification, we proposed three structured steps that allowed us to detect the disease at different levels of severity. As can be seen in Fig. 1, we carried out an image conditioning process, which generated a new database that allows training a Convolutional Neural Network in LeNet topology. We used this classifier to predict the level of severity in sections of the image.
Since it is required to weigh the generated labels, we proposed to count the results computed by the deep learning model. This arithmetic process generates a set of characteristics that are classified using a decision tree, generating the overall result of the image.
Each mentioned process is explained in detail below.

B. Image Conditioning for Classification
This step is necessary because classification in CNNs requires images with a resolution approximating 96 × 96 pixels [16]. Consequently, we segmented the signal into small windows that were classified using deep learning models. Even more when the captures made by the database construction are in high resolution. Fig. 2 shows the procedure for conditioning an image. This process segments the image into windows of 500 × 500 pixels and then resizes it to 96 × 96 pixels. This window size was chosen for segmentation since allows the information of the studied objective.
This image processing stage, in addition to conditioning the input signal for the RNN, allows increasing the number of samples that will be used in the training and validation of the models.

C. Convolutional Neural Network Model
The proposed classifier used the LeNet CNN topology [17], which consists of feature extraction and classification stages. Feature extraction is performed by a convolution layer. This applies filters in a slider window form that computes the dot product between the kernel and the image, giving as result a bidimensional activation map that extracts morphological patterns such as curves and edges [18]. The CNN learns the values of these filters on its own during the training process. This information is important for disease description.
The first and third convolution layers used a kernel size of 5 × 5 and 3 × 3. In addition, they had ReLU activation functions that introduced a non-linearity to the network, generating significant separability [19]. Equation (1) shows the convolution calculation.
where Y j is the neuron output and j is the matrix from the dot product between the image Y i and the kernel convolution K ij . The term b i corresponds to the bias parameter of each neuron, and g is the activation function.
The second and fourth layers perform a sub-sampling according to the max pooling process. This consists in obtaining the maximum activation values for each sub-window [20]. This process reduces the image size and generates invariance to potential rotations and input translation. Finally, we flattened the image, getting a feature vector that is classified through connected layers as shown in Fig. 3. These fully connection layers are numbered from 5 to 8 and have tanh activation functions to raise the gap of the neuron weights. The tanh activation function observed in equation 2, normalizes the values between -1 and 1, where the highest parameters tend asymptotically to 1 and the very low values to -1.
where X is the input of the neuron. The last layer uses a softmax activation function [21], whose input is the feature vector generated by the fully connected layers, and from this, we estimated the class membership likelihood ("high," "middle," or "low") by using equation 3.
where Z is the input vector and C is the dimension vector.

D. Generation, Count, and Classification of Labels
This stage classified each image segment using the CNN, generating a set of N × M labels that correspond to "high," "middle," and "low" severity levels.
The set represents a separate coding for each illness state that is characterized through a class count. These quantities are organized in a 3 1x x vector, which generates a classifiable pattern from a decision tree. Fig. 4 shows the algorithm used to obtain the infection severity level of the banana leaf base in the classification count given by the CNN.

E. Training and Validation Models
As explained in the methodology section, it is necessary to use two learning models to detect the disease. Therefore, a CNN model and a decision tree must be trained. The CNN training was performed using the reduced-images database that is described in the next section. The proposed LeNet network learned the weights and biases by using the Adadelta algorithm; then we used the categorical cross-entropy loss function, and finally, we evaluated the model by following the accuracy metric as shown in Table I. To train the decision tree, we used the features generated by the label counting process. The model was trained with four nodes by using the Gini diversity index optimization model without pruning.
We evaluated our models using a cross-validation strategy that divided the database in k-folds. Each fold was divided into two parts: 70% of the data were used for training and 30% for testing. Nevertheless, generate the confusion matrix for each classifier. This process applied a Monte Carlo analysis, where the stop criterion was defined by where M k is the confusion matrix at iteration k and th is the error threshold.

F. Database for Severity Level Detection
In the literature, there are databases with many images of Sigatoka infection. Unfortunately, these databases were not International Journal of Machine Learning and Computing, Vol. 12, No. 4, July 2022 created according to a capture and sampling protocol that allowed the classification of leaves into different levels of infection severity.
In addition, these databases have a reduced number of samples, avoiding the statistical relevance needed for the application of machine learning models. In some cases, the authors were not allowed access to databases. This made it difficult to apply a structured methodology like the one proposed in this article with the purpose of replicating findings and comparing methods.
We sampled images from banana harvests in the Risaralda Department (Colombia); these were labeled with three severity levels according to the Fouré scale. The samples were taken using cellphones with camera resolutions of 9.6 MP (4128 × 2322) and 13 MP (3120 × 4160). Fig. 5 shows some examples of the base and the population per class. To train the CNN model, it was necessary to create a second database. This base was constructed with the image segments obtained through the conditioning process, explained in Section II-B. We then obtained 96 × 96 pixel sub-images. Next, these images were depurated and labeled according to the Fouré scale. As a result, we secured 4244 samples. Fig. 6 shows some examples from the database and the population per class. The databases described can be found here: https://sites.google.com/a/utp.edu.co/black-sigatoka-disease -database/.

III. RESULTS
The presentation of the results of this study is divided into two parts. First, we show the performance of the classification methods in detecting the state of infection by the disease in different segments of the leaf. Second, we present the results obtained by validating the decision tree to classify the stage of infection in the whole banana leaf using the classifiers proposed in the first stage. We used the validation approach described in this section to obtain the confusion matrices of our method and other studies.

A. Detection of Infection Levels in Different Segments of the Leaf
To understand the scope of the proposed method, we conducted a comparative analysis of our model and other state-of-the-art methodologies. We used the CNN Inception-v3 [14] and SVM [10] methods, which have been successfully applied in disease detection. In addition, the SVM is a methodology widely used in classification problems and has demonstrated high performance in classifying patterns. The SVM configuration for training is a polynomial kernel and a feature descriptor with wavelet transform in the RGB and HSV color spaces.
To detect disease with the CNN Inception-v3 approach, we used this network to extract the features vector and then, it is connected to a final layer to classify. In Table II, we show the traces of the recognition confusion matrices for the detection of black Sigatoka in a banana leaf. The CNN LeNet method provides the best disease detection results with an average detection rate of approximately 90.03% ± 1.73%, which is substantially higher than those achieved using the other methods. On the other hand, the CNN Inception-v3 method alone performs relatively poorly; however, it can still obtain a detection rate of 78.79% ± 1.6%, whereas the SVM presented with a performance of approximately 86.16% ± 2.1%. The CNN LeNet method shows an accuracy rate of 87% ± 2.1% and 99% ± 1.7% in classifying images from the high and low classes, respectively. Although the performances are suitable for the application, the middle class has more overlap.

B. Detection of Infection Levels in the Whole Banana Leaf
In this part, we observed the performance of the algorithm in detecting disease in banana leaves. It is important to highlight that this classification represents the diagnosis that the system would deliver to a person interested in analyzing the disease. In Table III  Our results demonstrate that our method can accurately classify high and low classes. However, it was less effective with respect to medium classes. This is due to the high level of overlap in the data, which makes it difficult to find a classification method that can separate classes properly.

IV. CONCLUSIONS
We developed an automatic method to detect the stage of International Journal of Machine Learning and Computing, Vol. 12, No. 4, July 2022 the Black Sigatoka in a banana leaf. Our approach uses CNN LENET method to classify the stage of the disease in different segments of the leaf and with each segmented label will appear on the tree decision.
This methodology helps to detect the disease at an early stage, allowing the user to perform corrective practices to avoid economic losses. Therefore, our method solve the problems presented in some state of the art techniques that detected the disease in the last stage of the illness, when the farmer has no chance to act, demonstrating the importance of addressing researches under this approach.
This work studies the contribution of different classification method to the recognition task. To perform robust Black Sigatoka recognition by using CNN classification methods, itś necessary to use an image size of 96x96. This makes it necessary to perform a local analysis by small leaf segments, due to the fact that resizing the image from each banana leaf sample would lose the capture information and the disease could not be detected.
The proposed approach was tested in an annotated dataset that was created specifically for this work, because there was no publicly available database with different levels of severity labeled by an expert or labeled under a standardized method such as the Fouré scale. We made the dataset publicly available to facilitate comparisons and accelerate the research in this area. In the future, the database must be expanded to validate our approach on a wider set of activities.
The proposed method generates labels in a local and global levels, which creates a leaf severity map that could be used to support agricultural methodologies for diseases detection such as Stover's sampling modified by Gauhl.

CONFLICT OF INTEREST
The authors declare no conflict of interest.

AUTHOR CONTRIBUTIONS
Cristian Escudero conducted initial literature review, proposed and implemented the model, collected and preprocessed datasets, run experiments and drafted the first version of the paper under supervision of André s Calvo and Arley Bejarano. André s Calvo and Arley Bejarano edited and extended the manuscript.