Recognizing and counting Dendrocephalus brasiliensis (Crustacea: Anostraca) cysts using deep learning

The Dendrocephalus brasiliensis, a native species from South America, is a freshwater crustacean well explored in conservational and productive activities. Its main characteristics are its rusticity and resistance cysts production, in which the hatching requires a period of dehydration. Independent of the species utilization nature, it is essential to manipulate its cysts, such as the counting using microscopes. Manually counting is a difficult task, prone to errors, and that also very time-consuming. In this paper, we propose an automatized approach for the detection and counting of Dendrocephalus brasiliensis cysts from images captured by a digital microscope. For this purpose, we built the DBrasiliensis dataset, a repository with 246 images containing 5141 cysts of Dendrocephalus brasiliensis. Then, we trained two state-of-the-art object detection methods, YOLOv3 (You Only Look Once) and Faster R-CNN (Region-based Convolutional Neural Networks), on DBrasiliensis dataset in order to compare them under both cyst detection and counting tasks. Experiments showed evidence that YOLOv3 is superior to Faster R-CNN, achieving an accuracy rate of 83,74%, R2 of 0.88, RMSE (Root Mean Square Error) of 3.49, and MAE (Mean Absolute Error) of 2.24 on cyst detection and counting. Moreover, we showed that is possible to infer the number of cysts of a substrate, with known weight, by performing the automated counting of some of its samples. In conclusion, the proposed approach using YOLOv3 is adequate to detect and count Dendrocephalus brasiliensis cysts. The DBrasiliensis dataset can be accessed at: https://doi.org/10.6084/m9.figshare.13073240.


Introduction
The practice of moving species is common to humans since the agriculture and cattle raising was originated, becoming more intense with the trade expansion across different parts of the pests that dominantly attack field crops in order to real-time monitor them. YOLOv3 and Faster R-CNN obtained an average precision in detection of pests of 63.54% and 51.72%, respectively. [15] achieved video-based fruit counting performances up to 93% on three different fruits using Faster R-CNN. [16] presented an architecture with two-stage to detect aphid, whose detection stage is based on YOLO. The experiments showed that the approach achieved an aphid detection performance of 76.8% average precision. [17] used an approach based on Faster R-CNN to obtain images from maize seedlings to distinguish maize seedlings and weeds in crops. The approach obtained an average precision in the detection of maize seedlings with respect to soil and weeds of 97.71%. [18] used Faster R-CNN to detect and count banana plants on a farm using aerial images collected from a UAV (Unmanned Aerial Vehicle). The approach achieved 97.9%, 91.5%, and 87.2% accuracy on altitudes of 40m, 50m, and 60m, respectively. These are some examples of process automation using domain-specific object detection techniques based on computer vision. In this paper, we compare state-of-the-art object detection models Faster R-CNN and YOLOv3 in order to propose an automated approach for Dendrocephalus brasiliensis cysts detection and counting from images obtained by a digital microscope. Besides, we show that it is possible to infer the number of cysts from a substrate with a known weight. Finally, we introduce the DBrasiliensis dataset, a repository with 246 images containing 5141 cysts of Dendrocephalus brasiliensis, a native species from South America. One of the motivations for publishing the DBrasiliensis dataset is related to the importance and potential of this species to productive activities in aquaculture and conservational efforts. A dataset with cysts examples can help to accelerate researches that need Dendrocephalus brasiliensis cysts in an automated way using computer vision, as well as new applications for counting and weight inference of cysts.
The contributions of this paper are: • The publication of a novel annotated Dendrocephalus brasiliensis cysts images dataset, called DBrasiliensis, composed of 246 images divided into training and testing. The training set has 111 images containing 3173 annotated cysts. The testing set has 135 images divided into ten subsets, whose labels represent the weight in grams of each one. In all, the testing set has 1968 cyst images. To the best of our knowledge, this is the first Dendrocephalus brasiliensis cysts image dataset destined for deep learning. The DBrasiliensis dataset can be accessed at: https://doi.org/10.6084/m9.figshare.13073240.
• Definition of a baseline for detection and counting Dendrocephalus brasiliensis cysts using the state-of-the-art YOLOv3 and Faster R-CNN.
• A deep learning-based automatized approach to detect and count Dendrocephalus Brasiliensis cysts from images obtained by a digital microscope.
The rest of the paper is organized as follows. In the Materials and Methods Section, we describe the DBrasiliensis dataset, introduce an overview of the YOLOv3 and Faster R-CNN, and also present the experimental setup, followed by the analysis of results in the Results and Discussion Section and conclusions in Conclusion Section.

DBrasiliensis dataset
The Dendrocephalus brasiliensis, whose life stages is presented in Fig 1, lays resistance cysts in the bottom of culture medium, such as an aquarium, small lakes, etc. These cysts mix with the substrate present at the bottom of the culture medium which is basically composed of organic and inorganic matter. Fig 2 shows a substrate sample, whose cysts are highlighted by a red rectangular bounding box.
In order to build the DBrasiliensis dataset, we took substrates portions from the bottom of an aquarium that we used as an incubator for the Dendrocephalus brasiliensis and split them into two parts: one to capture the training images and the other to capture the test images. Both parts were fixed on white coverslips in order to be observed using a digital microscope. We used an XTRAD USB digital microscope model XT-2036 with 52x magnification to capture the images with resolution of 640 × 480 pixels.
On the images designated to the training set (see example in Fig 3(a)), we used the LabelImg software to label cysts in both PASCAL VOC [19] and YOLO formats, as shown in Fig 3(b). There are 111 images in all, for training, containing 3173 annotated cysts.
We divided the images designated for the tests into ten small groups. Each small group received a label that indicates the number of cysts in the images group and the weight of the substrate used to capture the images. For building a given group, we split the substrate reserved to it into small portions on a white coverslip and weighed it using a precision scale (see Fig 4  (a)). Then, we captured an image of each portion of the substrate using the digital microscope. The captured images were stored in a folder, whose name (label) indicates the amount of cyst  Table 1 shows the testing set in detail.

An overview of Faster R-CNN and YOLOv3 architectures
YOLOv3 [10] and Faster R-CNN [11] are state-of-the-art object detection architecture and are employed to solve many problems whose aim is to detect and classify objects [9]. In this section, we provide an overview of both the architectures. YOLOv3 architecture. The YOLOv3 workflow is basically composed of three steps [10]. First, it receives an input image and then divides it into a grid. Next, it applies the image classification and localization processes on each grid cell in order to predict class probabilities for objects and their corresponding bounding boxes. For both classification and localization processes, the YOLOv3 uses an open-source CNN (Convolutional Neural Network) called Darknet-53 as backbone, whose 53 first layers are for classification and another 53 additional layers are for detection, resulting in a CNN with a total of 106 layers.
The object detection is done at three different scales in 82nd, 94th, and 106th layers, whose inputs are downsampled by a factor of 32, 16, and 8, respectively. The 82nd layer is responsible for detecting large objects, the 94th layer for medium objects, and the 106th layer for smaller objects. The detection at different layers provides detection of small objects since the upsampled layers are concatenated with the previous layers in order to preserve the object's fine-grained features. During the detection multiple bounding boxes for each object in a grid cell can be predicted. To define the right bounding box for the object, the IoU (Intersection over Union) is calculated between bounding boxes in the grid cell and is selected one with the highest IoU. For those bounding boxes selected, the network calculates conditional class probabilities. Finally, conditional class probabilities and box confidence predictions jointly provide class-specific confidence scores for each bounding box [10].
The Darknet-53 architecture, used by YOLOv3 as a backbone, is mainly composed of successive 3 × 3 and 1 × 1 convolutional layers. Each convolution layer is followed by a Batch Normalization layer [20] and Dropout operations [21]. At the end of each convolutional block, residual blocks are added in order to perform the identity mapping, whose purpose is to add the output from the previous convolutional layer x to output F(x) of the layer ahead. This allows x and F(x) to be combined as input to the next convolutional layer [22]. The final block consists of a Global Average Pooling [23] followed by a fully connected layer and a final layer Softmax [24]. Fig 5 shows the general workflow of YOLOv3 applied to cyst detection and counting. After the YOLOv3 was trained using the annotated images of the DBrasiliensis dataset designated for training, a test image captured by the digital microscope is inputted into the model to detect the cysts. Next, the image is divided into several grid cells. For each cell there are predicted several anchor boxes and confidence scores. Then, the boxes with the highest score are selected so that the network calculates conditional class probabilities for each one. For the last step, the conditional class probabilities and box confidence predictions jointly provide cyst

PLOS ONE
class confidence scores for each box, drawing a bounding box around each cyst in the image. We design a post-processing step that counts the cysts detected in the image. Faster R-CNN architecture. The Faster R-CNN is composed of two modules [11]: RPN (Region Proposal Network) and Fast R-CNN detector. The RPN receives as input an image that is processed by a CNN in order to obtain features and produce a set of rectangular region proposals with three scales (128 × 128, 256 × 256 and 512 × 512) and three aspect ratios (1:1, 2:1 and 1:2) that possibly have the candidate objects. The Fast R-CNN detector receives input RoIs (Region of Interest) produced from the region proposals generated by RPN. Each RoI is processed by a pooling layer and pooled into a fixed-size feature map that is mapped to a feature vector. This feature vector will be the input for a fully connected layer to classify the RoI. The output is composed of two vectors per RoI: the probabilities and bounding-box for each object class considered. Both RPN and Fast R-CNN detector modules share a common set of convolutional layers which can be provided by a CNN backbone like VGG16 [25], ResNet-50 [22], or Inception-v2 [26]. In this paper, we choose the Inception-v2 architecture to act as a backbone for Fast R-CNN.
Inception-v2 architecture [26] has three initial convolutional layers with 3 × 3 filters followed by max-pooling. The output of this block is the input for another block with three convolutional layers with 3 × 3 filters. Next, the architecture has three inception modules in sequence. In the first module, it is performed convolution on an input using filters 1 × 1 and 3 × 3, as well as max-pooling. The resulting outputs are concatenated and moved to the next inception module that applies a grid reduction technique to reduce the number of parameters in order to become the model computationally cheaper. The grid reduction consists of 1 × n and n × 1 convolutions instead of n × n convolutions. Like in the first inception module, the outputs are concatenated and moved to the next inception module. The last inception module is similar to the second, however, it is wider instead of deeper. Finally, before the final layer Softmax, an extra classifier act as a regularizer [26]. Fig 6 shows the general workflow of Faster R-CNN applied to cyst detection and counting. After the training, a test image captured by the digital microscope is inputted into the model to detect the cysts. The image passes through convolutional layers to obtain feature maps, which are inputted into RPN to generate rectangular region proposals. The region proposals are transformed into RoIs and inserted into the Fast R-CNN process that provides cyst class probability and bounding box prediction for each one. Finally, a post-processing step counts the cysts detected in the image.

Experimental setup
Both YOLOv3 and Faster R-CNN architectures were set to use the fine-tuning strategy with all layers initialized with weights from previous training on the MS-COCO (Microsoft Common Objects in COntext) dataset [27]. Besides, we set the learning rate at 0.001, the number of iterations at 8,000, and varied the batch size at 2, 4, 8, 16, 32, and 64. We used a small batch size to consume less memory and train the architectures faster since the small batch size allows us to update the network weights more often [28]. We limited the number of iterations to 8,000, as from that number, the loss rate did not present improvement. We set the learning rate at 0.001 because this value is recommended by [29] when used a small number of samples on training.
During training, all images in the batch were augmented using random rotation by +30/-30å nd exposure between -10% and +10%. Both architectures were trained using a Tesla P100-PCIE-16GB GPU.
We used the DBrasiliensis dataset to train and test both architectures. Thus, 111 images containing 3173 annotated cysts were used in training, i.e., 61.72% of the cysts, and in the test were used 135 images with 1968 cysts, 38.28% of the cysts, arranged into 10 different subsets as presented in Table 1. Each built model has been tested ten times using only one test subset at a time. The metric result is an average from the sum of the scores achieved on each one test subset.
We considered a correct detection (true positive) when the predicted cysts have a detection score of � 0.3, and a wrong detection (false positive) when the detected object isn't a cyst. A false negative is assigned when a cyst is in the image and it isn't detected. The evaluation metrics used were Precision, F1-Score, Accuracy, and Recall. In all formulas below, TP refers to true positives, TN to true negatives, FN to false negatives, and FP to false positives.
Besides, we evaluated both architectures in terms of MAE (Mean Absolute Error), RMSE (Root Mean Square Error), and R 2 . Finally, we applied statistical methods on the Accuracy metric to evaluate the differences among the architectures.
We performed a statistical analysis using the Shapiro-Wilk test [30] to verify the normality of the data, the one-way Anova hypothesis test, and the Tukey's test [31] to analyze the difference between the architectures in a pairwise way. We adopted a significance level of 5% for all statistical tests (p-value < .05).

Results and discussion
The classification results for Precision, F1-Score, Accuracy, and Recall for both architectures are presented in Table 2. Table 2 shows that the YOLOv3 architecture exhibits higher and more uniform precisions than the Faster R-CNN, indicating that the proportion of true positives concerning the total of predicted positives achieved by it didn't present large distortions. It is important to emphasize that the YOLOv3 achieved the best results with batch size set at 32, except the precision, whose best index was achieved with the batch size set at 4.
The Faster R-CNN achieved 94.44% of precision with batch size set at 4, showing that the observed true positives really were cysts. However, it presented a high false negatives rate that can be observed at the recall of 40.72%. One example of this high false negative rate can be seen in Fig 7(a). From the 20 cysts in the image, the Faster R-CNN detected only 1. On the other hand, the YOLOv3 with batch size also set at 4, in which it achieved the best precision and recall of 67.91%, detected 10 of 20 cysts in the same image (see Fig 7(b)).
We can observe in Table 2 that both YOLOv3 and Faster R-CNN with batch size set at 4 achieved better precision. Nevertheless, for other metrics, the YOLOv3 and Faster R-CNN achieved better results with batch size set at 32 and 2, respectively. This high precision with batch size at 4 relates to the low false positive rates achieved by the architectures. YOLOv3 with batch size set at 4 got better precision because it had false positive rates lower than when batch size is 32. Notice in Fig 8 that YOLOv3 with batch size set at 4 got only 7 false positives. On the other hand, when the batch size is 32, the number of false positives is 29. The same is true of the Faster R-CNN with batch sizes set at 2 and 4, whose number of false positives were 290 and 42, respectively (see Fig 8). However, the high precision of the architectures with batch size at 4 did not translate into detecting more cysts (see Table 3).
Concerning accuracy, the Faster R-CNN achieved the lowest results at all batch sizes compared to YOLOv3. The best accuracy rate achieved by YOLOv3, 83.73%, is relevant due to the difficulty of detecting cysts in the substrate because, in many instances, only parts of the cyst are visible, or the cysts are glued together, and there is also a considerable quantity of sand and other residues.   We counted the number of false positive detections in different batch sizes (see Fig 8). This count confirms the precision achieved by each architecture configurations shown in Table 2, i.e., the precision decrease as the number of false positive increases.
We analyzed the false positives and observed that most of them have similar colors to cyst colors and, in some cases, they have parts similar to cyst shapes. Fig 9(a) and 9(b) show an image in which both architectures, Faster R-CNN and YOLOv3, with batch size set at 2, achieved a high false positives rate. The red rectangular bounding boxes show the false positives, the greens the true positives, and the blues the false negatives. Table 3 shows the detection and counting results for all batch sizes on the 10 testing subsets. It can be noted that YOLOv3 outperforms the Faster R-CNN on all testing subsets and has a higher hit rate with batch size set at 32, detecting and counting 1666 of 1968 cysts, a hit percentage of 84.66%. The Faster R-CNN achieved the higher hit percentage with batch size set at 2, detecting and counting 1127 of 1968 cysts, 57.27%.
We carry out an analysis of variance with Anova on a .05 level of significance using the accuracy as metric to determine if there is a difference between the different batch sizes in each architecture, as well as if there is some difference between the average accuracy of both architectures. We adopted the Anova test because, in general, the average accuracy presented normality/homogeneity of variance after we performed the test of normality using the Shapiro-Wilk.
The test between the different batch sizes for Faster R-CNN resulted in a p-value of 0.0001, which indicates a statistically significant difference between the different batch sizes average accuracy. The Tukey test showed that the batch size defined in 2 differs from the others.
The test between the different batch sizes for YOLOv3 also resulted in a p-value of 0.0001, indicating a significant difference between the batch sizes average accuracy, except the batch sizes set at 32 and 64 which according to Tukey test didn't present a statistically significant difference between them.
The comparison between YOLOv3 and Faster R-CNN using the accuracy as metric resulted in a p-value < .05, indicating a statistically significant difference between the models. Table 4 shows that YOLOv3 reached R 2 of 0.88 for batch sizes 32 and 64. On the other hand, Faster R-CNN achieved the best R 2 using batch size 2 (0.20). These results indicate that YOLOv3 outperforms Faster R-CNN in detecting and counting the cysts since the R 2 metric is a performance indicator, and the higher the result the better the agreement between the resulting count of the architectures and the number of cysts in the DBrasiliensis Dataset.
In terms of RSME and MAE, Table 4 shows that YOLOv3 using batch size 32 achieved 3.49 and 2.24, respectively. This result indicates that YOLOv3 has the lowest average standard deviation using batch size 32 between the number of cysts detected and counted and the number of cysts in the DBrasiliensis dataset. From this result, we can tell that YOLOv3 with batch sizes  set at 32, and the learning rate at 0.001 is the approach best suited to detect and count cysts, since that configuration achieved the best results for Accuracy, Precision, R 2 , RSME e MAE. We also carry out an analysis of variance with Anova on a.05 level of significance using the accuracy as a metric to determine if the YOLOv3 with batch size set at 32 maintains the average accuracy between the different testing subsets (see Table 1). The test resulted in a p-value of 0.1008, therefore, we have no evidence that there is a statistically significant difference in YOLOv3 accuracy on the different testing subsets of the DBrasiliensis dataset.
In that way, taking into account that the testing set of the DBrasiliensis dataset consists of 10 subsets, each of which is associated with the substrate weight used to capture your images (see Section DBrasiliensis dataset), we can infer the number of cysts for a new portion of substrate with a known weight obtained from the same aquarium where we took the substrate to build the DBrasiliensis dataset. Thus, a producer or researcher associating weights and counts can use the same technique to infer his production.
The inference of the number of cysts from the substrate with a known weight, for both research and cultivation, is a necessary practice because the manual counting of thousands of cysts is not feasible. Thus, we can use YOLOv3, with batch size set at 32, to infer the total number of cysts contained in a substrate, counting a certain number of cysts collected through sampling, with the samples vary according to the need for more/less accuracy of the data. Adopting the inference, we will be able to count cysts with 83.73% of accuracy (Table 2). It is up to the producer or scientist to analyze the number of samples (set of images associated with a weight) that best suits their needs.
Although the proposed approach can present disadvantages, such as the work required for annotation of thousands of cysts and the computational cost for training the model, the benefit obtained by it, concerning the accuracy and the counting time, is a factor that supports the adoption of automated cyst counting. For instance, the YOLOv3 takes around 1 minute and 29 seconds to count the cysts of 135 images with 83.73% of accuracy and 98.24% of precision.
We believe that the results obtained by YOLOv3 with batch size set at 32 are enough to build an automatic detection and counting system of cysts since the visual counting of cysts performed by humans using microscopes is a hard task, prone to errors, and that also very time-consuming. Besides, it is possible to optimize the process of gauging the number of cysts present in a given medium or substrate, inferring the number of cysts without the need for cleaning, drying, and manual counting.

Conclusion
Due to the potential of Dendrocephalus brasiliensis species in the conservational efforts and productive activities, we presented a new technology aimed to improve and facilitate the cysts measurement process. We built a novel annotated images dataset of Dendrocephalus Brasiliensis cysts called DBrasiliensis and used the YOLOv3 and Faster R-CNN to provide a baseline for detecting and counting cysts. To promote research in the automation of cyst measurements, we also report evidence that the performance of YOLOv3 is superior against Faster R-CNN. Besides, we provided the possibility of inferring the total number of cysts, with an accuracy around 83.73%, from a substrate image set associated with a known weight. The DBrasiliensis dataset can be accessed at: https://doi.org/10.6084/m9.figshare.13073240.