Deep learning approach to bacterial colony classification

In microbiology it is diagnostically useful to recognize various genera and species of bacteria. It can be achieved using computer-aided methods, which make the recognition processes more automatic and thus significantly reduce the time necessary for the classification. Moreover, in case of diagnostic uncertainty (the misleading similarity in shape or structure of bacterial cells), such methods can minimize the risk of incorrect recognition. In this article, we apply the state of the art method for texture analysis to classify genera and species of bacteria. This method uses deep Convolutional Neural Networks to obtain image descriptors, which are then encoded and classified with Support Vector Machine or Random Forest. To evaluate this approach and to make it comparable with other approaches, we provide a new dataset of images. DIBaS dataset (Digital Image of Bacterial Species) contains 660 images with 33 different genera and species of bacteria.


Introduction
The recognition of the bacterial genera and species is crucial since the biological knowledge of microorganisms is extremely important in medicine, veterinary science, biochemistry, food industry or farming. Although most of the microorganisms have positive impact on various areas of life, they can be a reason of many diseases (including the infectious ones). Therefore, automatizing the process of recognition can find application in medical prevention, diagnosis and treatment.
The recognition of microbiological samples is preceded by the culturing process. This phase requires dedicated equipment and chemical agents used for staining the samples. Furthermore, it follows stringent culturing procedures and safety protocols. As the result, we obtain the samples, which are analyzed in order to discover characteristic features and to classify the particular genera and species of bacteria. The classical laboratory methods of bacteria recognition require an expert knowledge and experience. It is a time-consuming process based on comparative analysis of the obtained samples with referential ones (American Bank ATCC; https://www.lgcstandards-atcc.org). One of the most important features that can be recognized on the images is the shape of a bacteria cell. We distinguish three basic shapes: cylindrical, spherical and spiral. However, the process of recognizing bacteria based solely on the shape would be a difficult one because many bacteria share very similar shapes. Second most differentiating feature is the shape and the size of the colonies formed by the bacteria. Some of them live solitary, some live in colonies which are very characteristic in terms of structure and spatial arrangement (e.g. resemble longer or shorter chains or letters V, X or Y). However, their shapes can be vastly irregular which also affects the process of recognition. Moreover, some species of bacteria are morphologically diverse and their cells may have multiple sizes and forms. Therefore, the recognition of the bacteria species based on the shape of the bacteria and their colonies is challenging even for experienced specialist and may require additional analysis with the other microbiological characteristics. It can be observed in subfigures 7.1 and 7.2 in Fig 1, which present two different species of very similar bacteria from the same genera. There can also exist a similarities between two different genera (see subfigures 2.1 and 4 in Fig 1).
The purpose of this article is to propose a computer-aided recognition system for classifying bacteria genera and species. For this purpose, we apply the state of the art method for texture recognition introduced by Cimpoi et al. [1]. The choice of this solution is justified by the fact that not only the shape of individual bacteria but also its frequency and the shape of the colony bring important information in the recognition process. This approach allows to solve major disadvantages of the existing methods, which are dedicated to recognize very few species or strains. The main advatage of this method is that it uses Convolutional Neural Network in order to obtain image descriptor. Such descriptor outperforms handcrafted image descriptors, because it contains features learned automatically, based on millions of images [2]. Additional purpose of this project was to provide a new dataset of bacterial images. DIBaS dataset contains 660 images of 33 different genera and species of bacteria. This database is publicly available and can be freely used by the other researchers. The experiments presented in this article can be reproduced with code available online (https://github.com/bartoszzielinski/deep-fbanks). This code was forked from Cimpoi code (https://github.com/mcimpoi/deep-fbanks) and appropriatelly modified in order to cover DIBaS database and new classifiers.
The article is organized as follows. In the next section we review related work on recognizing genera and species of bacteria. After that, we introduce new database of bacterial images called DIBaS, which is publicly available. Section Method shortly describes state of the art method introduced in [1]. The remaining sections contain experiment setup, result and discussion.

Related works
Image-processing and pattern-recognition techniques combined with various types of classifiers are often used as an effective tools for recognition of the laboratory samples, occurring in the form of images. When taking those methods into consideration, we can state that there are many methods for automatic recognition of bacteria species and strains. There are, among others, statistical methods [3], the artificial neural networks [4][5][6] or other machine learning classifiers [7].
In [6], an algorithm identifies the species of bacteria based on their geometric features: circularity, compactness, eccentricity, tortuosity and length-to-width ratio. Moreover, because the shape of the bacillus is not a discriminant feature (due to the same morphology in different species of bacteria), it takes into consideration their color. Ferero et al. [8] describes a method for automated recognition of tuberculosis. It is based not only on geometrical features but also on the average color of the images. In their research, authors tackles the issue of the deceptive similarities in bacterial morphology. He shows that the color of microorganisms is the key feature to improve the accuracy of recognition.
The other approach, presented in [9], applies two classifiers to the pre-segmented scanner images. It uses various measurement features to extract size and shape of the microorganisms and classify them into their appropriate morphotype. The first classifier uses single features to analyze relatively simple communities, containing only a few morphotypes (e.g., regular rods, cocci, and filaments). A second classifier is a hierarchical tree which uses an optimized subset of features to analyze significantly more complex communities, containing greater morphological diversity. Those classifiers automatically categorizes each cell into one of 11 predominant bacterial morphotypes, including cocci, spirals, curved rods, U-shaped rods, regular straight rods, unbranched filaments, ellipsoids, clubs, rods with extended prostheca, rudimentary branched rods, and branched filaments.
Ahmed et al. [10] proposes a method of identification and classification of foodborne pathogens, using colony scatter patterns. In the first step, big set of features are extracted, and then the Fisher's criterion is used for dimensionality reduction. In the final step, Support Vector Machine (SVM, [11]) classifier is used. Similar approach, with Random Forests instead of SVM, is presented in [12]. Moreover, Ates et al. [13] uses similar approaches to count the number of bacteria colonies, using the compactness ratio of the clusters for the particular species of bacteria. The overview paper concerning those methods was written by Sommer et al. [14].
Most of the described methods are used to recognize very few species or strains (sometimes only one, e.g. tuberculosis). Moreover, in many cases the algorithms base on the morphological features combined with some classification method. Therefore, they are very limited.
The approach used in this article is much more robust and can be used with any genera and species of bacteria.

DIBaS dataset
Digital Images of Bacteria Species dataset (DIBaS) contains 33 bacteria species with 20 images for each of them. It was collected by the Chair of Microbiology of the Jagiellonian University in Krakow, Poland (http://www.km.cm-uj.krakow.pl/). Table 1 summarizes the genera and species of the bacteria in this dataset while Fig 1 presents fragments of the images.
All of the samples were stained using the Gramm's method. The images were taken with Olympus CX31 Upright Biological Microscope equipped with a SC30 camera (Olympus Corporation, Japan). They were evaluated using a 100 times objective under oil-immersion (Nikon50, Japan).

Method
Visual representations based on orderless aggregation of local features, which were originally developed as texture descriptors, have had a widespread influence on the image recognition algorithms. They have been successfully applied to a huge variety of visual domains, including problems closer to "texture understanding" such as material recognition, as well as domains such as object categorization and face identification, despite the fact that on the first glance characteristics of an image of a face shares very little with those of a texture.
In this article, we apply the state of the art texture model to the problem of bacteria species classification. The choice of this method is justified by the fact that different species of bacteria colonies reproduce in a particular manner, resulting in different texture (see Fig 1).
We apply the approach proposed by Cimpoi et al. [1], which revisits Fisher Vectors (FV, [15]), a classic texture representation, in the context of deep learning. The successive steps of this approach are as follows: extracting local image descriptors; producing a single feature vector using orderless pooling encoder; and classifying with SVM or Random Forest. The flowchart of this approach is presented in Fig 2. Below, we describe those three steps in details.
Local image descriptors: We consider two types of local image descriptors. The first type is the SIFT descriptors [16] extracted densely from the image (DSIFT, [17]). SIFT is the histogram of the image gradients quantized with respect to their location within a patch as well as to their orientation. DSIFT is obtained by sampling with a step of 2 pixels, resulting in W/2 × H/2 vectors of dimension 128, where W and H are width and height of the image, respectively. The second type consists of the deep features extracted by truncating Convolutional Neural Network (CNN) at the level of the last convolutional layer. A CNN can be seen as a composition ϕ K Á Á Á ϕ 2 ϕ 1 of K functions or layers. The output of each layer x k = (ϕ k Á Á Á ϕ 2 ϕ 1 )(x) is a descriptor field x k 2 R W k ÂH k ÂD k , where W k and H k are the width and height of the layer and D k is the number of kernels. Therefore, as the result of applying CNN to image x, W k Á H k vectors of dimension D k is obtained (assuming that kth layer is the last Deep learning approach to bacterial colony classification convolutional layer). We consider three CNN architectures: AlexNet [2], VGG-M [18] and VGG-VD [19]. All of them trained on ImageNet dataset (http://www.image-net.org/) [20].
Pooling encoder: A pooling encoder takes the local descriptors extracted from an image x as an input and produces a single feature vector ϕ(x k ) as an output. Such vector is suitable for classification, especially with SVM. We use FV, which consists of: computing Gaussian Mixture Model (GMM) for descriptors; assigning descriptors softly to GMM components; and computing first and second order statistics for each GMM component.
Classifier: After extracting the FV representation, as set of classifiers is used. One-vs-all linear SVM with C = 1 (the classifier originally used by Cimpoi et al. [1]) is called "Original" later in this article. In addition, one-vs-one SVM is investigated with various kernels and their optimal parameters. They are called "Linear SVM", "Polynomial SVM" and "RBF SVM", respectively. We use SVM because it has shown stable performance for many important tasks in bioinformatics, such as protein remote homology detection [21], DNA binding protein identification [22], and recombination spot identification [23]. We also use Random Forest (with AdaBoost algorithm) in order to test an ensemble approach. The parameters of the models are computed with Bayes optimization [24].
The representations analyzed in this article are labeled as pairs X-Y, where X is a pooling encoder and Y a local descriptor. For example, FV-SIFT denotes the Fisher vector encoder applied to densely extracted SIFT descriptors, while FV-CNN denotes the same encoder applied on top of the CNN. The possible values of CNN are AN for AlexNet, M for VGG-M and VD for VGG-VD.
We additionally use the output of the last but one fully connected layer (we do not use the last layer, because it corresponds to the classification of 1000 classes from the ImageNet dataset). Although such approach does not use pooling phase, we label it as FC-CNN for consistency (FC in this case stands for fully connected layer).

Experiment setup
In order to verify if the method dedicated for texture recognition [1] can be successfully applied to classify species of bacteria, we perform two experiments based on DIBaS dataset.
The goal of the first experiment is to verify which of the local image descriptors and pooling encoders results in the best accuracy. For this purpose, the images of each bacterial specie are randomly divided into equally numbered training and test sets. There are 330 images (10 for each bacteria class), both in training and test set. For classification, we use SVM or Random Forest with three types of representations: FV-SIFT, FC-CNN and FV-CNN (see previous Section for details). The experiment is repeated twenty five times in order to obtain stable results. We compute accuracy not only for the particular representations but also for their combinations, as according to Cimpoi et al. [1] they are complementary (e.g. concatenation of FC-M and FV-M is called FCFV-M) In the second experiment, we examine the scalability of the considered method. We analyze this property, because there exist much more than 33 bacterial species and in the future we plan to extend DIBaS dataset. To predict decrease in accuracy caused by increased number of species, we analyze the accuracy of the method applied to the subsets of our dataset. More precisely, we randomly select n out of 33 species, for n = {3, 6, 9, . . ., 31} and train linear SVM with C = 1 (the original approach) which classifies only those n species. We repeat this procedure 25 times for each n in order to obtain smaller variance in results. Table 2. The accuracy (%) of the presented approaches. Each row corresponds to different descriptor. The first part of the shortcut represents a type of pooling encoder, while the second part is a type of local descriptor. Each column corresponds to different classifier. Original classifier corresponds to linear SVM with C = 1 (the classifier originally used by Cimpoi et al. [1]). The other classifiers use Bayes optimization (see Section Method for details). The best results for each classifier are in bold.

Results and discussion
The results of the first experiment are presented in Table 2. It shows that either concatenating FC-M, FV-M, and FV-SIFT, or using FV-M itself give the highest accuracy (sometimes greater than 97%) for all of the classifiers. One can also observe that one-vs-one linear SVM with Bayes optimization works slightly better than one-vs-all linear SVM without optimization, however the difference is not significant. Moreover, polynomial and RBF kernels do not separate classes better than linear kernel, what is probably cause by the small number of training example (only 10 for each class). Such a small amount of training data are also the reason of underfitting in case of Random Forest. Based on the results of the first experiment, we select the following three methods of feature extraction for the further analysis: Per class accuracy for Linear SVM and Original classifier is presented in Fig 3. It can be noticed that the lowest accuracy is obtained for genera and species labeled with 7.1, 18.2, and 19.1 in Fig 1. The examples of incorrect recognitions are presented in Fig 4. It can be observed that they are usually caused by the similarity in bacteria morphology. In particular, genera and species labeled with 7.1, 18.2 and 19.1 may look very similar in the spatial layout in the microscopic image, since all of them represent spherical bacteria that can appear individually or in Deep learning approach to bacterial colony classification characteristic post-division arrangements whose shape is related to the plane and number of cell divisions during the propagation. This is probably why, they can be sometimes misinterpreted by the program. Fig 5 presents the confusion matrix for the best methods FCFV-M & FV-SIFT (with Original classifier), which confirms that the classification error occurs mostly within a genera, while the error between different genera is very rare.
The results of the second experiment are presented in Fig 6. We can observe that the accuracy of the considered method decreases almost linearly with the number of classes. We performed the classical linear regression and we estimated that the approximation of accuracy for database with 100 classes is: This confirms that, the method can be used with decent accuracy, even if the number of classes triples.

Conclusion
In this paper we apply the state of the art texture recognition method into the problem of classifying bacteria genera and species. To evaluate this approach and to make it comparable with other approaches, we provide a new dataset of images called DIBaS, which is available to other researchers.
According to performed experiments the best approach can be successfully used by the microbiologist in their daily practice, as the accuracy of the recognition is 97.24 ±1.07%. Furthermore, this method can be also used, when database size triples, as the accuracy of the method decreases linearly together with database size.
The future work will concentrate on extending the DIBaS database and on extending the investigated method with the information about the color distribution. This should further improve the accuracy of the recognition.