EMDS-5: Environmental Microorganism image dataset Fifth Version for multiple image analysis tasks

Environmental Microorganism Data Set Fifth Version (EMDS-5) is a microscopic image dataset including original Environmental Microorganism (EM) images and two sets of Ground Truth (GT) images. The GT image sets include a single-object GT image set and a multi-object GT image set. EMDS-5 has 21 types of EMs, each of which contains 20 original EM images, 20 single-object GT images and 20 multi-object GT images. EMDS-5 can realize to evaluate image preprocessing, image segmentation, feature extraction, image classification and image retrieval functions. In order to prove the effectiveness of EMDS-5, for each function, we select the most representative algorithms and price indicators for testing and evaluation. The image preprocessing functions contain two parts: image denoising and image edge detection. Image denoising uses nine kinds of filters to denoise 13 kinds of noises, respectively. In the aspect of edge detection, six edge detection operators are used to detect the edges of the images, and two evaluation indicators, peak-signal to noise ratio and mean structural similarity, are used for evaluation. Image segmentation includes single-object image segmentation and multi-object image segmentation. Six methods are used for single-object image segmentation, while k-means and U-net are used for multi-object segmentation. We extract nine features from the images in EMDS-5 and use the Support Vector Machine (SVM) classifier for testing. In terms of image classification, we select the VGG16 feature to test SVM, k-Nearest Neighbors, Random Forests. We test two types of retrieval approaches: texture feature retrieval and deep learning feature retrieval. We select the last layer of features of VGG16 network and ResNet50 network as feature vectors. We use mean average precision as the evaluation index for retrieval. EMDS-5 is available at the URL:https://github.com/NEUZihan/EMDS-5.git.


Environmental Microorganisms
All the time, Environmental Microorganisms (EMs) [1] are part of our environment. Some EMs bring us benefits, while others affect our physical health. Many researchers devote themselves to study these microorganisms to improve our lives. Nowadays we usually use a microscope to observe EMs. However, scholars sometimes get it wrongly. Image analysis has a great significance for the analysis of EM images. It can help researchers to analyze the types and forms of EMs. For example, Rotifera is a common EM and it is widely distributed in lakes, ponds, rivers and other brackish water bodies, having great significance in the study of ecosystem structure function and biological productivity because of their extremely fast reproduction rate and high yield. In addition, Arcella is also a kind of common EMs. Arcella mainly feeds on plant giardia and single-celled algae. An oligoplastic water body is the most suitable living environment. Two EM image examples are shown in Fig 1.

Application scenarios of Environmental Microorganisms
Noise can be generated during the acquisition or transmission of digital images [2]. Image denoising can reduce the noise of EM images while preserving the details. In addition, image segmentation is the technique and process of dividing an image into a number of specific regions with unique properties and extracting specific targets [3]. So image segmentation technology can be used to segment images of EMs to separate microorganisms from the complex background of the images. After that, the feature extraction part is performed. When the input data is too complex or the amount of data to be processed is large and redundant, the input data will be converted into streamlined features (such as the commonly used 'feature vectors'). Feature extraction is the process of transforming redundant input data into the desired streamlined features [4]. For the segmented EMs, we usually extract their shape features, color features or deep learning features. We need to use these features for image classification and image retrieval. Image classification is determined by the trained classifier, which is trained by the training data with category labels. We put the extracted feature vectors into a classifier and match them with the known data and put them into the same group of EMs. Image retrieval is given a query image and searches for similar images. We extract the feature vector and calculate its similarity to the feature vector of the known data.

Contribution
Environmental surveys are always carried out in an outdoor environment where conditions such as temperature and salinity are constantly changing. As EMs are very sensitive to these environmental conditions the quality of the observed EMs can be easily affected. It is difficult to collect sufficient EM images [5]. As a result, when researchers want to create EM datasets, they often run out of data. Currently, there are some existing EM datasets, but many of them are not open source. To the best of our knowledge, we know seven special EM datasets. This will make it difficult for EM researchers to obtain the existing EM data set and require much time to collect it. In two cases, we only know the types of microorganisms and the number of samples used in user experiments. The remaining five are our EMDS series. The seven databases are NMCR [6], CECC [7], EMDS-1 [8][9][10], EMDS-2 [8][9][10][11][12], EMDS-3 [1,13,14], EMDS-4 [15][16][17][18] and EMDS-5 [19,20]. Environmental Microorganism Data Set Fifth Version (EMDS-5) has been made available to other researchers as an open source dataset. In addition, EMDS-5 has many advantages over other datasets. EMDS-5 provides the corresponding Ground Truth (GT) images. Since it takes a lot of time and human resources to make GT images, many datasets do not make GT images corresponding to their own data sets. GT images play an important role in image analysis. GT images can be a significant evaluation index for image segmentation. The result of image segmentation can be judged by comparing the segmented image with the GT images. EDMS-5 has a variety of EM images to provide sufficient data support for image classification and image retrieval. The experiment of multi-classification can be carried out for image classification of multi-species EMs to obtain ideal results. At the same time, many kinds of EM images and sufficient data provide strong data support for the results of image retrieval.

Dataset information of EMDS-5
EMDS-5 is made up of 1260 images of 21 EM classes. The original 420 EM images are partly collected under artificial light sources and partly under natural light sources with a 400× optical microscope. In addition, 840 GT images are manually prepared, including 420 single-object GT images and 420 multi-object GT images. Basic information of the 21 EM classes in EMDS-5 is given in Table 1, and an example of 21 EM classes in EMDS-5 is shown in Fig 2. Three researches from University of Science and Technology Beijing (China) and University of Heidelberg (Germany) provide the original image data of EMDS-5. Furthermore, the preparation of EMDS-5 GT images is jointly completed by three researchers from Northeastern University (China), Johns Hopkins University (US) and Huazhong University of Science and Technology (China). All of them have research backgrounds in Environmental Engineering or Biological Information Engineering. Especially, EMDS-5 GT images are manually labelled based on pixel-level with two rules: • Rule A: The area where an EM is located is labelled as foreground (1, white). In contrast, other areas are labelled as background (0, black).
• Rule B: Because the microscopic images in EMDS-5 are collected under optical microscopes, this process produces interference fringes and results in unwanted edges in the EM images. Hence, when making GT images, the most complicated thing is to determine the edges of an EM. First, each researcher selects the edges that she or he thinks are the clearest to label.
Then, if their labelling results are conflict, they have a collective discussion to judge and decide a final solution.

Evaluation of image denoising methods
We add a total of 13 kinds of noise to the original images and then denoise the noisy images with different methods. The noise we have chosen is grouped into Poisson noise, multiplicative noise, Gaussian noise, and pretzel noise in total. Gaussian noise is a noise whose probability density function follows a normal distribution. Poisson noise is a noise model that conforms to the Poisson distribution. Multiplicative noise is a type of noise caused by random variations in channel characteristics. Multiplicative noise is related to the signal by multiplication. Pepper noise, also known as impulse noise, which randomly changes some pixel values, is a black and white bright and dark dot noise generated by the image sensor, transmission channel, decoding process, etc. For multiplicative noise, we divide it into multiplicative noise with a variance of 0.2 and multiplicative noise with a variance of 0.04 according to the variance of multiplicative noise. We classify Gaussian noise according to the mean and variance: Gaussian noise with mean 0 variance 0.01, mean 0.5 variance 0.01, mean 0 variance 0.03 and mean 0.5 variance 0.03, and also Position Gaussian noise and Brightness Gaussian noise. Similarly, we divide the pepper noise into pepper noise, salt noise, pepper noise with a density of 0.01 and pepper noise with a density of 0.03. We summarize the above noises and count them into 13 kinds of noise to add noise to the original image respectively, and then use different filters for denoising. An example of the noisy EM images is shown in Fig 3. We use nine different methods to denoise and choose to use the similarity between the denoised image and the original image and the variance of the two as the evaluation index. The evaluation index is expressed by Eq (1) [2].
where A is the similarity, i 1 is the denoised image, i is the original image, and N is the number of pixels. The closer the value of A is to 1, the better the denoising effect. We use the above original image as an example, and use the table to list the similarity between the image after removing various noises and the original image using different filters. Here we have simplified the names of the noise and filters involved in the table as follows.Types of noise (ToN),  Table 2.
From the comparision in Table 2, we find that EMDS-5 can support distinguishable evaluation for different denoising methods. For example, the maximum filtering effect is not very good, so it is not ideal for the denoising results of Gaussian noise and multiplicative noise, but it is still very good for the denoising results of salt and pepper noise and Poisson noise. In addition, the mean variance of the denoised image and the original image is an indicator of stability of denoising mehods. The mean variance is expressed by Eq (2) [2].
where l (i, j) and B (i, j) represent the pixels corresponding to the original image after denoising, and S represents the mean variance. The comparison of variances between denoised images and original image using EMDS-5 are shown in Table 3. From the comparison in Table 3, we find that our EMDS-5 is useful to test and evaluate image decisioning methods effectively. For example, increasing the mean value of Gaussian noise will result in greater variance between the denoised images and the original images, indicating that the results after denoising are not very stable.
In addition, we chose to use the most well-known Image Quality Assessment (IQA) to evaluate the image quality of the denoised images. Here we use two indicators, Peak-Signal to Noise Ratio (PSNR) and Mean Structural Similarity (SSIM), to evaluate the image quality of the denoised images. PSNR calculates the difference between the grey value of the pixel to be measured and the corresponding pixel of the reference image. PSNR is a method of assessing image quality using a statistical approach. We hypothesis that the image to be evaluated is F, the reference image is R, and their sizes are MN. The calculation method for characterizing image quality using PSNR, which is expressed by Eq (3) [21].
PSNR measures the image quality by calculating the global size of the pixel error between the image to be evaluated and the reference image. The larger the PSNR value, the less distortion between the image to be evaluated and the reference image, and the image quality is better. SSIM is a commonly used image quality evaluation method originally proposed in [22]. SSIM is composed of three contrast functions. The brightness contrast function is expressed by Eq (4).
Contrast contrast function is expressed by Eq (5).
Structural contrast function is expressed by Eq (6).
We combine the three functions and finally get the SSIM index function expressed by Eq (8).
Where u x , u y are all pixels of the image block; σ x , σ y are the standard deviation of the image pixel values; σ x σ y is the covariance of x and y; C 1 , C 2 , C 3 are constants, in order to avoid the system error caused when the denominator is 0. SSIM is a number between 0 and 1. The larger the SSIM, the smaller the difference between the two images. The comparison of PSNR between denoised images and original image using EMDS-5 are shown in Table 4.
The comparison of SSIM between denoised images and original image using EMDS-5 are shown in Table 5.

Evaluation of edge detection methods
Edge detection is an important component of image preprocessing. In order to prove the effectiveness of our EMDS-5 in edge detecition evaluation, seven operators are used to detect edges from images in EMDS-5 dataset. The seven operators are Canny, Laplace of Gaussian (LoG), Prewitt, Roberts, Sobel, Zero cross and CNN, and an example of the edge detection results is shown in Fig 4. For the edge detection of images, we choose two evaluation metrics, PSNR and SSIM, to evaluate the results of edge detection. We choose the edge detection result obtained by Sobel operator as the standard, and compare the results obtained by other edge detection methods with it and evaluate the results.
A comparison of edge detection methods using EMDS-5 is shown in Table 6. From Table 6, we find that the PSNR evaluation index that the edge detection results obtained by the Prewitt operator are the most similar to the Sobel results. The SSIM evaluation index shows that the difference between the results of other operators and the results of Sobel operator is also very small. By comparison, we can see that EMDS-5 images can be used to detect and evaluate various edge detection methods.  4 Image segmentation evaluation using EMDS-5

Single-object image segmentation
In order to prove the effectiveness of EMDS-5 for image segmentation evaluation, six typical image segmentation methods are compared to segment the EMDS-5 original images, including GrubCut, Markov Random Field (MRF), Canny edge detection based, Watershed, Otsu thresholding and Region growing approaches. GrubCut is a common and classic method of semi-automatic segmentation. MRF is a classical graph based segmentation method. Otsu thresholding is an image segmentation method based on threshold. Region growing approaches and Watershed algorithm are classical region based segmentation methods. An example of different single-object segmentation results is shown in Fig 5. We compare the images obtained after image segmentation with the corresponding GT images, where five evaluation indexes in Table 7 are used to evaluate the segmentation results [23,24].
In Table 7, V pred represents the foreground that is predicted by the model; V gt represents the foreground in a ground truth image. We show the evaluation results of the sample images in Table 8.
From Table 8, it is observed that because the GrubCut method segments the original images, when compared with the GT image it leads to a low evaluation result. Among several other classic single-object image segmentation methods, the results of Otsu Thresholding and MRF segmentation closest to the GT images and the best effect. Other segmentation methods have a certain gap compared with these two segmentation methods. Through the comparison of these image segmentation parties, we can conclude that EMDS-5 is effective in testing and evaluating image segmentation methods.

Multi-object image segmentation
For multi-object image segmentation, we use two methods, k-means and U-net, to test our EMDS-5. k-means is an unsupervised learning approach (clustering) and U-net is a supervised learning method (deep convolutional neural network, DCNN). These two methods are representative of the classic methods in their respective fields. The structure of U-net is shown in Fig 6 [19].
U-Net is initially a DCNN for performing microscopic image segmentation tasks. A strong use of data enhancement is at the core of U-net. This allows for more efficient use of existing annotation samples. In addition, U-Net's end-to-end architecture allows retrieval of shallow information about the network. The structure of U-Net is symmetrical. The U-Net uses a network structure that contains both downsampling and upsampling [24]. The left half is the contracting path, a downsampling operation in which two 3 × 3 convolutions (unfilled convolutions) are repeated, followed by a ReLU activation function with a 2 × 2 maximum pooling layer for downsampling, doubling the number of feature channels at each downsampling, to achieve a minimum resolution of 32 × 32. The right half of the region is the expansive path. There are still a large number of feature channels in the upsampling part, which allow the network to propagate contextual information to the high resolution layers, so that the expansive path is more or less symmetrical with respect to the systolic path, resulting in an U-shaped structure. Each layer in this region contains a 2 × 2 inverse convolution operation for upsampling, which halves the feature channels. A fusion operation with the clipped feature map of the same dimensional layer is also included, followed by the addition of two 3 × 3 convolutions with ReLU activation functions [24]. Since unfilled convolution is used, boundary pixels are  For these two multi-object image segmentation methods, a comparison is shown in Table 9. It can be seen from Table 9 that the segmentation effect of U-Net in the multi-target image segmentation method is much higher than that of k-means, showing the effectiveness of EMDS-5 for evaluaiton of multi-object image segmentation methods.

Feature extraction evaluation using EMDS-5
We use GT images to localize the target EMs in the original images to test feature extraction methods. Since GT images have single-object GT images and multi-object GT images, feature extraction methods are grouped into two types. An example of original images and target EM images extracted from GT images are shown in Fig 8. First, we randomly select ten images from each EM class as the training set and the other ten as the test set. Then, we extract and compare 12 features, including two color feaures (HSV (Hue, Saturation and Value) and RGB (Red, Green and Blue) features), three texture features (GLCM (Grey-level Co-

Single-object feature extraction
In Table 10, the accuracies of EM image classification using single-object features are compared.

Multi-object feature extraction
In Table 11, the accuracies of EM image classification using multi-object features are compared. From Tables 10 and 11, we can find that when using the same RBFSVM classifiers to classify EM images with different features, we obtain different classification results, showing the effectiveness of EMDS-5 for the feature extraction evaluation. Especially, because VGG16 feature achieves the best effect, we chose it in the following section about 'classification evaluation'.

Image classification evaluation using EMDS-5
We use the features extracted from the EMDS-5 data to test classification performance of different classifiers. As mentioned in Sec. 5, we use the extracted VGG16 features for testing in this section. The VGG16 feature vector selects the 16th layer feature vector. The dimension is 1 × 1000. First, we randomly select ten images from each EM class as the training set and use another ten as test set. Then, we select 14 normally used classifiers for EM image classification, including four SVMs, three k-Nearest Neighbors (KNNs), three Random Forests (RFs), two VGG16 and two Inception-V3 classifiers. We combine and compare the extracted VGG16 features with four classic classifiers. In addition, four deep learning classifiers are directly compared. In VGG16 and Inception-V3, we divide the data into test, validation and test sets. Then we test the accuracy of any two types of EM image classification. We change the ratio of the images owned by these three datasets and test the accuracy, separately. Especially, the parameters of four SVM classifiers are shown in Table 12.
Furthermore, a comparison of different classifiers for EM image classification using EMDS-5 is shown in Table 13.
It can be seen from Table 13 that when using the same feature to test different classifiers. The classification results of the two deep learning networks are the best. From the comparison of the results of different classifiers, we can see that EMDS-5 images can be effectively applied to the testing and evaluation of various classification algorithms.

Image retrieval evaluation using EMDS-5
We use EMDS-5 for image retrieval. Because we use different features, we group the image retrieval methods into two categories: texture feature and deep learning feature based image retrieval approaches. We use Average Precision (AP) [18] to evaluate the retrieval results. AP is derived from the field of information retrieval and is primarily used to evaluate ranked lists of retrieved samples. The definition of AP in our article is shown in Eq (9).
M is the number of related EM images, P(k) is by considering the cut-off position divided by the kth position in the list, and rel(k) is an index. The EM image rank in the kth position is the target type image, then take 1; otherwise, take 0. AP represents the average value of the accuracy of the current position target type EM image. Our experiment is conducted on 21 types of EM images, so we apply the mean AP (mAP) to summarize the APs of each class. It is calculated by obtaining the average value of APs. During the retrieval process, we match the feature vector of the image to be tested with the feature vectors of all the images in the EMDS-5 dataset and calculate the Euclidean distance between the two. Then calculate the mAP value of the search result of the type of image to be tested as the search result. We display the first 20 images in the search results, in which the frame of the correct image is marked with a color.

Texture feature based image retrieval using EMDS-5
We extract a total of four texture features, GLCM, GGCM, HOG and LBP to test the EMDS-5 image retrieval evaluation function. An example of the image retrieval results based on texture features is shown in Fig 9. Furthermore, the retrieval results of four texture features are demonstrated in Fig 10.

Deep learning feature based image retrieval using EMDS-5
We first extract VGG16 features and Resnet50 features. Then, the selected feature vectors are the feature vectors of the last layer of the respective network. The dimension is 1 × 1000. The following figure is an example of retrieval results based on deep learning features. An example of retrieval results based on deep learning features is shown in Fig 11. Furthermore, the image retrieval results with two deep learning features are shown in Fig 12. We calculate the variance of the mAP based on texture feature image retrieval and the variance of the mAP based on deep learning feature image retrieval. The result we get is that the variance of the image retrieval results based on deep learning features is smaller, which shows that the results of deep learning feature image retrieval are more stable. By comparing the results of different retrieval methods, we can know that EMDS-5 images can be effectively applied to various image retrieval tests and evaluations.

Conclusion and future work
EMDS-5 is a microscopic image dataset containing 21 classes of EMs. EMDS-5 contains original image and GT images of each EM. GT images include single-object GT images and multiobject GT images. Each original image has two corresponding GT images. Each microorganism class has 20 original images, 20 single-object GT images and 20 multi-object GT images. EMDS-5 has the function of testing the denoising effect. When testing the denoising effect of EMDS-5, we add 13 kinds of noise, such as Possion noise and Gaussian noise, and use nine kinds of filters to test the denoising effect of various noises. EMDS-5 can also evaluate the results of edge detection methods. We adopt six edge detection methods and use two evaluation indexes to evaluate the detection results and get good results. In terms of image segmentation, EMDS-5 can detect the results of image segmentation due to its single-object GT image and multi-object GT image. So we do the testing with two parts: single-object image segmentation and multi-object image segmentation. In the single-object image segmentation part, we use six methods such as GrubCut and MRF to segment the original images. In terms of multi-object image segmentation, we use k-means and U-Net methods for segmentation. We extract nine features from the images in EMDS-5, such as RGB, HSV, GLCM, HOG. We use the LIBSVM classifier to evaluate the results of the extracted features. In the test, we randomly select ten images of each type of EMs as the training set and ten images as the test set. In terms of classification, we use VGG16 features to test different classifiers such as LIBSVM, KNN, RF. In terms of image retrieval, we divide image retrieval based on texture features and image retrieval based on deep learning features. In terms of texture features, we select four features, GLCM, GGCM, HOG and LBP, to test separately. In the deep learning feature, we select two deep learning features, VGG16 feature and Resnet50, for retrieval. We select the last layer of features of these two deep learning networks as feature vectors. We use mAP as an evaluation index to detect the quality of retrieval.
In the future, we will expand the types of microorganisms and increase the number of images of each microorganism class. We hope that we can use the EMDS database to achieve more functions in the future.