Complementary performances of convolutional and capsule neural networks on classifying microfluidic images of dividing yeast cells

Microfluidic-based assays have become effective high-throughput approaches to examining replicative aging of budding yeast cells. Deep learning may offer an efficient way to analyze a large number of images collected from microfluidic experiments. Here, we compare three deep learning architectures to classify microfluidic time-lapse images of dividing yeast cells into categories that represent different stages in the yeast replicative aging process. We found that convolutional neural networks outperformed capsule networks in terms of accuracy, precision, and recall. The capsule networks had the most robust performance in detecting one specific category of cell images. An ensemble of three best-fitted single-architecture models achieves the highest overall accuracy, precision, and recall due to complementary performances. In addition, extending classification classes and data augmentation of the training dataset can improve the predictions of the biological categories in our study. This work lays a useful framework for sophisticated deep-learning processing of microfluidic-based assays of yeast replicative aging.


It is better that the authors can excavate certain medical meaning of this study.
Response: beginning in Line 1, we add the following -The budding yeast Saccharomyces cerevisiae is an effective model for studying cellular aging [1,2]. The replicative lifespan of a yeast mother cell is defined as the total number of cell divisions accomplished or the number of daughter cells produced throughout its lifetime. Microfluidics is a fast-developing technology for the single-cell monitoring and imaging required in this context. In particular, microfluidic devices are partially automatic method to monitor cells development and classify cells which can speed up the manual process of cells lifespan estimation [3]. Typically, microfluidic images have relatively low resolution compared to confocal microscopic images that are often of high resolution [5], rendering unique challenges for microfluidics image processing [4]. For instance, microfluidic device materials, device coating, device volume, and area limitations increase capturing errors such as blurring, shifting focus, and trap deformation. Capturing the full progression of cellular replicative lifespans requires identifying both mother cells and daughter cells in full cell cycles [6]. Low image resolution hinders the automation of this process, demanding time-consuming, manual classifications of yeast replicative lifespans. Machine learning-specifically deep learning-could simplify this process.
4. The ensemble method proposed by the authors, is actually a combination of models, rather than ensemble learning.
Response: beginning in Line 313, we add the following -In machine learning, minimizing bias and variance errors is a challenging task. The weighted average ensemble model is one of the methods to overcome this issue that relies on two properties in machine learning [39]: creating an ensemble model such that the bias can be decreased at expense of increased variance, and creating an ensemble model such that the variance can be decreased at no expense to bias [40]. In general, there are two simple methods to combine several machine learning models and create an ensemble model with better performance. First, train a model (e.g., classifier) over multiple subsets of the training dataset, which leads to different models. Then, the individual model can have a prediction on the test dataset and the results can be averaged as an ensemble model. This method is useful when there is no other model available. The other method is to train various models on the same dataset and average the results on the test dataset. An ensemble model attains a synergistic betterment in overall performance including reproducibility and stability.
-Addition reference added ([40,41]) Reviewer #2: 1. The author compared three deep learning neural network methods. Due to the complementary performance, the whole composed of three most suitable single architecture models can achieve the highest overall accuracy, precision and recall rate. It is only a combination, so the technical novelty is low.
Response: Start from Line 62 we add the following: -The purpose of the current work is to compare deep-learning classification models of microfluidic images of dividing yeast cells. We compare three deep-learning neural network approaches to classify microfluidic trap images into 4 biological categories. This comparative study focuses on the performance of three models: two convolutional neural networks and a capsule neural network. The two convolutional neural networks contains 2 and 13 convolutional layers respectively. We also investigated ensemble models built from these three models. Due to dataset limitations, we investigated the effect of data augmentation on all three models.
2. The motivation for combining the three models should be better explained.
Response: Start in Line 313, we add the following: -In machine learning, minimizing bias and variance errors is a challenging task. The weighted average ensemble model is one of the methods to overcome this issue that relies on two properties in machine learning [39]: creating an ensemble model such that the bias can be decreased at expense of increased variance, and creating an ensemble model such that the variance can be decreased at no expense to bias [40]. In general, there are two simple methods to combine several machine learning models and create an ensemble model with better performance. First, train a model (e.g., classifier) over multiple subsets of the training dataset, which leads to different models. Then, the individual model can have a prediction on the test dataset and the results can be averaged as an ensemble model. This method is useful when there is no other model available. The other method is to train various models on the same dataset and average the results on the test dataset. An ensemble model attains a synergistic betterment in overall performance including reproducibility and stability.
-Addition reference added ([40,41]) 3. A large number of experiments discussing the comparison results before and after data expansion, which is worthy of praise. But it is suggested that the advantages and disadvantages of different models can be discussed from other angles, such as loss and time.
Response: Start in Line 182, we added: -The advantages and disadvantages of individual model are mainly covered and explained in results and discussion part. Addition information can also be found from S2 Fig.   4. We suggest you thoroughly copyedit your manuscript for language usage, spelling, and grammar. If you do not know anyone who can help you do this, you may wish to consider employing a professional scientific editing service.

Response:
-We have revised the manuscript accordingly with a professional editor.

Response to first submission (reviewers' comments)
This part contains the responses to the reviewer's comments (see PONE-D-19-32184_reviewers_comments..pdf file in attachments). Each section has a line number corresponding to the reviewer's concerns highlighted in yellow, the reviewer's comments are in light-blue, and the author's responses are in black.

Reviewer #1:
Abstract: Line 171: cell. Based on this observation, we split all images with two cells into two separate classes; in the first class, the daughter cells are on top of mother cells (upwardoriented, mduC class), and in the second class the daughter cells are below the mother cells (downward-oriented, mddC class), as illustrated in Fig 2 (c). Comment: Between line 77 and 91 the reader is told that first the 5 biological classes were introduced and then these were combined to 4 classes. Here again it is argued that similarities between exC and mdC were considered which is why mdC was divided into mddC and mduC. Comment: What was there first? Labels with the 5 classes or labels with the 4 classes? Comment: Please structure the paper in a chronologically correct order of class construction. Response: Modified accordingly: In many cases, a single mother cell appears as two cells (due to dynamic shape and low image resolution) and the daughter cell is above the mother cell.
Line 179: dataset. However, the results for mddC and mduC classes are averaged Comment: What do you mean with averaged? Comment: Do you talk about renaming? Comment: if (result_label in c("mddC","mduC")){ New_label="mdC") Comment: if (result_label not in c("mddC","mduC")){ New_label=result_label) Response: Modified accordingly: Merged (not averaged) Line 188: was improved to 92%. Moreover, Comment: Please mention how much it was before the augmentation. Response: Modified accordingly (from 87% to 92%)  Line 33: The output is a vector that the size of the output vector depends on the number of classes Comment: Please reformulate this specific sentence accordingly! Response: which the output vector depends on the number of classes.
Line 35: because they are mainly designed for 2-dimensional (or higher) input tensors Comment: Is this really a justification for the successful use of CNNs in the domain of image classification? What about the hierarchical construct characterizing CNNs that enables such a model to slowly but successfully learn relevant representations adapted to the specific task at hand? Please reformulate this specific sentence by adding a better and pertinent justification. Response: The CNN-2 and CNN-13 are used for comparison purposes considering the effect of number of layers in the model. Line 43: in datasets Comment: involving small sized datasets." Please correct accordingly Response: involving small sized datasets Line 23: A recent study showed that CapsNet could classify fluorescent microscopic images Comment: At which extent? Please be more specific. Response: For example, max pooling layers take the most prominent values (e.g. pixels) from a previous convolutional kernel as input to the next layer. A recent study showed that CapsNet could classify fluorescent microscopic images [38]. The model illustrated improvement in accuracy on datasets such as MNIST, yet it is computationally expensive as training time increases substantially. In [19], authors claimed that the CapsNet can achieve near state-of-art performance on the MNIST dataset using 10 % of whole dataset.
Line 54: of the top three models Comment: How many models have been assessed? If there are just three models, please correct the phrase accordingly. Response: We showed that an ensemble of the top three models performs better Line 56: could be an effective approach for some models based on the type of dataset and model architecture Comment: An effective approach to achieve what exactly? Please be specific. Response: In addition, dataset augmentation and splitting a class into two classes could be an effective approach for some models based on the type of dataset and model architecture.
Line 64: S1 Table  Comment: Missing table  Response: added accordingly Line 79: the 5 categories Comment: The authors mean "... the following 5 categories ...". Please correct accordingly. Response: We trained the deep learning methods using the 5 categories based on cell numbers and their relative positions: a trap with no cell (nC), a trap with a single mother cell (mC), a trap with mother and one upward-oriented daughter cells (mduC), a trap with mother and one downward-oriented daughter cells (mddC), and a trap with more than two cells (exC). Line 206: Augmentation of training data also led to more stable CNN-13 models as seen when changes of the cost functions during training became more smooth with augmented datasets Comment: Is there a specific plot that shows this specific aspect of data augmentation for the model CNN-13? Please provide such a comparison plot (with and without data augmentation). Response: refer to S2 Fig. Line 233: S1 Table  Comment: Missing table  Response: added accordingly Line 233: We picked the best-performing CapsNet model for this study Comment: Steel how was the grid search performed? Was the grid search performed using the test set as validation or a specific validation set? The grid search should be performed on a validation set since the test set should not be seen during the optimization of the model. If the parameter optimization step was done using the test set, all the depicted experiments should be performed at new using a validation set which do not include any of the samples belonging to the test set. And the proportion of data used as validation set as well as the selection process of the samples should be described thoroughly. Response: refer to S1 Table   Line Fig. 4.1, Fig. 4.2, Fig 4. The table is presenting the correct and misprediction results of mddC and mduC classes without augmentation for all three models. Comment: The current depiction of the results is confusing: how many samples belong to the class mdc? normally we should have total number of mdc samples = correct prediction + miss prediction. And this specific number should not vary from one model to another since it is the same classification task. But when we look at the number (correct prediction + miss prediction) for each model: cnn-2: 380 != cnn-13: 371 != 495 ? Response: corrected accordingly ( see fig 5) Fig 4 (legend): In the bar graphs, the precision an recall are shown for individual class based on biological interpretation classes (four). The mean and total tested images results are presented for all models Comment: These results seem to be inconsistent: e.g. CNN-2: accuracy = 780/896 = 87.05% but looking at the Average results (all classes) bar plot, CNN-2: accuracy > 90%? Response: corrected accordingly ( see fig 5) Line 250: S4