Classification of red blood cell shapes in flow using outlier tolerant machine learning

The manual evaluation, classification and counting of biological objects demands for an enormous expenditure of time and subjective human input may be a source of error. Investigating the shape of red blood cells (RBCs) in microcapillary Poiseuille flow, we overcome this drawback by introducing a convolutional neural regression network for an automatic, outlier tolerant shape classification. From our experiments we expect two stable geometries: the so-called ‘slipper’ and ‘croissant’ shapes depending on the prevailing flow conditions and the cell-intrinsic parameters. Whereas croissants mostly occur at low shear rates, slippers evolve at higher flow velocities. With our method, we are able to find the transition point between both ‘phases’ of stable shapes which is of high interest to ensuing theoretical studies and numerical simulations. Using statistically based thresholds, from our data, we obtain so-called phase diagrams which are compared to manual evaluations. Prospectively, our concept allows us to perform objective analyses of measurements for a variety of flow conditions and to receive comparable results. Moreover, the proposed procedure enables unbiased studies on the influence of drugs on flow properties of single RBCs and the resulting macroscopic change of the flow behavior of whole blood.


Introduction
Amongst all human organs, blood is the most delocalized one, delivering oxygen from the respiratory system to the tissues in the body and transporting carbon dioxide back. On a microscopic scale, this is performed by red blood cells (RBCs) which form the largest fraction of cells in whole blood (% 99%). At rest, RBCs are biconcave discocytes with an average diameter of 8 μm and a height of 2 μm. Due to their flexible membrane, RBCs alter their shape under external stress prevalent in the microvascular network [1,2]. This feature is one of the key properties of RBCs, which allows them to squeeze through geometrical constrictions much smaller than their stress-free shape [3], which is partly an intrinsic property of RBC morphology [4,5] and partly an active adaptation process [6,7]. Although data on the mechanical properties of RBC suspensions is widely known from rheological measurements [8], the linkage to individual cell behavior is limited. Further, the comparison between capillary Poiseuille flow and pure shear flow prevalent in rheometers is difficult. Consequently, mimicking flow under physiological conditions in vitro demands for experimental setups such as PDMS-based microchannels, ubiquitous in lab-on-a-chip devices [9]. In this work, we focus on experiments of individual flowing RBCs, providing a holistic insight to individual cell mechanics. Hereby, the experimental data originate from a previous study on RBC shape geometry [10] and the data is reused for the introduction of a fully automated data analysis approach based on a deep learning convolutional neural network (CNN) [11].
As described in the preceding work, two stable RBC shapes are expected from the measurements: The so-called 'croissant' and the 'slipper' shape [12][13][14]. Their frequency of occurrence highly depends on the imposed flow conditions. Whereas axisymmetric croissants mostly appear at lower flow velocities, non-axisymmetric slippers are observed at higher shear rates [15,16]. However, besides these stable geometries, also a large number of indefinite shapes occur, especially in the so called phase transition range. These outliers or 'other' shapes make a considerable amount of the whole statistics. Due to their large shape variance, these cells cannot be assigned to a mutual exclusive class and can therefore not easily be involved in the machine learning processes. Overcoming this drawback, we present a regression based CNN aiming to distinguish between croissants, slippers and others by applying statistically derived thresholds to the net response. As a result, we obtain an automatically generated RBC phase diagram which relinquishes any subjective user input. Prospectively, we will be able to run comparable studies on RBC shape geometries in microchannels of variable size and to generate unbiased phase diagrams. Above all, data evaluation can then be performed in a highly timeeffective manner.

Ethics statement
Human blood withdrawal from healthy donors as well as blood preparation and manipulation were performed according to regulations and protocols that were approved by the ethic commission of the "Ä rztekammer des Saarlandes" (reference No 24/12). We obtained informed consent from the donors after the nature and possible consequences of the studies were explained.
shape characteristics of RBCs in flow and to detect undefined outliers which occur due to channel imperfections, membrane damages, cell-cell-interactions, shape transitions or transients, and optical ambiguities.
Independent of the particular use case, the architectures of CNNs usually follow the same design rules. They consist of an image input layer followed by a certain number of subsequent convolution stages (cp. Fig 1 and Table 1), and provide so-called interconnected layers forming an artificial neural network (ANN) to combine the convolution data in a final stage before the information is fed to the output layer nodes.
The main stages of a CNN usually consist of several sublayers [19] including the actual convolutional layer, a non-linear rectification 'reLU' layer [20], and a pooling layer [21]. Even though more sophisticated designs [22,23] exist, for RBC shape recognition we restrict ourselves to these layer types keeping the system as simple as possible.
Convolutional layers make use of a number of convolutional kernels (feature maps) of particular size and are optimized to find the major characteristics of a set of various input images being subject to a sophisticated training process (cf. section Training). For instance, this can include horizontal or vertical edges but also more detailed characteristics, such as the 'tail' of a cell, see Fig 2. Convolving an image with a set of specially optimized convolution kernels then results in a number of output images showing clear differences in between the RBC shape classes.
In a mathematical sense, convolutions are linear operations. Thus, CNNs could easily be contracted to a simple linear signal processing system if there were no non-linearities involved. , which undergo a non-linear rectification (reLU layer) before getting down-sampled by a max-pooling procedure (2 × 2, stride 2) to a size of 35 × 35px 2 . The combination of convolution, rectification, and max-pooling are repeated twice using different sets of convolution kernels (see Table 1). The output node then intertwines all resulting subimages by a full interconnection of all available pixel values and maps them to a linear output range. Sizes of subimages (blue/grey) as well as the indications of convolution kernels (black) are chosen to scale, illustrating that kernels obey the characteristic features of input and subimages.
Indeed, the usage of non-linearities resolving higher order dependencies in between the characteristics of a set of distinct input images is one of the key ideas behind neural networks. For CNNs, it is common practice to use a reLU layer to set the negative values of the subimages to zero. This renders image transformations non-unitary and allows to introduce further non-linear evalution branches within the tree-like structure of the system.
After the convolution and rectification stage, the amount of data is fairly increased and usually corresponds to a multiple of the input data. Thus, it is reasonable to reduce data with the aid of pooling layers [21,24]. Within such layer, subsets of each intermediate image are pooled according to their mean or maximum values. Results are then assigned to a target image of smaller size. For our CNN, subsets of size 2 × 2px 2 with a stride of 2px are evaluated according to the maximum value aiming to halve the sizes of intermediate images at the end of each convolution stage. A further advantage of such pooling strategy is to obtain certain tolerance with respect to spatial translations of objects.
As special feature, the here employed CNN makes use of a regression output layer providing a linear transfer function. In contrast to standard classification networks offering a purely binary output, realized e.g. by a logistic transfer function or softmax approach [25], the output node of our CNN is able to take on a range of floating point values. Taking the number space of input images into account (8-bit space), we define our output to move in the same range thus the same order of magnitude. Taking values at this scale, we observe a fast convergence of the training process. Consequently, perfect slippers are defined at value −127 whereas croissants are located at 127. Input images leading to an output value which significantly differs from these targets are assumed to be indefinite and therefore discriminated to be an outlier of type other (cf. section Training).

Training of convolutional neural networks
CNNs are not ready to use if neurons are not 'trained' to solve certain regression or classification problems. This means that initially randomly chosen convolution kernels, weights and bias terms (concluded in the vector σ of free parameters) first need to be reasonably updated. For this purpose, we employ a supervised learning approach which requires three major ingredients: A labeled set of training data, a suitable loss function that has to be minimized, as well as an appropriate optimization strategy. As loss function, the root mean square error (RMSE) is chosen (Eq 1), expressing the cumulated differences between all target and actual output values of the training data set.
Here, a i expresses the actual output of a given input image i and t i the respective, predefined target value.
Finding the global minimum of the RMSE, σ is optimized using a stochastic gradient descent solver with momentum (SGDM), see [26] and [27]. The SGDM algorithm updates the vector σ l in discrete steps, where l denotes the actual iteration: E(σ l ) corresponds to the loss function, respectively the RMSE, α = 0.001 to the constant learning rate, and γ = 0.9 to the momentum term. Achieving a faster convergence, a single optimization iteration takes into account a socalled mini-batch [25,28,29] consisting of a subset (128 images) of the randomly shuffled training data. In this context, a full pass of the training data is called an 'epoch'. A typical temporal evolution of training states is shown in Fig 3 where the loss function is plotted with respect to the number of training epochs. We set a maximum of ten training epochs, since a prolongation to more training epochs rather causes overtraining instead of a gain in We set a maximum of ten epochs since a prolongation to more training epochs yields no gain in performance but rather causes overtraining. As training method, a gradient descent solver with momentum (SGDM) is used. The red line indicates the training loss, whereas the black dots represent the loss of the validation data set (validation loss). The progression of the validation loss serves as an indicator whether the CNN is overtrained, since the training and validation losses would diverge. As loss function, a root-mean-square error is chosen, being a standard approach for regression problems. performance. This overtraining of the neural network can be monitored in the validation loss: A termination criterion of the training status is implemented by a consecutive increase in validation loss five times, since a divergence between training and validation loss is an indicator for not being in the global optimum. Further, we choose a validation frequency (update frequency of validation loss) of 50 mini-batches, to not meet the termination criterion by random fluctuations of the training process. In general, the validation data set consists of data disjoint from the ones for training to guarantee an independent validation of the networks' performance. For our purposes, 5% of the training data set was chosen randomly for validation, whereas the rest was the true training data.
For training, besides slippers and croissants, we additionally define an auxiliary class called 'sheared croissant' (Fig 4). This is due to the fact that the characteristic features of pure croissants and sheared croissants show a larger mutual similarity than pure croissants and slippers. Assuming a linear scale of output values, this subtype is biased towards a value of 64. Contrasting the nomenclature, they are not any type of croissants, but most probably resemble a slipper flipped by 90˚perpendicular to the optical axis. However, due to the lack of information of our 2D images, we cannot make a clear statement on the actual shape.
Our training data set consists of 4,000 manually classified cells (1,500 each for slippers, and croissants, resp. and 1,000 sheared croissants), which is augmented to the doubled number (8,000 in total) by mirroring each image along the centerline in flow direction. Each of those 4,000 initial training cell images represents one distinct cell, i.e. no cell was considered more than once in the training data set. We intentionally have fewer sheared croissants in the training data set than croissants and slippers since it is only an auxiliary cell class not reflected in We estimate perfect slippers to be around the peak of the distribution at % −117, whereas croissants occur around % 115. By fitting the whole spectrum by four Gaussians, we are able to separate the respective contributions of each cell shape class and thus can determine a respective confidence interval. In the lower part, typical cell shapes are depicted for different output value ranges. Starting from the leftmost cell image, we undergo a shape change from slippers (image 1-3) to others (image 4-5) and finally to sheared (image 6-7) and pure croissants (image 8-9). the phase diagram but only to increase the precision of croissant classification. The three subsets (one per cell class) were taken at different pressure drops. Slippers were recorded at 700 -1,000mbar, croissants at 100 -200mbar and sheared croissants at 300 -500mbar. Within these ranges, we applied several different pressure drops to ensure a certain variability of our training data set. All cells of a class are then mixed since they should be identified independent from the applied pressure. Further, we intentionally recorded cells for different lighting conditions by varying the power of the light source randomly (within a range of 50 -80%). Together with a bilinear contrast adjustment of the resulting images (cf. experimental setup), this ensures high contrast dynamics of each image and corrects for minor changes in illumination. Due to constant manual observation during the recording process of the input data, we ensure collecting focused cell images. However, to ensure an invariance of our CNN approach with respect to optical misalignment and to reach a robust algorithm in terms of optical imaging, we also train for defocused cells by recording training images at different positions within a slightly tilted channel. The outcome will then be a neural network with a self-contained training set and thus is perfectly suited for a non-biased analysis.

Experimental setup
For our measurements, dilute suspensions of RBCs are required to observe single cells flowing through microcapillaries. Therefore, capillary blood is drawn with consent from a healthy individual and resuspended in a buffer solution (phosphate buffered saline, PBS, gibco by life technologies). After centrifugation at 1,500 g for 5 minutes, RBCs sediment on the bottom of the sample tube, whereas leukocytes and platelets form a buffy coat on top of these (for further details concerning the blood sampling, see [30]). Since the presence of platelets can alter the dynamic properties of RBCs due to shear-induced platelet activation (SIPA, see [31]), this buffy coat is removed together with the supernatant and the residual pellet of RBCs is again resuspended in PBS. This procedure is repeated three times until the final pellet of cells will be adjusted to a hematocrit of Ht 1% in a base solution of both PBS and bovine albumin (BSA, Sigma-Aldrich) at a concentration of 1 mg/ml, to avoid the well-known glass slide effect, turning discocytous RBCs to echinocytes.
With the aid of a high-precision pressure device (Elveflow OB 1 Mk II), this highly diluted suspension of RBCs is then driven through microcapillaries for a discrete set of 12 pressure drops Δp 2 {20, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000}mbar, yielding various flow velocities, and recorded with a high-speed camera at a framerate of 400 Hz. Over the full pressure range, we record in total 3,090 RBCs and obtain experimental data, independent of the training data set.
The microcapillaries are formed in PDMS (polydimethylsiloxane), have a length of L x = 4 cm and a rectangular cross-section with dimensions L y = 11.9 ± 0.3 μm and L z = 9.7 ± 0.3 μm. The y-direction is perpendicular to the camera axis, whereas the z-direction points inward the camera axis. Several channels are branched from a common reservoir to the fluid inlet, which is connected to the pressure device via flexible tubing.
Post-processing the recorded media involves single particle tracking of cells, including distinction and sorting out non-isolated RBCs (i.e. RBCs being too close to each other, such that hydrodynamic interactions are not negligible). Due to the used optical setup (Nikon CFI Plan Fluor 60× oil-immersion objective, NA = 1.25) camera (Fastec HiSpec 2G) and capture settings, typical cell dimensions are in the range of 80 px. However, to allow for minor (physiological) variations, we crop the individual cell images to a format of 90 × 90 px 2 . Since the channel width is smaller than 90 px, we apply a Tukey window w(y) (3) to the images, causing a smooth fade-out towards the channel walls where y denotes the relative position (0 y 1) in vertical direction of an arbitrary cell image. This reduces the influence of markable edges probably influencing the training and output of the CNN (Fig 2). For image preparation, we additionally map the image intensities to the full 8-bit range, such that the bottom 1% and the top 1% of all pixel values are saturated. This transformation yields higher signal dynamics and equal intensity profiles of all cell pictures and renders our approach more stable regarding slight illumination variations.

Results and discussion
Neural networks are usually benchmarked by opposing their output with manually classified data. For our presented analysis, we compared the CNN-classified cell shapes with a manually ascertained phase diagram, presented first in [10]. This manual result serves as reference for all presented system benchmarks.
As can be seen in Fig 4, the positions of the main populations of slippers (−117), croissants (115), and sheared croissants (% 40) differ significantly from the target values of −127, respectively 127 and 64. Deviations mostly occur due to different microscopy settings such as illumination and focusing. However, since thresholds are applied by statistical means, this shifting is not crucial for the final result. The center values for the occurring populations as well as the background are determined by applying four Gaussian fit functions.
Based on the standard deviations of the Gaussian distributions of the main classes, thresholds can be defined to assigning cells at certain position the respective population. Cells which do not fall in a designated range are classified as others rendering our regression based CNN an outlier tolerant classification system.
In Fig 5, both the output values for the whole image data as well as the corresponding phase diagrams for different thresholds are shown. According to three predefined training classes (slippers, sheared croissants, croissants), we again fit the results of the CNN by four independent Gaussians, one of them accounting for the background.
In Fig 5(a), we labelled all cells within a confidence interval of 1σ (1σ croissant , and 1σ slipper , resp.) to assign the cells to the associated distinct class. In Fig 5(b), manually evaluated data from [10] are depicted as areas, whereas the automatically evaluated data are marked as dashdotted lines. The slipper population is represented in light blue, whereas croissant-labelled cells appear in light red. Although the qualitative behavior of the resulting phase diagram can be reproduced, for a threshold of 1σ we significantly underestimate the fraction of slippers and croissants compared to the result from manual classification.
By a change of these confidence bounds from 1σ to 2σ, we perceive a better accordance of the manually classified croissants and the CNN-evaluated ones. Slippers still tend to be underestimated by the neural network, however, the deviation to the manually obtained graph is smaller throughout the whole velocity regime (Fig 5(c) and 5(d)), causative of predicting a wrong phase transition point (intersection of both graphs). In Fig 5(e) and 5(f), we therefore apply an adapted threshold range for each population, proposing a phase diagram in very good accordance to the benchmark of manually classified cell shapes. Even though a slight deviation to the manually generated phase diagram is still noticeable, both for slippers and The underlying adapted confidence interval is derived from an iterative process, finding a minimum of a predefined cost function. As cost function, we calculate the number of false negatives and false positives for each cell shape class compared to the manual selection. By scanning the center values as well as threshold intervals of each population separately, both values (false positives and false negatives) are summarized to receive a cost value. Consequently, by minimizing this cost value, we obtain the optimal threshold range setting for each population. Finally, we yield adapted ranges at −116.6 ± 28.5 for slippers and 116.8 ± 35.5 for croissants. All deviating individuals either for croissants and slippers are shown in Figs 6 and 7. In total, the associated confusion matrix of our classification is depicted in Fig 8. However, Figs 6 and 7 illustrate that either the CNN as well as the manual evaluation of particular cells include many indifferent decisions especially for the false negative cases. Adjusted for these contrary decisions, it can be assumed that the false negative count rate influenced by subjective input is significantly lower. In contrast, with few exceptions, the false positive rate is in good accordance with the indicated value.
Referring to the final RMSE (Fig 3) which corresponds to a value of % 10 regarding a target value of |± 127|, the false detection rates of slippers are in a very good agreement with the final error of the training process (% 8%). For the croissants, we observe much higher deviations due to the occurrence of sheared individuals which are closely neighbored to the pure croissant population. This especially affects the false positive counts, whereas the false negative count contains a noticeably high amount of indifferent cases.

Future perspectives
Using artificial intelligence for classification issues is a well-known, yet rare approach in biological systems. Although a wide variety of accessible, pre-trained, highly sophisticated neural networks (e.g. AlexNet, ResNet) exist, they tend to be over-engineered for most purposes in this field. Due to the complexity of their architecture with respect to the degrees of freedom, they require millions of input images. Furthermore, they are not designed for regression problems as required for the present study, but rather have a binary classification output. The presented CNN is a step forward to a fully automated analysis of recorded microscopy images associated with a big gain in evaluation efficiency. It moreover constitutes an unbiased classification system for RBC shapes in flow. Any restriction to cell classes is purely artificial in the sense that we train the CNN solely with RBC shapes in flow. Similarly, one could tailor a set of training data for other distinct cell classes or even cell types. We aim to conduct further experiments with different channel geometries and flow conditions, amending the existing phase diagram. The phase diagram and especially the phase transition point between croissants and slippers can be altered adding drugs to the RBC suspension. Since certain drugs, e.g. acetylsalicylic acid, show a strong effect on the membrane structure [32], we conjecture to resolve this feature in phase diagrams of future studies, leading to insights towards mechanical properties of individual RBCs in flow. Following this scope, the change in flow behavior of whole blood under drug influence can be predicted due to the influence on its main constituent, the red blood cells. This might play a key role in finding appropriate drugs to avoid pathogenic incidents, e.g. stenosis. Analogously, we clearly see evidence to detect maladies causing an alteration of cell's intrinsic mechanical properties, e.g. in sickle cell disease, thalassemias and numerous rare anaemias. By slight technical adaptations (e.g. the usage of high-performance graphics cards), it is also possible to classify cells on-the-fly, i.e. in real-time rather than with the presented frame-based approach. Prospectively, the here proposed CNN will be a useful quantitative tool in hematology aiming to investigate cell membrane characteristics.

Author Contributions
Conceptualization: Lars Kaestner, Christian Wagner, Stephan Quint.  Classification of cells using machine learning