Segmentation of HE-stained meningioma pathological images based on pseudo-labels

Biomedical research is inseparable from the analysis of various histopathological images, and hematoxylin-eosin (HE)-stained images are one of the most basic and widely used types. However, at present, machine learning based approaches of the analysis of this kind of images are highly relied on manual labeling of images for training. Fully automated processing of HE-stained images remains a challenging task due to the high degree of color intensity, size and shape uncertainty of the stained cells. For this problem, we propose a fully automatic pixel-wise semantic segmentation method based on pseudo-labels, which concerns to significantly reduce the manual cell sketching and labeling work before machine learning, and guarantees the accuracy of segmentation. First, we collect reliable training samples in a unsupervised manner based on K-means clustering results; second, we use full mixup strategy to enhance the training images and to obtain the U-Net model for the nuclei segmentation from the background. The experimental results based on the meningioma pathology image dataset show that the proposed method has good performance and the pathological features obtained statistically based on the segmentation results can be used to assist in the clinical grading of meningiomas. Compared with other machine learning strategies, it can provide a reliable reference for clinical research more effectively.


Introduction
In recent years, tumor morbidity and mortality have increased rapidly in the global population [1]. Early detection and accurate diagnosis can make treatment more effective and thereby increase the chances of survival, and the analysis of tumor histopathological images is the gold standard for tumor diagnosis [2]. Hematoxylin-eosin (HE) staining is one of the most commonly used techniques for observing pathological paraffin sections, especially in the analysis of microscopic histopathological images of tumor tissue [3], in which the nucleus is stained with hyacinthine by alkaline hematoxylin while the cytoplasm is stained red by acidic eosin. In this way, the differences between the various cell structures in the tissue are effectively magnified. In HE-stained histopathological images, the morphology of the nucleus is the basis for judging the nature of cancer. For example, the size, shape, and density of the nucleus affect the qualitative analysis of the tumor. However, it is difficult for pathologists to analyze massive amounts of HE-stained data, and the results of the analysis are susceptible to subjective factors [4].
Researchers are seeking breakthroughs in the segmentation of nuclei in pathological images because nuclear detection and segmentation are the most basic and critical steps in the process of pathological image analysis. Most of morphological methods are based on the threshold, watershed algorithm, statistical level set, active contour model, or a combination of these approaches [5][6][7][8][9]. These methods provide excellent segmentation results under certain conditions. However, in the actual pathological image analysis of HE-stained tissues, the following challenges may be encountered when using the above methods. First, because HE staining is greatly affected by external factors, there is extensive variability in staining nuclei. Second, the lack of a clear nuclear boundary makes these methods extremely prone to over-segmentation. Third, the diversity in nuclear shapes makes it difficult to establish a stable shape model.
With the rapid development of machine learning, image segmentation is simplified as labeling all image pixels through learning, which consist of two major categories: 1) Classic machine learning segmentation methods: The segmentation of pixels is based on feature sets extracted based on well-designed models, and inputted into classic machine learning methods to assign a certain category to each pixel in the whole image. 2) Deep learning methods for image segmentation: Various typical image patches are manually selected as the training dataset, and inputted into a neural network model to segment different cell parts such as nucleus, cytoplasm and extracellular space (ECS), then the trained network is used to segment all other unmarked images.
As classic machine learning approaches, Mittal et al. [10] used super-pixel clustering method to segment cell nuclei, and used gravity search algorithm to optimize cluster centers. Qu et al. [11] proposed a method based on the pixel-wise support vector machine (SVM) classifier for segmenting tumor nests and the stroma. Meanwhile, recent breakthroughs have been made in the analysis of natural and pathological images with deep learning, various methods of nuclear segmentation based on deep learning have emerged. In segmentation tasks, initially, convolutional neural networks (CNNs) were used only for feature extraction, and with the emergence of fully convolutional networks (FCNs) [12], semantic segmentation has gradually become a mainstream segmentation method, and various improved models have emerged. Chen et al. [13] presented a novel deep contour-aware network (DCAN) and used a multilevel contextual FCN to generate multiscale feature representations in an end-to-end manner. Through the modification of the FCN model, Qu et al. [14] obtained a full-resolution convolutional neural network (FullNet) to retain more detailed information. The U-Net Ronneberger et al. [15] proposed brought new vitality to the segmentation model of pathological images, and many optimized versions were produced. Pan et al. [16] extended U-Net with atrous depthwise separable convolution (AS-UNet) for nuclear segmentation. Study [17] proposed a U-Net-based neural network residual channel attention U-Net (RIC-UNet), which applied residual blocks as well as multiscale and channel attention mechanisms to RIC-UNet to enhance the segmentation accuracy.
The above methods are all improvements in terms of the model aspects, but one of the essential problems of deep neural networks is still unsolvable: the necessity of a large number of manually labeled data points, which is especially important in medical image analysis. Since there is high requirement for professional in medical image annotation, to further improve the efficiency of machine learning in the training dataset preparation. In this regard, we propose a fully automatic pathological image nuclear segmentation method based on pseudo-label as shown in Fig 1. First, K-means clustering based on pixel features is performed on the images, and then the training sample patches are grabbed according to both the established rules and the clustering results. Second, a novel training algorithm that is not fixed epoch is designed, and the obtained training sample patches are put into the model for training with the full mixup operation according to the probability. Then, overlapping patches of the test images are cut into the same size and delivered to the model to obtain the prediction results, and the foreground segmentation results of the nuclei are obtained by stitching. Finally, the hybrid watershed algorithm is used to segment the boundaries between nuclei, the final segmentation result is obtained and the relevant pathological features are counted. Comparing to both traditional machine learning and deep learning methods, the use of unsupervised clustering replaces the tedious process of manually annotating the training sets, and the full mixup strategy is used to solve the problem of the poor generalization ability of the automatically acquired training patch sets.

Materials and methods
In this section, we mainly focus on the detailed process of training samples and their pseudolabels, which can automatically grab reliable patches from complete HE-stained images. To solve the generalization problem, a series of training and testing protocols are proposed.

Acquisition of HE-Stained images and nuclear annotation
In this work, we used meningioma pathological images as the experimental subjects. The World Health Organization classifies meningiomas into three grades: benign meningiomas (WHO I), atypical meningiomas (WHO II), and interstitial or malignant meningiomas (WHO III). The first grade is known as the low-grade, and the second and third grades are categorized as high-grade meningiomas [18]. Therefore, there are certain differences between high-and low-grade meningiomas, which are highly suitable for testing the generalization ability of the proposed method.
Dataset preparation was performed on tissue samples of two groups including high-and low-grade meningiomas from different clinical patients in Fujian Medical University Union Hospital. The local ethics committee granted ethical approval for the study (Certificate No. 2019KJTYL024), and informed consent was obtained. Microscopic slices were stained by HE with the standard histological procedure described in [19], and then RGB images were captured and stored as 1536 × 2048 × 24-bit tiff files. Eventually, we obtained a total of 60 HE- stained tissue sections of high-and low-grade meningiomas, including 30 high-grade and 30 low-grade meningiomas. Images of five most representative areas determined by professionals in each slice were collected to form HE image sets. Therefore, a total of 300 (60groups × 5) HEstained images were used in this work.
To include as diverse of nuclear appearances as possible, we randomly selected 20 images of different groups from the collected 60 groups including 300 HE-stained pathological images captured from both high-and low-grade meningioma samples. The selected dataset included 10 high-grade and 10 low-grade meningioma images. A 1536 × 2048 HE-stained pathological image has approximately 800 nuclei to be annotated, so there are about 16,000 nuclei in 20 images. We used the software MaZda4.6 [20] to annotate these nuclei, and the annotators were two doctors from Fujian Medical University's Union Hospital.

Training sample selection process based on unsupervised learning
This section introduces a method of unsupervised selection of training samples, which automatically select most reliable patches including nuclei and their labels based on dyeing characteristics of each slice. These samples are then used to train the deep learning model for nuclei segmentation of other HE-stained images.
Feature selection. There are two kinds of quality problems in HE-stained images: uneven distribution of dyes in tissues, and salt and pepper noise caused by small dye particles. According to these problems, the image is filtered by a combination of a 5 × 5 median filter [21] and a 5 × 5 Gaussian filter [22] to eliminate noises. Earlier studies of our group have proven that the selected feature sets are effective for segmentation of the nuclei [23]. The two-color channels R and G provide much more information for the classification result than the B color channel. Therefore, to reduce computational cost and improve the effectiveness of the proposed pipeline, intensities of color channels R and G are selected and mapped into the two-dimensional feature space to form a two-dimensional feature set.
Acquisition of stable color areas of the nuclei. Manhattan distance-based K-means clustering is applied to automatically obtain the nuclear stable color areas. There are three different cell structures in the HE-stained images, namely, the nuclear area, the cytoplasmic area and the ECS. There is no clear boundary between the above structures in the HE-stained images. To obtain more reliable nuclei areas for building the training set, we firstly set the clustering classes as five and randomly initialize clustering centers. Since the eigenvectors modules of nuclei, cytoplasma and ECS feature sets are ascending in practice, the mean values of all clustering centers are also following this order. Then the following five pixel-level categories are obtained from clustering: the nuclear stable color area, the nuclear-cytoplasmic fuzzy area, the cytoplasmic stable color area, the cytoplasmic-ECS fuzzy area, and the ECS stable color area.
Only Precision needs to be considered in clustering nuclear areas for the automatic nuclear patches selection, and the increasing of Recall might give more options in selecting proper training samples. We extract several representative HE-stained images and compare the nuclear areas obtained by clustering with the artificially sketched nuclear areas. Fig 2 shows the distribution of the stable color pixels of nuclei in different images. The olive areas are the deeply stained parts of stable color areas of the nuclei obtained by clustering can be used as the candidate areas for training samples.
Collection of training image sets. Since our task is to segment the whole nuclei areas, we further merge the above five categories obtained by clustering. First, pixels belonging to the stable color areas of the nuclei can be used as the foreground of the training samples, so no changes are made. Second, pixels belonging to the nuclear-cytoplasmic fuzzy areas include both the nuclear and cytoplasmic pixels, so they are defined as fuzzy areas. Finally, pixels belonging to the cytoplasmic stable color areas, the cytoplasm-ECS fuzzy areas, and the ECS stable color areas are combined into one category, which is defined as the nonnuclear area. Detailed integration results are demonstrated in Table 1.
The automatic patch collection is shown as pseudocolor images of the different categories in Fig 3. The training image sets are built based on the following screening rules. We design a window with a size of 48 × 48 pixels and set the stride size to 8 pixels to slide on the entire category label pseudocolor image. When more than 100 nuclear stable color area pixels are in the window and the proportion of pixels in the fuzzy area is less than α, the windowed original HE image is captured as the training image patch, and the corresponding position in the pseudocolor image of the category label is captured as the label of the training image patch. Since the fuzzy areas are basically around the nuclear stable color area, it is difficult to completely avoid the fuzzy areas. To minimize the impact of the fuzzy areas on the training effect, we set α to 0.05 here. The result is shown in the training image patch and training label patch in    which shows that the most deeply stained parts, the training sets collected by automatic collection are directly used for the model training, which may cause under-segmentation of the nuclei with light staining. Usally it is difficult to distinguish some light blue nuclei with the surrounding cytoplasma. Therefore, accurate segmentation of the lightly stained nucleus is the key challenge of nuclear segmentation. We design a series of training and testing protocols for the training sets to improve the generalization ability of the proposed model to maximize the resolution of the lightly stained nuclei. Convolutional neural network model. We use the U-Net [15] network model, which is one of the most widely used networks in image segmentation, to verify the proposed pseudolabel construction method. The U-Net model is an improvement and extension of the FCN, which use convolution layers and pooling layers for feature extraction and then use deconvolution layers to restore the image size. The model is a mature baseline semantic segmentation model, and its effectiveness has been fully verified. The simple structure of U-net can minimize the impact of the segmentation model itself, and verify the proposed pseudo-label method more precisely.

Training and testing protocols
Training and testing sets. We use 20 manually annotated images as the testing sets of the model. Considering that the five images of the same group acquired from one patient at the same time, the nuclear morphology and staining were highly similar. Therefore, to ensure the reasonableness and reliability of the test accuracy, the images used as the test sets and other images in the same group are not involved in the production of the training sets. Using the designed unsupervised method to find a reliable training set, more than 200,000 patches with a size of 48 × 48 that can be used for training are obtained from the remaining 40 groups of 200 images. We allocate the training sets and the validation sets according to a ratio of 4 to 1, so the training sets generally contain more than 160,000 patches and the validation sets have more than 40,000 patches.
Training patch full mixup. Because the training sets we constructed are composed of the most well-stained nuclei in different HE-stained images, the network trained with the original training sets cannot segment the lightly stained nuclei, resulting in a lack of generalization ability. A full mixup strategy of training samples is applied to compensate this problem. We use the following methods to perform the mixup operation: where I x and I y are different input training image patches and L x and L y are different input training label patches. As the Eqs (1) and (2) shows, the full mixup operation merges different images with λ as the weight and directly superimposes the labels of the images. Because each HE-stained image is real and effective, they have the same importance, so we set the mixing weight λ to 0.5. 'Or' operation is performed between binary labels of L x and L y in Eq (2). As shown in Fig 4, comparing with the original image, there are many more lightly stained nuclei in the mixed image, which is caused by the mixing of the nuclear areas of one image with the cytoplasmic areas or ECS areas of another image. The nuclei generated by the image mixing are highly similar to the lightly stained nuclei in the original image, so this operation fixes the problem that only the mostly stained nuclear areas could be selected in self-supervised learning. In practice, we determine whether a group of batch-size training sets in the input network needs to be fully mixed according to the probability P. If necessary, this set of training sets is randomly interleaved and fully mixed within the group, and P is variable while training is in progress.

Loss function.
The training effect of the training set image after full mixup is the key to whether the model can distinguish lightly stained nuclear pixels. To increase the penalty intensity for the nuclear pixels in the mixed image, we use the weighted binary cross entropy loss function, and the equation is as follows: among them, where N is the batch size, y n is the ground truth value of the nuclear image, and x n is the predicted value (between 0 to 1). For the weight w n , which is initialized as a matrix with all elements of 1, we impose twice the weight on the nuclear pixels in the batch if the full mixup is performed, but for the background and the batch without the full mixup operation, we maintain the original weight. Evaluation criterion. Criterions including accuracy (ACC), precision (PC), recall (RC), specificity (SP) and F1-score (F1) are widely used to evaluate the performance of nuclear segmentation in pathological images, and the calculation equations for these parameters are shown in Eqs (5)- (9).
The accuracy (ACC) represents the ratio of correctly segmented nuclear and background pixels to the total number of pixels in the image. In the comparison of the segmentation results (SR) with ground truth (GT), the precision (PC) represents the proportion of correctly segmented nuclear pixels in SR to the total nuclear pixels in SR. The recall (RC) can also be called the sensitivity, which is relative to the specificity (SP). The former indicates the ratio of correct nuclear pixels in SR to the nuclear pixels in GT, and the latter indicates the ratio of correct nonnuclear pixels in SR to nonnuclear pixels in GT. The F1-score(F1) value is the harmonic mean of the precision and recall, and TP, FP, FN and TN represent the true positive, false positive, false negative and true negative, respectively. In addition, we introduce two other indicators from [24], the Jaccard similarity (JS) and the Dice coefficient (DC), and their calculation equations are shown in Eqs (10) and (11).
From the Eqs (10) and (11) ScoreT if ScoreV > Best_score − 0.005 and |ScoreT − ScoreV| < 0.06 then 5: Save the current model weights; 6: Update the parameters:Best_score ScoreV; P P + 0. In Algorithm 1, P is the initial probability of performing full mixup; JS t i and DC t i are the JS and DC in the training accuracy index obtained from each epoch of training; JS v i and DC v i are the JS and DC in the validation accuracy index obtained from each round of training. In the 4th step of the algorithm, as the training progresses, the P value increases step-by-step; that is, an increasing number of images are fully mixed, resulting in a slight fluctuation in the training accuracy. Thus, ScoreV is slightly lower than the Best_score obtained from the P value of the previous stage, and the decrease range is set to 0.005. Ensuring that the distance between ScoreT and ScoreV is within a controllable range prevents the model from overfitting.
The final P value ranges from 0.1 to 0.9. Here the upper limit is set as 0.9 because completely mixing the training sets results in insufficient training of the original image by the model. This step-by-step training method can allow the model to abundantly train on the original images and then train on the full mixup images, which is equivalent to a process of gradually generalizing the model.

Prediction result stitching.
In the testing, to keep the input size of the model the same, we crop the test image into patches sized 48 × 48 and set the cropping step to 24, which allows the boundary part of each patch to be predicted multiple times and improves the stability of the model at the patch boundary. When performing segmentation result stitching, since the cropping step is 24, overlapping parts are generated multiple times during stitching. Then, we average the probability values of the corresponding overlapping parts as the final result. Finally, the pixels with a predicted probability value greater than or equal to 0.5 are used as the nuclear pixels.

Overlapping nuclei segmentation
In clinical research, quantitative analysis of different cell morphologies in histopathological images is often required, so the simple division of nuclear areas is not adequate. Therefore, the nuclear areas need to be precisely segmented to determine the boundaries of each nucleus. However, due to the lack of delineated boundaries between the overlapping nuclei, morphological method is needed to further segment the boundaries of the nuclei in the fully automatic pipeline.
The common operation steps of the watershed algorithm are color image graying, gradient map construction, and watershed segmentation based on the gradient map to obtain the edge line of the segmented image. When the initial catchment area (i.e., the minimum value of the area) is designed, the nuclear area obtained by the foreground segmentation is subjected to distance transformation to extract the morphological center point, and the stable color areas of the nuclei obtained by the clustering together constitute the initial catchment areas. Then, based on these areas, the watershed algorithm is used to obtain the segmentation results of the adherent nuclei.

Results and discussions
We first conducted ablation experiments on the main training strategies designed to verify the effectiveness of each strategy. Secondly, the proposed method is compared with the traditional pixel-based machine learning segmentation method to prove the advantages of the image block training set constructed by unsupervised learning. Then, in order to reflect the advantages of unsupervised construction of training samples, it is compared with the fully-supervised semantic segmentation method. Finally, the effect of the hybrid watershed method on the segmentation between cores is verified.
The study is implemented with Python 3.7, and the algorithms are developed based on the deep learning framework of Pytorch, which is a now commonly used machine learning library. The core components of the hardware environment are an Intel (R) Core (TM) i7-8700K CPU, 16 GB RAM and a Nvidia Titan RTX GPU.

Ablation experiment performance
To verify the necessity of the series of training protocols we proposed, we use 20 original HEstained images with artificially annotated nuclei for testing, while the training sets and validation sets are automatically captured by unsupervised learning. The designed ablation experiment and the experimental results of segmentation criterions are shown in Table 2.
The operation items in Table 2 are the training protocols in the previous section. In addition, the general mixup operation, in which both the training images and labels are mixed with random weights that conform to the beta distribution, is added for comparison with full mixup. The tick in the table indicates corresponding operations are performed in each row.
From the results we can see that the ACC values of different operations are very close. This similarity occurs because in the meningioma HE-stained images, nonnuclear areas usually have a large proportion in the whole image, and it is not difficult for the general model to correctly segment most of the nonnuclear areas, which leads to the dilution of the incorrect segmentation of the nuclear areas. The PC values and SP values show downward trends because the insufficient generalization ability of the original model, which can only identify darker stained nuclei based on the training samples, and these areas are basically the correct nuclear areas, leading to a lower FP. With additional operations, the generalization ability of the model is enhanced, and can distinguish the lightly stained nuclei, causing the RC value and F1 value to increase. That indicates the generalization ability of the model is gradually increasing with multiple operations shown in the last row of Table 2, and the nuclei foreground segmentation results based on the final combined operations are shown in Fig 5. Note that the proposed self-supervised learning method does not require the color normalization of the image before or after the training patch is collected, which allows the image to maintain the original color features and further saves the time in training set preparation.

Comparison of the segmentation performance with traditional machine learning methods
Usually there are two mainstream traditional machine learning methods used for segmentation pathological images: pixel-level classification based on supervised SVM [6] and unsupervised hierarchical K-means clustering [25].
We designed the following experiment to compare the segmentation effect of the proposed pseudo-label method with those of traditional supervised and unsupervised methods: 1) SVM method: Since the method described in [11] needs to be based on the same group of HEstained images, we randomly take out one percent of the total number of pixels in the image and its pixel labels for training the SVM and then use the trained model to predict each pixel of the entire image until all 20 test images are predicted and the average accuracy is calculated.  2) The hierarchical K-means clustering method is the same as the method described in study [25]. The index results obtained by different methods of segmentation are listed in Table 3. For traditional machine learning segmentation method based on pixel classification, even if the local area feature of the pixel is added, there is still lack of correlations between the pixels, so this kind of method needs some morphological postprocessing to produce relatively complete nuclear areas. Meanwhile, the proposed method needs only to splice each patch, with no excessive morphological postprocessing required. The comparison results are shown in Fig 6.

Comparison of the segmentation performance with supervised semantic segmentation
To construct training datasets for supervised learning, we split the 20 artificially annotated HE-stained images into two parts by 3:1, 15 for training, and the remaining 5 for testing. We  randomly grab more than 200,000 training patches from 15 labeled images to ensure that the number of training samples is the same as that in the pseudo-label method. The training datasets are augmented by means of left-right and up-down shifting, rotating, flipping and rescaling operations, and then the U-Net model and its two improved models are used for comparison. After sufficient training, the test images are input into all the models for prediction.
The results are shown in the Table 4, where AttU-Net means Attention U-Net [26], and R2U-Net presents the Recurrent Residual CNN-based U-Net [24]. The experimental results expose the limitations of supervised learning approached based on manual labeling, which has limited generalization ability based on the diversity of manual labeling. Since we randomly selected 20 labeled images from different groups, there are obvious differences in the color and nuclear morphology of the images. In addition, the training images are not color-normalized during training, so it is difficult for the supervised learning model to directly learn the features of the other 5 test images from the 15 training images, and that restricts the accuracies shown in Table 4. The comparison images of the segmentation results of each method are shown in Fig 7. From sample 2 segmented by AttU-Net and sample 1 segmented by R2U-Net in Fig 6, we see that whether the introduction of an attention mechanism to strengthen AttU-Net learns the effective features or the recurrent neural network (RNN) and ResNet Structure integration of R2U-Net is used, there are still several images that cannot be effectively segmented. This failure may be due to no similar images being present in the training set, meaning the model is unable to effectively learn the features of such images. In the real HE-stained pathological image segmentation task, the quality of the images varies a lot, which makes comprehensive manual annotation data difficult to obtain. One solution is to normalize the color of the HEstained images so that the color of the images tends to be the same. This can reduce the efficiency of batch image processing, but the selection of the reference image will directly affect the quality of the overall normalized result.

Comparison of the number of nuclei
To precisely quantify the distributions and morphologies of single cells in the slice, we use the nuclear count method to test the segmentation of the adherent cell nuclei, which also provide statistical results for researches on clinical diagnosis or classification. We randomly select three images from the high-and low-grade meningioma test images respectively as the samples to be counted, and then send them to the pathologist for estimation of nuclear number statistics. Due to the high density of cells in the complete image, the pathologist divides each image into 16 (4 × 4) areas, each with a size of 384 × 512 pixels, and then the expert selects four ROIs for statistics based on the cell distribution. The number of nuclei in the entire image is estimated based on the principle of uniform cell distribution. The results of the comparison between the proposed hybrid watershed segmentation method and the manual statistics are shown in Table 5, in which Manual Statistics represents the result of manually counting the number of nuclei, and Before represents the statistical results before the adhesion nuclear segmentation, that is, the result of directly counting the number of nuclear foreground areas obtained by proposed method. After represents the statistical results after the Mixed Watershed segmentation, Rate represents the rate of increase in the number of nuclei after adding the adhesion nuclear segmentation. Error gives an error range of the final statistics after watershed segmentation compared with the manual statistics. The application of mixed watershed segmentation has a relatively strong impact on the number of nuclei, the lowest increase is greater than 25%, and most of the errors compared with the manual statistics are lower than 10%, which shows that the mixed watershed method

PLOS ONE
nucleus, and then obtain the accurate boundary of the adhered nucleus. The prerequisite for this method is that the segmentation method based on pseudo-label can segment a more accurate foreground nuclear area from the HE-stained image, which provides favorable conditions for the hybrid watershed method.

Statistical analysis of pathological features of meningioma
The quantification of the segmentation results of HE-stained images to output relevant pathological features is one of the important steps in pathological image analysis. In order to verify that the pathological features based on the segmentation results have an auxiliary effect on meningioma grading, we quantified the segmented images using a series of pathological features defined in the previous study [23], as shown in Table 6, which characterise the state of the tumour tissue and can reflect the progression of the tumour [23]. Since no manual annotations were involved in all training processes, there is no need to consider the data leakage problem in deep learning here. The proposed method for segmenting the nuclei of meningioma pathology images was applied to a total of 60 groups of 300 images (30 groups of 150 for high-grade and 30 groups of 150 for low-grade), and the cytoplasmic regions and extracellular interstitial regions were temporarily segmented using K-means clustering. Based on the segmentation results, the feature sets in Table 6 were counted and some of the results are shown in Tables 7 and 8.
The Wilcox rank sum test was used to test for differences in features between high and low grade meningiomas. The null hypothesis: there is no significant difference between the statistical characteristics of high and low grade meningiomas. The test results in the case of a confidence level of 0.95 are shown in Table 9. It can be seen from Table 9 that except for the proportion of ECS area ratio, the results of other features are all less than 0.05 and rejecting the null hypothesis. The reason why the proportion of ECS failed the test is that it is not the cell itself, but the effect on abnormal tissue growth is mainly reflected in the cell, and the effect on the ECS is relatively small. Therefore, the changes in pathological images of meningiomas of different grades are relatively insignificant. Therefore, on the whole, the pathological features based on the segmentation results can effectively reflect the differences in the grade of meningiomas.

Comparison of segmentation results of public datasets
To further test the generalization performance of the proposed method, we perform segmentation experiments on the publicly available dataset MoNuSeg. The MoNuSeg dataset [27] comes from the Medical Image Computing and Computer Assisted Intervention (MICCAI) 2018 Multi-Organ Pathology Image Nucleus Segmentation Challenge, which includes 30 training sets and 14 test sets. These images come from many different hospitals and cover tumor tissue samples from many different organs, and it can be found from the following link https:// monuseg.grand-challenge.org/Data/.
To enrich the number of training samples and reduce the bias of clustering results due to uneven coloring, we set a 500 × 500 size window to randomly grab image blocks from the original image of 1000 × 1000 size, and use the designed sample selection strategy on the image blocks. Table 10 shows the comparison results of different methods. From the values in the table, it can be seen that the proposed unsupervised pseudo-label training method reaches or exceeds the accuracy of the classical supervised semantic segmentation model in terms of the accuracy of nucleus foreground segmentation. This also further proves the strong generalization ability of the proposed method.

Conclusions
In this paper, we propose a fully automated pipeline based on pseudo-label to locate precise nuclear boundaries in HE-stained pathological images of meningiomas, which focus on the automatic generation process of pseudo-labels, and design a series of effective deep learning solutions for pseudo-labels. As an alternative to manually choosing training samples from HE image patches, an unsupervised selection strategy is proposed that can automatically and adaptively capture training samples according to their features, which greatly improves the efficiency of the training process and is compatible with various image quantities and qualities. Then, a deep learning framework is improved with strategies of full mixup and dynamic epochs in the training process. Through this framework, even incompletely stained nuclear areas can be predicted based on stably stained nuclear areas. The proposed method shows good performance in comparison experiments against supervised semantic segmentation methods and traditional machine learning methods. For supervised semantic segmentation methods, in order to increase the generalization ability of the model, it is necessary to obtain as much as possible the labels of images with different nucleus shapes and colors, when a new case image needs to be processed, it must be manually labeled first, and then a training set is constructed to retrain the model, so that the model has the ability to segment the new case, the inevitable manual participation will greatly reduce the efficiency of image segmentation. However, the proposed method can capture training sample images and their pseudo-labels according to the established strategy. When new image case is obtained, training samples and labels can be automatically constructed based on the images for further training of the model, which improves the expansion efficiency of the model. For traditional machine learning methods, feature construction with high representation ability is a difficult problem, and the ability of pixel or superpixel classification to obtain local information of the area where the pixel is located is insufficient. The proposed method uses image patch and semantic segmentation network to solve these problems.
In addition to the above advantages, some minor problems of the proposed methods could be further improved in the future. First, segmentation of some large nuclei aggregations with regular shapes is relatively difficult. These aggregations can be mistakenly classified as a whole nucleus, and training samples with regular aggregations may be lacking. Since such aggregations are often much larger than single nuclei, the feature of nuclei size will be considered in a future training framework to distinguish such cases. Second, some tiny irrelevant dots are segmented as nuclei, which could be erased in the postprocessing after segmentation. Third, there are still some under-and over-segmented cases after using the mixed-up watershed method. Considering these possible improvements, we will further direct the unsupervised selection strategy to choose more reliable nuclear boundaries as training samples, which will improve the segmentation accuracy for the reference of more efficient clinical researches on pathological image analysis.