A Fuzzy-C-Means-Clustering Approach: Quantifying Chromatin Pattern of Non-Neoplastic Cervical Squamous Cells

Despite the effectiveness of Pap-smear test in reducing the mortality rate due to cervical cancer, the criteria of the reporting standard of the Pap-smear test are mostly qualitative in nature. This study addresses the issue on how to define the criteria in a more quantitative and definite term. A negative Pap-smear test result, i.e. negative for intraepithelial lesion or malignancy (NILM), is qualitatively defined to have evenly distributed, finely granular chromatin in the nuclei of cervical squamous cells. To quantify this chromatin pattern, this study employed Fuzzy C-Means clustering as the segmentation technique, enabling different degrees of chromatin segmentation to be performed on sample images of non-neoplastic squamous cells. From the simulation results, a model representing the chromatin distribution of non-neoplastic cervical squamous cell is constructed with the following quantitative characteristics: at the best representative sensitivity level 4 based on statistical analysis and human experts’ feedbacks, a nucleus of non-neoplastic squamous cell has an average of 67 chromatins with a total area of 10.827μm 2; the average distance between the nearest chromatin pair is 0.508μm and the average eccentricity of the chromatin is 0.47.


Introduction
Papanicolaou-smear test is a useful screen test to detect precancerous stages of cervical cancer, thus enabling removal of intraepithelial lesions before progression into the invasive stage. Since the introduction of Pap-smear screening, mortality rate due to cervical cancer has been dramatically reduced [1,2]. In the meantime, technical advancement in slide preparation has ameliorated from conventional preparation to liquid-based preparation, overcoming the limitations of cell loss and overlapping cell morphology to a single layer of cells, thus improving specimen adequacy and further enabling better sensitivity of the test [3,4]. In Pap-smear reporting, pathologists or cytotechnologists examine the cervical epithelial cells according to the worldwide recognised reporting standard, the Bethesda System for Reporting Cervical Cytology [5]. Changes in the morphology of the cell nucleus, which are termed as malignancy-associated changes (MACs) observable under light microscope, are the prime criteria employed in this reporting standard. Changes in chromatin pattern are generally accepted as one of the MACs [6][7][8][9]. Chromatin is a complex of deoxyribonucleic acid and proteins that condenses within the nucleus [10]. Under light microscope, chromatin appears in dark and bright regions, consisting of strongly-stained heterochromatin and weakly-stained euchromatin dispersed throughout the nucleus. To report a Pap smear as negative for intraepithelial lesion or malignancy (NILM), the nuclei of squamous cells have been defined as having evenly distributed, finely granular chromatin [5,11,12].
As the definition of "evenly distributed, finely granular chromatin" for a non-neoplastic cervical squamous cell is qualitative in nature, inevitably, discrepancies between individual pathologists or cytotechnologists occur due to the subjective judgement, which would partly contribute to differences in their diagnostic accuracy [13]. To address this uncertainty in subjective judgement, it is desirable to transform those qualitatively-defined criteria to more definite quantitative criteria. Because of paramount importance of chromatin pattern as a diagnostic criterion, previous studies have been attempted to quantify the texture of nuclear chromatin. Multi-level thresholding [14][15][16] and region growing [17][18][19] techniques are generally employed. However, multi-level thresholding requires user-defined threshold values, which can highly affect the segmentation results. Furthermore, techniques based on thresholding are known to be sensitive to noise and uneven illumination. On the other hand, segmentation technique based on region growing requires predefined stopping criteria. Furthermore, the segmentation results are greatly dependent on these stopping criteria as well as the direction of the growing process. There are a few other less well known techniques, including the statistical-geometrical-features-based method [20] and the adjacency graph attribute co-occurrence matrix (AGACM) method [21]. A few studies on nuclear chromatin pattern are performed with an aid of free or commercially available software [22][23][24], unfortunately the details of the segmentation techniques are not known.
To overcome previous limitations in the segmentation technique of the nuclear chromatin, this study employs Fuzzy C-Means (FCM) clustering technique due to its simplicity and effectiveness in yielding promising results [25,26]. It is an unsupervised algorithm, in addition to its robustness for ambiguity and its ability which always converges. By employing a reasonably defined number of clusters, this technique would enable the segmentation of chromatin from cervical squamous cell nuclei to be achieved at different sensitivity levels, thus stimulating the differences in interpretation threshold of pathologists or cytotechnologists in their subjective judgment of "evenly distributed, finely granular chromatin". To our best knowledge, study on segmentation of chromatin pattern of cervical squamous cells captured from ThinPrep slides, a liquid-based preparation as compared to conventional smear slides, using the clustering technique, has yet to be performed. At the end of this study, a model representing the chromatin distribution of a non-neoplastic cervical squamous cell will be presented. were received without any patients' information. The cervical cell images were captured and analysed anonymously. These slides had been previously screened by cytotechnologists and formally reported as "negative for intraepithelial lesion or malignancy" by pathologists. They were reviewed and cervical squamous cell images were captured by a pathologist, using an Olympus BX43F clinical microscope mounted with a video camera. Oil immersion with 100x objective is used. The single cell image is manually cropped into the size of 500 x 500 pixels to obtain its nucleus. A total of 150 cropped test images (S1 File) are used in this study.

Methodology
The processing of the cervical squamous cell images consists of three stages, i.e. the pre-processing stage, the chromatin segmentation stage and the feature extraction stage. During the pre-processing stage, the input colour image with the size of 2048 x 1536 pixels is initially cropped at a size of 500x500 pixels to obtain the nucleus. The colour nucleus image is then converted into gray scale image. The contrast of the image is enhanced by stretching its input histogram to occupy the entire dynamic intensity range.
With the cropped nucleus, Fuzzy C-Means (FCM) clustering technique is applied to segment the chromatin. FCM clustering technique is first proposed by Bezdek [27]. The objective function of FCM algorithm is defined as [28]: Parameter μ ij is the degree of membership of x j belonging to the c-th cluster. Parameter m, a scalar value greater than one, is the weighting exponent which controls the amount of fuzziness of the resulting partitions. The operator kk represents the Euclidean norm. Parameter v i represents the set of cluster centroids. FCM is performed by minimizing Eq (1) through equations updating of membership function μ ji and cluster centroid v i as presented in Eqs (2) and (3). The number of cluster is defined as the value of the intensity, which coincides with the peak of the histogram of the nucleus. Details on assigning the initial cluster number are further illustrated in S2 File with S1, S2, S3, S4 and S5 Figs. Values of parameters in FCM are defined as in Table 1. Parameter testing has been performed for m. Complexity of FCM in segmenting the chromatin is justified as well through the computational time. Results and analyses of parameter testing and complexity are presented in S3 File with S1 to S9 Tables and S4 File with S10 Table respectively.
As discussed in the previous section, there is no exact term to describe the exact intensity level for chromatin. Questions arise such as to which extend we can define a dark region in the nucleus as chromatin? How 'dark' a region can be considered as chromatin? Thus, an attempt is made here where several levels of intensity threshold, which is known as the sensitivity level in this paper, is proposed. In this study, the sensitivity level of the chromatin detection can be defined by user. The sensitivity level defines the intensity threshold for chromatin segmentation. Here, for the purpose of initial study, the distribution of chromatin is observed for five sensitivity levels. From the segmented image obtained from FCM clustering, the intensities of all pixels are sorted in the ascending order. The five most minimum intensity values at sensitivity level k, which are the five lowest intensity values obtained from the segmented image, are taken as the intensity threshold t k , where t 1 < t 2 < t 3 < t 4 < t 5 and t k 2[0.255].

Simulation Results
Simulation results of three randomly selected test images are demonstrated in Fig 1. The number of detected chromatin is increased with the increment of the sensitivity level. The dotted points represent the centres of the chromatin regions detected. Three measurements are computed for better understanding on the spread of the data, which are the distance between two nearest chromatin pair, the area of the chromatin and the eccentricity of the chromatin. For each image, the average value of each measurement is computed. For example, if a test image contains ten detected chromatin regions at sensitivity level 1, the average values of the size (area) of the chromatin, the average distance between all the two nearest chromatin pair and the average eccentricity for all the chromatin regions are computed.
Consider an input cervical image, j segmented at sensitivity level k, there are n chromatin regions detected. The total area of the chromatin regions can be defined as where a ijk is the area of chromatin region i in the image j at sensitivity level k.
The average chromatin region can be represented as The overall average area of a chromatin region in 150 test images can be defined as For every chromatin region, the centre of the region is computed to find the distance between every chromatin pair, d njk .
where d (1)-(1) is the distance between chromatin regions 1 and 1 and d (1)-(n) is the distance between chromatin regions 1 and n.
The nearest chromatin pair is obtained with The average distance of the nearest chromatin pair for an image j at sensitivity level k is defined as The overall average distance of the nearest chromatin pair of 150 test images at sensitivity level k can be represented by In addition to the chromatin size and distance between chromatin pair, the eccentricity of each chromatin, E is computed. Eccentricity is the ratio of the length of major axis and the length of minor axis. It is invariant for geometric transformations and can be defined as E ¼ Length of major axis Length of minor axis ð12Þ The total eccentricity of the input image, j at sensitivity level k with n chromatin regions can be defined as where ε ijk is the eccentricity of chromatin region i in the image j at sensitivity level k.
The average eccentricity of the chromatin for image j can thus be defined as At sensitivity level k, the overall average eccentricity for 150 test images is defined as For the measurement of the distance of the nearest chromatin pair, the boxplot of the 150 test images is shown in Fig 3(a). The mean and standard deviation of the average distance of the nearest chromatin pair for these test images are shown in Fig 3(b). Fig 4 demonstrates the changes in shape of the chromatin detected. Fig 4(a) shows the boxplot of the average eccentricity for 150 test images and Fig 4(b) shows the mean and standard deviation of the average eccentricity values of these images.
For further analysis on the data, Friedman test is performed to test for differences between the five sensitivity level groups. For the both the average area and the average distance of the nearest chromatin pair, the tests reveal a statistically significant difference result, with p-value less than α. Null hypothesis stating that there is no differences between the sensitivity levels is rejected. Post hoc test is performed to further identify which sensitivity levels differ from which other sensitivity levels in the measurement of average area and average distance of the nearest chromatin pair. Summary from the results of the post hoc analysis with Holm's and Shaffer's procedures are presented in Table 2. Details of the post hoc test are presented in the supplementary material. Since comparison of all five sensitivity levels at four different amount of fuzziness reported significantly difference except for a pair of sensitivity level, Table 2 presents the sensitivity levels at which there are no significant difference for both the analysis of average   Table 2 demonstrated adjusted p-values less than 0.05.
From Table 2, for the average distance of the nearest chromatin pair, only sensitivity level 3 and 4 reported analysis that has no statistical difference. For all the other analysis, sensitivity levels 4 and 5 reported analysis that is not statistically different. As the amount of fuzziness changes from 2.0 to 4.0, there is no difference in the post hoc results. Therefore, in this study, the amount of fuzziness employed is 2.0. In addition to statistical analyses, a survey was conducted to justify the most representative sensitivity levels of human experts. The feedbacks from both the pathologists and cytotechnologists reported the 'reality visual perception' as compared to the findings from the statistical analyses. Ten pathologists and ten cytotechnologists participated in the survey. The survey consists of 20 questions, each presented with five sensitivity levels of chromatin detection. The human experts independently selected the images which best suit the chromatin as perceived. The mean and standard deviation of the sensitivity levels chosen for pathologists and cytotechnologists are demonstrated in Fig 5(a) and 5(b) respectively. The grand average sensitivity level for pathologists and cytotechnologists are 3.725±0.380 and 3.575±0.537 level respectively. The statistically selected sensitivity level matched the visual perception of  pathologists and cytotechnologists. The grand average sensitivity levels of 3.725 and 3.575 reveal that sensitivity level 4 is the most representative level for chromatin detection.

Discussions
Changes in the morphology of the cell nucleus are recognized as one of the crucial phenomena associated with neoplastic transformation [29]. These malignancy-associated changes (MACs) in the cell nucleus, particularly changes in the chromatin pattern, are employed as diagnostic criteria in the Bethesda System for Reporting Cervical Cytology for precancerous and cancerous diagnostic categories [5]. Separating the non-neoplastic category, i.e. negative for intraepithelial lesion or malignancy (NILM), from the neoplastic categories, chromatin pattern of cervical squamous cells of NILM has been defined in several literatures as in Table 3. It is apparent from Table 3 that the criteria of the reporting standard are qualitative in nature. Pathologists or cytotechnologists acquire their diagnostic skills through observation of numerous slides based on these qualitatively defined criteria. It is inevitable that discrepancies between individual pathologists or cytotechnologists occur due to the subjective judgement based on these criteria, which might lead to different diagnostic results. Therefore, there is a need to transform those qualitatively-defined criteria to more definite quantitative criteria. A computer-aided tool  Table 3. Description on criteria of chromatin to report a case as NILM.
Reference Description The pattern is finely granular.
[8] Usually, chromatin pattern of nucleus of normal cell is fine.
[11] The chromatin is finely and uniformly granular. [15] Mild hyperchromasia may be present, but the chromatin structure and distribution remain uniformly finely granular. which is capable to analyse and quantify the characteristics light microscope changes would be useful in the process of transition from qualitative criteria to quantitative criteria. Various attempts have been made to investigate the chromatin pattern in the cell nucleus. Rowinski et al. [30] measured the area of the chromatin of the lymphocytes using the Image Analysing Computer Quantimet based on multi-level thresholding. Smeulders et al. [31] segmented the chromatin of the cervical cells where the size of the segmented region is restricted by the lowest gray level and the lowest gray level gradient. Similar technique which limits the growing based on a fixed percentage of the nuclear area is then proposed [32]. Young et al. [33] on the other hand measured the heterogeneity, granularity, condensation and margination of chromatin by dividing the nucleus image into three categories based on thresholding. Murata et al. [34] employed 2-dimentional and higher texture analysis to analysis chromatin pattern of thyroid tumor cells. Jingu et al. [15] measured the gradient of the staining intensity from the center to the border of the nucleus as an index of the chromatin distribution of cervical squamous epithelial cells. These previous works have emphasized the usefulness of chromatin pattern in diagnostic. Although the above mentioned techniques tried to refine the descriptive terms for chromatin pattern such as homogeneity, clumping, and granularity used by the pathologists or cytotechnologists, they do not take into consideration the issue of different judgments by individual pathologist or cytotechnologist due to different sensitivities in visual perception of chromatin.
To imitate the human diagnostic behavior, this study proposed different sensitivity levels for the segmentation of chromatin pattern to represent the potential view of individual pathologist. The aim of this study is to quantify the statement "evenly distributed, finely granular chromatin" and hence build a model for the chromatin distribution of non-neoplastic cervical squamous cell. For chromatin regions detected at each level, three parameters are computed: the area of chromatin, the distance between two nearest chromatin pair, and the eccentricity of chromatin. Firstly, to quantify the so called "finely granular chromatin", the area of the chromatin is computed based on the total number of pixels which are connected in neighbourhood and have the same intensity values in the segmented image. The area of the chromatin would reflect the degree of fineness of chromatin quantitatively. Secondly, to quantify the so-called "evenly distributed chromatin", the Euclidean distance between two centres of the nearest chromatin pair is computed. The average distance of the nearest chromatin pair with minimum standard deviation would reflect the degree of even distribution quantitatively. Thirdly, the eccentricity of the chromatin is computed to investigate the change in the shape of the chromatin detected as the sensitivity level increases, which is another aspect of chromatin pattern that has yet to be explored. From Fig 2(a), the increment rate of the average number of chromatin detected is greater than the rate for the average total area of the chromatin. Thus, the average area of the chromatin decreasing with the increasing of sensitivity levels as shown in Fig 2(b) and 2(c). When the sensitivity level increases, more chromatin can be detected and this might result in the generation of more overlapping and combination of regions. The elimination of overlapping regions and replacement of these regions with the regions detected at lower sensitivity level reduces the rate of increment of the average total area. Details of the issues on regions overlapping and combination can be found in the supplementary material. The boxplot of the area for 150 test images in Fig 2(b) shows the decreasing trend in the median value of the size of the chromatin as the sensitivity level increases. The interquartile range becomes smaller as the sensitivity level increases. This indicates that when more chromatin are detected, their average size become similar. The standard deviation decreases with the increment of sensitivity level. It could be implied from these stimulation results that at a sufficient level of sensitivity, i.e. level 3 and above, the size of the chromatin detected would have less fluctuation; this would result in almost similar fine granular chromatin pattern for the visual perception of most pathologists and cytotechnologists. From Fig 3(a) and 3(b), the distance between the nearest chromatin pair decreases as the sensitivity level increases. When more chromatin is detected at higher sensitivity level, the distance between all nearest chromatin pair becomes shorter. Although the distance between the nearest chromatin pair decreases, the standard deviation decreases at lower rate where it does not vary significantly as compared to the area. This indicates that the sensitivity level less affects the chromatin distribution in terms of their distances. As the sensitivity level increases, the distance between the nearest chromatin pair has little difference. Thus, it can be concluded that the chromatin distribution of the cervical nucleus image always appeared to be evenly distributed, provided the amount of chromatin detected is at a sufficient level. A pathologist might perceive different amount of chromatin from another pathologist. From the observation on the distance between the nearest chromatin pair, these pathologists will eventually observed the similar distribution of chromatin patterns because the chromatin appeared to be evenly distributed regardless the sensitivity level.
The eccentricity values demonstrated in Fig 4(a) and 4(b) show that the shape of the chromatin lies within the range of circle and ellipse. As the sensitivity level increases, the chromatin shape becomes more round with their eccentricity value getting closer to zero. The interquartile range for the eccentricity of every sensitivity level is similar with the median value decreases as shown in Fig 4(a). The standard deviation appeared to be constant regardless the changes in sensitivity level. This shows that as the sensitivity level increases, even though the shape of the chromatin detected has increasing roundness, the eccentricity values of all the chromatin regions in an image show little difference among each other.
From the statistical analysis, sensitivity level 4 appeared to be the most sufficient level to represent the distribution of the chromatin pattern for non-neoplastic cervical squamous cell. From Table 2, the chromatin pattern at both sensitivity levels 4 and 5 has insignificant difference for both chromatin size and the distance between the nearest chromatin pair as the amount of fuzziness changed from 2.0 to 4.0. The steady trend in the standard deviation of these two sensitivity levels for the measurement of the chromatin area and the distance between the nearest chromatin pair could also be regarded as equivalent to the criteria 'evenly distributed, fine granular chromatin' for the classification of non-neoplastic cervical squamous cells. Therefore, statistically, the most representative sensitivity level is 4. Cross-checking this statistically selected sensitivity level, the visual reality perception in the form of survey of human experts also returned similar grand average sensitivity levels of 3.725 and 3.575, validating the sensitivity level 4 as the most representative level for model construction of chromatin pattern. With the simulation values as shown in Table 4, we develop the model for the distribution of chromatin pattern based on the proposed technique. The model is shown in Fig 6.

Conclusion
In this study, we have quantified the criteria 'evenly distributed, fine granular chromatin' for the chromatin pattern of non-neoplastic cervical squamous cell. The tool which implements Quantifying Chromatin Pattern of Cervical Squamous Cells the Fuzzy C-Means clustering technique segment the chromatin from cervical squamous cell nuclei at different sensitivity levels and thus imitating the different chromatin detection sensitivity of individual pathologist or cytotechnologist based on his/her experience and understanding on the subjectively-defined criteria. A model representing the distribution of chromatin pattern for non-neoplastic cervical squamous cell is developed with the following quantitative features: a nucleus of non-neoplastic squamous cell has an average of 67 chromatins with a total area of 10.827μm 2 , the average distance between the nearest chromatin pair is 0.508μm and the average eccentricity of the chromatin is 0.47. As an initial effort to quantify the criteria in a definite way, the tool could be useful to the pathologists as it can be installed in laboratories and hence eliminates the discrepancies of diagnostic due to the ambiguity of defining the criteria. For future improvement, more sample cervical squamous cells could be included for a better representation of chromatin features and we will further extend our work to cases of low grade and high grade squamous intraepithelial lesion.
Supporting Information