Bioinformatics analysis of whole slide images reveals significant neighborhood preferences of tumor cells in Hodgkin lymphoma

In pathology, tissue images are evaluated using a light microscope, relying on the expertise and experience of pathologists. There is a great need for computational methods to quantify and standardize histological observations. Computational quantification methods become more and more essential to evaluate tissue images. In particular, the distribution of tumor cells and their microenvironment are of special interest. Here, we systematically investigated tumor cell properties and their spatial neighborhood relations by a new application of statistical analysis to whole slide images of Hodgkin lymphoma, a tumor arising in lymph nodes, and inflammation of lymph nodes called lymphadenitis. We considered properties of more than 400, 000 immunohistochemically stained, CD30-positive cells in 35 whole slide images of tissue sections from subtypes of the classical Hodgkin lymphoma, nodular sclerosis and mixed cellularity, as well as from lymphadenitis. We found that cells of specific morphology exhibited significantly favored and unfavored spatial neighborhood relations of cells in dependence of their morphology. This information is important to evaluate differences between Hodgkin lymph nodes infiltrated by tumor cells (Hodgkin lymphoma) and inflamed lymph nodes, concerning the neighborhood relations of cells and the sizes of cells. The quantification of neighborhood relations revealed new insights of relations of CD30-positive cells in different diagnosis cases. The approach is general and can easily be applied to whole slide image analysis of other tumor types.


Introduction
The lymph node is a structured organ with major compartments, such as the subcapsular sinus, B cell follicles, the T cell zone, trabecular and medullary sinuses, and blood vessels. Many cells of different type enter the lymph node. They migrate from compartment to compartment, interact with each other and with other cells, and show a complex movement in a stromal cell network [1]. This movement is neither systematically investigated nor understood. In particular, the movement of tumor cells would be of great interest to comprehend the progress of the disease.
Hodgkin lymphoma (HL) is a malignant tumor disease of the lymphoid system, which originates from B-lineage cells at various stages of development [2]. The annual incidence of about three cases per 100, 000 persons makes HL to one of the most frequent lymphoma of the Western civilization [3,4]. In the United States, about 185, 000 people were diagnosed with HL in 2011. Currently, more than 9, 000 new cases are estimated per year. The probability to be diagnosed with HL for a single person during the life time is 0.2%. 85.3% of the HL patients survive five years or more, which is a high survival rate compared to other tumor types. Nevertheless, 1, 100 people die of HL per year in the United States, so HL is still a severe disease.
About 11% of all malignant lymphomas are HL. The WHO (World Health Organization) classification differentiates HL into two main groups [5], classical Hodgkin lymphoma (cHL), the most common type with about 95% of all cases, and the less frequent nodular lymphocyte predominant type. Classical Hodgkin lymphoma is further subdivided into four types. The nodular sclerosis type (NScHL) is the most common type with 60 − 70% of all cHL cases. In NScHL, at least one nodule formed by sclerotic bands, which contains malignant tissue, is present. With roughly 25%, mixed cellularity cHL (MCcHL) is the second frequent type of cHL. The two other subtypes named lymphocyte-depleted and lymphocyte-rich cHL are rare. They will not be considered in this study.
On cellular level, HL has some unusual characteristics compared to other tumors. The malignant morphologically huge pleomorphic cells called Hodgkin and Reed-Sternberg (HRS) cells originate mainly from B cell pre-cursor cells of the germinal center. In rare cases, HRS cells originate from T cells. Whereas Hodgkin cells are mononucleated, Reed-Sternberg cells are multinucleated (Fig 1).
HRS cells exhibit a broad morphological spectrum. In contrast to other tumor types, for cHL, only 1 to 2% of the tissue of the lymph node is covered by tumor cells. The majority of the tumor microenvironment consists of reactive cells of the immune system, including lymphocytes, macrophages, eosinophiles, mast cells, plasma cells, and stromal cells. In consequence, no solid tumor is present, but an inflammatory heterogeneous microenvironment. HRS cells actively influence their environment via rearrangement of cytokines and chemokines [6]. Quantified knowledge of the distribution and movement of tumor cells is necessary to understand the progress of the disease.
For diagnosis, tissue sections are cut and immunohistochemically stained. The visualization of HRS cells by specific staining based on CD30 antibody in combination with microscopy is routinely applied for diagnosis of lymphoma, see subsection Image processing in section Materials and Methods for more detail. Computer-aided analysis of tissue sections becomes essential for diagnosis and prediction of the disease's progress. All this led to the emerging field of digital pathology, see [7][8][9]. Nowadays, beside traditional microscopy, digital scanners allow the digitization of whole object slides to whole slide images (WSI). Nevertheless, WSI are rarely used in diagnostic work-flows, because it would require a costly infrastructure to automate the WSI retrieval process. Another aspect is that, the visual evaluation of a pathologist is strongly dependent from the display used. Although a few institutions have already implemented a digital workflow, see for example [10,11], the majority of pathologists still evaluate tissue sections in the microscope.
For WSI analysis, Kothari et al. applied a quantitative analysis of WSI to explore kidney renal clear cell carcinoma cases [12]. In previous work, we have investigated the quantification of CD30-positive pixels in cHL WSI [13]. These tissue sections have shown a high variability regarding the CD30-positive pixel count, which most likely has been an effect of the various progression states of the disease. Therefore, a classification based on pixel quantification has not been able to distinguish between the disease types, MCcHL and NScHL, and the nontumor type, lymphadenitis.
Malignant cells have been reported to develop abnormal, irregularly shaped nuclei [14,15]. To separate and label benign and malignant tissues, morphologic features, such as gray-level texture features, color-based features, Law's Texture Energy-based features, Tamura's features, and wavelet features, have been applied [16][17][18][19][20]. These texture-based methods consider the neighborhood of cells, applying the gray scale co-occurrence matrix, the wavelet transformation, and Fourier transformation, see [21] for a detailed overview. Various imaging approaches have been applied to analyze and classify malignant tissues for Barrett's cancer and prostate cancer, respectively [19,22]. These previous approaches have analyzed color, texture, and object features, but no morphological shape features of tumor cells. Image texture analysis has been also applied to breast tissue images to determine the region of interest to follow up microarray experiments [23,24]. Other work concerns heterogeneity assessment of tissue sections in WSI to label breast cancer histopathological digital images [25].
Currently, more and more deep learning methods are applied in digital pathology, e.g. convolutional neural networks, see, for example, [26][27][28] for more information and use cases. Deep learning methods provide a very effective and versatile way for the analysis and classification of tumor images. The drawbacks of these methods are the necessity of large training sets and the loss of traceability and, thus, the lack of full interpretability of the results.
In the diagnostic processes, a systematic computational analysis is not yet regularly applied [29]. There are no statistical studies available that investigate morphological properties and neighborhood relations of tumor cells in the lymph node. The aim of the present study was to draw conclusions about migration of HRS cells by systematically analyzing the distribution of HRS cells in the tissue as a function of their size and shape. This includes also non-tumor cells, which are CD30-positive and represent activated lymphocytes. An inflammation of the lymph node called lymphadenitis (LA) often contains such activated lymphocytes. It is so far not possible to perform life imaging of human lymph nodes affected by HL, and a corresponding mouse model of HL does not exist. Since histological sections represent snapshots of the tumor development, including migration, we explored features in the WSI, such as the number, distance, and neighborhood relationship of tumor cells, applying statistical methods.
We wanted to answer following questions. Do HRS cells come in close spatial contact to communicate and cooperate with each other? Is the close spatial neighborhood somehow related to the morphology of the cells? Can we draw conclusions regarding the movement of HRS cells? We applied following hypotheses.
• To yield statistically significant neighborhood relations, we can apply a classification of cells to one of eight classes according to empirically defined specific thresholds for the morphological features eccentricity, solidity, and size.
• The morphology of the next neighbor of a CD30-positive cell depends statistically significant on the morphology of the cell itself.
• The cell morphology of a tumor cell is important to understand its role in the spatial organization and the spread of classical Hodgkin lymphoma.
• The size of CD30-positive cells is larger in classical Hodgkin lymphoma than in lymphadenitis.
• The majority of HRS cells in classical Hodgkin lymphoma have large diameters, i.e., diameters larger than 30 μm.
• HRS cells of elongated shape are moving.
• Frayed cells are more communicating than non-frayed cells.
• Next neighbor relations of CD30-positive cells in tissue of patients with lymphadenitis and classical Hodgkin lymphoma are not random.
• CD30-positive cells cluster in classical Hodgkin lymphoma and lymphadenitis.
• The neighborhood relations of CD30-positive cells show differences between lymphadenitis and the subtypes nodular sclerosis and mixed cellularity of classical Hodgkin lymphoma.
To the best of our knowledge, these questions have not been addressed so far, neither for HL nor for other tumor types.

Results and discussion
Here, we use abbreviations we describe in more detail in section Materials and Methods. We define a profile class (PC) for each cell, according to the size and morphology of the cell. We call the PC of the nearest neighbor of a cell as neighbor profile class (NPC). 63%, describing small, round cells followed by PC = 4 with 25.47%, standing for also small, but frayed cells. Only very few cells, 0.75%, were large and elongated (PC = 3). Overall, PC = 1, 3, 5, and 7, describing large cells, were less frequent than the PC = 0, 2, 4, and 6, describing small cells. For disease-specific numbers for the fractions of PC, see S1 Table. For NScHL and MCcHL, the fractions of large cells were 17.36% and 11.06%, respectively, whereas for LA only 8.10%, see Because cHL is commonly characterized by the occurrence of large HRS cells, we expected to find many large cells, at least in images diagnosed as cHL. Surprisingly, the measured difference in the numbers of large PC in cHL and LA were not sufficiently significant to distinguish between cHL and inflammation.

The diameters of cells differ in classical Hodgkin lymphoma and lymphadenitis
CD30-positive cells of the classical HL, NScHL and MCcHL, which are most likely HRS cells, vary in size between 20 and 60 μm [30][31][32], whereas the size of CD30-positive cells of LA, which originate from activated lymphocytes, was between 10 and 30 μm [33][34][35]. Based on the distribution of the cell diameters of CD30-positive cells (Fig 4), we assumed the majority of

Small, round cells favor cells of the same type as neighbors
In Fig 6, the thick red arrow from PC = 6 to NPC = 0 indicates a strong unfavored relation of small, elongated, frayed profiles with small, round cells in 74% of the images. In 26 of the 35 images, small, round cells were significantly underrepresented in the neighborhood of small, elongated, frayed PC. This was mutually, indicated by a reversed red arrow (from PC = 0 to NPC = 6) with a negative score of 69%. Note that, favored and unfavored neighborhood relations have not to be mutual, see S1 Fig  NScHL: Unfavored neighborhood relations of cells to other PC is less noticeable, see Fig 7A).
Only two red arrows indicate a negative score of more than 50%. Small, round cells (PC = 0) show unfavored neighborhood relations to small, frayed, elongated cells (PC = 6) and vice versa. There exists a preferred neighborhood relation between large, frayed, elongated cells (PC = 7) and large, frayed cells (PC = 5).

Small, round cells have large distances to their preferred neighbor
In NScHL, the cells have a smallest mean distance of 37.9 ± 6.1 μm, see Table 1. It increases to 42.7 ± 11.9 μm and 48.3 ± 18.1 μm for MCcHL and LA, respectively. Within the large standard deviations, the differences in the mean distances are not statistically significant. Compared to the theoretically expected distribution for randomly located cells, the distribution of distances to the nearest neighbor was shifted towards small values and had a much more narrow shape, see S3, S4 and S5 Figs. This observation was valid for all 35 images and indicates a significant clustering of the CD30-positive cells independent of the medical diagnosis. The result has been verified by a graph-theoretical analysis of network properties of the corresponding cell graphs [36].
We asked whether the strong preference of small, round cells, PC = 0, to cluster together could be influenced by an attraction between these cells. For the majority of images, the ratio between the mean distances of small, round nearest neighbors and those of two nearest neighbors of arbitrary PC was greater than 1, see S6 Fig. Pairs of small, round cells showed large distances between them, with mean distances up to 50% enlarged, so that an attraction between  The Pearson correlation coefficient of the mean distance between pairs of PC = NPC = 0 cells and the log-odd value of the significance level of a preference for these pairs were positive, over all images 0.4. The positive value of the correlation coefficient indicated that, the preference of a neighborhood of PC = NPC = 0 cells strongly correlated with a large distance between the cells.
We found a significantly high number of pairs of PC = NPC = 0 cells in tissue areas of low cell density, e.g. see Fig 8. A preferential location of small, round cells in sparsely populated tissue regions could explain the large distance between pairs of PC = NPC = 0 cells. Interestingly, a positive correlation between large distances and neighborhood preferences occurred only for these pairs, whereas for all other combinations, the correlations were negative. Only in the cases of pairs of PC = NPC = 0 cells, the preferences of cells of a PC to stay among themselves were strongly correlated with tendency to keep the distances large to the next neighbors.

Images
We analyzed 35 two-dimensional histological WSI of human lymph nodes provided by the Dr. Senckenberg Institute of Pathology, Frankfurt am Main. The anonymized cases were manually preselected based on quality from a set of about 160 images. Because of many artifacts regarding, for example, the staining or folded tissue, the selection had to be done manually. To  In extreme cases, the intensity of the nonspecific staining is very close to the actively stained regions, and a large fraction of the tissue section is covered by nonspecific staining. We chose a similar number of images for the three disease types. Overall, more MCcHL cases than lymphadenitis and NScHL cases were available. Some lymphadenitis cases show an unusual progression, e.g. an inflammation of the lymph node over a long time period. As the clinical examination is not sufficient to make the final diagnosis, the lymph node is biopsied for further histopathological exploration. Then, physicians contacted the Reference-and Consulting Center of Lymph Node and Lymphoma Pathology in Frankfurt am Main.
Additionally, the intensity of the staining was not consistently good in all areas of the image, such that even a low amount of nonspecific staining reduces the quality of the WSI significantly. The selected images cover the three histological diagnoses, NScHL (12 images), MCcHL (12 images), and LA (11 images). The tissue was CD30-immunohistochemically stained to visualize the HRS cells and special forms of activated B and T lymphocytes [37]. Overall, we considered more than 400, 000 CD30-positive cells in images with a resolution of 0.25 μm × 0.25 μm per pixel.

Image processing
The images were doubly stained, one staining to visualize the cell nuclei and cytoplasm and one to label tumor cells. In immunohistologically stained images, the dye is attached to an antibody, binding to the target protein. For visualization of the cell nuclei, we used the H&E (hematoxylin & eosin) staining. Hematoxylin is a standard stain that nonspecifically binds to all negatively charged components of cells and principally colors the nuclei of cells blue or dark-purple, along with a few other tissues. Since the backbones of the DNA and RNA are the main fraction of negatively charged biomolecules within the cell, their nuclei are stained most, but also the cytosol appears in bright blue. Hematoxylin staining increases the contrast of the image and makes the cell density visible. Eosin binds to positively charged proteins and stains the cytoplasm and some other structures including extracellular matrix, such as collagen. For a review on histological staining, we refer to [38].
For identification of HRS cells, CD30 represents a typical marker protein classified by the Cluster of Differentiation (CD) protocol. The list of CD molecules consists of more than 300 unique cell surface proteins that can be targeted by monoclonal antibodies [39]. CD30 stains tumor cells of cHL and some other malignant lymphomas, e.g. anaplastic large cell lymphoma. However, CD30 is not lineage-specific and can stain any kind of activated cells, which means that reactive bystander lymphocytes can become CD30-positive when they get strongly activated. Although, cells of other tumors, which are regularly CD30-negative can become CD30-positive upon activation. The HRS cells of both cHL subtypes are positive for CD30, whereas the abundant bystander cells are CD30-negative. HRS cells secrete soluble factors, which attract bystander cells and, thus, shape their direct microenvironment [40]. This is observed in both cHL subtypes, whereas CD30-positive activated lymphocytes in lymphadenitis do not secrete the respective cytokine and, thus, differ from the malignant HRS cells. In contrast, the differential diagnosis between cHL and lymphadenitis with activated CD30-positive lymphocytes by simple morphological examination can be challenging.

Image digitization
For digitization of the tissue samples, we used an Aperio Scanscope device and created SVS files in the Aperio's proprietary image format that represents an altered large-tiff format, containing multiple layers. The image sizes of our samples vary from 10,000 x 10,000 pixels up to 90,000 x 90,000 pixels, see S2 Table. The image format also includes a JPEG2000 compression, resulting in SVS files with a size between 0.5 and 4 GB for images used in this work.

Cell detection and classification
We identified cell profiles of CD30-positive cells by applying the in-house software pipeline Impro [13,36]. Impro implements image processing methods, such as the detection of a region of interest (ROI) and the color deconvolution to separate the stains. Additional methods, such as the thresholding for the image segmentation and the computation of morphological cell descriptors, were integrated using established imaging software and libraries like CellProfiler and the Java Advanced Imaging API (JAI). S8 Fig illustrates schematically the CD30 pipeline of Impro. CellProfiler [41,42] is an open-source image processing software focused on cell detection and images with biological background. It is designed to perform image processing tasks on high throughput data.
The pipeline neglected small objects of size 109 μm 2 or less. The threshold corresponds to the size of a round cell with a diameter of 11.8 μm. Applying the shape descriptors, eccentricity, solidity, and area provided by CellProfiler [41], we assigned each cell to one of the previously manually defined eight profile classes (PC), see Fig 9. The eccentricity measures the deviation of a fitted ellipse from a circle. Solidity is the fraction of the object pixels in the convex hull of the cell. Together with pathologists, we empirically defined specific thresholds to distinguish between small and large, round and elongated, and frayed and not frayed cell profiles, see S3 Table. Fig 10A)  For each cell profile, we computed the maximal Feret diameter using the CellProfiler module MeasureObjectSizeShape. The Feret diameter describes the distance of two parallel tangents to the cell in a given orientation. The maximal Feret diameter referred in the text as diameter is the largest value of all possible orientations.

Neighborhood analysis
To detect statistically significant correlations between cells, we studied the spatial neighborhood relationships defined by the Euclidean distance between the centers of gravity of the cells. We applied a maximal distance of 175 μm (700 pixels). This threshold corresponds to about ten times the diameter of an average cell, which is assumed to be sufficiently small to enable intercellular communication based on, e.g. chemokines or cytokines [43]. We computed a neighborhood table, which contains for each cell the PC and the PC of the nearest neighbor called neighbor profile class (NPC), see Fig 11 for an exemplary subsection of a histological image (A) and the corresponding neighborhood table (B).

Statistical significance
The probability to randomly choose a neighbor, NPC = j, is proportional to the frequency of the PC, f(PC = j), in the image. In a random choice, the PC of a cell should have no influence on the selection of the nearest neighbor. We expect to measure a value for the conditional probability that is within the statistical fluctuation indistinguishable from the relative frequency of the NPC, i.e. P(NPC = j | PC = i) � P(PC = j). For example, the entry for the conditional probability, P(NPC = 0 | PC = 0) = 0.354, in S4(B) Table has to be compared with the entry for the expected probability, P(PC = 0) = 0.316, in S4(A) Table. Part (A) of S4 Table exemplifies the measured probability P(PC = i) � f(i) to find a cell of a certain PC for a given image (ID = 1721). The table contains the relative frequency, f(i), of the PC i 2 {0, . . ., 7} identified by the row number.
We evaluated whether the deviation of the conditional probability from the expected probability is statistically significant for a rejection of the null hypothesis of a random selection of the nearest neighbor. For a set of n cells of PC = i, the probability to have a subset of k cells with nearest neighbor of NPC = j is given by the binomial distribution where p = P(PC = j) is the probability to find a class, j, by chance, i.e. f(j), the relative frequency of class j. We chose a significance level of α = 1%. A measured value k, has a p-value smaller than α only if k = 2 [k low , k up ]. A k outside the prediction interval has a significance level of α = 1%, which is sufficient to reject the null hypothesis of a random selection of the nearest neighbor.
For example, the image with ID = 1721 of diagnosis MCcHL contains 3, 435 cells of PC = 0 from 10, 860 cells in total. We computed the prediction interval [1047, 1125] for the significance level α = 1%. We measured a number of k = 1217 pairs of PC = 0 and NPC = 0, which is above the upper endpoint of the 1% prediction interval. Thus, we can reject the null hypothesis for the significance level α = 1%. Within a significance smaller than 1%, the small, round cells are enriched in the neighborhood of other small, round cells (PC = 0), i.e. they prefer the neighborhood of cells of their own kind. We call such a favored neighborhood relation to be significantly high (sh).
If the number of measured pairs is smaller than the prediction interval for the significance level, we call the neighborhood relation to be significantly low (sl). A non-significant (ns) neighborhood relation denotes a number of pairs within the prediction interval. We compiled the results for each of the 12 images of diagnosis NScHL, 12 images of diagnosis MCcHL, and 11 images of diagnosis LA in an individual significance matrix, e.g. see Table 2.
The significance matrix in Table 2

Ethics statement
The manuscript is based on geometric cell graphs, which were computationally generated from anonymized whole slide images of human tissues and does not contain any individual person's data in any form.

Conclusions
With our study we demonstrated how a semi-automatic analysis of WSI of tissue can be performed. For the first time, a systematic analysis of a high number of more than 400, 000 CD30-positive cells in WSI of lymph node tissue sections of NScHL, MCcHL, and LA was performed. We provided measured values for the distances between neighbored cells, the cell diameters of HRS cells in complete lymph node sections, and morphological characteristics for each cell. The profiles of CD30-positive cells were identified by the imaging pipeline Impro [13,36]. Based on morphological cell features, we defined eight morphological PC to classify the cells and computed the fractions of each PC. The presented study provides valuable information about the morphological characteristics and neighborhood relations of CD30-positive cells in tumor, cHL, and inflammation, LA.
Nevertheless, there are limitations of the method and many points that have to be improved. Although we considered just two disease types of Hodgkin lymphoma and one non-tumor type in 2D images and although we had not perfect material to investigate more images, we were able to perform a statistically sound analysis of CD30-positive cells. The study produced interesting results, regarding the distribution of CD30-positive cells in tumor and non-tumor tissue, the classification of these cells according to morphological properties, an analysis of the diameters of CD30-positive cells, and an analysis of favored and unfavored neighborhood relations of CD30-positive cells based on their morphological classification, all in dependence of the three considered cases, NScHL, MCcHL, and LA. The development of different cell forms of HRS cells have been experimentally analyzed in cell cultures. Re-fusion of divided cells have been demonstrated [44]. The shape of the lymphoid cells including tumor cells is influenced by different factors. These involve the above mentioned cell proliferation, cell activation, and cell transformation, which are dynamic processes regulated and controlled by intrinsic and extrinsic factors. The phases and cell morphology reflecting cell proliferation are well known, however detailed data confining each dividing step morphologically of CD30 cells are lacking. Nondividing cells should be round whereas before cell division, a polarization indicated by cell elongation and finally resulting in two or more smaller round or irregularly shaped cells will be seen. During cell division, it seems that most cells are stationary. In G0 phase, the diameter and cell elongation as well as formation of cell processes is influenced by cell movement. During movement, cell diameters and shapes are dynamically controlled by intrinsic movement programs. Extrinsic factors of cell movement is controlled and determined by the microenvironment, as collagen fibers, sinuses, and the consistence of surfaces cells are moving on. Last but not least, the process of cell activation influences especially the cell diameter. The stronger the activation, the larger the cell looks like, as a result of pathway activations and enhanced energy consumption. Summing up, all these factors importantly configurate the appearance of the individual cells. In histological section, a snapshot including all stained cells is done freezing cell dynamic in time and space. Nevertheless, in this paper definitions of morphology and localization of CD30-positive cells is the basis for the bioinformatic computation. So we differentiated in small round, large round elongated and other cell types, including hypothetically the above described processes and principles. The results cannot be completely exact, because a dynamic model exactly reflecting these factors should be the basis of computations. This test model has to be established in the near future for further validating the results reported in this investigation. Regarding further developments, the detection of CD30-positive cells in three dimensions by applying confocal scanning microscopy [45] will be an important aspect of experimental work. Multi-staining will allow to visualize various important players beside CD30-positive cells, as e.g. T cells and B cells that are known to interact with malignant cells. Pattern recognition is and will further be a focus of ongoing research.