Development and Validation of a Histological Method to Measure Microvessel Density in Whole-Slide Images of Cancer Tissue

Despite all efforts made to develop predictive biomarkers for antiangiogenic therapies, no unambiguous markers have been identified so far. This is due to among others the lack of standardized tests. This study presents an improved microvessel density quantification method in tumor tissue based on stereological principles and using whole-slide images. Vessels in tissue sections of different cancer types were stained for CD31 by an automated and validated immunohistochemical staining method. The stained slides were digitized with a digital slide scanner. Systematic, uniform, random sampling of the regions of interest on the whole-slide images was performed semi-automatically with the previously published applications AutoTag and AutoSnap. Subsequently, an unbiased counting grid was combined with the images generated with these scripts. Up to six independent observers counted microvessels in up to four cancer types: colorectal carcinoma, glioblastoma multiforme, ovarian carcinoma and renal cell carcinoma. At first, inter-observer variability was found to be unacceptable. However, after a series of consensus training sessions and interim statistical analysis, counting rules were modified and inter-observer concordance improved considerably. Every CD31-positive object was counted, with exclusion of suspected CD31-positive monocytes, macrophages and tumor cells. Furthermore, if interconnected, stained objects were considered a single vessel. Ten regions of interest were sufficient for accurate microvessel density measurements. Intra-observer and inter-observer variability were low (intraclass correlation coefficient > 0.7) if the observers were adequately trained.


Introduction
Tumor growth can only be achieved when sufficient blood vessels are present in the tissue. Two major processes can be responsible for the blood supply: sprouting angiogenesis or, alternatively, co-option of existing blood vessels of the host [1]. Angiogenesis is triggered by vascular endothelial growth factor (VEGFA), which is produced in the tumor [2]. The molecular mechanisms of vascular co-option, on the other hand, have not been fully elucidated yet. The most frequently used technique to quantify the result of angiogenesis or vascular co-option in a tumor section is based on measuring the microvessel density, i.e. counting the microvessels at a high magnification (200x-400x) in a predefined number of fields [3][4][5][6]. In several cancer types, microvessel density has been quantified (e.g. in colorectal carcinoma (CRC), glioblastoma multiforme (GBM), ovarian carcinoma (OC) and renal cell carcinoma (RCC) ( Table 1)) and found to be prognostic for survival [7]. However, not the same unit of measurement was used (e.g. microvessels per mm 2 and microvessels per 200x field), data were highly variable and therefore difficult to compare, as illustrated by mean microvessel density in CRC ranging from 6 to 351 across several studies. In order to visualize microvessels, tumor sections have been stained immunohistochemically for one or multiple pan-endothelial markers, such as CD31 [8][9][10], CD34 [10][11][12], von Willebrand factor [10,12,13], endoglin [10], and/or coagulation factor VIII [10]. Hitherto, microvessel counting has mostly been performed in a fraction of the total tissue area determined by a sampling method such as the vascular hotspot method (Weidner's method). This method involves the selection of one to five areas with the highest density of microvessels (hotspots) at low magnification, and the counting of vessels in these areas at high magnification [14,15] by computerized image analysis systems [16,17] or by applying a Chalkley grid [18]. In a previous study, we used a systematic uniform random sampling (SURS) method to avoid observerdependent sampling variation and selected a limited number of at least five regions of interest (ROIs) on each whole-slide image (WSI) [19]. In these ROIs two parameters were measured by using an unbiased array of test points (grid) separated by constant distances: the number of vessel profiles per area (Q A or microvessel density) and the number of grid points overlapping with vessels per area (A A ) [20]. This sampling and counting method had previously been compared with other schemes, such as the hotspot method [21]. At the intra-observer level, the methods have variations of the same magnitude (coefficient of variation (CV) around 20%) [21]. At the inter-observer level, the SURS estimate of Q A from the whole tumor section and the Chalkley method had the lowest variation (CV around 21%) with a small contribution by observers (CV 8% to 9%) [21]. Although the SURS estimate appeared the most reliable method to pick up microvessel density differences between study subjects [21,22], a major drawback is that it is labor-and time-intensive, limiting its use [21]. SURS has only been extensively studied in breast cancer [10,21,23], without applying digital tools to automate the process and without a thorough validation that is necessary for clinical use [5,24]. Accordingly, we developed and validated a method to measure microvessel density by using computer-assisted manual SURS of WSIs of cancer tissue, named AutoTag and AutoSnap, which reduces workload and guarantees full traceability [19]. In the present study, we investigated the intra-and inter-observer variability of this method in CD31-stained tissue sections of four different cancer types and in samples that have different spatial distributions of blood vessels.

Materials
WSIs were made of existing sections stained for CD31 from our database for inclusion in this study. Samples were coded to protect the privacy of patients. All samples were obtained in Only samples with an area that could fit more than nine assessable ROIs were considered for selection. Finally, a set of samples was selected based on a representation of both low and high microvessel density heterogeneity in the study group. Two observers (KM and VC) trained and experienced in counting microvessels performed initial measurements. Four extra observers (ES, PV, WW, YW), two of which did not have previous experience in vessel counting (YW, WW) and two pathologists (PV, WW), were trained (30 minutes) and performed follow-up measurements on the same set of samples to assess inter-observer variability. SURS of 15 ROIs in the WSIs was done twice with Pannoramic Viewer (3DHISTECH, Budapest, Hungary) by one observer (KM) using a 20x magnification (Fig 1) assisted by the AutoTag and AutoSnap applications [19]. The second group of ROIs did not overlap with the ROIs from the first group. The first group of ROIs was used for assessing the intra-and interobserver variability, while the second group was used for the inter-ROI variability. Guided by a pathology report of the closest hematoxylin and eosin-stained section, regions were taken in viable tumor tissue according to the SURS principle, but regions with abundant necrosis, inflammation, or ulceration were discarded. All images were analyzed on identical, color-calibrated displays. To assess intra-observer variability, vessels were counted at two different time points (with an interval period of one month) to allow washout of the visual memory [25]. A web-based viewer (Pathomation BVBA, Antwerp, Belgium) was used to guarantee traceability when analyzing the grid-combined images of the ROIs. Pathomation software allows combining data forms and WSI in the same viewport assuring that measurement results and the sample IDs stay unequivocally linked.

Stereological point counting
All grid points overlapping with vessels (V) were counted, regardless of whether the microvessels crossed the left or bottom outer grid lines (Fig 1). A grid point, which was designated by two perpendicular cross-lines, was regarded as overlapping a microvessel when it fell on an endothelial cell or a vessel lumen (red arrows versus shaded red arrow in Fig 1). When, exceptionally, only a single endothelial cell of a larger vessel was stained, all other endothelial cells that lined this vessel were nonetheless counted upon intersection. To establish a reference area, all grid points intersecting with tissue (V ref ) were counted. Small necrotic zones within tumor structures or glandular lumens were considered as cancer tissue. Only if more than 75% of the grid area (more than 60 out of the 81 grid points) covered tissue, the ROI was analyzed. The unbiased estimation of the microvessel areal fraction was calculated for each sample according to: with an i value from 1 to 15 ROIs, expressed as a percentage of microvessels per area, with V i the number of grid points overlapping with vessels in ROI i and V i,ref the number of grid points hitting tissue in ROI i [19].

Microvessel counting
Besides the stereological point counting, the microvessel density (Q A ) is captured by our method. The outer borders of the superimposed grid (Fig 1) [19] delineated the counting chamber [20]. Vascular structures crossing the virtually extended left or bottom lines of the grid were not counted. Regardless of staining, the others were counted (shaded green arrows in Fig 1) [20]. The initial counting rules only took into account stained structures with a clear lumen or without a lumen but larger than one tumor cell. Accordingly, very small cross-sectioned capillaries without a clear lumen were not counted. CD31 staining of suspected myofibroblast-like cells or of cells not belonging to a blood vessel was also excluded for counting. Because of high inter-observer variability using these counting rules, the following, adapted counting rules were defined in which every CD31-positive object, no matter how small, should be counted, except suspected CD31-positive monocytes, macrophages and tumor cells. Furthermore, if CD31-positive objects were connected, they were considered a single object, while absence of staining defined two or more separate objects. microvessel density was calculated for each sample according to: with an i value from 1 to 15 ROIs, expressed as number of microvessels per area, with N i the number of counted vessels in ROI i and V i,ref the number of grid points hitting tissue in ROI i [19].

Statistical analysis
Heterogeneity of microvessel distribution was determined for every cancer type by calculating the difference between the minimum and maximum number of counted microvessels per ROI (N) in one sample. Heterogeneity was considered low or high when this difference was respectively below or above the median of the calculated differences for all the samples in the database of that cancer type. The average of two repeated measurements for each of the two observers (KM, VC) was used for the calculation of inter-observer variability. A script was written in the statistical package R (version 3.2) to perform the calculations and plotting [26]. The intraclass correlation coefficient (ICC) was calculated by using the icc(ratings, . . .) function from the irr package. A two-way model and type agreement was chosen. The unit of analysis for N and V was 'unit', whereas for Q A and A A it was 'average'. The Kruskal-Wallis Rank Sum Test was carried out with the kruskal.test(formula, data, . . .) function from the stats package. XY plots of the counts with prediction intervals, Bland-Altman plots and Tukey boxplots were also constructed. Two-way ANOVA and paired Student t-test were performed in R. The minimum number of ROIs required for analysis of microvessel density was calculated using random sampling with replacement, also known as bootstrapping [19].

Inter-ROI variability
Comparing Q A and A A for two groups of non-overlapping ROIs (n = 15) in the same sample (n = 6) revealed that the choice of locations of the ROIs only affected A A . The calculated ICCs for Q A were always above or equal to 0.8 (CRC: 0.9, GBM: 1.0, OC: 1.0, and RCC: 0.8), whereas the ICCs for A A were significantly lower (CRC: 0.6, GBM: 0.6, OC: 0.7, and RCC: 0.7) (p = 0.01; paired Student t-test). The point-counting (A A ) was sensitive to the choice of locations of the ROIs, whereas the profile-counting (Q A ) was more robust with regard to the choice of locations.

Minimum number of ROIs for accurate microvessel density measurements
A plot from a bootstrap analysis showed higher variation at lower number of ROIs compared to higher number of ROIs (Fig 3) [19]. Creating these graphs for 19 colorectal carcinomas, 22 renal cell carcinomas, 21 glioblastomas, and 21 ovarian carcinomas, counting ten ROIs appear to be sufficient for accurate microvessel density measurements [19]. Highly heterogeneous samples require more ROIs compared to samples with low heterogeneity (Fig 4). A relationship with heterogeneity was established by two-way ANOVA (CRC: p < 0.05; GBM: p < 0.001; OC: p < 0.01; RCC: p < 0.001). Moreover, there was a relationship with the observer (CRC: p < 0.001; GBM: p < 0.05; OC: p > 0.1; RCC: p < 0.05), implying that the required number of ROIs can differ between observers. On average, the minimum number of ROIs required was 5 for OCs, 7 for CRC and GBM, and 9 for RCC.

Intra-observer variability
All ICC-values for the four parameters (N and Q A , V and A A ), in the four cancer types and for both observers (KM, VC) were higher than 0.7 (Table 2), which is generally considered the minimal acceptable reliability [27]. Importantly, 81% of ICC-values were higher than 0.9, which is considered excellent concordance [27]. The ICC-values for CRC were lowest, those for GBM highest. The parameters Q A and A A showed lower intra-observer variability compared to N and V (Fig 5).

Inter-observer variability
Inter-observer variability ICC-values for the four parameters (N and Q A , V and A A ) did not exceed 0.7 in all four cancer types (Fig 5, Table 3). The variability of N, Q A , and A A was large in the CRC samples (Fig 6), which might be due to a large systematic bias between the observers. Therefore, a third trained and experienced observer (ES) quantified the samples. The ICC-values for the variability between observer two and three were better (0.9, 0.8, 0.9, and 0.8 for respectively N, Q A , V, and A A ), but this was not the case between observer one and two, and observer one and three. Therefore, a series of consensus training sessions (two hours in total) was held in which the most discrepant cases were discussed. Accordingly, the following new counting rules were proposed: every CD31-positive object, no matter how small, should be counted, except suspected CD31-positive monocytes, macrophages and tumor cells. Identification of these cell types is not straightforward. For example, very dense inflammation prohibits accurate counting. Furthermore, if CD31-positive objects were connected, they were considered a single object, while absence of staining defined two or more separate objects. Using these new counting rules, three different observers (KM, PV, YW) recounted the CRC samples with the strongly discrepant inter-observer counts when the initial counting rules were used and the OC samples. A fourth observer (WW) counted only the OC samples. Both pathologists (PV, WW) recommended the exclusion of one sample from each set (OC1 and CRC7) as there were too many CD31-positive inflammatory cells. The inter-observer variability ICC-values resulting from the new counting rules were all greater than 0.7. Importantly, more than half of ICCvalues exceeded 0.8 ( Table 4). The parameters Q A and A A showed lower inter-observer variability compared to N and V.

Discussion
Manual ROI sampling and blood vessel counting is a time consuming and labor intensive process, partly due to the amount of effort required to find valid ROIs under the microscope. Although there are different methods for sampling and counting, the validation is usually limited to one specific cancer type. It is important to note that the sampling method chosen depends on the research question, as the hotspot method (Weidner's method) will quantify the strongest angiogenic areas of a tumor and the SURS method provides global information. We developed a novel ROI sampling and microvessel counting method that combines parts from existing methods and adds stereological techniques to improve the validity of the results. Furthermore, we used WSIs of CD31-stained tissue sections, allowing traceability and higher throughput by providing ROI annotations on the images [19]. The level of consensus within and between observers was evaluated by calculating the ICCs, which were higher than the generally accepted minimal reliability of 0.7 [27] and more than half of the results even exceeded 0.8. These results were only possible after consensus training with all observers and with the new counting rules. Therefore, we strongly advise simplified counting rules and extensive consensus training sessions with all observers involved. Most attention needs to be paid to the minimum size of a staining pattern that can be considered a vessel. Our newly proposed counting rules include every CD31-positive object, no matter how small, and therefore also has the advantage of including single endothelial venules, indicative of active angiogenesis [28].
We evaluated the effect of the location of ROIs on the variability of the microvessel density and areal fraction of the blood vessels. No major effect was present if SURS is performed for profile counting. However, for the areal fraction, ICC-values lower than 0.7 were obtained. This is not unexpected, as the amount of overlapping structures with the grid will depend heavily on these locations as only a small area is sampled by the grid intersections. In addition, these ICCs can be regarded as too optimistic because only one observer assessed the ROIs. Therefore, additional intra-observer variability could be taken into account. We limited our investigation to the above-mentioned parameters, but it would be interesting to study the effect of magnification, grid type or grid size in future research.
A limitation to our study is the relatively small sample size and therefore a follow-up study with more samples is advised.
Several challenges are inherent to vessel counting independently of the method used: firstly, tumors develop in different tissue types, which all have their own characteristic vessel network architecture [29][30][31][32]. The distribution of microvessel sizes and growth patterns vary between, but also within, a cancer type [29,32,33]. Starting from the first published milestone study by Weidner N et al. [15] many published research studies have been conducted regarding the significance of microvessel density in breast cancer patients. It is of great interest to evaluate our method in a follow-up study with breast cancer samples. Secondly, the different cell types of which microvessels are composed are another source of bias [34][35][36][37][38]. The proportions of the different cell types in a vessel define the vessel type. The most important cell type is the endothelial cell [39]. Peri-endothelial cells, such as pericytes and smooth muscle cells, strengthen the vessel and expand its functionalities [34,40]. The importance of these different types of vessels present in the growing tumor tissue seems to be prognostic [41,42] and even predictive of Table 2. Intra-observer variability for the old counting rules. This was calculated by the intraclass correlation coefficients (ICC) between the counting of round one and two of observers 1 and 2 (ICC1 and ICC2) for the four different cancer types and the four different parameters.  survival after therapy [24,[43][44][45]. Developing an assay for the detection of these pericytes in blood vessels would be of great interest, but is challenging. For example, alpha-smooth muscle actin, which stains pericytes, also stains myofibroblasts hindering quantification. Future studies will be needed to address the challenges imposed by such a strategy. Because the choice of vascular cell type that is stained can lead to the selection of a specific type of vessel in terms of functionality, this choice will also affect the number of the microvessels counted. Most microvessels are stained in the tumor sections using pan-endothelial markers such as CD31. However, these proteins are not only expressed by endothelial cells, but also by other cells, such as macrophages and platelets [12], which may result in an overestimation of the number of microvessels. Thirdly, the counting method has sources of bias as well, because a decision has to be made whether stained cells resemble endothelial cells in shape and size. Depending on the orientation of the vessel and the direction of the sectioning, one vessel can appear as separate shapes in the two-dimensional section. Finally, there are several challenges presented by manual counting per se, such as searching and finding microvessels, counting and memorizing the number of counts [46,47].
Nonetheless, the present investigation shows that it is possible to obtain an unbiased result by our method. Moreover, the validity of the method described in the present study was shown in the GOG-0218 trial [48], in which patients with epithelial ovarian cancer were treated with carboplatin-paclitaxel with or without bevacizumab (manuscript in preparation). Higher microvessel density values in the CD31-stained samples that were measured by our method showed prognostic and potential predictive value for progression-free survival [48]. In conclusion, the present microvessel counting method is reliable if observers are extensively trained. Table 3. Inter-observer variability for the old counting rules. This was calculated by the intraclass correlation coefficients (ICC) between the averaged counting of observer 1 (KM) and 2 (VC).  Although the amount of ROIs needed depends on the cancer type, on average ten ROIs are sufficient for accurate microvessel density measurements. Analyzed the data: EF YW KM.

Supporting Information
Contributed reagents/materials/analysis tools: MMK.
Wrote the paper: KM YW PBV GRYDM. Table 4. Inter-observer variability for the new counting rules. This was calculated by the intraclass correlation coefficients (ICC) between the counting of observer 1 (KM), 4 (PV), 5 (YW) and 6 (WW).

Cancer
Parameter