Cancer stem cell subpopulations in primary colon adenocarcinoma

Aims The cancer stem cell concept proposes that tumor growth and recurrence is driven by a small population of cancer stem cells (CSCs). In this study we investigated the expression of induced-pluripotent stem cell (iPSC) markers and their localization in primary low-grade adenocarcinoma (LGCA) and high-grade adenocarcinoma (HGCA) and their patient-matched normal colon samples. Materials and methods Transcription and translation of iPSC markers OCT4, SOX2, NANOG, KLF4 and c-MYC were investigated using immunohistochemical (IHC) staining, RT-qPCR and in-situ hybridization (ISH). Results All five iPSC markers were detected at the transcriptional and translational levels. Protein abundance was found to be correlated with tumor grade. Based on their protein expression within the tumors, two sub-populations of cells were identified: a NANOG+/OCT4- epithelial subpopulation and an OCT4+/NANOG- stromal subpopulation. All cases were accurately graded based on four pieces of iPSC marker-related data. Conclusions This study suggests the presence of two putative sub-populations of CSCs: a NANOG+/OCT4- epithelial subpopulation and an OCT4+/NANOG- stromal subpopulation. Normal colon, LGCA and HGCA could be accurately distinguished from one another using iPSC marker expression. Once validated, novel combinations of iPSC markers may provide diagnostic and prognostic value to help guide patient management.


Tissue samples
Snap-frozen and formalin-fixed paraffin-embedded (FFPE) tissue samples of LGCA from ten patients and HGCA from eight patients, with patient-matched normal colon (NC) tissue samples from 17 of the 18 patients, were provided by the Gillies McIndoe Research Institute Tissue Bank for this study, which was approved by the Central Health and Disability Ethics Committee (Ref. 15/CEN/106).

DAB IHC staining
DAB IHC staining was performed on the entire cohort. 4 μm-thick FFPE sections of NC, LGCA and HGCA tissue samples were stained for iPSC markers OCT4, SOX2, NANOG, KLF4 and c-MYC. Positive human control tissues included in each run to validate the staining were seminoma for OCT4 and NANOG, normal skin for SOX2, normal breast tissue for KLF4 and prostatic tissue for c-MYC. Each IHC staining procedure also included a matched isotype antibody as a negative control. Protocols were performed as previously described [29].
Staining was carried out on the Leica BOND™ RX Auto-stainer using primary antibodies for OCT4 (1:30

IF IHC staining
Protein localization was performed on three LGCA and three HGCA and their patientmatched normal colon samples by dual IF IHC staining, carried out on the Leica BOND™ RX Auto-stainer. Secondary antibodies used were Vectafluor Excel goat anti-mouse 488 (readyto-use; cat#DK2488, Vector Laboratories, Burlingame, CA, USA) and Alexa Fluor donkey anti-rabbit 594 (1:500; cat#ab150076, Life Technologies, Carlsbad, CA, USA). All stained slides were mounted as previously described [29]. Negative controls were performed using matched isotype controls for both mouse (ready-to-use; cat#IR750, Dako, Copenhagen, Denmark) and rabbit (ready-to-use; cat#IR600, Dako).

ISH
ISH staining was performed on 4 μm-thick FFPE sections of six LGCA and six HGCA tissue samples and their patient-matched NC tissue samples. This was carried out on the Leica BOND™ RX Auto-stainer using probes for OCT4 (NM_002701), SOX2 (NM_003106), NANOG (NM_024865), KLF4 (NM_004235) and c-MYC (NM_002467), using the ViewRNA eZ Detection Kit to detect the presence of mRNA (Affymetrix, Santa Clara, CA, USA). Positive controls were human seminoma for OCT4, NANOG and KLF4, normal skin for SOX2, and normal colon tissue for c-MYC. To determine specificity of the probes, negative controls were run using a probe for Bacillus (NM_L38424).
were used for manual cell counting using ImageJ software (National Institutes of Health, Bethesda, MD, USA). Six images were captured per stained slide. When capturing images, areas with muscle and blood vessels were avoided. All positively and negatively stained cells within these images were counted and identified as either epithelial (crypt) or stromal cells. Cell were counted as positive if they had any level of staining (weak, moderate or strong) in the nucleus and/or cytoplasm. IF IHC-stained slides were visualized and imaged using an Olympus FV1200 biological confocal laser-scanning microscope (Olympus) and processed using CellSens 2.0 software (Olympus).

RNA extraction
RNA was extracted from the same cohorts of six LGCA and six HGCA tissue samples and their patient-matched NC tissues, using a QIAcube (Qiagen) as previously described [30].

Statistical analysis
Statistical analysis was carried out using SPSS V22. Protein expression of iPSC markers in the stroma and the crypt were compared using a t-test for both normal and tumor samples. Statistical significance was defined as a p<0.05.
The differences between LGCA and HGCA tissue samples were calculated using Analysis of Variance (ANOVA). A discriminant function analysis was also performed using the four sets of data which had the highest correlation for either LGCA or HGCA tumors. This produced a canonical correlation value and Wilkes Lambda variance value, representing the level of confidence for which these four pieces of data taken from any given specimen is able to be used to predict the grade of the tumor. mRNA levels were compared between normal stroma and tumor stroma, and between normal epithelium and tumor epithelium, using a t-test to determine statistical significance.

DAB IHC staining
EPCAM was used to distinguish between epithelial cells and stromal cells (S1 Fig). It was found that EPCAM expression was restricted to epithelial cells in all NC, LGCA and HGCA tissues.
OCT4 (Fig 1A-1C, brown) was detected in the nucleus of 2.5% of NC epithelial cells, likely to be the normal intestinal stem cells (A). However, it was found in the cytoplasm of 25% of stromal cells in LGCA (B) and 30% of stromal cells in HGCA (C) with little or no presence in the epithelium (0.7%). SOX2 (Fig 1D-1F, brown) was expressed in the cytoplasm of epithelial and stromal cells in NC, LGCA and HGCA tissue samples. Overall, NC (D) samples stained more strongly than LGCA (E) and HGCA (F) tissue samples. SOX2 was abundant in the nuclei of NC epithelium. NANOG (Fig 1G-1I, brown) was not detected in NC (G) but was present in HGCA (75% of cases, weak-to-moderate; H) and LGCA (40% of cases, weak; I) tissue samples. c-MYC (Fig 1J-1L  Two CSC subpopulations were identified by DAB IHC staining (Fig 2): one within the CA epithelium, with 9.7% of LGCA and 52.4% of HGCA epithelial cells expressing NANOG ( Fig  2A); and the other within the CA stroma, with OCT4 being expressed by 24.3% of LGCA and 30.8% of HGCA stromal cells (Fig 2B). Discriminant value analysis revealed that all LGCA and HGCA tissue samples could be graded with 100% accuracy based on stromal expression of KLF4 in NC (p = 0.020) and the tumors (p = 0.034), and OCT4 (p = 0.001) and NANOG (p = 0.026) in CA epithelium (canonical correlation = 0.981; Wilkes Lambda = 0.037).

IF IHC staining
IF IHC staining expanded on DAB IHC data by revealing localization of two iPSC markers simultaneously, as well as being a more sensitive detection method.
OCT4 (Fig 3A-3I (Fig 3G-3I, red) was widely expressed in the epithelial cells and some stromal cells in both NC (Fig 3A), LGCA ( Fig 3H) and HGCA (Fig 3I) tissue samples. SOX2 and OCT4 were co-expressed in epithelial cells in NC ( Fig 3G) and stromal cells in LGCA ( Fig 3H) and HGCA (Fig 3I). Cytoplasmic and nuclear staining of c-MYC (Fig 3J-3L, green) was weak in NC (Fig 3J), LGCA ( Fig 3K) and HGCA (Fig 3L) epithelial and stromal cells. Stromal cells co-expressing OCT4 and SOX2 and those that stained positively for c-MYC had the same morphology and were assumed to be the same cell type. c-MYC and NANOG were co-expressed in HGCA epithelial cells (Fig 3L). From the above data we inferred that there was an epithelial subpopulation co-expressing NANOG, SOX2 and KLF4, and a stromal subpopulation coexpressing OCT4, SOX2 and c-MYC. Split images for

RT-qPCR
RT-qPCR demonstrated mRNA expression of all five iPSC markers in both the NC, LGCA and HGCA tissue samples (Fig 4). SOX2 mRNA was below the detection threshold in three NC tissue samples, and OCT4 was not detected in one NC tissue sample. NANOG, KLF4 and c-MYC mRNA was detected in all 12 NC tissue samples. All LGCA and HGCA tissue samples expressed mRNA for OCT4, NANOG, KLF4 and c-MYC. One LGCA and one HGCA tissue sample did not reach the detection threshold for SOX2. ΔCT and fold-change data is displayed in S2 Table. ISH ISH demonstrated the presence of mRNA for OCT4 (Fig 5A-5C, brown), SOX2 (Fig 5D-5F, brown), NANOG (Fig 5G-5I, brown), KLF4 (Fig 5J-5L, brown) and c-MYC (Fig 5M-5O, brown) in NC (Fig 5A, 5D, 5G, 5J and 5M), LGCA (Fig 5B, 5E, 5H, 5K and 5N) and HGCA (Fig 5C, 5F, 5I, 5L and 5O) tissue samples. Positive and negative controls are shown in S4 Fig,  and cell counting data is displayed in S3 Table. ISH cell counting demonstrated OCT4, SOX2, NANOG and c-MYC had higher mRNA levels in CA epithelium and stroma when compared to NC (Fig 6). Conversely, KLF4 was more

Discussion
This study investigated the transcriptional and translational expression of OCT4, SOX2, NANOG, KLF4 and c-MYC to identify their presence in CSC subpopulations within CA.
RT-qPCR and ISH data for SOX2 corroborated with each other, with RT-qPCR failing to detect SOX2 in three NC samples and two CA samples, and ISH showing SOX2 to be the least abundant in terms of the number of cells containing mRNA. However, SOX2 was one of the most abundant markers at the protein level. Other studies have also shown an abundance of SOX2 protein in both the nuclei and cytoplasm of CRC tumor cells [11,19]. Furthermore, ISH   showed abundance of c-MYC mRNA but DAB IHC staining showed weak protein staining, which may be due to the concentration of primary antibody used for DAB IHC staining.
KLF4 has been previously studied in CRC and shown to be associated with epithelial-tomesenchymal transition (EMT), cell migration and metastasis [16]. However, studies on the role of KLF4 in cancer often yield conflicting results [31]. In NC, KLF4 helps direct epithelial progenitor cells down the goblet cell lineage, the most abundant epithelial cell type in colonic Cancer stem cells in colon adenocarcinoma crypts [26]. As the grade of CA increases, the tumors are less differentiated, and this may explain the observation of decreased KLF4 in both LGCA and HGCA tumors, relative to NC. Our finding of higher KLF4 and OCT4 protein expression in the stroma of HGCA may reflect the migration of cancer cells by EMT, away from the epithelium, which has been postulated as a major factor in CRC progression [32]. Furthermore, when applying the concept of a stem cell hierarchy in cancer [6][7][8], we propose that cells at different levels of this hierarchy will express different combinations of these markers. For instance, OCT4 is known to be expressed by primitive stem cells such as ESCs [16,33], whereas KLF4 is associated with a more differentiated phenotype [26,27].
In this study, IF IHC staining identified two distinct CSC subpopulations: a NANOG + / OCT4subpopulation localized to the epithelium, and an OCT4 + /NANOGsubpopulation within the stroma. The stromal OCT4 + subpopulation did not co-express NANOG or KLF4, but some stromal cells co-expressed OCT4 and SOX2. Similarly, in the epithelium, SOX2 and KLF4 staining was widespread but comparatively few of these cells were NANOG + . Based on the staining patterns, of these markers we infer the presence of two predominant CSC subpopulations in CA: an OCT4 + /SOX2 + /c-MYC + subpopulation within the stroma and a NANOG + / SOX2 + /KLF4 + subpopulation within the epithelium.
The literature correlating OCT4 and SOX2 expression with EMT and metastasis provides evidence supporting a stromal subpopulation expressing these markers that migrates away from the tumor [20,31,34]. Furthermore, NANOG is associated with maintenance of the stem-like phenotype of CSCs within the tumor, consistent with our observation of NANOG expression by some epithelial cells within the tumor [35]. KLF4 and c-MYC are associated with proliferation and differentiation and it is therefore not unexpected that these two markers were co-expressed by cells within CA [24,27,36].
Some stromal cells within CA that stained positively for OCT4 and c-MYC did not express KLF4 and SOX2. Some of these OCT4 + stromal cells may be cancer-associated fibroblasts recruited by the tumor and induced to express OCT4 [37]. Alternatively, they may represent a CSC subpopulation that expresses stem cell markers other than the iPSC-related genes investigated in this study.
The patient-matched 'normal colon' samples used as a control may not represent true normal colon. Other limitations of this study include the lack of functional in vitro and in vivo investigations which will be the focus of future work.
The significance of OCT4, NANOG and KLF4 expression was seen in the discriminant values analysis. By considering the expression of stromal KLF4 and epithelial OCT4 and NANOG, all 18 CA cases could be accurately graded. This is demonstrated to be robust, with a canonical correlation of 0.981 representing a high degree of statistical significance, and a Wilkes Lambda value of 0.037 showing that 96.3% of the variance between cases can be explained by these data.
Once validated, using the localization and expression levels of novel combinations of iPSC markers may provide a valuable tool to help guide patient management by further stratifying tumor grade, identifying cases with higher potential for metastasis or relapse, or tracking response to therapy. LGCA, low-grade colon adenocarcinoma tissue samples; HGCA, high-grade colon adenocarcinoma tissue samples. NCLG, normal colon tissue from patients with LGCA; NCHG, normal colon tissue from patients with HGCA. Significance values for comparisons between LGCA and HGCA tissue samples and their patient-matched normal colon tissues, for cells in the epithelium and those in the stroma: a p-value between 0.05 and 0.01 is shown by � , and <0.01 represented by �� . (PDF) S2 Table.  LGCA, low-grade colon adenocarcinoma tissue samples; HGCA, high-grade colon adenocarcinoma tissue samples. NCLG, normal colon tissue from patients with LGCA; NCHG, normal colon tissue from patients with HGCA. Significance values for comparisons between LGCA and HGCA tissue samples and their patient-matched normal colon tissues, for cells in the epithelium and those in the stroma: a p-value between 0.05 and 0.01 is shown by � , and <0.01 represented by �� . (PDF) S1 Datasets. Raw data. Raw data collected and analyzed during this study is provided, including anonymous patient data, DAB IHC cell counting data, discriminant function analysis output, in-situ hybridization cell counting data, raw RT-qPCR CT values, and cell counting statistics. (ZIP)