Cell Specific CD44 Expression in Breast Cancer Requires the Interaction of AP-1 and NFκB with a Novel cis-Element

Breast cancers contain a heterogeneous population of cells with a small percentage that possess properties similar to those found in stem cells. One of the widely accepted markers of breast cancer stem cells (BCSCs) is the cell surface marker CD44. As a glycoprotein, CD44 is involved in many cellular processes including cell adhesion, migration and proliferation, making it pro-oncogenic by nature. CD44 expression is highly up-regulated in BCSCs, and has been implicated in tumorigenesis and metastasis. However, the genetic mechanism that leads to a high level of CD44 expression in breast cancer cells and BCSCs is not well understood. Here, we identify a novel cis-element of the CD44 directs gene expression in breast cancer cells in a cell type specific manner. We have further identified key trans-acting factor binding sites and nuclear factors AP-1 and NFκB that are involved in the regulation of cell-specific CD44 expression. These findings provide new insight into the complex regulatory mechanism of CD44 expression, which may help identify more effective therapeutic targets against the breast cancer stem cells and metastatic tumors.


Introduction
Breast cancer remains the most common form of cancer among women and the second leading cause of cancer related deaths [1]. Recently a small subset of cancer cells was identified by their cell surface markers (e.g., up-regulation of CD44 and down-regulation of CD24) as cancer stem cells (CSCs) [2]. This CD44 + /CD24 low/2 signature is observed in other CSCs including prostate, pancreatic, brain and leukemia stem cells [3][4][5]. In addition to stem cell characteristics (i.e., the ability to self-renew and differentiate into all cell types in a mammary gland), CSCs are resistant to chemoand radiation treatment [6], and have the increased ability to metastasize and develop new tumors throughout the body [7].
Overexpression of CD44 has been correlated to a number of transcription factors including Egr1, AP-1, NFkB, and c/EBPb [8]. Most notably, AP-1 and NFkB have been shown to directly correlate with CD44, by binding the CD44 promoter [16]. AP-1, a leucine zipper transcription factor consists of two families, JUN (c-JUN, JUNB and JUND) and Fos (c-Fos, FosB, Fra1 and Fra2). The Jun proteins can form homodimers with one another or heterodimers with the Fos proteins. Together these proteins bind to core sequences in the genome to regulate expression of a target gene. AP-1 is involved in a number of cellular processes similar to CD44 including differentiation, proliferation and apoptosis [17,18]. Regulation by AP-1 is induced by growth factors, cytokines and oncoproteins, which are implicated in the proliferation and survival of cells. AP-1 activity in a cell, whether it be proapoptotic or pro-oncogenic, is determined by the composition of the homodimer or heterodimer formed as well as the tumor type and state of differentiation of the cell [18,19].
NFkB, like AP-1, has been linked to the up-regulation of CD44, but no direct evidence has been shown. Increased HGF has been shown to enhance expression of CD44v6 through a complex of NFkB, c/EBPb and EGR1 [20]. NFkB proteins have also been shown to be up-regulated in breast cancer stem cells (BCSCs), and their expressions have been correlated to increased expression of tumor stem cell markers, including CD44. Interestingly, the reduction of NFkB in a murine cell line Met-1 was able to reduce the number of CD44 + /CD24 2/low cells [21].
Despite intense research on CD44, the mechanism by which the protein is up-regulated in cancer and BCSCs is not well understood. Gene regulatory elements, e.g., promoters and enhancers, recruit transcription factors and chromatin modifying proteins, and allow transcription of the target genes to occur [22][23][24][25][26][27][28]. Enhancers are required for both temporal and tissue/cell specific gene expression [22][23][24][25][26][27][28]. Therefore, it is an important task to identify and understand their role in gene expression of both normal and pathological conditions.
In this study, we report the identification of a novel cis-element of CD44 containing 717 bp (in human) and 715 bp (in mouse) of evolutionarily conserved noncoding DNA, located approximately 95 kb upstream of the CD44 transcription start site. We show that this cis-element has the ability to direct reporter gene expression in breast cancer cells in a cell type specific manner. These data suggest that this cis-element and its interacting transcription factors play an important role in regulating CD44 expression in breast cancer and BCSCs.

Computational Prediction of CD44 cis-regulatory Elements
Multiple sequence alignment methods were used to identify evolutionarily conserved noncoding DNA sequences as putative gene regulatory elements. The sequences and annotations of analyzed genes along with their homologs from the various genomes were retrieved using noncoding sequence retrieval system, NCSRS [29]. These sequences were then aligned using multi-LAGAN [30] to identify elements with . 70% identity over a 100 bp span to ensure significance in sequence conservation. The percent identity and length of the CR were used to calculate a score for each conserved region (CR) (score = percent identity + (length/60)).

Cell Culture
The breast cancer cell lines SUM159, MDA-MD-231 and MCF7, were describe previously [4]. SUM159 cells (Asterand Inc. Detroit, MI), MDA-MB-231 cells (ATCC), MCF7 cells (gift from Dr. Nanjoo Suh at Rutgers) were cultured according to the guidelines from the suppliers. All cell lines were maintained at 37uC in a humidified incubator with 5% CO 2 .

Reporter Plasmids
Conserved regions were amplified by PCR from mouse genomic DNA (Table S1), subcloned into a GFP reporter plasmid with a basal beta-globin promoter (bGP-GFP) and verified by sequencing.

Transfection
For transfections, cells were seeded onto poly-L-Lysine (PLL) treated coverslips in 24 well plates. Cells were transfected with Lipofectamine LTX (Invitrogen) as per manufacturer's recommendations. Following a 24 hr incubation period, nuclei were stained with Hoechst33342 (Sigma). Cells were then fixed with 4% paraformaldehyde in PBS for 12 minutes at room temperature. Coverslips were adhered to slides with Fluoro-Gel (Electron Microscopy Sciences). GFP-expressing cells were visualized by a Zeiss AxioImager A1 fluorescence microscopy.

qRT-PCR
RNA was isolated from cells using Tri Reagent (Ambion). cDNA was prepared by reverse transcription using the qScript cDNA SuperMix (Quanta), and used as a template for RT-PCR (PerfeCTa SYBR Green FastMix (Quanta)). RT-PCR reaction was run on a Roche LightCycler using primer sequences obtained  (Table S2). Threshold cycles were normalized relative to GAPDH expression. Error bars represent the standard deviation of the mean.

Data Quantification
In all experiments, percentages represent the averages calculated from at least three independent samples. All values are shown as a mean 6 standard error of the mean. Error bars represent the standard error of the mean. In cases where results were tested for statistical significance, a student's t-test was applied.

Immunocytochemistry
For immunocytochemistry, cells were plated on PLL treated coverslips and incubated for 24 hours and then fixed to coverslips  using 4% paraformaldehyde, blocked with 10% Donkey Serum (Jackson Immunology) and then incubated with the primary antibody for 2 hours at room temperature. The following antibodies were used [CD44 (Chemicon); CD24 (Santa Cruz); NFkB-c-Rel (Chemicon); NFkB-p50 (Upstate); NFkB-p65 (Abcam); JUNB (Santa Cruz); FosB (Santa Cruz)]. Following incubation with primary antibody, cells were incubated with a fluorescent secondary antibody (Jackson Immunology) for 30 minutes at room temperature. Nuclei were stained with Hoechst33342.

Genomic DNA Sequencing
Genomic DNA was collected from the human cell lines using the Promega Genomic DNA kit as per manufacturer's recommendations. Genomic DNA from each cell line was sequenced using primers specific for the conserved regions (Table S1).   Genomic DNA was aligned using the online program ClustalW [31].

Electrophoresis Mobility Shift Assay and Supershift
Single stranded DNA probes were designed from mouse CR1 and labeled with the 3' Biotin End Labeling Kit (Thermo Scientific) as per manufacturer's suggestions. Nuclear extracts were collected from each breast cancer cell line using NE-PER nuclear and cytoplasmic extraction reagents (Thermo Scientific). Binding reactions were performed and detected using the LightShift Chemiluminescent EMSA kit (Thermo Scientific) per manufacturer's recommendations. DNA-protein complexes were run on 10% non-denaturing poly-acrylamide gels and transferred onto Biodyne Plus membrane (Pall). Membranes were cross-linked in a UV imager for 15 minutes. EMSA probe sequences are in Table  S3. Supershift assays were performed in a similar fashion. Antibodies were added to select reactions 15 minutes prior to addition of labeled probes.

Site Directed Mutagenesis
Site directed mutagenesis was performed as previously described [32] using primer sequences as listed in Table S4. Treated DNA was transformed into NEB5a cells (NEB) and plated onto LB-amp plates. Constructs were collected by Qiagen midi-prep and then sequenced to verify the resulting mutation. Mutated constructs were transfected into cells and tested for GFP expression.

shRNA-based Gene Knockdown
Short hairpin RNA (shRNA) sequence (leading strand) used for AP1-JUNB knockdown were CCTTCTACCACGACGACTCA-TACACAGCT and CACGACTACAAACTCCTGAAACC-GAGCCT. shRNA sequences for NFkB-p50 knockdown were GCAGCTCTTCTCAAAGCAGCAGGAGCAGA and GA-GAACTTTGAGCCTCTCTATGACCTGGA (OriGene Technologies, Inc. , Rockville, MD). Control constructs were an empty vector and scrambled shRNA construct. Constructs were transfected into cell lines using Lipfectamine LTX (Life Technologies). Transfected cells were cultured for 72 hours before being fixed and stained as described above.

Prediction of cis-regulatory Elements for CD44 Expression using Sequence Alignment Analysis
To understand the molecular mechanism of CD44 expression in breast cancer cells, highly conserved regions of non-coding DNA were computationally predicted as cis-regulators of CD44 expression. Multiple sequence alignment using the human CD44 genomic region as baseline revealed homologous regions in mouse, dog (Fig. 1A) and other mammalian species. A total of 14 conserved regions (CR) (.100 consecutive base pairs of sequence with .70% sequence identify) were identified. The three highest conserved regions (CR1-3, Fig. 1B) were chosen for further experimental verification, because many studies have shown that highly evolutionarily conserved noncoding DNA sequences have a high potential to regulate gene expression [35,36]. CD44CR1 (CR1) contains 715 bp and located 95 kbp upstream of CD44 with 78% conservation. CR2 contains 611 bp with 76% conservation and is located 55 kbp upstream of CD44. CR3 contains 604 bp with 79% conservation and it is located in the first intron of the CD44 gene.

Conserved Regions have the Ability to Direct Reporter GFP Expression in Breast Cancer Cells
To test the CRs for their ability to direct gene expression, the CRs were PCR amplified from mouse genomic DNA and subcloned into an expression vector containing a b-globin minimal promoter (bGP) and green fluorescent protein (GFP) as the reporter gene (Fig. 1C). Mouse DNA was used to validate that evolutionarily conserved elements can function in different species.
The ability of the conserved regions to direct gene expression was tested using three previously characterized human breast cancer cells, MDA-MB-231, SUM159, and MCF7, each with a different CD44/CD24 expression profile (Table 1) [4,37]. Both MDA-MB-231 and SUM159 cells contain high levels of CD44 expression. In addition, SUM159 cells have been characterized with cancer stem cell like features including the ability to selfrenew, reconstitute the parental cell line, survive chemotherapy, as well as form tumors with as few as 100 cells [4,37]. Thus, these cells provide different lines of validation.
First, immunofluorescence staining was performed to verify CD44 and CD24 expression level. Consistent with the genomewide expression profiling study [4], MDA-MB-231 and SUM159 cells showed very high CD44 staining and low CD24 staining, while MCF7 showed low CD44 and high CD24 staining ( Fig.  S1A-C).
Then, CD44 and CD24 expression level in the three cell lines was further quantified using quantitative PCR (qPCR). Results showed that MDA-MB-231 and SUM159 cells have the high CD44 and low CD24 expression, while MCF7 cells have the opposite expression profile, i.e., a higher CD24 and lower CD44 expression (Fig. S1D).
Next, each reporter construct containing one of the top three conserved regions of CD44 was individually tested by transfection into the three cell lines. Transfection of the positive control construct, CAG-GFP, resulted in reporter GFP expression ( Fig. 2A-C) and demonstrated the ability of each of the cell lines to be transfected. As negative controls, a highly conserved region in Neurod1 locus with bGP and bGP alone (data not shown), resulted in no visible GFP expression (Fig. 2D-F), indicating that not all highly conserved regions of genomic DNA nor bGP alone have the ability to direct gene expression. GFP expression was observed in MDA-MB-231 and SUM159 cell lines after transfection with CR1-GFP construct (Fig. 2G-H). More GFP-expressing cells were observed in SUM159 cells as compared to MDA-MB-231 cells, while no GFP-expressing cells were observed in MCF7 cells (Fig. 2I). Transfection of constructs containing CD44CR2 and CD44CR3 also resulted in GFP-expressing cells (data not shown, under further investigation).

Analysis of Trans-acting Factor Binding Sites on the Conserved Regions of CD44
The ability of CR1 to direct different levels of reporter GFP expression among the three cell lines is most likely attributed to their interactions with trans-acting factors. Therefore, CR1 of both mouse and human were examined for trans-acting factor binding sites (TFBSs) and mutations in these sites. Genomic DNA of CR1 from each of the three cell lines was collected and sequenced to determine if mutations in the region that disrupt TFBSs. Sequencing results show only a 5 bp span that differed between the three human cell lines in CR1 (Fig. S2). This 5 bp difference found in the SUM159 cells is located in an unconserved region of CR1 and showed no disruption of key TFBSs. This indicates that the difference in GFP expression among these cells may not be associated with the DNA sequence. Thus, we speculate that the difference in GFP expression may be the result of trans-acting factor binding in the cell lines. CR1 sequences from mouse (Table  S6) and human (Table S7) both contained over 150 putative TFBSs as predicted by MatInspector [38]. These TFBSs were examined further for conservation between mouse and human sequences ( Table 2). Most of these conserved TFBSs involved in breast cancer (e.g., AP-1, NFkB, and STAT5), stem cells and embryonic development (e.g., OCT1, PAX6, GATA1), and therefore had the highest potential for regulating CD44 and for being involved in breast cancer. Our further analysis was thus focused on the activities of CR1 in regulating gene expression in breast cancer cells.

Sequence Specific Trans-acting Factor Binding with CR1
Electrophoretic mobility shift assays (EMSAs) were performed to determine if differences in GFP expression resulted from differences in trans-acting factor binding in the cells. Doublestranded, biotin labeled oligonucleotides corresponding to subregions of CR1 were assayed for trans-acting factor binding using nuclear extract from each of the three cell lines (Fig. 3A). The  shifted bands for three of the large probes spanning the length of the conserved regions in all three cell types (Fig. 3B-D) indicating protein-DNA binding activity. Probe 1 shows strong bands shifted with nuclear extracts from MDA-MB-231 and MCF7 cells only (Fig. 3B), while probe 2 has a band shifted that is equally strong with all three cell lines (Fig. 3C). Probe 3 shows a number of bands that can be competed away with an unlabeled probe (Fig. 3D). Although the bands in probe 3 are similar in all three cell lines, there was a band with SUM159 cells that is not present in the other two cell lines.
Smaller probes were then used to narrow down regions of binding and to identify specific TFBSs. A probe designed to mimic the first AP-1 site (AP-1-1) showed no band shift (Fig. 3E), while the probe for the second AP-1 site (AP-1-2) showed a number of band shifts (Fig. 3F). Although these bands were not completely competed away, there was a significant reduction in band intensity with the addition of the competition probe. A probe for the region of NFkB binding also revealed band shifts. The intensity of the band differed among cell lines, with SUM159 showing the strongest shift (Fig. 3G).

Mutation of AP-1 and NFkB Binding Sites Results in a Loss of CR1 Expression
EMSA identified regions of CR1 that were able to bind nuclear factors in each of the three cell lines. However, these in vitro assays are not sufficient to determine if these factors have the ability to direct gene expression. To determine if the specific TFBSs are involved in the regulation of reporter GFP expression, site directed mutagenesis (SDM) was performed. The core binding sites for the two AP-1 TFBSs and NFkB binding site were deleted from the CR1 reporter construct using SDM. Mutant constructs were transfected into each of the cell lines. Wild-type CR1 and a random mutation were used as control transfections. Results show that the control transfections no significant difference in the percentage of GFP-expressing cells (Fig. 4A-B), whereas single site mutations at each AP-1 site and NFkB binding site (Fig. 4C-E) resulted in statistically significant decrease in the percentage of GFP-expressing cells in SUM159 cell line when compared to unmutated CR1 and the control mutation ( Fig. 4A-B).
Since GFP expression was not completely abolished with the deletion of a single TFBS in SUM159, we mutated a combination of TFBSs (Fig. 4F-H). Results of transfections with combinatorial mutations again showed a statistically significant decrease in the percentage of GFP-expressing cells (Fig. 4F-H). However, the percentage of GFP-expressing cells with two mutation constructs did not change significantly as compared with single-mutation constructs. To determine whether all three sites are needed for CR1 to direct GFP expression, the three binding sites were mutated (Fig. 4I). The transfection of this construct resulted in the highest decrease in the percentage of GFP-expressing cells. Interestingly, transfection of the mutant constructs into MDA-MB-231 resulted in no GFP-expressing cells (Fig. S3) suggesting regulation of CD44 in MDA-MB-231 differs from SUM159 cells.
Trans-acting factor binding assays identify components of AP-1 and NFkB binding to CR1 in SUM159 cells.
To determine whether the difference in reporter GFP expression among the three breast cancer cells is due to the trans-acting factors binding with CR1, chromatin immunoprecipitation (ChIP) assays were performed using antibodies against individual components of AP-1 and NFkB. ChIP results show that in SUM159 cells JUNB bound strongly with CR1, while in MCF7 cells only JUND bound to CR1 (Fig. 5). When ChIP assays were performed with antibodies against NFkB components (e.g., c-Rel, p50 and p65), SUM159 revealed weak binding with all three NFkB antibodies (Fig. 6A). However, MCF7 showed no significant binding when compared to background. These results are supported by an EMSA supershift assay performed to verify specific proteins binding using antibodies against NFkB proteins c-Rel, p50 and p65 (Fig. 6B). The antibody against NFkB-p50 was able to provide a significant shift in the labeled probe. NFkB-p65 showed a weaker shift similar to NFkB-p50 as well as a band that was downshifted. Together these results support the notion that the different cell lines have different means by which they regulate CD44.

JUNB and NFkB-p50 Knockdown Represses CD44 Expression
To determine the effects of AP-1-JUNB and NFkB-p50 on CD44 expression, we performed shRNA gene knockdown experiments in SUM159 cells. Control transfections, with scrambled control shRNA (Fig. 7A-E) or an empty vector (Fig. S3A-E), showed no change in JUNB or CD44 expression in transfected cells. Transfection of shJUNB constructs resulted in a decreased JUNB expression as shown by immunocytochemistry (Fig. 7F-J and Fig. S3F-J). Cells transfected with the shJUNB construct also showed a decrease in CD44 expression as compared to untransfected cells (Fig. 7). Similar results were seen with knockdown of NFkB-p50. Control shRNA transfection with a scrambled shRNA ( These results support the notion that JUNB and NFkBp50 interact with CR1 and regulate CD44 expression.

Discussion
In breast cancer, the up-regulation of CD44, a cell surface glycoprotein involved in cell-cell and cell-extracellular matrix adhesion, migration, differentiation and survival, is associated with cancer stem cells [39,40]. However, the mechanism for this gene up-regulation is not well understood. In this study, we identified the novel cis-element CR1, with the ability to direct reporter gene expression in a cell specific manner (Fig. 2), and the trans-acting factors AP-1 and NFkB as key factors involved in the regulation of CR1 (Fig. 3).
Genomic sequencing of CR1 from breast cancer cell lines did not reveal any major mutations that cause changes in key TFBSs (Fig. S2), which suggests that variations in reporter gene expression among these cells may be attributed to the difference in trans-acting factor binding to CR1.
Consistent with the notion that there was a difference in transacting factor(s) binding to CR1, mutations of TFBSs for AP-1and NFkB resulted in a significant reduction in GFP expression in two breast cancer cell lines (Fig. 4). Deletion of each site individually was able to completely eliminate reporter gene expression in MDA-MB-231 (Fig. S4). However, deletion of all three sites TFBS, individually and sequentially in SUM159 cells did not completely eliminate reporter gene expression (Fig. 4). These results indicate that factors AP-1 and NFkB are important transregulators of gene expression in breast cancer; and AP-1 and NFkB function in a cell type specific manner via various binding patterns to CR1 in different breast cancer cell lines. The inability to completely eliminate CR1 expression implies other TFs and/or co-factors may be involved in regulating CD44 expression in breast cancer stem-like SUM 159 cells.
Our ChIP results showed that binding of AP-1 with CR1 in SUM159 and MCF7 cells, however, the two cells showed a different pattern of TF binding to CR1, i.e., JUNB in SUM159 and JUND in MCF7 (Fig. 5). ChIP results also showed that NFkB factors cRel, p50 and p65 bind to CR1 in SUM159 cells but not MCF7. This result was confirmed with an EMSA supershift with SUM159 nuclear extract, showing shifts with both NFkB-p50 and p65 (Fig. 6).
The observation that knockdown of AP-1-JUNB and NFkB-p50 reduced the expression of CD44 suggest the role of JUNB and p50 in regulating CD44 expression via their interaction with CR1. The fact that a complete loss of CD44 expression was not seen may be attributed to 1) reduced JUNB and p50 expression as opposed to a complete knockdown; 2) other factors interact with JUNB and/or p50 in the regulation of CD44 expression; and 3) other regulatory regions allowing basal expression of CD44.
Studies have shown that deletion of CD44 can lead to a reduction in recurrence of cancers [5] and metastasis [41]. By targeting the factors that result in the overexpression of CD44, we may be able to better treat breast cancer and metastatic tumors.
Previous studies have shown that AP-1 regulates CD44 expression [18,[42][43][44]. AP-1 has an increased activity in small cell and non-small cell lung carcinomas, which lead to an increase in CD44 expression. In addition, a TRE binding element with Fra-1 in the promoter of CD44 has been identified [45,46]. These studies have established that AP-1 regulates CD44 expression via its interaction with CD44 promoter. In this study, our findings suggest that the cis-element CR1 functions via common factor AP-1 and/or NFkB and interact with the promoter to regulate CD44 expression, which provides new insight into regulatory mechanisms on complex CD44 expression.
Together, our findings suggest that CR1 has the potential to regulate CD44 expression in breast cancer and BCSCs via its interaction with AP-1 and NFkB factors. Further studies will focus on how CR1 interacts with the promoter to regulate CD44 expression. CD44 is known to have a complex expression patterns with ubiquitous expression and variant forms, and has been implicated in the aggressiveness and metastasis of a number of cancer types [9,11,37,47]. Therefore, the regulation of such a molecule could be equally complex. A full understanding of complex regulation of CD44 expression requires the investigation of the other cisand trans-regulators of CD44.