Genomic Insights into Triple-Negative and HER2-Positive Breast Cancers Using Isogenic Model Systems

Introduction In general, genomic signatures of breast cancer subtypes have little or no overlap owing to the heterogeneous genetic backgrounds of study samples. Thus, obtaining a reliable signature in the context of isogenic nature of the cells has been challenging and the precise contribution of isogenic triple negative breast cancer (TNBC) versus non-TNBC remains poorly defined. Methods We established isogenic stable cell lines representing TNBC and Human Epidermal Growth Factor Receptor 2 positive (HER2+) breast cancers by introducing HER2 in TNBC cell lines MDA-MB-231 and MDA-MB-468. We examined protein level expression and functionality of the transfected receptor by treatment with an antagonist of HER2. Using microarray profiling, we obtained a comprehensive gene list of differentially expressed between TNBC and HER2+ clones. We identified and validated underlying isogenic components using qPCR and also compared results with expression data from patients with similar breast cancer subtypes. Results We identified 544 and 1087 statistically significant differentially expressed genes between isogenic TNBC and HER2+ samples in MDA-MB-231 and MDA-MB-468 backgrounds respectively and a shared signature of 49 genes. By comparing results from MDA-MB-231 and MDA-MB-468 backgrounds with two patient microarray datasets, we identified 17 and 22 common genes with same expression trend respectively. Additionally, we identified 56 and 78 genes from MDA-MB-231 and MDA-MB-468 comparisons respectively present in our published RNA-seq data. Conclusions Using our unique model system, we have identified an isogenic gene expression signature between TNBC and HER2+ breast cancer. A portion of our results was also verified in patient data samples, indicating an existence of isogenic element associated with HER2 status between genetically heterogeneous breast cancer samples. These findings may potentially contribute to the development of molecular platform that would be valuable for diagnostic and therapeutic decision for TNBC and in distinguishing it from HER2+ subtype.


Introduction
Breast cancer is the most commonly diagnosed cancer among women worldwide [1]. In the United States, one out of every three cases of cancer diagnosed in women is that of the breast and associated malignancy is the second largest causes of cancer deaths [2]. Although breast cancer is claimed to have a higher prevalence among women from the developed part of the world, this statistics is rapidly changing. The incidence of the disease is on the rise even in developing countries, where the cumulative risk for women below 75 years of age and mortality rate is almost equivalent to the rate found in the developed countries [1]. The phenotypic and clinical manifestations of the disease vary widely among women, and various cancer subtypes show wide range of responses to different treatment modalities. The stage, grade and status of three therapeutically relevant receptors, estrogen receptor alpha (ER), progesterone receptor (PR) and human epidermal growth factor receptor 2 (HER2) are the main determinants of tumor response to most of the current treatments, and therefore, are the major factors in planning optimal therapy [3].
In the past decade, we have witnessed an active investigation of heterogeneity of breast cancer at the molecular level through various high throughput approaches. Derived from a large collection of tumors, these studies have classified breast cancer into five major subtypes based on expression pattern of an 'intrinsic gene set' signature [4,5]. These subtypes include luminal A and B, basal-like, HER2 overexpressing and normal breast like and are named according to the markers expressed by the corresponding cell types. These molecular classes not only differ in the expression levels of ER, PR and HER2 but also in disease prognosis [6]. The luminal subtypes show higher expression of ER and have a favorable prognosis and basal-like tumors have absence or low levels of the three receptors and in general, exhibit poor prognosis. These studies point to the likelihood that different breast cancer subclasses might stem from different cellular types based on origin. Over the years, a number of studies have validated and refined such signatures [7,8,9,10,11,12,13,14]. It is generally believed that different breast tumor subtypes represent distinct disease entities and may require personalized treatment modalities for an effective outcome. Despite multitude of studies on breast cancer expression signatures and their proclaimed robustness, the biological relevance of these signatures remains to be firmly established and this is an area for further improvement.
The status of ER, PR and HER2 is routinely assessed prior to deciding treatment options in general for breast cancer. Two of the most common treatment regimens include anti-estrogens like Tamoxifen or Fulvestrant or aromatase inhibitors for ER+ tumors and monoclonal antibody Herceptin (Trastuzumab) for HER2+ tumors. However, for TNBC that lacks both ER and HER2, there is no targeted therapy so far and the only option is non-specific and highly toxic chemotherapy or radiation therapy [15]. Unlike other subtypes of breast cancer, TNBC commonly affects younger (< 50 years) pre-menopausal women. It is a very aggressive form of breast cancer with majority of the deaths occurring within the first five years of diagnosis [16]. The relapse rates and the prognoses for these patients are very poor even after treatment [17]. Therefore there is a pressing need and growing research interest to understand how TNBC, which comprises approximately 15% of all breast cancers, differs from other subtypes [18].
Most comparative studies of breast cancer subtypes consider clinical or biological variables for classifying samples. However, these studies fail to account for the heterogeneity of samples within subtypes as well as clonal origin of most tumors. For example, although cultured TNBC cell lines routinely used in the laboratory are similar in the context of receptor status, they are distinct in terms of their genotypes. Human breast tumors similarly show significant genetic heterogeneity. Thus, studies involving samples from TNBC and non-TNBC cancer subtypes with diverse genetic background [19] can't be directly compared, especially when most breast cancers start as clonal in the initial stage of tumor formation. To mitigate this issue and to eliminate variability due to different genetic backgrounds, we established an isogenic cell line model system representing two common breast cancer subtypes. We stably transfected empty vector or HER2 in two TNBC cell lines MDA-MB-231 and MDA-MB-468 and created isogenic TNBC or non-TNBC differing in the status of HER2. After initial characterization of these isogenic cell lines, we performed a microarray-based gene expression profiling to deduce the signatures of TNBC compared to non-TNBC isogenic cells, and searched resulting signatures in compactable publically available data sets.

Generation of Stable Clones
Triple-negative breast cancer MDA-MB-231 and MDA-MB-468 cells (ATCC) were chosen for stable clone generation. Origin of the cell lines have been described previously [19]. Transient transfections with 2.5-10 µg of plasmid per reaction with Fugene transfection reagent (Roche Ltd.) were used to optimize transfections. Using optimal conditions, the two cell lines were transfected with each of the two plasmids; pcDNA 3.1a and HER2. Cells were cultured using Dubelcco's Modified Eagle Medium/ Ham's F12 50:50 (DMEM/F-12) mix (Mediatech) supplemented with 10% FBS (Atlanta Biologics) and 1% Antibiotics (Gibco) and kept at an incubator maintained at 37°C and 5% CO2. The transfected cells were then treated with 0.5µg/ml of G418 over several weeks to select for the cells containing the plasmids. Multiple plates were then pooled and selected for 2-3 more weeks to generate multiple stable clones. Proteins from these plates were harvested using RIPA buffer and 50µg of protein were loaded on an 8% SDS-PAGE gel & transferred on a nitrocellulose paper (Biorad). TNBC and HER2+ clones generated on one type of cell line were included in one gel along with negative & positive controls. Protein from parental (untransfected) cell line SKBR3 (HER2-positive) cell lines were used as negative and positive controls respectively. The membrane was then blotted with HER2 antibody (Bethyl) and was reprobed with vinculin antibody (Sigma) as a protein loading control. Protein in the membrane was then detected using ECL reagent (GE Healthcare) and exposed onto an autoradiography film (Hyblot CL). Clones showing highest expression of HER2 protein compared to negative control were selected for further experiments. Our experimental studies involved established in vitro immortalized human breast cancer cell lines and secondary data from in-house RNA-sequencing study and public microarray repository of human patient samples. Therefore, an ethical approval was not needed.

Flow Cytometry
Cells were plated in duplicates in 60 cm dishes with complete media and allowed to grow at 37°C and 5% CO2 until the cells reached desired confluency. After one wash with PBS, the cells were treated with 0.5M EDTA to detach cells, followed by centrifugation at 500 x g for 5 minutes. They were then washed thrice with PBS buffer containing 0.5% BSA and resuspended in the same buffer to get approximately 4X10^6 cells/ml. From this, about 10^5 cells in a reaction volume of 25µl were taken and added to a tube containing 10µl of Phycoerythrin (PE) conjugated anti-human HER2 antibody (R&D Systems). For isotype control, 10 µl of PE-conjugated mouse IgG2B reagent (R&D Systems) was added to 10^6 cells. The mixture was incubated for about 45 minutes at 4°C. The cells were washed twice with PBS buffer containing 0.5% BSA and resuspended in 200-400µl of PBS for flow cytometry analysis. For experiments involving Herceptin treatment, two HER2 and a TNBC clone were plated in 60 cm dishes. Starting the next day the cells were serum starved for 24 hours. After starvation, cells were treated with 10nM Herceptin and incubated at 37°C and 5% CO2 for 16 hours. Cells were then collected and prepared for flow cytometric analysis as mentioned above.

Confocal Microscopy
Cells were plated in 60 cm dishes with complete media, and, starting the next day, serum starved for 24 hours. After starvation, cells were treated with 10nM Herceptin and incubated at 37°C and 5% CO 2 for 16 more hours. After one wash with PBS, the cells were trypsinized and plated over glass cover slips placed on culture plates. The cells were then fixed in 4% paraformaldehyde for 20 minutes at room temperature, permeabilized for 5-15 min with 0.1% triton-X-100. Indirect immunofluorescence technique was used to examine the cells. The cells were blocked with 5% normal goat serum for half an hour and then incubated with HER2 antibody (1:50 dilution) for 2 hours at room temperature, washed three times with PBS, and incubated with Alexa Fluor 546-labeled secondary antibody (Molecular Probes). We used DAPI (Molecular Probes) to stain DNA. Confocal microscopy was performed using a Zeiss laser-scanning confocal microscope.

Gene Expression Profiling Using Microarray
Triplicates of one each of TNBC and HER2+ isogenic clones in both cell lines were plated and grown to 60-70% confluency in complete media containing G418. RNA was extracted using TRIZOL reagent (Invitrogen) according to manufacturer recommendations and quantified using a Nanodrop. Using RNeasy Mini Kit (Qiagen, Valencia, CA), RNA was purified and its integrity was tested using Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA). After RNA cleanup and labeling, the samples were hybridized onto an Affymetrix Human Exon 1.0 ST array chip and washed according to manufacturer's protocol. The chips were then scanned to measure signal intensities. Resulting raw files were preprocessed using Robust Multi-array Average (RMA) algorithm, filtered and normalized by quantile technique using Gene Spring GX 10.0 (Agilent Technologies Inc.). Unpaired ttest was used to identify statistically significant differentially expressed genes between TNBC and HER2+ (p-value ≤0.05 and fold-change ≥1.5) in each cell line background. Benjamini and Hochberg method was used for multiple testing correction.

Validation by quantitative Real Time PCR
RNA was extracted from the cells using TRIZOL reagent (Invitrogen) and 1µg was used for cDNA synthesis using Superscript III reverse transcriptase kit (Invitrogen) with oligodT method. Using SYBR Green RT PCR mastermix (Biorad), the qPCR reaction was set up in duplicates using 1µl of the cDNA as a template. 18S was used as a housekeeping control. The fluorescence detection and measurements were taken using Applied Biosystems thermal cycler. The relative expression levels of candidate genes for each cell line were calculated after normalization with control. The resulting values were then averaged and plotted as bar plot. Standard error (S.E.) was included in the graph. Two tailed unpaired student's t-test was used for statistical analysis of the difference in expression between TNBC and HER2 clones.

Comparison with Microarray from GEO
Among the breast cancer microarray datasets with patient samples in Gene Expression Omnibus (GEO) [20], studies employing GPL96 (Affymetrix Human Genome U133A Array) and GPL570 platforms (Affymetrix Human Genome U133 Plus 2.0 Array) were searched. Datasets containing samples from patients with untreated tumors or tumors prior to treatment were chosen. Samples representing the subtypes included in our study were selected based on the clinical annotation and information provided by Lehmann et al in identifying tumor subtypes of the samples [21]. Two different datasets representing each of the two microarray platforms (GPL96 and GPL570) were compiled for comparison with our results. Datasets we included in our study were GSE7390, GSE2603, GSE3494, GSE2990, GSE2034, GSE11121, GSE1561 and GSE20194 from GPL96 platform. Similarly datasets from GPL570 platform were GSE7904, GSE2109, GSE19615 and GSE12276. Tables S1A and B provide information about the GEO datasets that were selected and number of samples for each subtype included from each dataset.

Gene Ontology and Pathway Analysis
The gene ontology (GO) and pathway analysis of the genes with deregulated expression and splicing was analyzed using Database for Annotation, Visualization and Integrated Discovery (DAVID) [22,23]. The enrichment of GO terms comprising molecular process and biological functions were identified. A p-value of 0.05 was considered significant for the results.

Establishing Isogenic Stable Cell line Models
We created an isogenic model system for comparative study of two major breast cancer subtypes by stably transfecting empty vector or HER2 plasmids into TNBC cell lines MDA-MB-231 and MDA-MB-468. Protein expression levels of the reconstituted receptor in the pooled stable clones were measured using western blot. Results showed multiple clones in which the levels of HER2 in corresponding stable cell lines were higher than that in TNBC clones. Representative immunoblot of selected clones in each background showing high levels of HER2 in comparison to parental cell lines and TNBC clones are shown in Figure 1A. Surface expression of HER2 in TNBC and HER2+ clones was examined using flow cytometry. As shown in Figure 1B, HER2+ clones showed a larger population of cells with high expression of HER2 compared to TNBC clones.

Biological Characterization of TNBC and HER2+ Isogenic Clones
Next we examined whether the transfected HER2 receptor is responsive to anti-HER2 monoclonal antibody Herceptin.  The levels of the receptor tyrosine kinase after treatment of HER2+ clones with Herceptin were almost similar to the levels seen in TNBC clones. Congruent results were obtained using confocal microscopy as shown in Figure S1A-B.

Gene Expression Analysis
Following biological characterization, we carried out gene expression profiling of TNBC and HER2+ clones in MDA-MB-231 and MDA-MB-468 background using an Affymetrix human exon array. Pairwise differential expression between the TNBC and HER2+ clones in both backgrounds were performed. A schematic diagram of differential expression analysis is illustrated in Figure 3A. In brief, pairwise differential expression analyses between the TNBC and HER2+ clones were performed individually for each cell line background and genes upregulated and downregulated in TNBC were identified. Gene lists from both backgrounds were then compared to delineate common genes with a similar expression pattern. Results from our studies were also overlaid with data from patient studies with similar breast cancer subtypes. A heatmap of the statistically significant differentially expressed genes (p. value ≤0.05 and fold change ≥1.5) in both backgrounds are shown in Figure 3B Table S2). Between the two TNBC versus HER2+ comparisons, there were 49 genes that were common following same trend of regulation, with 18 upregulated and 31 downregulated genes ( Figure 3C, Table 1).
Based on biological significance and association with breast cancer, 34 candidates were selected from the differential expression gene lists for validation using qPCR.    found in breast cancer stroma as compared to the normal tissue and was found to be associated with low levels of estrogen receptor and higher tumor grade [24]. LIPG is a member of the triglycerides lipase family and is thought to be associated with metabolism of lipoproteins in endothelial cells. It was found to be one of the lipid metabolizing enzymes whose expression correlated with HER2 overexpression in a breast cancer cell line [25].
A member of the lysyl oxidase family, LOXL2, is of paramount importance in the extracellular matrix remodeling by crosslinking collagen with elastin. It is also found to play an important role in the development, tumor progression, epithelial to mesenchymal transition (EMT) and senescence. It is associated with distant metastasis and poor survival rates [26].
CTSB is a member of the lysosomal cysteine proteinases and is involved in protein degradation. It is found to play an important role in tumor invasion and metastasis and also considered a prognostic marker in an aggressive form of breast cancer known as the inflammatory breast cancer [27,28]. CTSB along with CTSL was found to be overexpressed in HER2 positive cancers and is an important mediator of tumor invasion in this subtype of breast cancer [29].
Next, we were interested in biological contextualization of the differentially expressed gene list that we obtained from microarray analysis. We did a gene ontology analysis using DAVID to investigate if any functional categories were enriched in our data. The top ten biological processes and molecular function of differentially expressed genes between the TNBC and HER2+ clones in each background are depicted in Table  S3A-D, respectively. Biological functions like cell signaling, adhesion, regulation of apoptosis and proliferation were common themes for genes upregulated in TNBC in both cell lines. Similarly, shared themes like response to wounding and organic substance were common categories in genes downregulated in TNBC.

Comparison with Data from Patient Samples
We next compared the differential gene expression signature derived from the isogenic studies with data obtained from patients with breast cancer of similar subtypes, i.e. TNBC vs. HER2+. We initially aligned our data to individual datasets from independent studies. However, due to variability in the overlaps between our results and the datasets being compared and small sample size in many of these datasets, we switched to a different approach. We sought to create a super-dataset from various microarray studies by different groups of investigators. We identified several studies from Gene Expression Omnibus (GEO) repository that contained microarray data from breast cancer patients that hadn't undergone any treatment. During curation of datasets, we only included samples from studies that used two most common Affymetrix platforms in GEO (GPL96 and GPL570). Although the sample size of dataset containing samples from GPL96 platform was bigger, the probes included in the platform were almost half of those included in the GPL570 platform. Therefore, we created two independent super-datasets that included samples corresponding to our subtypes from the two platforms. Schematic diagram for curation of samples from GEO is shown The expression values for qPCR were calculated using ΔΔ Ct method using 18S for normalization. Microarray values represent normalized and preprocessed data that have been log transformed. The plotted data represent mean ± S.E. Two-tailed student's t-test was used for statistical analysis of qPCR data. Statistically significant differences in expression are indicated with *. Similar trend of regulation was observed for data from both techniques for these four genes. *, p ≤0.05; **, p≤0.01. in Figure 5A. Pairwise comparisons of deregulated genes between breast cancer subtypes from the two GEO super datasets are shown in Figure S2 ( Table S4, S5). Comparison of GEO datasets with data from MDA-MB-231 resulted in 4 up and 13 downregulated genes in TNBC vs. HER2+ ( Figure 5B, Table S6). Analogously, 11 genes each were found to be up and downregulated in comparison with the MDA-MB-468 data ( Figure 5C, Table S7). Overlap of these two comparisons provided two downregulated genes, GDF15 and GPRC5A; however, there were no upregulated genes.
Additionally, we also compared our results with a published mRNA sequencing based study of different breast cancer subtypes from our lab [30]. Comparison of MDA-MB-231 cells results with the sequencing data sets resulted in 56 genes that showed the same deregulation pattern in TNBC as compared to HER2+ samples ( Figure 6A, Table S8a, b). Parallel comparison with data from MDA-MB-468 cells resulted in 72 genes with same trend of regulation ( Figure 6B, Table S8 c, d). From these two comparisons, we found 10 common genes that followed the same regulation pattern.

Discussion
We have established an isogenic model system for the comparative study of TNBC and non-TNBC (HER2+) subtypes. The reengineered non-TNBC cell lines express HER2 receptors at levels comparable to receptor-positive cell lines. As a proof of functionality of transfected receptor, we observed an effective downregulation of overexpressed HER2 in the stable clones after treatment with Herceptin. Furthermore, isogenic background of the stable cell lines made comparison between different subtypes feasible without any generally noticed variability of genetic background. To the best of our knowledge, this is the first isogenic cell line model for comparing two major breast cancer subtypes. Using our model system, we have identified gene expression signatures that differentiate TNBC and HER2+ breast cancer subtypes.
Our goal was to identify a genomic signature associated with the status of HER2 in breast cancer and how their loss in TNBC affects the expression of other genes. Using microarray technology, we interrogated the expression levels of multiple genes that changed as a result of expression of HER2 alone in isogenic setting using two different cell lines. We have characterized a signature of TNBC in comparison to HER2+, non-TNBC subtype. In addition, we have also identified a comprehensive list of all statistically significant deregulated genes between TNBC and HER2+ cell line in isogenic background. A survey of literature pointed out several candidates from our studies to be in line with various published studies, validating the merit of our study. We found that Fibroblast Growth Factor Receptor 2 (FGFR2) and acyl-CoA dehydrogenase, short/branched chain (ACADSB), which was found to be upregulated in TNBC in a study by Turner et al. to have higher expression in TNBC clones compared to two other subtypes [31]. Similarly Cysteine-rich angiogenic inducer 61 (CYR61), that we found to be upregulated in TNBC compared to HER2+ subtype, is overexpressed significantly in TNBC. CYR61 is also significantly upregulated in invasive breast cancer and considered as an important therapeutic target for breast cancer [32]. From the list of molecules that are positively correlated to HER2 status in breast cancer tumors and cell lines from a study by Bertucci et al., we found two candidates lysl oxidase (LOX) and fatty acid desaturase 2 (FADS2) with similar correlation in our data sets [33]. Candidates reported in this study could include molecules that are affected by the downregulation of HER2 in TNBC, including novel targets of HER2, and are important players in the development of TNBC and its invasive phenotype.
One theory as to how TNBC might evolve is that in early stages, breast tumor starts out as a hormone receptor positive benign lesion that depends on hormones (e.g. estrogen) for its growth and proliferation. During the course of its malignancy, the tumor develops hormonal independence, gradually loses the expression of estrogen receptor and becomes more aggressive. However, the mechanism of downregulation of the estrogen receptor is poorly studied due to lack of a suitable model system [34]. Clark et al. studied down regulation of estrogen receptor using wild type MCF-7 and its sublines that lose their receptor expression and hormone dependence [35]. Differentially expressed genes between TNBC and HER2+ samples in our model system potentially constitute the gene signature that changes as a result of tumor progression from being receptor, HER2 in this case, positive to receptor null. Similarly, since TNBC shows absence or low levels of HER2, our data could point to negative regulators of the receptor tyrosine kinase in TNBC. However, additional studies are needed to validate these tentative conclusions and to gain a mechanistic insight into the downregulation of the receptor. We found several molecules with repressor functions like ERBB receptor feedback inhibitor 1 (ERRFI1), grainyhead-like 1 (Drosophila) (GRHL1) and E3 ubiquitin-protein ligase RING2 (RNF2) upregulated in TNBC clones compared to other two subtypes. It would be interesting to study if any of these are responsible for the downregulation of HER2 receptors in TNBC.
The overlap of gene expression signature from our cellular model with microarray data sets from breast cancer patient samples of corresponding subtypes increases the confidence of our finding. Additionally it also provides an essential receptor status related gene dataset with physiological significance for further studies. This could be of high value as diagnostic and therapeutic targets for TNBC.

Conclusion
Using an isogenic model system, we have shown that the underlying molecular differences between various breast cancer subtypes are evident at the level of gene expression. Our findings point to key molecules and events that are potentially linked to the biology of TNBC and explain how it differs from HER2+ subtype. The deregulated genes potentially represent the signature that changes as breast cancer progresses into a more aggressive TNBC phenotype. Our findings also exhibit how upregulation of a single gene could lead to whole range of molecular changes in the isogenic cells. Importantly, a portion of alterations in the expression levels in  the model system also hold true in human TNBC and non-TNBC HER2, suggesting the presence of an element of isogenic signature within the generally noted heterogeneous expression patterns. We provide an important dataset and a model system for further exploration and testable hypothesis generation. Further in depth studies are needed for confirming our findings and using it for identifying cases of TNBC from non-TNBC that would aid in tailoring subtype specific therapies. Figure S1. Surface expression of transfected HER2 is effectively downregulated upon 4D5 (Herceptin) treatment. Confocal microscopy was used to determine the surface expression of HER2 in isogenic clones in A) MDA-MB-231 B) MDA-MB-468 backgrounds. One TNBC and two HER2 clones in each cell line were treated with 10nM 4D5 after 24hr starvation. Immunofluorescence staining was used for examining HER2 expression (red) in the control and treated cells. HER2 was expressed at a higher level in HER2 clones in comparison to TNBC (pcDNA) clones. Treatment of the HER2 clones with 4D5 reduced the expression of HER2 similar to the levels seen in the TNBC clones.