Integrated Genomic and Epigenomic Analysis of Breast Cancer Brain Metastasis

The brain is a common site of metastatic disease in patients with breast cancer, which has few therapeutic options and dismal outcomes. The purpose of our study was to identify common and rare events that underlie breast cancer brain metastasis. We performed deep genomic profiling, which integrated gene copy number, gene expression and DNA methylation datasets on a collection of breast brain metastases. We identified frequent large chromosomal gains in 1q, 5p, 8q, 11q, and 20q and frequent broad-level deletions involving 8p, 17p, 21p and Xq. Frequently amplified and overexpressed genes included ATAD2, BRAF, DERL1, DNMTRB and NEK2A. The ATM, CRYAB and HSPB2 genes were commonly deleted and underexpressed. Knowledge mining revealed enrichment in cell cycle and G2/M transition pathways, which contained AURKA, AURKB and FOXM1. Using the PAM50 breast cancer intrinsic classifier, Luminal B, Her2+/ER negative, and basal-like tumors were identified as the most commonly represented breast cancer subtypes in our brain metastasis cohort. While overall methylation levels were increased in breast cancer brain metastasis, basal-like brain metastases were associated with significantly lower levels of methylation. Integrating DNA methylation data with gene expression revealed defects in cell migration and adhesion due to hypermethylation and downregulation of PENK, EDN3, and ITGAM. Hypomethylation and upregulation of KRT8 likely affects adhesion and permeability. Genomic and epigenomic profiling of breast brain metastasis has provided insight into the somatic events underlying this disease, which have potential in forming the basis of future therapeutic strategies.


Introduction
Brain metastasis is the most common intracranial tumor, occurring in 15-40% of all cancer patients with metastatic disease [1,2,3]. The incidence of brain metastasis has increased in recent years, possibly due to prolonged survival of cancer patients receiving aggressive treatments for their primary or systemic disease [1,2,3]. Given their overall frequency in the population, lung and breast cancer are by far the most common tumors to develop brain metastases [1,2,3].
Epidemiological studies suggest that brain metastases occur with a frequency of approximately 10-16% in patients with breast cancer, although large autopsy studies indicate that frequencies may be as high as 18-30% [2,3,4,5]. Brain metastases occur rapidly, usually within 2-3 years following diagnosis of systemic metastatic disease, and the median survival once there is brain involvement is a stifling 13 months with fewer than 2% of patients surviving greater than 2 years. Breast cancer involving the brain (parenchyma or leptomeninges) is considered a feature of latestage progressive disease for which few effective treatments exist. Due to limitations imposed by the blood brain barrier (BBB), chemotherapy has not generally been used to treat most epithelial cancers that metastasize to the brain. Whole brain radiation can provide a survival benefit of 4-5 months, which can be further extended with stereotactic radiosurgery (SRS). Surgery can also lead to dramatic improvements in survival if fewer than three metastases exist and all are treated aggressively with surgery or SRS.
Currently there are few predictive measures for identification of patients at risk for developing brain metastasis from their primary cancer. In general, the development of brain metastases from breast cancer depends on several prognostic factors, including younger age, ethnicity, hormone receptor negative status, presence of BRCA1 germ-line mutations, and the expression of the epidermal growth factor receptor 2 (Her2/neu) proto-oncogene, all of which contribute to an increased rate of brain metastasis [2].
The overall goal of our study was to utilize array-based technologies to assemble a compendium of genomic and epigenomic events in a series of breast cancer brain metastases to understand the landscape of breast cancer brain metastatic lesions. The compendium would be interrogated for common and uncommon abnormalities in order to identify potential new therapeutic targets to control this fatal manifestation of breast cancer.

Sample Acquisition
Retrospective fresh-frozen samples of breast brain metastases (BBM) were obtained from The University of Toronto Nervous System Tissue Bank, University Health Network, Toronto, Canada (n = 23) and from The Department of Neurosurgery's brain tumor bank, University of Iowa Medical Center, Iowa City, Iowa (n = 12). Non-neoplastic brain samples were also obtained from the University of Toronto Tissue Bank (n = 2) and from The Department of Neurosurgery, University of Iowa Medical Center (n = 8). Ten non-neoplastic breast tissue specimens were purchased from Asterand (Detroit, MI). A series of 50 early-stage (grade 1 and 2) breast cancer specimens were obtained from the Manitoba Tumor Bank, Winnipeg, Manitoba. All samples were obtained under appropriate ethical procedures and informed patient consent at the respective institutions.
All human biospecimens used in this study were pre-existing and de-identified before shipment to the Translational Genomics Research Institute (TGen) for genomic analysis. TGen investigators did not have access to patient identifiers at any time before or after completion of the study. TGen investigators and the holder of patient identifiers entered into an agreement prohibiting the release of this information to TGen investigators under any circumstances. Therefore, the biospecimens do not qualify as human subjects and the study is exempt from Institutional Review Board, in accordance with the Office of Human Research Protections (OHRPs) Guidance on Research Involving Coded Private Information or Biological Specimens.

gDNA and RNA Isolation
Genomic DNA (gDNA) was isolated from fresh-frozen tissue using the DNeasy Blood and Tissue Kit (Qiagen, Valencia, CA) with the following modifications. Approximately 25 mg frozen tissue was pulverized after a brief incubation in liquid nitrogen, then lysed in 180 mL ATL buffer. The sample was further disrupted using a hand-held tissue homogenizer (VWR, Radnor, PA) before adding 20 mL proteinase K solution. Lysates were incubated at 56uC for 72 hours. Following proteinase K treatment, lysates were centrifuged at 17,0006 g to pellet particulate material. Genomic DNA was eluted in 100 mL T low E buffer (Teknova, Hollister, CA) and stored at 4uC. Total RNA, including small RNA, was isolated using the mirVana miRNA Isolation Kit (Ambion, Austin, TX) following the manufacturer's protocol and stored at 280uC. Genomic DNA and total RNA yields and purity were assessed using a NanoDrop 2000c (Thermo Scientific, Waltham, MA). Genomic DNA integrity was confirmed by agarose gel electrophoresis. Total RNA samples were evaluated for integrity using the Bioanalyzer RNA 6000 Nano LabChip Kit (Agilent Technologies, Santa Clara, CA) on a Bioanalyzer 2400 (Agilent Technologies). Only total RNA samples with RNA integrity number values of at least 7 (RIN$7) were profiled. A total of 10 samples were dropped due to RIN values lower than 7.

Copy Number Analysis
Array-based comparative genomic hybridization (aCGH) was performed on 19 BBM samples using the Agilent SurePrint G3 Human CGH Microarray Kit, 161M, which have an average probe spacing of 2.1 Kb (Agilent Technologies). Briefly, 800 ng of experimental and normal female reference (Promega, Madison, WI) gDNA were independently digested with Bovine DNAse I (Ambion) and directly labeled with Cy5-dUTP and Cy3-dUTP, respectively, using the BioPrime Array CGH Genomic Labeling Module (Invitrogen, Carlsbad, CA). Labeled DNA was purified using Vivaspin 500 columns (Satorius Stedim Biotech, Goettingen, Germany). Equal amounts of labeled, purified experimental and reference DNA were hybridized to the microarray in a rotary oven at 65uC for 40 hr at a rotation speed of 20 rpm. The slides were washed according to manufacturer's protocol and images were captured using an Agilent DNA microarray scanner set at default settings for array-based comparative genomic hybridization. Scanner images were extracted using Feature Extraction software v.10.5.1.1 (Agilent Technologies). Log 2 data was imported into Agilent DNA Analytics 4.0.81 software for visualization and quality assessment. The aCGH data for 15 of 19 BBM samples, which passed quality control metrics, were segmented using the circular binary segmentation (CBS) algorithm [6,7]. Genomic Identification of Significant Targets in Cancer (GISTIC) was then used to identify regions of the genome that were significantly amplified or deleted across the 15 breast brain metastasis samples. GISTIC calculated a statistic (G-score) for the frequency of occurrence and the amplitude of the aberration. The statistical significance of each aberration was computed by comparing the observed G-score to the results expected by chance. Regions with false-discovery rate (FDR) q-values less than 0.25 were considered statistically significant. In addition, copy number variation analysis was performed using Agilent's Genomics Workbench 6.5 software. The Aberration Detection Method 2 (ADM-2) algorithm was used to flag altered chromosomal regions and breakpoints (ADM-2 threshold of 5.5 within a 5.0 Mb window size containing at least 3 probes and with minimum 0.58 absolute average log ratio for the region).

mRNA Expression Profiling
RNA from 35 BBM, 10 non-neoplastic brain (NBn) and 10 nonneoplastic breast (NBr) tissues were profiled using Agilent whole human genome 4644K mRNA expression microarrays. A quickamplification kit (Agilent Technologies) was used to amplify and label 500 ng target mRNA species into complementary RNA (cRNA) for oligo microarrays according to the manufacturer's protocol. For each two-color array, a commercial universal reference RNA (Stratagene, La Jolla, CA) was labeled with cyanine 5-CTP and cyanine-3-CTP (Perkin Elmer, Boston, MA). Complimentary RNA concentration and labeling efficiency were measured spectrophotometrically. Approximately 800 ng of both Cy5-labeled experimental cRNA and Cy3-labeled universal reference RNA were hybridized to each microarray (adjusting for labeling efficiency). Images were captured using an Agilent DNA microarray scanner set at default settings for gene expression. Scanned images were processed using Feature Extractor v. 10.5.1.1software by applying a LOWESS (locally weighted linear regression) correction for dye bias and background noise was subtracted from spot intensities. To filter the preprocessed data, genes with a background signal higher than feature signal were removed.

Intrinsic Subtype Classification of Breast Brain Metastasis
The PAM50 gene expression classifier is a supervised, centroidbased prediction method to classify breast cancers into intrinsic molecular subtypes (Luminal A, Luminal B, HER2-enriched, basal-like, and normal-like) using a 50-gene signature. We applied this classifier to samples analyzed on the Agilent 4644K mRNA expression platform. Normal samples were used as controls. The log ratio values of the probes were collapsed to gene level by taking the median of all probes matching to same gene.

DNA Methylation Analysis
A total of 1 mg of DNA from 32 BBM, 12 NBr, 15 NBn samples and 48 early-stage primary breast cancer samples was bisulfite converted with the EZ DNA methylation kit (Zymo Research, Irvine, CA) and subsequently processed for hybridization onto the Infinium HumanMethylation27 BeadArray (Illumina, San Diego, CA) according to manufacturers' protocols. This array interrogates 27,578 CpG dinucleotides encompassing 14,495 genes. Bisulfite-treated DNA was subsequently amplified, fragmented and hybridized to locus-specific oligonucleotides on the BeadArray. Image processing and intensity data extraction were performed using an Illumina BeadArray Reader. The GenomeStudio Methylation software from Illumina was used for data assembly and acquisition. Methylated (M) and unmethylated (U) alleles were detected by fluorescence signal from single-nucleotide extension of the DNA fragments. Results were interpreted as a methylation ratio (b-value) of methylated signal (M) to the sum of methylated and unmethylated signal (M+U) for each locus. The average b value reports a methylation signal ranging from 0 to 1 spanning completely unmethylated to completely methylated, respectively. A differentially methylated locus was defined by having a statistically significant (p-value#0.05 after computing a Mann Whitney non-parametric test) average b difference of at least |0.2| between groups.

Data Deposition in Public Portals
The raw Agilent gene expression array data discussed in this publication have been deposited in NCBI's Gene Expression Omnibus and are accessible through GEO Series accession number GSE52604 (http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc = GSE52604

Pathway Analysis
Gene lists of interest were uploaded into IPA (IngenuityH Systems, Redwood City, CA) and the Core Analysis workflow was run with default parameters. The Core Analysis provides an assessment of significantly altered pathways, molecular networks and biological processes represented in the samples' gene list.

Quantitative Reverse-Transcriptase (RT)-PCR and Copy Number PCR Assays
Complimentary (cDNA) was synthesized using 100 ng of total RNA in a 20 ml reaction volume. The SuperscriptH III First Strand synthesis system (Life Technologies, Carlsbad, CA) was used with the following conditions: 10 minutes at 25uC, 30 min-utes at 50uC, 5 minutes at 85uC and 20 minutes at 37uC with RNase H. SYBR green fluorescence was used for the detection of amplification after each cycle using the LightCycler 480 SYBR Green I Mastermix (Roche Applied Science, Indianapolis, IN). Quantitative PCR (qPCR) was subsequently performed on cDNA in a final volume of 25 ml using the LightCycler 480 instrument (Roche Applied Science). The qPCR cycling conditions were as follows: 5 minutes at 95uC for activation of PlatinumH Taq DNA polymerase, 10 seconds at 95uC, 20 seconds at 59uC, and 30 seconds at 72uC for 45 cycles. Quantification was based on the number of cycles necessary to produce a detectable amount of product above background. The following primer pairs were used: AURKB For each sample, the delta C t value was calculated as the difference between the target gene C t value and the C t value of the geometric mean of the internal reference controls. The quantity of expression was calculated relative to the average of expression obtained from NBr and NBn samples (n = 6). The equation used for relative fold-change was 2 2DDCT .
Copy number validation was completed with the qBiomarker Copy Number PCR Assays (Qiagen) for ATAD2 (Assay ID 28855976), and cMYC (Assay ID 28877687). Samples were analyzed with the qBiomarker SYBR ROX Mastermix (Qiagen). A multi-copy reference assay, the qBiomarker Multicopy Reference Copy Number PCR Assay (MRef, Assay ID 30773761) was performed for each sample and served as the internal reference control. Data were analyzed with qBiomarker Copy Number PCR Assay Data Analysis Software.
All PCR reactions were run in triplicate, and melting curve analysis was performed to ensure specificity of the PCR product. Negative (no template) controls were run in parallel to confirm the absence of nonspecific fluorescence in samples.

mRNA Expression
Gene expression profiling was performed using Agilent whole human genome 4644K mRNA expression microarrays to identify differentially expressed genes (DEG) in BBM samples and an independent set of non-neoplastic hyperplastic breast samples (NBr) and non-neoplastic brain samples (NBn). DEG were selected if differential expression was evident between tumor samples and NBn; between tumor samples and NBr but not between NBn and NBr. In addition, DEG were selected based on p-values#0.05 and a fold-change $2 or#22. This comparison identified 863 differentially expressed genes ( Table S2, File S1). A heatmap was generated in GeneSpring v12.1 to visualize the DEG list, which also shows a clear separation of tumor and non-neoplastic samples ( Figure 2A). In order to identify biological concepts altered in the differentially regulated BBM gene list we submitted the gene list to IPA (IngenuityH Systems) and applied the Core Analysis workflow. The functional analysis portion of the workflow identified biological functions and/or diseases most significantly Deep Profiling Analysis of Breast CNS Metastasis PLOS ONE | www.plosone.org altered in our DEG list. Significant categories were sorted based on activation/inhibition z-scores to identify the most significant distinguishing categories with respect to up-regulated and downregulated genes. Fourteen specific functions had increased activation states as evidenced by z-scores $2 ( Table S3a, File S1). Five of the fourteen functional annotations mapped to the 'Cell Cycle' category. The genes defining the five categories and their relationships were visualized as a network in ( Figure 2B). 'DNA repair' was another noteworthy enriched category. The genes that mapped to this category were also visualized as a network ( Figure 2C). Enrichment analysis was performed to annotate further the genes in this category. Two DNA repair processes, 'double-stranded DNA break repair' (CDCA5, FGF2, NR4A1, PRKDC, RAD54L, KPNA2) and 'homologous recombination repair' (HUS1, PRKDC, RAD54L and RAD54B) were identified and highlighted in Figure 2C. The functional annotation categories for down-regulated genes converged mostly on categories associated with 'tissue morphology' and 'development' ( Table S3b, File S1).
To identify potential transcriptional regulators among the DEG we used the results of the upstream regulator analysis. This part of the Core Analysis workflow connects transcriptional regulators in the IPA database to the differentially expressed genes. Overconnected regulators are scored with a p-value, for gene enrichment, and z-score for degree of activation based on direction of regulation in database and concordance with direction of regulation in DEG. The list of upstream regulators was filtered for 'transcription regulators' upregulated (z-score $2) and downregulated (z-score#2). Five specific transcription factors were identified as active and six were identified as inhibited ( Table S4a, File S1). One of the inhibited transcription factors was TP53, implying that TP53 signaling is defective in BBM. Among the activated transcription factors, IRF1 and IRF7 seemed to be connected preferentially to genes involved in immune response such as 'antiviral response' and 'antimicrobial response.' This may indicate a possible infiltration of immune cells in the samples or could reflect an immunogenic response by the tumor cells. Two interesting transcriptional regulators highly scored were FOXM1 and TBX2 that had three commonly regulated genes. We constructed a combined network illustrating the downstream transcriptional functional targets for both transcription factors ( Figure S1, File S1). A functional enrichment was performed on the resultant network indicating that the these two transcription factors control the expression of genes enriched for processes such as 'cell cycle progression' (p-value 4.1E-15), mitosis (p-value 4.97E-13) and 'cytokinesis' (p-value 1.98E-10, Table S4b, File S1). This observation, coupled with the above functional enrichment on the whole gene list suggests that breast cancer brain metastasis gene expression is associated with cell cycle/mitosis and may be driven by FOXM1 and TBX2.
Next, we used gene set enrichment analysis (GSEA) to identify sets of genes that are coordinately regulated in the BBM samples. The GSEA algorithm identifies gene sets enriched at the top (breast brain metastases) or bottom (non-neoplastic samples) of the ranked list of DEG. We conducted the GSEA analysis using only the c5 gene set library (GSEA | MSigDB), which contains only gene ontology gene sets. There were 109 gene sets significant at a FDR,50% and p-value of #0.05 that were upregulated in the breast brain metastases and only one gene set downregulated (full GSEA results can be found in Table S5a-b, File S1). We visualized the above-referenced gene sets using the Enrichment Map plug-in for cytoscape [8]. The enrichment map portrays the GSEA results as a network of gene sets (nodes) connected by edges representing overlapping genes. The enrichment map improves interpretation of GSEA results by allowing for the identification of functional groupings of the enriched gene sets. We manually inspected the resultant clusters and assigned summary labels to individual subnetworks of interest. Some interesting subnetworks enriched in the BBM samples include 'Cell Cycle/Mitosis', 'DNA repair', 'Vesicle Processes', 'Protein Localization' and 'RNA processing'. The only category associated with the non-neoplastic samples was extracellular matrix ( Figure 2D). We experimentally validated the expression of FOXM1 and AURKB in 42 breast brain metastasis samples (which included those used in the expression arrays) and in a series of 50 primary breast cancer samples by qRT-PCR. Both FOXM1 and AURKB were significantly upregulated in brain metastasis samples compared to primary breast cancer samples and non-neoplastic samples ( Figure 2E).

Breast Cancer Intrinsic Subtype Analysis of Breast Brain Metastasis
We used the PAM50 gene expression classifier to divide the breast brain metastatic samples into the common intrinsic subtypes known for breast cancer. From this analysis we identified the following subtypes in our sample cohort: 2 (5.7%) Luminal A, 12 (34.2%) Luminal B, 8 (22.8%) Her2+/ER2, 11(31.4%) basal-like, and 2 (5.7%) Normal-like tumors. An unsupervised clustering analysis of the 863 DEG failed to discriminate between the different subtypes ( Figure 2A). Therefore, we performed an analysis of variance (ANOVA) to identify DEG between Luminal B, Her2+/ER2 and basal-like tumors. A Post-hoc Tukey test and a Benjamini Hochberg multiple correction test were applied to the data. Differences with fold-change ,2 were excluded. We excluded tumors classified as Luminal A and normal-like due to a very small sample size. There were 733 DEG between Luminal B and basal-like tumors; 492 DEG between Her2+/ER2 and basal-like and 223 DEG between Luminal B and Her2+/ER2 ( Table S6a-b, File S1). The union of the differentially expressed genes between groups consisted of 774 unique genes (Table S7, File S1) or 886 probesets. Hierarchical clustering using this gene list was able to clearly distinguish the subtypes and six distinct gene clusters were identified ( Figure 3, Table S8a-f, File S1).
The genes in each cluster were analyzed using ToppGene suite and data are presented in tables and as summary word clouds for enriched processes and signatures (Table S9a-f, Figure S2, File S1). Cluster 1, 2 and 6 contained genes upregulated in both the Her2+/ER2 and Luminal B subtypes. However, expression was generally highest in the Luminal B brain metastases. Enrichment analysis of genes in clusters 1, 2 and 6 using the ToppFun module of the ToppGene suite reveals genes previously associated with breast cancer signatures [9]. Specifically, those signatures were associated with luminal subtypes and estrogen receptor(ER)positive breast cancer. Additionally, cluster 1 has overexpression of RET, and ERBB3, which represent possible actionable therapeutic targets. Cluster 2, similar to clusters 1 and 6, contained genes also upregulated across the Her2+/ER2 and Luminal B samples. This cluster contained GATA3, which is an ESR1 target gene, but was only expressed in the Luminal B tumors. Of note, in cluster 6, ESR1 was most highly expressed in Luminal B tumors, coincidentally with AR. Basal-like tumors had a negative expression value for ESR1 and AR. We also note highest expression of FLT3 and FOXA1 genes in Luminal B tumors. Expressed highly in both Her2+/ER2 and Luminal tumors were the TFF3 and LRRC6. Genes in cluster 3 and 5 were, in general, were preferentially upregulated in the basal-like samples. The gene lists were consistent with known basal-like breast cancer genes and are also known to be lowly expressed in Luminal breast cancer.
For example, Keratin 5, 6 and 14, as well as, CDH3 were present in cluster 5. In addition, FOXC1 and CHST3 in cluster 3 and UGT8, and CHODL genes in cluster 5 were among the highly expressed genes in basal-like tumors. Cluster 4 is unique in that it has a number of genes downregulated across the three subtypes, however, there was a more pronounced downregulation in the Her2+/ER2 and Luminal B subtypes. Interestingly, this cluster contained a number of proliferation-associated genes, but they were relatively underexpressed. The basal cell marker gene, Keratin 17, is overexpressed in the basal samples across this cluster. WNT pathway members, FZD7 (C3), WNT6 (C4), WNT11 (C5) and FZD9 (4) were all preferentially expressed in basal-like tumors suggesting novel therapeutic opportunities for basal-like breast cancer brain metastases. Her2+/ER2 tumors were associated with overexpression of TMEM45b, very reminiscent of a Her2 subtype. While these samples had some commonly expressed genes with the basal-like tumors, such as CLDN8, they were most similar to the Luminal B tumors. They also had the highest expression of TML5, CYP4F8, and PAX9 when compared to the other subtypes.

DNA Methylation Analysis
In order to identify alterations in DNA methylation we used the HumanMethylation27 BeadChip array. We compared BBM to NBn and NBr tissue and identified 425 differentially methylated loci (DML, Figure 4A, Table S10, File S1). The median methylation values were 0.4, 0.2 and 0.15 for breast brain metastasis, NBn and NBr respectively, indicating that breast brain metastasis was associated with hypermethylation ( Figure 4B). Of the 425 loci, 117 were hypomethylated compared with nonneoplastic tissue and 308 were hypermethylated (Table S10). Only 23 of the hypo-methylated loci were associated with CpG islands, compared to 294 hyper-methylated loci, which occurred in CpG islands ( Table S10). The 425 DML failed to discriminate between the molecular subtypes ( Figure 4A), so we performed an ANOVA analysis to identify differentially methylated loci between the subtypes. A Post-hoc Tukey test and Benjamini Hochberg multiple correction were applied to the data. Differences with b values less than |0.2| were excluded. Due to small numbers of samples, we excluded tumors classified as Luminal A and normallike. There were 95 DML between Luminal B and basal-like tumors; 71 DML between Her2+/ER2 and basal-like tumors; and 13 DML between Luminal B and Her2+/ER2 (Table S11a-c, File S1). The union of these loci resulted in 90 unique DML that discriminated between the subtypes ( Figure 4C, Table S11d, File S1). When the union of these loci were examined, basal-like tumors had the lowest median methylation (0.32) compared with Her2+/ER2 and Luminal B subtypes (0.61 and 0.68 respectively, Figure 4D), including non-neoplastic tissue Figure 2. Analysis of Differentially Expressed Genes in Breast Brain Metastasis. A) Hierarchical clustering of 863 genes distinguishing breast brain metastases (BBM, orange ticks) from non-neoplastic breast (NBr, red ticks) and non-neoplastic brain tissue (NBn, blue ticks); B) Cell Cycle Gene Network. Genes that mapped to the 'Cell Cycle' categories were used to construct a direct interaction network; C) DNA Repair Network. Genes that mapped to the 'DNA Repair' category were used to construct a direct interaction network. Two DNA repair processes are highlighted and connected to genes annotated to those processes. For Figure 2B and 2C, the gene nodes are shaded red in proportion to the degree of upregulation. Gene nodes shaded in green are downregulated. Log 2 ratios are listed under the individual nodes. Direct physical interaction relationships are represented by solid lines. Dotted lines represent indirect physical interactions; D) GSEA Enrichment Map. The results from the GSEA analysis comparing 35 breast cancer brain metastases to 10 non-neoplastic brain and 10 non-neoplastic breast were visualized using Cytoscape and Enrichment Map plug-in. The significant gene sets from the C5 gene ontology library are represented with a p-value of #0.05 and false discovery rate (Q) of ,0.5. Each individual node represents one gene set with the size of node proportional to number of genes in the set and color intensity relates to degree of enrichment (red = up in tumor; blue = down in tumor). The relative overlap of the number of genes shared by individual nodes is represented by the thickness of the connecting edges. Interesting subgroups in the network are circled and manually annotated; E) Vertical scatter plots showing FOXM1 and AURKB overexpression by qRT-PCR in breast brain metastasis (BBM, n = 42) compared to early-stage primary breast tumors (n = 50). Fold expression was relative to expression in non-neoplastic breast or brain samples (n = 10). doi:10.1371/journal.pone.0085448.g002 examined. From these DML we identified a subset of 15 DML that were most hypomethylated in basal-like tumors compared to the other subtypes ( Figure 4E-F, Table S12, File S1). This signature represents a potential CpG island hypomethylator phenotype (CIHMP) for basal-like breast brain metastasis and includes ALDH1A3, FANCG, TRIM29 and HOXA11 (Table S12, File S1).

Combined aCGH and Gene Expression Analysis on Single Tumors
Next, we undertook a single-tumor-level analysis to identify a tumor specific, n = 1, investigation into altered biological concepts specific to single samples. A combined copy number and gene expression based analysis was conducted on 11 individual BBM samples. In brief, the ADM2 gene-level data were matched to gene-level mRNA expression data. Each sample was compared against non-neoplastic tissue as described above. Genes were filtered to include only those with a log 2 ratio $2 or #22. For copy number data, a filter was applied to the ADM2 log 2 ratio to include only those genes with values $0.5 or #20.8. The remaining data were then filtered for congruency to ensure consistency in direction of combined data, i.e., genes needed to be amplified/overexpressed or deleted/underexpressed. The combined aCGH and mRNA expression lists for each sample were uploaded into MetaCore software (Thomson Reuters, New York, New York) for functional ontology enrichment, pathway mapping and knowledge mining. Each sample was interrogated with this workflow to identify biological concepts and observations which include single-gene alterations as well as pathway-based alterations. Expert review of data was conducted to identify and prioritize important biology and concepts for each sample (Table S13, File S1). Below we describe the top concepts identified in our samples.
We were able to identify at least one specific pathway/concept aberrantly operative in each sample. There were three samples that had alterations, which would predict interference with the 'autophagy' pathway. One sample had amplifications in two genes, eiF2AK3 and ATF6, which are crucial members of the 'endoplasmic stress' pathway. Multiple alterations were observed in one sample in the 'WNT signaling' pathway. Additionally, two samples had multiple alterations in the 'chromosome condensation' pathway. Lastly, six samples had amplifications and coupled overexpression of a histone gene cluster involving the genes HF3A and HF3B.
We also note a number of interesting genes (number of samples in parentheses) that were amplified and overexpressed: AKT1 (2), ATAD2 (7) We selected the ATAD2 gene for copy number and gene expression validation by qRT-PCR and copy number qPCR assays due to its high frequency of alteration shown by aCGH and gene expression arrays. In addition, its position on 8q24, a known hotspot locus in breast cancer, further suggests the potential importance of this gene. Here we examined 42 breast brain metastases and 50 primary breast cancer samples. The BBM samples included all samples analyzed by aCGH and the gene expression array platform as well as additional samples. These data demonstrate that both metastatic and primary samples had a comparable increase in both copy number and expression of ATAD2 compared to non-neoplastic samples ( Figure 5). The copy number data for the cMYC gene, which is also positioned on 8q24, demonstrate gene amplification in 10/15 samples by aCGH, but this was not accompanied by an increase in gene expression. Quantitative PCR for copy number determination in conjunction with qRT-PCR for expression analysis yielded similar findings where there was a noticeable, but similar cMYC amplification in both brain metastasis and primary breast cancer samples without any evidence of gene expression ( Figure 5).

Combined Gene Expression and DNA Methylation Analysis on Single Tumors
Similarly, a combined gene expression and methylation analysis was conducted on a sample-by-sample basis for 11 samples in our cohort. Gene expression and methylation data for each sample were compared against non-neoplastic tissue. Differentially expressed genes with log 2 fold-changes $2 or #2 and which had a corresponding methylation change with delta beta values .|0.2| were used for further analysis. The resultant gene lists were uploaded into IPA and the Core Analysis workflow was run with default parameters. Analysis of Molecular Functions demonstrated (sample number in parentheses) defects in 'Cellular Growth and Proliferation' (7), 'Cellular Development' (7), 'Cellular Movement' (7) and 'Cell-Cell Signaling Interactions' (8). Three samples had predicted decreased activity of cell movement and invasion of tumor cells. Two samples had predicted decrease in the motility of hematological cells such as leukocytes and granulocytes and one sample had predicted decreased activity in cell chemotaxis. Hyper-methylated and down-regulated genes most frequently contributing to cell motility and adhesion included: VAV1 (2), PENK (6), EDN3 (6), EDNRB (4), RELN (5) and ITGAM (4). Other genes affecting cell growth and proliferation which were frequently hypermethylated and downregulated included: CDKN1C (6), CDKN2B (3), CCND2 (4) and BANK1 (7). Other genes of interest include USP44 (6), and CRYAB (4), HSPB2 (1). In eight samples, KRT8 (affecting adhesion and permeability of tight junctions) was found to be hypomethylated and upregulated. We used receiver operator characteristic (ROC) analysis to compare the methylation status of BANK1 and CDKN1C in breast brain metastasis samples to a series of 48 early-stage primary breast cancer samples. The areas under the ROC curves were statistically significant for BANK1 (2/2 HumanMethylation27 array probes) and 6/8 CDKN1C methylation probes representing different CpG loci (p,0.01, Mann-Whitney U test). The data demonstrate differential methylation between primary breast tumors and brain metastases, where the metastatic samples were significantly more hypermethylated than primary tumors. The ROC curves for BANK1 and CDKN1C are shown in Figure 6. These data highlight the importance of the epigenetic silencing of BANK1 and CDKN1C in the development of BBM.
There was also hypomethylation and upregulation of six Xlinked MAGE genes (1 Basal sample), histone gene cluster (7), DNMT3B (4), and IL20 (5). CHODL was upregulated in basal like tumors and downregulated in Her2 and Luminal B tumors, but we also noted hypermethylation and downregulation of CHODL in four Luminal B tumors. Similarly, TFF3 is highly expressed in Luminal B tumors and TFF1 is hypomethylated and overexpressed in fourLuminal B tumors.
We further examined each sample for enrichment in canonical pathways and identified 'IL8 signaling', 'hepatic fibrosis/hepatic stellate cell activation signaling' and 'thyroid hormone metabolism signaling' to be among the most frequently enriched pathways (Table S14, File S1). In particular, enrichment of 'IL8 signaling' was due mainly to the hypermethylation and downregulation of several key genes such as ANGPT1, ANGPT2, KDR, ITGAM, ITGB2, PIK3CG, PIK3CD and TEK. BRAF and BCL2 were among the hypomethylated and overexpressed genes in this pathway.

Discussion
The brain is a common sanctuary site of metastatic disease in patients with breast cancer and brain metastases are becoming increasingly prevalent as greater control over systemic disease is achieved. Given the poor clinical outcomes of patients with breast brain metastases, there is urgency to better understand the mechanisms underlying the pathogenesis of brain metastasis as well as to identify novel targeted therapies. Accordingly, we performed a comprehensive genomics and epigenomics analysis using microarray technology to measure alterations at the level of mRNA expression, DNA copy number and DNA methylation.
Copy number analysis identified a number of focal and broad regions of amplifications and deletions. Among the most notable regions of broad gains in our samples were 1q, 5p, 8q, 11q and 20q. Broad-level deletions were identified in 8p, 17p, 21p and Xq. Previous studies have shown that ductal carcinoma in situ (DCIS) were associated with chromosomal gains in 1q, 8q and 17q [10]. Most commonly, deletions in DCIS have been shown to occur in 8p, 11q, 13q, 14q and 16q [11,12]. In invasive breast cancer, gains of 1q, 6p, 8q, 11q, 16p, 17q and 20q are most common and . Breast brain metastases (BBM, blue ticks) are distinguished from non-neoplastic breast (NBr) and non-neoplastic brain tissue (NBn, red ticks). B) Box plot demonstrating higher overall median methylation levels for 425 DML in BBM compared with NBr and NBn. C) Hierarchical clustering of 90 DML obtained by performing an ANOVA analysis and analyzing the union of genes between the different subtypes was able to distinguish BBM intrinsic subtype; D) Box plot representing the median methylation levels of 90 DML described above. NBr and NBn values are also shown. Basal-like breast brain metastases have overall lower methylation compared with the other groups. E) CpG Island Hypomethylator Phenotype (CIHMP) in basal-like brain metastases (red ticks) representing the 15 most hypo-methylated CpG Island loci when compared to Luminal B (maroon ticks) and Her2+/ER2 tumors (blue ticks); F) Box plot graphing the methylation values of the 15 CpG loci most hypomethylated in basal-like breast brain metastases. doi:10.1371/journal.pone.0085448.g004  chromosomal losses have been identified in 1p, 8p, 11q, 16q, 18q and 22 [11,12]. These data suggest some clear overlap of regions involved in primary breast cancer, but also point towards the emergence of other chromosomal alterations that may be unique to breast cancer brain metastasis. Certainly, concomitant gain of 8q with loss of 8p has been previously described for breast cancer, as well as prostate cancer, and has been associated with disease progression and poor patient prognosis [13]. Known breast cancer oncogenes such as MYC (8q) and ERRB2 (17q), while among the common regions of amplifications, were not among the highly expressed genes in our samples.
ATAD2 (8q24), and DERL1 (8q24) were among the frequently amplified and over-expressed genes suggesting they could play an important role in breast brain metastasis. ATAD2 may be a transcriptional coactivator of ESR1 required to induce the expression of estradiol target genes such as CCND1, MYC and E2F1 and may be required for histone hyperacetylation [14]. It has also been identified as a MYC cofactor and correlates with poor breast cancer outcomes [15]. The protein encoded by this gene contains two AAA domains and a bromodomain. AAA family proteins often perform chaperone-like functions that assist in the assembly, operation or disassembly of protein complexes. DERL1 encodes a member of the derlin family of proteins and is thought to participate in an endoplasmic reticulum (ER)-associated degradation response and retrotranslocate misfolded/unfolded proteins into the cytosol for proteosomal degradation. Data in breast cancer cells show that DERL1 expression is increased by ER stress while DERL1 knockdown resulted in decreased development of cancer cells [16]. The NEK2A gene on 1q32 was another frequently amplified and over-expressed gene in the samples we analyzed. The gene encodes a protein serine/ threonine kinase that is involved in mitotic regulation. It has recently been described to contribute to the growth potential of DCIS and IDC and expression correlated to higher histological grade and lymph node metastasis [17].
Several samples had co-deletion and downregulation of CRYAB and HSPB2 (both on 11q23) due to deletion and/or hypermethylation. CRYAB (B crystallin) and HSPB2 are two members of the multi-gene small heat shock proteins (sHSPs) family that are typically coexpressed in the mammalian heart, but the biological roles remain poorly defined [18,19]. CRYAB has been implicated in stress-inducible translocation, antiapoptosis, remodeling of the cytoskeleton, cardioprotection and inheritable cardiomyopathy in humans [19]. It has been reported that 11q22-23 is a frequent target for deletion during the development of many solid tumor types, including breast, ovarian, cervix, stomach, bladder carcinomas and melanoma [20], suggesting tumor suppressor functions in solid tumors including breast brain metastases. Integrated copy number and gene expression analysis also reveal BRAF, AKT1, and IGF1R amplifications and deletion/downregulation of ATM, all of which belong to pathways that lend themselves to therapeutic targeting.
Differential expression analysis reveals significant ontologic profiles associated with G2-M checkpoint and proliferation. A central player in the G2-M cascade is FOXM1, which was overexpressed in a large percentage of breast brain metastases. FOXM1 is a transcriptional activator involved in proliferation, cell-cycle control, and mitosis, through the regulation of many genes involved in the mitotic checkpoint, such as AURKA, AURKB, PLK1, and CENPF. The FOXM1 gene was also recently highlighted as significantly deregulated in serous ovarian tumors and metastatic triple-negative breast cancer [21]. Our DNA methylation analysis demonstrates an overall increase in methylation compared to non-neoplastic tissue. This is interesting since much more of the cancer genome is generally subject to lower methylation levels rather than higher levels of methylation [22]. Nevertheless, this is consistent with our findings demonstrating upregulation due to amplification and/or hypomethylation of DNMT3B and MAT1A. We did however demonstrate lower overall methylation in basal-like tumors, a finding also consistent with basal-like primary breast cancer (TCGA) [23].
Functional annotation of our epigenetically-regulated genes demonstrates a strong relationship to inflammatory and immunological responses and disorders. We identified a propensity of genes related to both tumor and immune cell migration and adhesion to be epigenetically silenced in a high percentage of samples. This phenomenon could be the result of immune cell infiltration into the brain and/or could be also be explained by the fact that progression of metastatic cells from the blood stream into the perivascular space and then to brain parenchyma share similar mechanisms as those employed by cells of the systemic immune system [24]. The latter scenario can be explained by considering that once cells have colonized the brain, migratory-promoting genes of the tumor cells are repressed, as cells have likely reached their 'final destination'. This would seem to be accompanied by activation of proliferative genes. However, cancer progression does involve the activation of stromal cells including pericytes, fibroblasts and leukocytes [24]. Extravasation of metastatic cells may cause damage to components of the BBB, which may facilitate entry of systemic immune cells in the perivascular space and are known to have both tumor preventing and promoting roles [24].
Over the last decade, there has been a marked improvement in the understanding of the molecular profile of breast cancer, which has suggested that breast cancer may behave as a multiplicity of diseases [25,26,27]. Gene expression studies using DNA microarrays have identified at least four distinct subtypes of breast cancer, including Luminal A, Luminal B, HER2+/ER2, and the basal-like subtype [25,26,27]. Recent work has shown that compared to the Luminal subtypes, Her2+/ER2 and basal-like subtypes have a greater predilection for seeding the brain, a much shorter latency period for doing so and worst overall survival [5,28,29]. Applying the PAM50 classifier to our breast cancer brain metastasis series identified a relatively high number of Luminal B tumors compared to Her2+/ ER2 and basal-like subtypes. Certainly, the availability of samples at the time of accrual could have impacted the frequency of the subtypes in our series. Additionally, since we were not able to obtain the matched primary breast tumors, we were not able to confirm the subtype of the primary tumor to determine if receptor conversion has occurred. However, previous studies have analyzed gene expression signatures known in primary tumors in metastatic tumors [30,31].
In summary, the breast brain metastases in our series generally appeared to retain molecular features consistent mainly with breast cancer and with their respective subtypes. Identifying changes unique to metastatic tumors will ultimately require study of primary/metastatic pairs. However, whether intrinsic subtype switching has occurred or not, it is clear from this study that molecular signatures resembling those known for primary breast tumors and its subtypes can be identified. Ensuring optimal success for developing novel therapies for breast cancer brain metastases therefore requires consideration of the tumor intrinsic subtype status. Due to the BBB and the unique environment of the brain, novel therapeutic approaches for brain metastases warrant intensive research efforts. Our study has highlighted a number of fundamental genetic and epigenetic aberrations occurring in brain metastases from primary breast tumor with strong implications for future targeted therapies that are aimed at alleviating the burden of this clinically unmet need.

Supporting Information
File S1 Supporting figures and tables. Figure S1: Combined Network for Upstream Analysis of FOXM1 and TBX2. The downstream genes connected to FOXM1 and TBX2 were illustrated as a network in IPA. The mRNA expression ratios are listed below the gene nodes. The legend within figure describes the node and edge color keys. Figure S2: Word Cloud Analysis of Cluster Enrichments. We have used word clouds to visually summarize the textual results from the enrichment analysis of each gene cluster as observed in Figure 3. The results were generated using www.wordle.net web resource. The larger the word, the more times it is mentioned in the enrichment categories. Supplementary Tables in File S1. Table S1a. Table S1b. Table S2. Table S3a. Table S3b. Table S4a. Figure S1.