An Integrated Genome-Wide Systems Genetics Screen for Breast Cancer Metastasis Susceptibility Genes

Metastasis remains the primary cause of patient morbidity and mortality in solid tumors and is due to the action of a large number of tumor-autonomous and non-autonomous factors. Here we report the results of a genome-wide integrated strategy to identify novel metastasis susceptibility candidate genes and molecular pathways in breast cancer metastasis. This analysis implicates a number of transcriptional regulators and suggests cell-mediated immunity is an important determinant. Moreover, the analysis identified novel or FDA-approved drugs as potentially useful for anti-metastatic therapy. Further explorations implementing this strategy may therefore provide a variety of information for clinical applications in the control and treatment of advanced neoplastic disease.


Introduction
Metastasis is an extremely complex process that involves not only tumor-autonomous events but also interactions with local microenvironment and distant tissues. Hundreds or thousands of genes are thought to be associated with metastatic progression [1]; however, the proportion of genes that contribute etiologically to tumor progression is unknown. Identification of genes that contribute mechanistically to metastasis will deepen our understanding of tumor progression and potentially provide novel targets for prevention or improving patient outcome. Because metastatic disease is the major cause of patient mortality and morbidity for patients with solid tumors [2], the ability to prevent or successfully intervene would be expected to demonstrate significant clinical benefit. Hence strategies to accelerate metastasis-associated gene discovery are particularly valuable for understanding terminal stages of neoplastic disease.
Our laboratory previously demonstrated that different inbred mice lineages (S1 Fig) possess different propensities for metastatic disease. The highly metastatic FVB/NJ-TgN (MMTV-PyMT) 634mul model [3] (MMTV-PyMT) was bred to 27 inbred strains and significant suppression of metastasis was observed in the progeny of 12 inbred strains. Since all of the tumors were induced by the same transgene, this suggests that polymorphisms in the genetic background can influence metastatic progression [4] (S1 Fig). Interestingly, projecting the metastatic capacity of each strain onto the mouse phylogenetic tree demonstrated closely related strains can have distinct metastatic capacities (Fig 1A). This observation suggests that single nucleotide polymorphisms (SNPs) which distinguish these closely related inbred strains are enriched for factors causally associated with metastatic progression. As a consequence, genome-wide identification of these SNPs and their associated genes provide a rapid method for capturing significantly more metastasis-associated genes than previous single candidate gene genetic strategies (e.g. Ref. [5]). Herein we describe a novel integrative strategy to identify candidate metastasis-associated genes based on an integrated mouse-human genome-wide inherited susceptibility systems genetics screen.

An integrated subtractive approach to identify metastasis-associated genes
To test this strategy, whole genome sequencing of NZB/B1NJ was performed. This strain was selected because of the availability of pre-existing linkage, expression, and validated metastasis susceptibility gene data, enabling internal validation of the screen [4,5,7,8]. Comparison of metastatic capacity demonstrated a significant suppression of metastasis by the NZB/B1NJ genome compared to the original FVB/NJ background of the MMTV-PyMT animal ( Fig 1B). (Raw data: ERP000927; polymorphism data available at http://www.sanger.ac.uk/resources/ mouse/genomes/). After alignment of the reads to the C57BL/6J mouse reference genome (GRCm38), approximately 5 million SNPs that distinguish NZB/B1NJ were identified, including approximately 54,000 unique SNPs.
A number of observations suggest that a subtractive strategy integrated with epigenetic and transcriptional filters could identify SNPs potentially driving metastatic disease. First, analysis of SNPs associated with phenotypes in human GWAS demonstrated that 88% of these SNPs are intronic or intergenic [9] and 71% fall within DNAse hypersensitivity sites (DHS) [10]  Identifying metastasis-associated genes using an integrated subtractive approach. A) Phylogenetic tree showing the inbred strains used in the original metastasis susceptibility screen and their phylogenetic relationship. Black labels indicate strains that were not significantly different from the original FVB/NJ MMTV-PyMT background. Red labels were strains that show significant reductions in pulmonary metastases. Underlined strains indicate strains sequenced by the Welcome Trust. The phylogenetic tree is adapted from Genetics (2010) v185 pgs 1081-1095 [6]. B) Comparison of the metastatic capacity the MMTV-PyMT FVB/NJ, and NZB/B1NJ genotypes. The p value for the comparison between FVB/NJ and NZB/B1NJ metastatic capacity was calculated by ANOVA analysis corrected for multiple testing across the entire set of strains depicted in panel A. C) Schematic representation of the low stringency subtractive strategy to enrich for genes associated with metastatic progression. Grey peaks represent DNAse hypersensitivity sites. which delineate cis-regulatory elements (promoters, silencers, insulators, enhancers, locus control regions) [11]. Together these data suggest that more than half of GWAS-associated SNPs are associated with polymorphic DHS (pDHS). Consequently, DHS represent less than 1% of the mouse genome, and only 10% of pDHS are associated with nearby transcriptional variation [12].
As most inherited variation is thought to result from changes in gene expression rather than structural changes in proteins, we restricted our analysis to those SNPs within the DHS sites [13] in the well-defined mouse mammary adenocarcinoma 3134 cell line [14,15]. This reduced the number of SNPs for consideration from more than 5 million to approximately 120,000. Further, the SNPs shared between NZB/B1NJ and FVB/NJ [16] (the host genome for the MMTV-PyMT transgene) were subtracted from the overall list to enrich for SNPs likely to be causally associated with the reduced metastatic susceptibility observed in the NZB/B1NJ strain. The total number of NZB/B1NJ SNPs that remained after the subtraction was 54,659 (Fig 1C, S1 Table). The SNPs were then mapped to genes using the Genomic Regions Enrichment of Annotations Tool (GREAT) [17] using the default association rules. A total of 7902 genes were associated with pDHS following this analysis.
Next, the genes were screened across two panels of mouse mammary tumors to enrich for those genes with pDHS that alter gene expression. These panels were generated from crossing the MMTV-PyMT transgenic mammary tumor model and the highly genetically diverse mouse genetic mapping panel, the Diversity Outbred (DO) population [18]. This population is a randomly bred population generated from 8 progenitor inbred strains, including wildderived representatives of the major mouse subspecies, Mus musculus, Mus domesticus, and Mus castaneous. As a consequence, the DO population closely resembles natural populations like humans [19] and is therefore likely to capture much of the heritable expression level variation within the Mus genus.
The two populations of tumors were generated from crosses between the MMTV-PyMT and different subsets of animals from the 5 th generation (G5 N = 131) [20] or 7 th generation (G7 N = 159) of the DO population. Approximately 25% of the total DO population was used to breed with MMTV-PyMT for each generation (45 females out of 175 DO breeding cages). As a result, these two populations of DO mice were expected to carry distinct, yet overlapping combinations of SNPs. Consistent with this, the tumor phenotypes were significantly different between the two mouse populations (ex. G5: 74/129 mice with metastatic disease; G7: 132/161 mice with metastatic disease; p = 4.4x10 -8 ; S2 Fig). We therefore chose to screen the populations separately for metastasis-associated genes.
To investigate the role of polymorphism on transcription, total RNA from the mammary tumors was assayed on Affymetrix ST v1.0 chips. The 7902 genes with polymorphic DHS were tested for significant expression variation across each of the DO populations. Genes that exhibited significant variation (p<0.05) within each DO x PyMT population were assumed to have polymorphisms that functionally affected transcription and were included for further analysis. Genes without significant expression variation across the DO populations were assumed to have SNPs that did not affect gene transcription and were excluded from further analysis. This filter reduced the metastasis-associated candidate gene list to 2810 genes for the DO G5 tumors and 3223 genes for the DO G7 tumors. These differentially expressed genes were then subjected to analysis using BRB ArrayTools survival or quantitative trait tools to identify genes from each of the DO data sets associated with metastatic disease. The resulting screen yielded 4 lists of potential metastasis susceptibility genes ranging from 358-1518 members (See Table 1 and S2 Table). Examination of the signatures indicated that only a minority of the genes were common between the DMFS (distant metastasis-free survival) and metastasis-correlated signatures (S3 Fig), consistent with the two DO populations comprising different combinations of metastasis-associated factors.

Metastasis-associated candidate genes predict outcome in human breast cancer
We next evaluated the performance of this strategy on a genome-wide basis. The prognostic ability of each of the gene signatures was tested on human breast cancer datasets. The signature hazard ratio (DMFS signature) or the correlation coefficient (metastasis-correlated signature) from the mouse data provided weight and direction to each gene to require identical functionality between the two species. The weighted gene signatures were then screened for their ability to discriminate outcome in human breast cancer using the Gene expression-based Outcome for Breast cancer Online (GOBO) tool [21]. Of interest, all four signatures were prognostic in estrogen receptor-positive (ER + ) but not estrogen receptor-negative (ER-) tumors (Fig 2A) in the GOBO data set, suggesting that a significant fraction of genes in the signatures contribute in the same manner to metastatic progression in both species. For a second independent validation, the signatures were then tested on the METABRIC gene expression dataset [22]. The weighted metastasis-correlated signatures were also prognostic in the ER+ subset in this patient dataset ( Fig 2B and S4 Fig). The DMFS signatures did not discriminate outcome in this dataset.
Of note, outcome in the GOBO dataset is distant metastasis-free survival (DMFS) and a large fraction of the data is from adjuvant treatment-naïve patients, similar to the treatmentnaïve mouse populations. In contrast, the METABRIC outcome data is overall survival and includes response to neo-adjuvant, adjuvant, and salvage therapy, all of which can alter the clinical course of disease for a significant fraction of patients [23], and may subsequently change the relative weights of genes within the signature. The weight for the G5 DMFS signature gene was therefore recalculated using half of the METABRIC data (discovery set) and validated on the remaining samples (test set) ( Fig 2C). This analysis indicates that the genes identified by the DMFS screen are associated with tumor progression in mouse and humans, although the relative weight of each contributing gene may not be preserved. Results showed both gene signatures can better discriminate outcome in ER+ versus ER-breast tumors, which is consistent with the MMTV-PyMT being a model for human luminal breast cancers [24,25].
To confirm that these gene signatures were more prognostic then thos generated by chance, three different analyses were performed using the METABRIC data set. In the first set of analyses, the weights were held constant and random genes were assigned for the 656 genes of the metastasis-correlated signature and the 383 genes of the recalculated METABRIC DMFS signature. The process was permuted 1000 times and the performance of the randomly assigned genes was compared to the experimentally derived signatures. Both of the experimentally performed signatures were significantly better at discriminating outcome in the METABRIC data than the weighted signatures for randomly assigned genes (for ER+ tumors p = 0.035 for the metastasis-correlated signature, p <0.001 for the DMFS signature). Next, the identity of genes within the metastasis-correlated and DMFS signatures were held constant, but the weights were randomly generated by the pseudo random generator following normal distribution. This process was also permuted 1000 times and compared to the performance of experimentally derived signatures. The experimentally derived signatures also outperformed the signatures with random weights (for ER+ tumors p = 0.048 for the metastasis- correlated signature, p < 0.001 for the DMFS signature). Finally, we generated random weights with normal distribution and randomly selected gene sets of size 656 and 383 respectively and tested the random signatures 1000 times. Once again, the experimentally derived signatures were significantly better than the permuted data (for ER+ tumors p = 0.028 for the metastasiscorrelated signature, p < 0.001 for the DMFS signature). Of note, the experimentally recalculated DMFS signature was also significantly better than the permuted signatures under all three conditions for patients with ER-as well as ER+ tumors (S3 Table). These data suggest that the genes identified by the integrated mouse subtractive strategy were unlikely to have been implicated with metastatic disease by chance.
Validation of the pDHS screen for metastasis QTL candidate genes To evaluate the subtraction strategy, the gene lists were compared with existing data. Linkage analysis previously identified the presence of a metastasis modifier locus on NZB/B1NJ chromosome 9 [7,8], located 16 to 67 megabases distal to the centromere [5] (Fig 3A), containing approximately 1300 genes:~800 annotated genes and~500 predicted genes. Limiting the subtracted gene lists to this interval further reduced the number of candidates to 6 to 25 genes (Table 2). Encouragingly, one of the DMFS genes was Cadm1, a previously identified metastasis susceptibility gene [5]. For further validation, the gene closest to the modifier peak, Pvrl1, was tested. shRNA knockdowns of Pvrl1 in two independent mouse mammary tumor cell lines was performed. Knockdown of Pvrl1 had inconsistent effects on tumor growth and in vitro cell proliferation, but consistently reduced metastatic disease (S5 Fig). Normalization of metastatic burden by tumor weight to account for differences in tumor weight in vivo still resulted in significant differences between control and knockdown cells (Fig 3B), consistent with Pvrl1 being a tumor progression gene.
To evaluate a candidate gene identified only in the G7 population screen, the gene Zbtb16 was selected (Fig 3A). Zbtb16 was previously implicated as a candidate due to its membership in a proliferation-associated gene network that is predictive of metastatic disease [26]. However, orthotopic implantation of cells with Zbtb16 overexpression did not show any differences in metastatic disease ( Fig 3C) suggesting Zbtb16 might have a tumor cell non-autonomous effect. Zbtb16 knockout animals [27] were therefore bred to MMTV-PyMT animals to generate PyMT + /Zbtb16 +/or PyMT + /Zbtb16 +/+ animals. Zbtb16 heterozygotes showed increased incidence of metastasis (p = 0.04; Fig 3D), but did not display a statistically significant change in metastatic burden ( To further validate this method, the genome-wide gene lists were examined to try and identifying metastasis-associated genes outside of the metastasis-associated susceptibility peaks. Analysis discovered a number of genes previously identified as playing important roles in metastatic disease (e.g., Tpx2 [28], Angptl4 [29], Ezr [30], Txnip [31]), indicating this strategy is capable of identifying putative metastasis genes on a genome-wide scale.

Metastasis candidates implicate molecular functions in metastatic etiology
To identify the cellular and molecular pathways contributing to metastatic disease, Ingenuity Pathway Analysis (IPA) was performed. The most significant canonical pathways for genes associated with metastasis in the G5 DO population was antigen presentation for the DMFS signature (p = 4.7x10 -8 ) and mitotic pathways for the metastasis-correlated signature (p = 1.28x10 -4 -1.07x10 -6 ; S3 Table). These results are consistent with our [5] and other laboratories' [32][33][34] findings showing a significant role for immunity or cellular proliferation in breast cancer progression. In contrast, significant pathways associated with metastatic progression in the G7 DO population included IL-17 pathways for the DMFS gene signature and metabolic and rheumatoid arthritis pathways for the metastasis-correlated genes, including diabetes signaling (p = 7.23x10 -4 -2.3x10 -4 ; S4 Table).
IPA analysis was also performed to identify potential upstream regulatory genes that contribute to prognostic signatures and tumor progression. All four gene sets implicated TGFβ1 as an important upstream regulator that suppresses metastatic disease (p = 3.7x10 -3 -9.3x10 -18 ), consistent with the role of TGFβ in early tumor progression [35]. Estrogen receptor-α and -β, and the progesterone receptor were also identified as significant upstream regulators in both analyses (S5 Table). These results further support the utility of the pDHS strategy in identifying metastasis-relevant pathways and implicate additional pathways for investigation.

The pDHS screen implicates novel drugs for anti-metastatic therapy
Ingenuity Pathway Analysis of the gene lists was then carried out to identify potential clinically actionable targets or drugs for metastatic therapy. Tamoxifen, a current standard of care therapeutic for breast cancer treatment, was associated with suppression of metastatic disease in both the DMFS and metastasis-correlated analyses (S1 Table), suggesting this strategy can identify clinically relevant therapeutics. In addition, the cannabinoid receptor 1 gene, CNR1, was found to be associated with metastatic suppression (S5 Table). Consistent with this possibility, an independent study showed that synthetic cannabinoids suppress tumor growth and metastasis in the MMTV-PyMT model [36].
Moreover, the data suggest that 8-bromo-cAMP, a cell permeable cAMP analog, is associated with metastatic suppression (S5 Table). Increased cAMP levels are a downstream consequence of caffeine (a non-specific phosphodiesterase inhibitor [37]) metabolism and interestingly cAMP was previously shown in our laboratory to be a metastasis-suppressing agent [38]. Treatment of a highly metastatic mouse mammary tumor cell line with 8-bromo-cAMP induced a gene signature that was an independent predictor of both DMFS and overall survival in GOBO datasets (S7 Fig), indicating this pathway may be a useful clinical target.
Finally, the IPA analysis not only implicated the diabetes signaling pathway as significantly associated with metastatic disease, but also a number of agents used to treat diabetes that might suppress metastatic disease. To test this prospect within our experimental systems, metastatic mouse mammary tumor cells were implanted into immunocompetent mice. The mammary tumors were permitted to grow until established (10 days post-injection) and then the mice were treated with rosiglitazone until euthanasia (28 days post-injection). As predicted from the integrated analysis, the rosiglitazone-treated group had 25% fewer pulmonary metastases compared to the control group (Fig 4; p = 0.028), consistent with previous reports for the LMM3 mammary tumor cell line [39].

Discussion
We have integrated quantitative trait genetics, transcriptional epigenetics, gene expression, and computational biology tools in a mouse model of metastatic mammary cancer to identify factors that contribute to inherited susceptibility to breast cancer metastasis. This approach significantly enriched for genes associated with metastatic disease as measured by the generation of gene signatures with prognostic value in human patients. Interestingly, our analyses has resulted in signatures that are prognostic in both early relapse and late relapse clinical scenarios (Fig 2A). The majority of gene expression signatures currently available for human breast cancer patients are limited because they are only associated with early relapse, hence this type of integrated mouse-human analysis could complement existing data as it provides additional insights into the later stages of metastatic disease. Importantly, all of the gene signatures described herein are weighted and must contain genes that transcribe in the same direction with equal relative potency in both mouse and human. The ability of the mouse signatures to maintain prognostic ability in human samples supports the assumption that the underlying biology of metastasis is very similar between these two species. If this were not the case, one would not expect a weighted signature to preserve its prognostic ability across species. These results therefore provide additional support for the application of metastatic mouse models to characterize and identify metastasis genes.
Our results also illustrate an important caveat to this hypothesis, described as such: because genetic and molecular analysis was performed using a treatment-naïve mouse model, the results should represent the natural course of the disease. In contrast, patients are commonly treated using surgical resection of the primary tumor combined with subsequent radiation and adjuvant therapies to reduce the risk of recurrence. Since adjuvant therapy reduces the number of patients who develop metastatic disease by 20-30% [40] and salvage therapy prolongs survival for patients who develop metastasis, the inability of the DMFS signature to predict overall survival in treated METABRIC patients without re-optimization is not surprising. Furthermore, the gene signature difference between treatment-naïve and treated patients may also contribute to the conflicting association results for SIPA1, which is associated with treatmentnaïve DMFS samples [41,42], but not overall survival [43]. Hence accounting for the two different phenotypes (treatment-naïve DMFS vs. overall survival) is an important consideration when translating mouse model data into clinical useful human observations. Furthermore, the gene signatures and associated gene lists derived from this analysis should be enriched for metastasis-associated genes. Prognostic gene signatures are usually derived by correlating gene expression with metastatic disease without distinguishing whether the associated gene expression is etiological or a secondary effect. The requirement for candidate genes to have polymorphisms within putative enhancers and promoters should significantly enrich for genes with primary effects. The ability of this strategy to both re-identify known metastasis susceptibility genes (e.g., Cadm1) and validate novel tumor progression genes (e.g., Pvrl1, Zbtb16) is consistent with this hypothesis. The resulting gene lists will therefore provide greater utility for identifying molecular and cellular pathways associated with the metastatic process.
Finally, the integrated strategy described herein allows for the identification of novel agents with anti-metastatic therapy potential. In addition to re-identifying agents that are the current standard of care (tamoxifen) and validating experimental drugs (cannabinoids [36]), this technique also identified FDA approved anti-diabetic drugs as potentially useful therapeutics. Of interest, orthotopic implantation assays confirmed the ability of one of these drugs, rosiglitazone, to reduce metastatic burden [39]. Investigations of other agents used to treat type II diabetes is also supported by the results obtained in this metastatic mammary tumor model. Increased survival in ER+/HER2+ breast cancer patients [44,45] and triple-negative breast cancer [46] treated with the anti-diabetic drug metformin have been reported, as well as investigations regarding its ability to increase survival in other forms of malignancy [47]. These data support the ability of this integrated analysis to identify potentially useful tools for clinical use and further support the complementary use of animal models systems to understand the complex biology of metastatic disease.

Ethics statement
The research described in this study was performed under the Animal Study Protocols LPG-002 and LCBG-004, approved by the NCI Bethesda Animal Use and Care Committee. Animal euthanasia was performed by cervical dislocation after anesthesia by Avertin.
Whole genome sequencing NZB/B1NJ and NZW/LacJ DNA was obtained from The Jackson Laboratory DNA Repository. Library preparation and sequencing on Illumina HiSeq instruments was performed following the manufacturers recommended protocols. Four lanes of 102 base paired end sequence per strain was performed to achieve approximately 40x coverage. Reads were aligned to the reference genome (GRCm38) using BWA version 0.7.5a-r406 [19451168]. All lanes from the same library were then merged into a single BAM file using Picard tools and PCR duplicates were marked using Picard 'MarkDuplicates' [19505943]. Local realignment was carried out using GATK-v3.0 using default parameters to generate the set of intervals for realignment [20644199]. SNP and indel discovery was performed with the Samtools v1.1 (samtools mpileup -t DP,DV,DP4,SP,DPR,INFO/DPR -E -Q 0 -pm3 -F0.25) and calling was performed with Bcftools call v1.1 (bcftools call -mv -f GQ,GP -p 0.99). Raw sequencing data is available under accession ERP000927.
DNAse hypersensitivity site and whole genome sequence datasets DNase hypersensitive site (DHS) data for the mouse 3134 mammary adenocarcinoma cells was downloaded from the Encode project websites.

Computing environment
All computations were performed on NIH helix/biowulf system, documentation of which is available at https://helix.nih.gov. We used R computing environment, perl scripts, bedtools, and ucsc liftOver for most of the analyses.

Identification of polymorphic DHS sites
The workflow consisted of the following. The UCSC liftOver tool was used to convert between mm9 and mm10 as necessary when using bedtools intersectBed to intersect two bed files. 1) The Encode DHS data were filtered for the regions overlapping with polymorphic sites. Since the DHS data were generated in Genome Build mm9, we used UCSC mm9 snp128 data to restrict the DHS sites. 2) The mice NZB/B1NJ genotype data in vcf was filtered to retain the SNPs that overlap with the DHS present in the 3134 cells. 3) We then removed SNPs in the DHS that are present in the mouse FVB/NJ strain.

Genomic Regions Enrichment of Annotations Tool (GREAT) analysis
BED files containing the subtracted NZB/B1NJ specific, DHS associated SNPs were loaded into the GREAT tool website using the default settings for gene assignment. GREAT calculates statistics by associating genomic regions with nearby genes and applying the gene annotations to the regions. Association is a two-step process. First, every gene is assigned a regulatory domain. Then, each genomic region is associated with all genes whose regulatory domain it overlaps. The default association settings included assignment of basal regulatory elements 5 kb upstream and 1 kb downstream of transcriptional start sites (regardless of other nearby genes). In addition, the gene regulatory domain was extended up to 1 megabase in both directions to the nearest gene's basal domain but no more than the maximum extension in one direction [16].

DO ST array analysis
The Diversity Outcross x MMTV-PyMT G5 CEL files have been deposited in the Gene Expression Omnibus (GEO) database (www.ncbi.nlm.nih.gov/geo) under the accession number GSE48565. The G7 CEL files are deposited under the accession number GSE64522. Data was imported into BRB ArrayTools version 4.3.2 (http://brb.nci.nih.gov/BRB-ArrayTools/) and normalized using the median array as a reference. Batch correction was not performed before BRB ArrayTool analysis. Only genes with a Log Intensity Variation of p< 0.05 were considered for further analysis. Distant metastasis free survival analysis of these data was performed using the Find Genes Associated with Survival tool (DMFS genes). Genes were considered associated with DMFS if p< 0.05. Genes correlated with metastatic disease was performed using the Spearman Correlation Test option of the Find Genes Associated with Quantitative Trait tool (metastasis correlated genes). Genes with a p< 0.05 were considered to be significantly associated with metastatic disease.
Gene signature analysis using the GOBO webtool. To provide direction and relative importance to each gene, weights for each gene in the signature was determined from the DO mouse crosses. For the DMFS signature, the hazard ratios associated with DMFS in the DO G5 cross was log2 transformed to provide direction and relative strength for each gene in the signature. For the metastasis-correlated signature, the correlation coefficient for each gene with metastasis number in the DO G7 data was used. The weighted gene was then uploaded to the GOBO tool and analyzed using the default settings for the ER+ and ER-subsets of breast cancers. P values reported are those generated by the GOBO analytical tool. [48] mammary tumor cells were plated in 6 well dishes. 24 hours later the cells were treated with 500 uM 8-Br-cAMP (Sigma) dissolved in 0.1% DMSO for 24 hours, then RNA harvested using Trizol following the manufacturers recommended protocol. Vehicle control or 8-Br-cAMP samples were performed in triplicate. Transcriptome analysis performed by the NCI Laboratory of Molecular Technology using Affymetrix MOE430 v2 chips. The data was analyzed using the Class Comparison tool of BRB ArrayTools. For the gene signature analysis, the differentially expressed genes were filtered for those with greater than 10-fold change in expression, either up-or down-regulated, compared to the vehicle alone. The fold change in gene expression was used to weight the individual genes, and the ability of the weighted gene signature to discriminate breast cancer patient outcome assessed using the webtool GOBO (http://co.bmc.lu.se/gobo/).

Cell lines
The 6DT1 and Mvt1 cell lines [48] were obtained from the laboratory of Robert Dickson, George Washington University. Microsatellite genotyping validated that these cell lines originated from an FVB/NJ animal. Both cell lines have been demonstrated to be mycoplasma-free.

Pvrl1 knockdown experiment
shRNA lentiviral vectors were purchased from Sigma-Aldrich (cat. # SHCLNG-NM_021424). Stable Pvrl1 knockdown cell lines were generated by lentiviral transduction into Mvt1 cells and knockdown validated by qRT-PCR. 8x10 5 Mvt1 or 10 5 6DT1 cells were inoculated into the four mammary fat pad of 6-8 week old FVB/NJ female mice, 10 animals per group. Animals were euthanized at 5 weeks (experiment 1) or 4 weeks (experiment 2) after implantation. Oneway Anova with Dunnett's correction for multiple testing was performed for each experiment using GraphPad Prism. The results of the replicate experiments were then combined using Fisher's combined probability test. All procedures were performed as approved by the NCI--Bethesda Animal Care and Use Committee.

Zbtb16 validation experiment
Epitope-tagged Zbtb16 over-expression was performed by lentiviral transduction into the cell line Mvt1. Orthotopic implantation was carried out as described above and analyzed using the Mann-Whitney test in GraphPad Prism. Autochthonous tumor assays were performed by breeding MMTV-PyMT male animals to Zbtb16 +/female animals to generate PyMT + / Zbtb16 +/female animals. 20 PyMT + /Zbtb16 +/animals was selected for analysis based on previous experience which suggested this as an appropriate group size to achieve statistical significance. Statistical differences were assessed using the Mann-Whitney test in GraphPad Prism. All procedures were performed as approved by the NCI-Bethesda Animal Care and Use Committee.
Effect of Rosiglitazone treatment on pulmonary metastases 100,000 cells/mouse of the mammary tumor cell line 6DT1 were orthotopically implanted into the fourth mammary fat pad of female FVB/NJ mice (6-8 weeks old). 10 days post-implantation all of the animals were combined into a single cage then randomly assigned to treatment or control group by alternating assignment to new cages. Roziglitazone (100μM) or vehicle (DMSO, 0.17%) was added to drinking water 10 days post tumor implantation and available to mice ad libitum. Drinking water with Roziglitazone or DMSO was refreshed every week. Tumor growth was monitored and animals were euthanized 28 days post implantation. Tumors and lungs were evaluated for weight and surface metastases respectively. The experiment was performed twice and the results of the Mann-Whitney test combined using Fisher combined probability test. All procedures were performed as approved by the NCI-Bethesda Animal Care and Use Committee.

Data access
NZB/B1NJ sequence polymorphism data available data is available at http://www.sanger.ac.uk/ resources/mouse/genomes/ Raw data is available at http://sra.dnanexus.com/studies/ ERP000927. The Diversity Outcross x MMTV-PyMT G5 CEL files have been deposited in the Gene Expression Omnibus (GEO) database (www.ncbi.nlm.nih.gov/geo) under the accession number GSE48565. The G7 CEL files are deposited under the accession number GSE64522.