Transposon mutagenesis identifies cooperating genetic drivers during keratinocyte transformation and cutaneous squamous cell carcinoma progression

The systematic identification of genetic events driving cellular transformation and tumor progression in the absence of a highly recurrent oncogenic driver mutation is a challenge in cutaneous oncology. In cutaneous squamous cell carcinoma (cuSCC), the high UV-induced mutational burden poses a hurdle to achieve a complete molecular landscape of this disease. Here, we utilized the Sleeping Beauty transposon mutagenesis system to statistically define drivers of keratinocyte transformation and cuSCC progression in vivo in the absence of UV-IR, and identified both known tumor suppressor genes and novel oncogenic drivers of cuSCC. Functional analysis confirms an oncogenic role for the ZMIZ genes, and tumor suppressive roles for KMT2C, CREBBP and NCOA2, in the initiation or progression of human cuSCC. Taken together, our in vivo screen demonstrates an extremely heterogeneous genetic landscape of cuSCC initiation and progression, which can be harnessed to better understand skin oncogenic etiology and prioritize therapeutic candidates.


Introduction
Cutaneous squamous cell carcinoma (cuSCC) is the second most common cancer in man, with approximately one million cases diagnosed annually in the United States. Although the majority of cuSCC are considered a low-risk neoplasm, up to 5% of high-risk cuSCCs are locally or distantly invasive and carry a poor prognosis due to a lack of biomarkers, therapeutic targets, or FDA-approved molecularly targeted therapies. This represents a substantial unmet need for approximately 50,000 patients per year with high-risk cuSCC, and an opportunity to identify new therapeutic modalities that could improve disease outcomes. All non-viral associated skin cancers are thought to require multiple cooperating mutations that deregulate distinct signaling pathways to initiate and progress the multi-step transformation of normal cells into a clinically significant neoplasm. Indeed, identifying cooperating mutations that drive malignant transformation is a prerequisite for developing better combinatorial therapies for managing and treating skin cancers. Most skin cancers, including cuSCC [1,2], have the highest mutation rates among human cancers due to ultraviolet irradiation (UV-IR) induced damage from chronic, intermittent sun exposure. Thus, using human cancer sequencing data alone, with some of the highest mutational burdens of any cancer, poses challenges to identify cooperating, low-penetrant mutations that lead to cancer progression. This presents a need to develop in vivo model systems to help identify and prioritize novel cooperating candidate cancer drivers for keratinocyte transformation and subsequent progression to invasive cuSCC.
Sleeping Beauty (SB) insertional mutagenesis [3] is a powerful tool used to perform genome-wide forward genetic screens in laboratory mice for cancer gene discovery [4][5][6][7][8][9][10][11][12][13][14] in animal models of both hematopoietic and solid tumors [12,15]. The SB system employs a DNA transposon that is mobilized throughout the genome by a conditional Cre-induced Datasets Generated: S1-S20 Tables provided in the Supplementary data package submitted with this manuscript. S1 Text, S1-S12 Figs, S1-S20 Tables and S1-S5 Data for this Article are published in the Dryad Digital Repository: Supplementary Data for manuscript "Transposon mutagenesis identifies cooperating genetic drivers during keratinocyte transformation and cutaneous squamous cell carcinoma progression", Mann MB, Aiderus A, Newberg JY, transposase expressed in trans that binds the inverted repeats at the end of the transposon, initiates DNA double-strand breaks and reintegrates the transposon randomly at TA-dinucleotides (reviewed in [5]). The transposon itself contains a minimal internal promoter and splicedonor (SD) and bidirectional splice-acceptor (SA) sequences, such that the SB transposons can either activate proto-oncogenes or inactivate tumor suppressor genes, and identify early cancer progression drivers that cooperate to initiate tumors [11,14] and potentially drive metastasis [16,17]. Importantly, SB insertions induce changes in gene expression, thus providing epigenetic information not easily obtained from carcinogenesis mouse models using chemical [18] or chronic UV irradiation [19,20] or from limited patient samples. We demonstrate that SB mobilization of a low-copy T2/Onc3 transposon allele is sufficient to induce and progress a variety of cancers in vivo. Here, we report our efforts for cancer gene discovery in skin tumors. Using high-throughput sequencing approaches [9,11] and our SB Driver Analysis [21] statistical framework, we profiled genome-wide SB mutations from cuKA and cuSCC and defined recurrently mutated, statistically significant candidate cancer drivers (CCDs) from bulk cuSCC tumors and from normal keratinocytes and early stage tumors, identifying both known tumor suppressor genes and novel oncogenic drivers. We further prioritized oncogene and tumor suppressor candidates and provide in vitro and in vivo functional evidence for the roles of these genes in the initiation and progression of cuSCC. Taken together, our efforts provide a systematic evolutionary landscape of cuSCC genesis, and highlights key pathways promoting disease that are potential therapeutic targets.
The resulting progeny, referred to as SB|Onc3 mice, exhibited whole-body mutagenesis and succumbed to a wide variety of solid and hematopoietic tumor types with variable penetrance (S1 Table). The presence of a Trp53 mutant allele accelerated tumor progression and significantly reduced tumor-free survival relative to Trp53 wild-type littermate controls (P<0.0001, Fig 1A). The SB|Onc3 mice with wildtype Trp53 exhibited a disease penetrance and latency similar to T2/Onc3 cohorts produced with constitutively active SBase [7]. SB|Trp53 KO/+ mice were the earliest to reach endpoint, but had the lowest overall tumor burden, presumably because these mice had the highest proportion of hematopoietic disease. SB|Trp53 R172H/+ mice reached endpoint earlier than SB|Trp53 +/+ mice and had the widest spectrum of solid tumors. The most prevalent of the more than 20 solid tumor types observed were early-stage cutaneous keratoacanthoma (cuKA) (Fig 1B) and well-differentiated cutaneous squamous cell carcinoma (cuSCC) (Fig 1C and 1F). Importantly, these cuSCC/cuKAs were absent in mice with mutant Trp53 lacking SB, strongly suggesting SB mutagenesis drives initiation of cellular transformation and subsequent cancer progression of cuSCC in vivo.

SBCapSeq defines recurrent transposon events in normal, premalignant, and tumor skin cell genomes
To investigate the genetic events underlying the SB-driven keratinocyte transformation and cuSCC progression in wildtype and Trp53 mutant mice, we sequenced SB insertions sites from 60 histologically confirmed cuSCCs, 11 cuKA and 32 normal skin genomes (S2 Table) using an enhanced version of SB Capture Sequencing (SBCapSeq) from our previously published workflow [9] that is optimized for liquid-phase capture of transposons from both normal and tumor specimen genomes. Using the SBCapSeq bioinformatics workflow [9] and gene-guided SB Driver Analysis statistical framework [21], we identified 11,113,694 reads at 59,337 nonredundant transposon insertion sites from cuSCCs which were used to statistically define 1,333 candidate cancer driver genes (S3 Table). Nearly all CCDs were identified across all cohorts, suggesting that their role in driving cuSCC is independent of Trp53 mutation. Stringent filtering of the SB insertions to include only events with the highest sequencing read depths defines trunk driver alterations predicted to be under positive-selection within the clonally expanding tumors, where they participate in either initiation or early progression of cellular transformation [9,11]. To identify CCDs that drive keratinocyte transformation, we selected the SB insertion sites with the highest number of sequencing reads to statistically define trunk drivers as previously described [21,25], reasoning that tumor initiating insertions would show the highest frequency in tumor cells (see Methods). We identified 86 trunk drivers (S4 Table). The top two most significant trunk drivers in cuSCC genomes were the paralogous transcriptional co-factors Zmiz1 and Zmiz2, which are collectively and mutually exclusively mutated in 71% of end-stage SB|cuSCC genomes (S2 Fig). Using the same methodology, we next analyzed cuKA specimens harvested either from mice with cuSCC or from an independent age-and genotype-matched cohort and identified 2,085,280 reads at 6,727 non-redundant transposon insertion sites, defining 192 candidate cancer driver genes (S5 Table) and 10 trunk drivers (S6 Table) transposon insertion sites from normal skin specimens with statistically defined enrichment in 547 genes (S7 Table). No trunk drivers were identified. The histologically normal skin genomes from mice with SB mobilization had a mean of 1,320 insertion events (range 1--3,857). Notably, fewer than 1% of the insertions had read depths >300 reads per site (range 1--2,297), which is one order of magnitude less than 23.7% of cuKA genomes or 8.5% of cuSCC genomes with detected insertion events at >300 reads per specimen (cuKA range 1-19,798 and cuSCC range 809). An oncoprint illustrates the read-depth and insertion patterns that define the trunk drivers in cuSCC and cuKA (Fig 2), comparing the incidence of selected insertions in these same drivers in normal skin. Together, these data strongly suggest that Landscape of trunk drivers mutated during SB-induced keratinocyte transformation and cuSCC progression using SBCapSeq. SB Trunk Driver Analysis, with read depth cutoff of 300, was performed using SBCapSeq insertion data from the SB|cuSCC and SB|cuKA tumor cohorts (the normal skin cohort did not identify any significant trunk drivers). Drivers with significant family-wise error rate (FWER) adjusted P-values were merged into a single list and plotted from each of the cuSCC, cuKA and normal skin specimens by occurrence within the cuSCC cohort. Vertical colored bars represent genomes (columns) with SB insertion events that occur within the same (sense, blue) or opposite (anti-sense, red) DNA strand relative to transcription of a driver gene (row). High read depth sites (>299 reads supporting an SB insertion event, blue or red) are distinguished from low read depth sites (<300 reads supporting an SB insertion event, light blue or pink) to denote clonal and sub-clonal SB insertion events, respectively (S1, S2, and S3 Data). Driver gene classifications (indeterminate, activating or Inactivating) are shown as a bar at the end of a gene on a cohort specific manner where calculation was possible. NIB, normalized insertion burden from highest (black) to lowest (light gray). clonal selection of recurrent, high-read depth SB insertion sites is featured prominently within the cuSCC and cuKA genomes and almost entirely missing within the normal skin genomes. Further evidence for clonal selection associated with disease progression was provided by technical replication of SBCapSeq libraries, demonstrating excellent biological reproducibility (sequencing of two independent library preparations from single isolation of genomic DNA) of cuSCC genomes (S3A and S3B Fig) and no reproducibility in normal skin genomes (S3C and  S3D Fig), illustrating the absence of clonal expansion in the normal samples despite the high insertion burden. Importantly, replicate sequencing of a single library preparation from two normal skin samples demonstrated excellent technical reproducibility (S3E and S3F Fig), further supporting the robustness of our methodology to detect clonal selection. These observations are consistent with the large numbers of mutations observed during chronic UV-IR exposure in human eyelid [26] and SKH hairless mouse [19,20] skin and, together with our SB data, indicates that keratinocytes within phenotypically normal skin can tolerate high background mutagenic insults that may act as a mutational reservoir prior to clonal selection of cooperating driver mutations that permit frank outgrown of cuSCC in vivo. This is likely the reason that no trunk drivers could be identified from the normal skin specimen genomes. The quantitative, stage-dependent insertional mutation burdens derived from SBCapSeq in cutaneous specimens reveal incredible levels of tumor heterogeneity in SB-driven keratinocyte transformation and progression to cuSCC.

cuSCC demonstrate clonal and sub-clonal intratumor heterogeneity
We identified four skin masses with histologically continuous but distinct regions of cuKA and cuSCC, suggesting for the first time that cuKA may progress directly to invasive cuSCC in vivo. Examples of gross and histological sections are shown in Fig 3A, 3B, 3C, and 3D. We performed SBCapSeq on representative, histologically distinct regions to investigate intratumor heterogeneity and to gain insight into the drivers likely to mediate cuKA to cuSCC progression. Regional sequencing gave an overall lower sequencing depth than our bulk tumor sequencing. Therefore, we adjusted the minimum read-depth threshold to 200 reads for defining clonal insertions. For several loci, we identified insertions at the same nucleotide address across tumor regions from both cuKA and cuSCC in the same mouse, indicating clonal origins of these lesions (S8 Table). Hierarchical two-dimensional clustering of recurrent genic SBCap-Seq insertion data from four skin masses containing distinct cuKA and cuSCC regions is shown in Fig 3E. Invariably, clonal insertions into Zmiz1 were identified across each section of all 4 masses at three different TA-dinucleotide addresses, suggesting that Zmiz1 was under positive selection within the nascent transformed cells that gave rise to the distinct cuKA and cuSCC features. We also found evidence for sub-clonal insertion events identified at the same nucleotide address in some but not all regions within the same tumor, demonstrating the evolution of the tumor. Some genes with sub-clonal insertion events with read depths greater than 200 (see Methods) were defined as drivers in the bulk population, including Tcf4 and Pak2, while other genes, such as Egr2 and Rab3gap1, were not (Figs 3F and S4). These data provide evidence for a common keratinocyte progenitor that acquires SB insertions in genes that promote cuKA and are maintained in cuSCC. This analysis also highlights the intra-and intertumor heterogeneity of insertions in SB cuSCC tumors and importantly demonstrates that SB insertion profiles can be used to trace the evolution of cuKA to cuSCC using both the insertion nucleotide address and read depth to define clonal relationships.

Comparative oncogenomic meta-analysis
One hundred-seven of the SB candidate Trunk driver genes are direct orthologs of human cancer genes found in the Cancer Gene Census (GRCh38 COSMICv86) [27], a growing catalogue of mutations causally implicated in cancer, which is an enrichment greater than expected by chance (χ 2 = 60.75 with Yates correction, P<0.0001; S1 Text). Another 64 SB candidate driver genes, including ZMIZ1, ZMIZ2, and KMT2C, have human orthologs with recurrent nonsilent mutations identified by exome sequencing [1,2,28] of 68 cuSCC genomes (χ 2 = 23.70 with Yates correction, P<0.0001; S1 Text). To investigate whether the human homologs of the mouse SB events are gained or lost in human tumors, we interrogated the limited data for CNAs in cuSCC tumors published for 40 tumors [29] which identified new and confirmed known recurrent chromosomal gains and losses in cuSCC. We found that KDM4C lies within the region of Chr9p frequently lost, while NOTCH1 (Chr9q), RASA1 (Chr5q) and KMT2C (Chr7q) lie within gained chromosomal regions. KMT2C is a known tumor suppressor in Multi-region SBCapSeq analysis of dual histology keratinocyte-derived skin masses containing distinct cuKA and cuSCC histological regions. Two individual mice displayed lesions with both cuKA and cuSCC histologies suggestive of clonal evolution of cuSCC differentiation within cuKA masses. Aperio slide scans of H&E stained FFPE specimens and the adjacent flash frozen tissue for specimen 1 (A-B) and specimen 2 (C-D) prior to sectioning for genomic DNA isolation. Yellow dotted lines demarcate the distinct lesions within each section that were sampled for sequencing SB insertions. Hierarchical two-dimensional clustering (Hamming distance with the Ward method) of recurrent genic SBCapSeq insertion data for specimen 1 (E) and specimen 2 (F) demonstrates trunk (common to all samples) and sub-clonal, lesion-specific SB insertions (S4 Data). Genomic DNA was isolated from adjacent serial sections and included a "bulk" reference consisting of a cross-section of entire mass, and three or four regions dissected from histologically distinct regions.
https://doi.org/10.1371/journal.pgen.1009094.g003 several solid tumors. Interestingly, copy-number analysis of cuSCC lymph node metastases [30] showed that NOTCH2 (Chr1p12) and CREBBP (Chr16p13) lie within chromosomal regions lost, while the chromosomal region containing TGFBR2 (Chr3q22) can be either lost or gained, and genomic regions containing NOTCH1 (Chr9q), TCF7L2 (Chr10q25) and AKT3 (Chr1q) are gained. 13 genes, including PTEN and TGFBR2, two well-documented tumor suppressor genes, contain mutations in cuSCC lymph node metastases, while PTEN is also lost in cuSCC [29]. It is worth pointing out that missense and nonsense mutations in NOTCH1 are frequent, early events in cuSCC [2], suggesting that amplified genomic regions may contain genes that are deregulated through other mechanisms. Further, NOTCH2 is frequently mutated in cuSCC. From these data, we conclude that, while the human homologs of the SB events can be deleted in cuSCC, it appears that missense and nonsense mutations are much more frequent events. We conclude from these findings that SB identifies the same genes mutated by the diversity of mutational processes operating within end-stage human cuSCC genomes. Our insertion data implies that some of these mutations may be inactivating in human SCCs and warrant further investigation.
To gain insight into the biological functions of the candidate discovery drivers, we used a curated subset of the Enrichr analysis platform [31,32] gene-set libraries for pathways and gene ontologies (see Methods) to perform biological pathway and process enrichment analysis from the cutaneous specimen cohorts. For this analysis, we first derived an integrated cuSCC driver list from the 60 SB|Onc3 cuSCCs sequenced by SBCapSeq with drivers defined from an additional 23 SB|Onc3 cuSCCs sequenced using Roche 454 and analyzed with our SBDriver Analysis pipeline (S1 Text). We also derived a composite list of cuKA drivers from two different cuKA datasets, one from cuKAs derived from SB|Onc3 mice sequenced by SBCapSeq and the second from cuKAs isolated from a previously published mouse model with K5-Cre; Pten CKO/+ and SB|Onc2 or SB|Onc3 mice [13,33] sequenced using Roche 454. Using these integrated driver lists, we discovered statistically significant enrichment of recurrent SB candidate drivers in many signaling pathways and biological processes previously associated with human SCC, including EGFR, MAPK, NOTCH, and WNT-TGFβ (S5 Fig and S9 Table). Notably, we also observed that all SB|cuSCC tumor genomes have at least one defined driver involved in chromatin modification, consisting of chromatin and histone modification enzymes, as well as Hippo pathway genes. Further, the SB|cuSCC Trunk drivers encode proteins with significantly more known protein-protein interactions (PPI) than expected by chance (P = 8.55 × 10 −6 , STRING enrichment analysis; number of nodes: 144, number of edges: 124, expected number of edges: 82) as do the SB|cuKA trunk drivers (P = 6.93 × 10 −5 , STRING enrichment analysis; number of nodes: 37, number of edges: 14, expected number of edges: 4; S6 Fig). PPIs among drivers identified in SB-driven cuKAs suggests that perturbation of NOTCH signaling may be an initiating or acquired event in cuKA.
Several of the top 30 most frequent drivers in SB cuSCC are key players in multiple pathways, suggesting that SB perturbation of multi-functional genes drives cuSCC. An oncoprint aligning the conserved drivers between SB cuSCC, SB cuKA and mutated human homologs in cuSCC that map to enriched pathways is shown in Fig 4 Comparative genomic integration of the SB defined cuSCC and cuKA drivers with human cuSCC genomes reveals that while differences in the incidence of individual genes across cohorts can vary substantially, the biological pathways are conserved (S5 Fig and S9 Table). These differences may be due in part to the types of alterations produced by SB transposition (gene activation/inactivation) versus UV-IR (silent/non-silent mutations) or, alternatively, may reflect the complexity of the systems of genetic networks [34] that exists between driver genes and keratinocyte transformation phenotypes across species.
We also sought to investigate the oncogenic potential of ZMIZ1 and ZMIZ2 in SCCs using gene expression analysis from patient samples. Because there are no publicly available expression sets for cuSCC, we utilized human TCGA head and neck SCC (hnSCC) datasets, for which gene expression (RNA-seq) and patient survival outcomes are available, as a proxy for SCCs. We performed a metagene expression analysis of ZMIZ1, identifying a 23-gene signature that links high expression with poor outcomes in human hnSCC patients with a correlation threshold of 0.65 (Cox Proportional Hazards Regression, P = 0.0195; Log-rank Test, P = 0.028 at 50% quintile; S7 Fig).
From our SB-driven cuSCC model, we observed that all Zmiz-mutated cuSCC genomes contained one or more inactivating SB mutations in chromatin-remodeling genes (Fig 4), suggesting this class of mutated tumor suppressor genes may contribute significantly in driving cuSCC progression. Four of the top-ranked chromatin remodeling trunk driver orthologs were also found to be significantly mutated in >15% of human cuSCC genomes [1,2]. Comparative oncogenomic landscape of candidate driver mutations and common altered pathways in mouse and human cuSCC genomes. Integrated Oncoprints sorted by biological pathway or process discovery significance within the SB|cuSCC dataset and displaying the mutation burden across SB-induced mouse cuSCC (n = 84) or cuKA (n = 62) and human UV-IR-induced cuSCC genomes (S1, S2, and S3 Data and S1 Text). https://doi.org/10.1371/journal.pgen.1009094.g004 Importantly, as observed in mouse tumors (Fig 2), mutations in the orthologs KMT2C (MLL3), ARID1B (BAF250B), NCOA2 (KAT13C), and CREBBP (CBP/KAT3A) were also found in the same cuSCC genomes harboring ZMIZ mutations (Fig 4), suggesting they might cooperate during human keratinocyte transformation or tumor progression. Previously, KMT2C (MLL3) mutations were found to be associated with poor outcomes in aggressive cuSCC [1], suggesting mutations in this gene may be a biomarker for advanced stage disease.

Gene expression alterations associated with Zmiz1 ΔN185 in cuSCC genomes
SB activating trunk insertions in Zmiz1 or Zmiz2 are unique to SB-induced cuSCC tumors and are predicted to activate gene expression by driving expression of specific N-terminal truncated forms of Zmiz1 ΔN185 [7] and Zmiz2 ΔN184 , a closely related paralog (S2A and S2B  Fig). The likelihood of achieving unique, sense-strand oriented SB insertion events into two related genes at the same protein coding region on two different chromosomes in dozens of tumors from independent mice by chance is incredibly small (P<0.0001), suggesting both trunk driver ZMIZ oncoproteins contribute directly to keratinocyte initiation and tumor progression in vivo. The ZMIZ1 ΔN185 and ZMIZ2 ΔN184 onco-proteins share 62% sequence identity and retain both the zinc finger PIAS, proline-rich transactivation and AR binding domains. Importantly, both oncoproteins lack the N-terminus in the full-length protein, which is thought to inhibit the intrinsic transcriptional activity of a C-terminal proline-rich transactivation domain [42].
To analyze the expression of SB insertion-driven transcripts for Zmiz1, Zmiz2 and other drivers (SBfusion reads [9]) in cuSCC genomes identified by high-insertion read depth from our genomic sequencing results, we sequenced ribo-depleted, whole-transcriptome RNA-seq (wtRNA-seq) libraries created from 7 cuSCC tumors with activating SB insertions into either Zmiz1 (n = 6) or Zmiz2 (n = 1) (S10 Table). The wtRNA-seq reads contained spliced chimeric transposon splice-donor or splice-acceptor sequences fused to adjacent known exons in the mouse genome. In total, we identified 2,906 SBfusion wtRNA-seq reads from 889 unique RefSeq genes (S11 Table), including 154 progression and 27 trunk cuSCC drivers, respectively (S12 and S13 Tables), which is substantially more than expected by chance (Chi-square test with Yates correction, χ 2 = 159, P<0.0001). Thirty-three genes with SBfusion events were supported by �5 SBfusion reads, including our top trunk drivers Zmiz1 and Zmiz2 and several keratins (Fig 5A). We observed a positive correlation between clonal activating insertions with high read-depth SB insertions (Fig 5B) and high expression of SBfusion reads in Zmiz1 ( Fig  5C). Importantly, in each tumor, SBfusion reads containing either SB-Zmiz1 or SB-Zmiz2 spliced exons were represented in the top 2% of SBfusion transcripts and were supported by multiple independent SBfusion reads (S14 and S15 Tables). Further, Zmiz1 or Zmiz2 gene expression was significantly increased within each of the 7 cuSCC genomes (≳100 and up to 300 times normal expression levels) (Figs 5Dand S9A, S9B, S9C, S9D, S9E, and S9F). This strongly suggests that the SB insertion events driving either Zmiz1 or Zmiz2 have been clonally selected to increase oncogenic expression of the N-terminal truncated forms during keratinocyte transformation and cuSCC progression in vivo. We interrogated the expression of SBfusion reads from 8 additional genes in this tumor subset with high read depth insertion sites (Fig 5C). Epha1 and Wnt7b had activating SB insertions with high SBCapSeq read depths and wtRNA-seq SBfusion reads resulting in significantly higher gene expression (≳100 times) compared to basal expression levels observed in tumors without insertions or SBfusion reads (Figs 5D, S8E, and S8F). Since SB insertions in Epha1 and Wnt7b were not recurrent at the population level, these genes were not identified as statistically significant drivers in cuSCC. However, the integrated genomic and transcriptomic analysis strongly suggests that each of these genes may have a 'private' role in driving tumor progression in the genomes in which they occur with elevated expression levels. Both genes encode proteins that are linked to cancer hallmarks-Epha1 is an ephrin receptor subfamily of the protein-tyrosine kinase family [45] and Wnt7b is a WNT family member [46]. For the 6 predicted inactivated genes, Nf1, Adam10, Oxsr1, Map2k4, Cul5, and Cep350 we observed reduced gene expression that correlated with high SBCapSeq read depth of inactivating SB insertions and truncated SBfusion reads (Figs 5E, and S9A, S9B, S9C, S9D, S9E, and S9F). Adam10, Cul5, and Nf1 are Trunk Drivers and Map2k4 is a Progression Driver in cuSCC cohorts. Oxsr1 and Cep350 were private selected events in individual tumors that may operate in cuSCC progression within the tumors that harbor clonal insertion events. Taken together, these data demonstrate that integrated genomic and transcriptome analyses may provide a means to discover significant cooperating drivers in cuSCC genomes from bulk specimens.
Next, to investigate the transcriptional signatures associated with SB-driven Zmiz1 ΔN185 in end-stage cuSCC, we selected 13 cuSCCs with activating Zmiz1 insertions and 9 end-stage cuSCCs lacking insertions in Zmiz1 to perform microarray analysis. We confirmed that SB insertions drive increased expression of truncated Zmiz1 by RT-PCR and by normalized microarray data analysis, where cuSCC tumors with Zmiz1 insertions were found to have significantly higher mRNA abundance than tumors without Zmiz1 insertions (Fig 6A and S16 and S17 Tables). We also observed a positive correlation between high-read-depth SB insertions among the 13 tumors harboring trunk Zmiz1 insertions and increased normalized Zmiz1 gene expression (Fig 6B). Differential expression analysis of the microarray data using limma from tumors with and without SB insertions in Zmiz1 identified 432 probe sets from 355 genes significantly differentially expressed genes with FDR adjusted P<0.2 (S18 Table). A heat map of normalized expression plots for the top 25 most differentially expressed genes is shown in Fig 6C. Differential expression plots for the top 8 most differentially expressed genes (P FDR <0.05) is shown in Fig 6D. Mtdh1 encodes metadherin, also known as AEG-1, a downstream target of Ras and c-Myc, and MTHD1 is upregulated in many cancers [47]. Set encodes the SET nuclear proto-oncogene, which interacts with several histone modifying proteins, including CREBBP, a driver identified in our SB cuSCC tumors and KMT2A, a family member of KMT2C, also identified as a driver in our screen. Taken together, our microarray gene expression analysis demonstrates that the ZMIZ1 ΔN185 isoform may be a critical mediator of gene expression changes essential for early keratinocyte transformation and sustained during cuSCC progression to end-stage tumors.

ZMIZ1 ΔN185 expression transforms human cutaneous keratinocytes in vitro
The selection of SB insertions promoting up-regulation of a truncated Zmiz1 transcript suggests that ZMIZ1 ΔN185 may be an early transforming event during cuSCC development. To test this hypothesis, we first generated stable overexpression of either ZMIZ1 or the truncated ZMIZ1 ΔN185 in the spontaneously immortalized human cutaneous keratinocyte HaCaT cell line Individual gene plots may be found in (S14 and S15 Tables). (E) Reduced gene expression in cuSCC masses with (+) high read depth inactivating SB insertion events among 6 candidate tumor suppressor drivers compared with normal gene expression levels in cuSCC tumors without (-) SB insertions. Individual gene plots may be found in (S14 and S15 Tables).
https://doi.org/10.1371/journal.pgen.1009094.g005 [48]. Interestingly, we noted that ZMIZ1 ΔN185 was more highly expressed at the protein level compared to full-length ZMIZ1, despite similar transduction efficiencies (Fig 7A). This observation is in line with published data by Rogers et al. 2013, where they expressed ZMIZ1 ΔN185 with greater stability than full-length ZMIZ1 in various cancer cell lines. Next, we assessed anchorageindependent growth as an in vitro surrogate for the earliest stages of tumor development in HaCaT cells stably transduced with empty vector, full-length ZMIZ1 or ZMIZ1 ΔN185 . Soft agar assays showed that ZMIZ1 ΔN185 expression resulted in significantly increased colony formation compared to empty vector or full-length ZMIZ1 (P = 0.0004, one-factor ANOVA, Figs 7B and 7C). These data suggest that ZMIZ1 ΔN185 has a distinct function from full-length ZMIZ1 and that it promotes transformation in an oncogenic manner. Next, we investigated whether ZMIZ1 functions as an oncogene in established cuSCC. We stably knocked-down total ZMIZ1 and showed a significant decrease in cell proliferation with ZMIZ1 depletion compared to a controlled scrambled shRNA at 96 hours for cuSCC cell lines A431 (Fig 7D and 7E, 2-factor ANOVA overall P = 0.0045; FDR-corrected q-values for multiple testing q<0.0001) and COLO16 (Fig 7F and 7G, 2-factor ANOVA overall P = 0.0005; FDR-corrected q-values for multiple testing q<0.0001). We showed similar results when we knocked down the ZMIZ1 paralog ZMIZ2 in each cell line [49]. Taken together, these data suggest that ZMIZ1 ΔN185 is an oncogene that drives transformation of human cutaneous keratinocytes in vitro, and that ZMIZ1 functions as an oncogene in established cuSCCs. Further experiments are needed to address the direct role of ZMIZ1 ΔN185 in promoting cuSCC initiation.

Knockdown of chromatin modifiers drive human cutaneous keratinocyte transformation
Chromatin remodeling is one of the most frequently inactivated biological processes in SCC, and has been demonstrated to be involved in either tumor initiation or progression in multiple cancer types [50][51][52][53][54][55]. In our SB cuSCC screen, chromatin remodeling was the top biological process that was significantly enriched (S5 Fig and S9 Table). However, it remains unclear how expression alteration of chromatin remodeling genes function in the cuSCC disease trajectory. We prioritized three tumor suppressor genes with high concordance between SB-cuSCC and human cuSCC involved in histone modification (KMT2C and CREBBP) or transcriptional regulation (NCOA2) to study their roles in keratinocyte transformation using immortalized and minimally transformed HaCaT keratinocyte cell line [48]. First, we confirmed expression of each gene within HaCaT cells [49]. Next, we used shRNAs to assess whether loss of these tumor suppressors can promote cellular transformation in vitro as a surrogate for tumor initiation. Forced depletion of CREBBP significantly increased cellular proliferation relative to control (Fig 8A, 1-factor ANOVA, P<0.05) and a similar trend was observed with KMT2C knockdown. No significant difference in proliferation was observed between NCOA2 knockdown and control (Fig 8A). To more stringently investigate transformation, we assessed anchorage-independent growth, a hallmark of cellular transformation, using a soft agar colony formation assay (Fig 8B). Knockdown of CREBBP (Fig 8C) or KMT2C (Fig 8D) expression resulted in larger and significantly more colonies relative to control while no significant difference was observed between NCOA2 knockdown and control (Fig 8E). Taken together, these data suggest that knockdown of CREBBP or KMT2C can transform the immortalized HaCaT cell system in vitro.

Knockdown of chromatin modifiers promote cuSCC tumor growth in vivo
Next, we investigated whether down-regulation of CREBBP, KMT2C or NCOA2 could promote tumor progression in cuSCCs. First, we used shRNAs to stably knockdown these genes in established cuSCC cell lines A431, COLO16 and cuSCC13. We showed down regulation of KMT2C and NCOA2 for A431 (Fig 9A), COLO16 (Fig 9B) and cuSCC13 (Fig 9C) by qPCR relative to cells transduced with a non-targeting control shRNA. Stable knockdown of CREBBP was achieved in A431 and COLO16 cells (S10A and S10B Fig). Next, we performed proliferation assays and demonstrated a significant increase in A431 cell proliferation with down regulation of KMT2C and NCOA2 at 96 hours compared to control (Fig 9D, 2-factor ANOVA overall P = 0.0005; FDR-corrected q-values for multiple testing q<0.0001 for KMT2C; q = 0.0019 for NCOA2). Knockdown of NCOA2 in COLO16 cells trended towards increased proliferation (Fig 9E) but it was not significant. However, proliferation was significantly increased with KMT2C knockdown at 96 hours compared to control (Fig 9F, FDR-corrected q-values for multiple testing q = 0.01). No change in proliferation was observed in A431 or COLO16 cells with CREBBP knockdown (S10C and S10D Fig). This is in contrast to the increased colony formation with CREBBP depletion in HaCaT cells. Finally, proliferation in SCC13 cells was significantly increased with KMT2C or NCOA2 depletion compared to control (Fig 9F, 2-factor ANOVA overall P = 0.008; FDR-corrected q-values for multiple testing q = 0.06 for KMT2C; q<0.0001 for NCOA2). We then determined whether CREBBP, KMT2C and NCOA2 function as cuSCC TSGs in vivo and performed xenograft assays by subcutaneous injection of cells for each of the three cuSCC cell lines into NSG immunodeficient mice. CREBBP depletion in A431 cells did not result in a change in tumor volume (S10E Fig) and this gene was not assayed in vivo using the other two cuSCC cell lines. All three cuSCC cell lines demonstrated enhanced in vivo tumor growth with KMT2C or NCOA2 depletion compared to cells with a non-targeting control shRNA (Fig 9G and 9H and 9I, 1-factor ANOVA, q<0.05). Pictures of cuSCCs harvested at necropsy at defined time points of 25 days for A431 (S11A Fig), COLO16 at 18 days (S11B Fig) and SCC13 at 49 days (S11C Fig) show the individual tumor volumes for the control and targeted knockdown cohorts. Together, these data provide functional evidence that KMT2C and NCOA2, but not CREBBP, are human TSGs that promote in vivo cuSCC progression.

Discussion
In this study, we sought to define the cooperating genetic events required for keratinocyte transformation and progression to frank cuSCC using the SB transposon mutagenesis system in vivo. Using an Actb-Cre transgene to activate systemic expression of the transposon, we observed tumor lesions in multiple organs. The kinetics of tumorigenesis were further accelerated in mice harboring either null (Trp53 KO/+ ) [22] or recurrent point mutant (Trp53 R172H/+ ) [23] alleles. This observation is consistent with previous studies showing that missense mutations in Trp53 promote malignant transformation by abrogating both the tumor suppressive and promoting the oncogenic (gain-of-function and/or dominant negative) functions of Trp53 in vivo [23,[56][57][58][59]. We and others have also documented that in wild type and heterozygous sensitizing mutant mice that present with solid tumors, a much greater proportion of inactivating tumor suppressor SB insertion patterns predominate [21,25]. While the SB|Onc3 mice developed a spectrum of tumor types, including of the hematopoietic lineage, we observed an enrichment for well-differentiated cuSCC.
Despite the high curative rate for cuSCC, there is a significant population of patients who develop advanced disease for which there are no defined therapeutic targets. Due to the high UV-induced mutation burden, combined with the scarcity of clinical samples, it is challenging to define the recurrent and druggable targets within these tumors. To address this need, we isolated early and advanced lesions from a SB mouse model of cuSCC to define recurrent drivers and potential new avenues for therapeutic intervention. From our in-depth sequencing analysis of SB insertions in advanced cuSCC genomes, we identified clonal and sub clonal drivers involved in the progression and maintenance of cuSCC. 64 and 17 drivers have human orthologs with non-silent mutations or copy number loss in human cuSCC genomes, respectively. Intriguingly, the majority of drivers in SB-driven cuSCC lesions were predicted tumor suppressor genes, suggesting that cumulative loss of these genes is a central feature of cuSCC. We showed that a few key TSGs, including KMT2C, are differentially expressed in cuSCC with prognostic implications for patients with hnSCC.
We subsequently focused on understanding the genetic events required for keratinocyte transformation. Our SBCapSeq and SB Driver analysis identified 349 CCDs, regardless of Trp53 status, of which 107 have been implicated in human cancers. Intriguingly, we observed that the SB-driven cuSCC lesions were driven almost exclusively by SB inactivating insertions in tumor suppressor genes, suggesting that cumulative loss of these TSG drivers is a central feature of cuSCC. Indeed, recent analysis from our lab, using a complimentary promoterless SB transposon forward genetic screen, revealed that tumor suppressor drivers alone can lead to keratinocyte initiation and cuSCC progression in vivo [49]. We defined two mutually-exclusive paralogous oncogenic drivers, Zmiz1 and Zmiz2 among the most recurrent drivers and our report is the first to identify Zmiz2 as a significant candidate cancer gene in any SB study and the first to discover mutual exclusivity among activated Zmiz1, Zmiz2, and Mamld1 in early keratinocyte transformation and cuSCC progression. Insertions into Zmiz1 [60][61][62] and Mamld1 [63] have been previously observed in skin tumors induced by transposon insertional mutagenesis. The possible redundancy of ZMIZ oncoproteins in the keratinocyte transformation program, by mutual exclusivity of trunk insertions within cuSCC genomes, strongly suggests that the Zmiz1 ΔN185 and Zmiz2 ΔN184 oncoproteins may have common neomorphic properties that contribute to the hallmarks of skin cancer. The work of Rogers et al. [62] and more recently Mathios et al. [60] on the epigenetic regulation of ZMIZ1 expression in human cancer, together with our finding that ZMIZ1 and ZMIZ2 are mutated with at least one nonsilent alteration in at least one third of published human cuSCC genomes [1,2] suggest that these genes may be an important initiating trunk mutation in human keratinocyte initiation and cuSCC progression. We confirmed this hypothesis, showing transformation of the immortalized keratinocyte HaCaT cells stably transduced with the N-terminally truncated ZMIZ1 ΔN185 isoform and decreased cellular proliferation in established cuSCC lines with ZMIZ1 or ZMIZ2 depletion.
Intriguingly, all SB cuSCC tumors with Zmiz1/2 insertions had inactivating insertions in at least one gene involved in chromatin remodeling, which suggests that changes in epigenetic regulation are necessary to progress a benign lesion to frank cuSCC. Our functional data suggests that CREBBP is involved in the earlier stages of keratinocyte transformation, while decreased NCOA2 expression can drive disease progression. COMPASS complex members Kmt2c and Kdm6a and the switch/sucrose non-fermenting (SWI/SNF) genes Arid1b, Arid1a and Pbrm1 were recurrently altered in our SB-cuSCC tumors. Although the COMPASS and SWI/SNF pathway members have been reported to be mutated in various cancers [64][65][66][67], whether alterations in one or cooperation of both pathways is required for the etiology of the disease remains unclear. We found that knockdown of KMT2C, a key molecule in the COM-PASS complex, in human cuSCC cell lines accelerated in vitro proliferation and in vivo xenograft growth, supporting a role for this gene as a tumor suppressor in advanced disease. Of clinical significance, several recent studies have highlighted potential dependencies amenable to pharmacologic inhibition in cancer cells with aberrations in COMPASS members [67].
Our experimental approach for driving high mutational burdens by SB mutagenesis recapitulates the sporadic, stepwise evolutionary selection of cooperating drivers in non-melanoma skin cancers in humans exposed to solar UV. Despite these divergent mutagens, we observed remarkable overlap of driver genes, pathways, and networks between the mouse and human cuSCCs, suggesting a deep mutual biology between these shared drivers and cutaneous oncogenesis. We hope that by defining the cooperative oncogenic and tumor suppressor networks that operate during keratinocyte transformation and subsequent cuSCC progression, the results of our screen will provide a foundation for exploring new therapeutic strategies.

Ethics statement
Mice were bred and maintained in accordance with approved procedures granted by the respective Institutional Animal Care and Use Committees (IACUC) at the National Cancer Institute Frederick National Lab, A � STAR Biological Resource Centre, Houston Methodist Research Institute, and Moffitt Cancer Center.

Histological analysis
Histological analysis of spleen was performed on 5-μm sections of formalin-fixed, paraffinembedded (FFPE) specimens stained with hematoxylin and eosin. As expected, robust nuclear staining of SB transposase (SBase) was confirmed by immunohistochemistry on FFPE tissues after antigen retrieval (pH 9) and endogenous peroxidase inhibition followed by overnight incubation with mouse antibody to SBase (anti-SBase; R&D Systems; pH 9; 1:200 dilution). After incubation with primary antibody, chromogen detection (with HRP polymer, anti-rabbit or anti-mouse, with Envision System from Dako) and hematoxylin counterstaining were performed per manufacturer's instructions. Genomic DNA (gDNA) was isolated from flash frozen necropsy specimens using Qiagen Gentra Puregene DNA isolation kit protocol for tissue.

Mapping transposon insertion sites using the splink_454 method
SB insertion reads were generated by 454 GS Titanium sequencing (Roche) of pooled splinkerette PCR reactions with nested, barcoded primers was performed [10,11,69]. Pre-and postprocessing of 454 reads to assign sample DNA barcodes, filter out local hopping events from donor chromosomes, and map and orient the SB insertion sites across the entire nuclear genome of the mouse was performed. All SB insertions from donor chromosomes were filtered out prior to identification of common insertion sites using the Gaussian kernel convolution (GKC) [69,70] and SB driver analysis [21] methods.

Mapping transposon insertion sites using the SBCapSeq method
Full details for the SBCapSeq protocol (9) optimized for sequencing from solid tumors, will be published elsewhere (Mann et al., in review; for general protocol and concept, see [9,71,72]). Briefly, for selective SB insertion site sequencing by liquid hybridization capture, gDNA (0.5 μg per sample) of either bulk tumor specimens or single cell WGA genomes was used for library construction using the AB Library Builder System, including random fragmentation and ligation of barcoded Ion Xpress sequencing adapters. Adapter-ligated templates were purified by Agencourt AMPure beads and fragments with insert size of 200±30bp were excised, purified by Agencourt AMPure beads, amplified by 8 cycles of adapter-ligation-mediated polymerase chain reaction (aLM-PCR), and purified by Agencourt AMPure beads with elution in 50 μl of TE (1X Tris-EDTA [Ethylenediaminetetraacetic acid], pH8). Capture hybridization of single or multiplexed up to 12 barcoded libraries (60 ng per sample) was performed using custom xGen Lockdown Probes (IDT; full details available at https://doi.org/10.35092/yhjc. 11441001.v1 [71]). All 120-mer capture and blocking oligonucleotide probes were purchased from IDT as Ultramer DNA Oligos with Standard Desalting. Bar-coded single and multiplex captured library fragments were further amplified by 12 cycles of LM-PCR and run using an Agilent 2100 Bioanalyzer or TapeStation to estimate enrichment. Bar-coded single and multiplex captured libraries were quantified by Qubit Fluorometer and quantitative Real Time-PCR (qRT-PCR) were used to dilute libraries for template preparation and Ion Sphere Particle (ISP) loading using Ion Chef System and sequencing on the Ion Proton platform with PI v3 semiconductor wafer chips per manufacturers recommended instructions. High-throughput sequencing of up to 39 multiplex captured libraries was carried out per PI v3 chip to achieve at least 1.5 million reads per barcode. Reads containing the transposon IRDR element were processed using the SBCapSeq bioinformatic workflow as described [9].

SB driver analysis
BED formatted files containing SB insertions from each of the histologically verified SB-cuSCC, SB-cuKA, and SB-cuSK cohort specimens (S1 and S2 and S3 Data) were used to perform SB Driver Analysis to identify statistically significant discovery, progression, and trunk driver genes that contain more SB insertions than expected by chance and were recurrently altered in three or more tumors [21]. Discovery and progression SB Driver Analysis considered all SB insertion events; trunk SB Driver Analysis considered only insertions represented by 5 or more reads. Statistically significant discovery drivers were defined as adjusted p-values using false discovery rate (FDR) multiple testing correction (q-value<0.5). Statistically significant progression and trunk drivers were defined as adjusted p-values using family-wise error rate (FWER) multiple testing correction (FWER adjusted-P<0.05). Statistically significant drivers on donor and non-donor chromosomes analyses were performed separately and combined into a single driver lists (S1 and S2 and S3 Data). Due to a local-hopping phenomenon known to occur with SB, where transposition events are biased to occur in cis along the donor chromosome more frequently than in trans to non-donor chromosomes in the genome, insertions from the donor chromosome are typically filtered away computationally before candidate cancer genes are identified. However, doing so in cohorts derived from mice with only a single SB donor allele (like our cuSCC cohorts) means that genome-wide driver analysis is not possible, because all genes on the donor chromosome are censored and are not reported even though they may contribute to the overall tumor burden. Since we obtained quantitative SBCapSeq datasets in this study, we evaluated whether all genes on the donor chromosome should be censored or if perhaps, as observed in PiggyBac transposon genomes, only a relatively small portion of the donor chromosome locus, occurring in close proximity to the donor concatemer insertion site, demonstrate significant bias compared with the frequency of SB insertions into non-donor chromosomes. In the cuSCC cohorts sequenced with SBCapSeq, we found that donor chromosome SB insertion site frequencies are significantly higher between chr9:79,000,000 and chr9:95,000,000 and drop to the genome wide average for non-donor chromosomes over the rest of chromosome 9 (S12 Fig), suggesting that filtering all SB donor chromosome insertion events may not be warranted. To confirm and extend this observation, we used all tumors within the SBCDDB with chromosome 9 donor alleles and identified that the same region between chr9:79,000,000 and chr9:95,000,000 had a higher frequency of SB insertions than the rest of chromosome 9 loci, which again matched non-donor chromosome insertion frequencies (S12 Fig). Thus, we ran SB Driver Analysis on tumors derived from mouse cohorts harboring an SB T2/Onc3 TG.12740 donor allele chromosome 9 (cuSCC83_454, cuSCC60_SBC, and cuKA11_SBC), by excluding the 81 genes that map between Filip1 at chr9:79663368-79825689 and Tfdp2 at 96096693-96224065 from the driver gene output files. To define trunk drivers, we used a normalized number as a percent of the total SB insertions, 300 reads is the cut off for the top 5% of reads, based on the assumption that the distribution of SB insertions from normal skin represented a normal distribution of background events not exceeding 5% of reads. We selected a driver cut-off as a whole number that was within the 5% of the highest insertion sites from the normal skin samples recognizing that a random number of insertions may have higher read depths from technical (e.g., amplicon amplification bias or known limitations of the Ion Torrent sequencing and alignment platform of homopolymer regions) and/or biological (e.g., clonal keratinocyte distributions within the sampled histologically normal skin or insertion events linked to loci overrepresented within the sampled histologically normal skin genomic DNA) sources of variation that may permit insertion overabundance compared with the majority (~95%) of reads within the range of sequencing depths represented within the specimen pools. In sequencing the histologically distinct regions from cuSCC masses, we used a normalized number as a percent of the total SB insertions, 200 reads representing read depths in the top 30%, so as to provide more power to detect recurrent gene insertions, given that our sequencing depth was lower for this experiment. Despite lowering this threshold, we note that Zmiz1 was the only gene that consistently had activating transposon insertions in all four independent lesions sequenced, suggesting that Zmiz1 proto-oncogenic activation is selected early and maintained throughout clonal evolution of these histologically heterogeneous masses.

Microarray gene expression analysis
Gene expression profiling of histologically confirmed SB-cuSCC masses selected by presence or absence of Zmiz1 insertion by 454Splink sequencing was performed using Affymetrix microarrays, as described previously [9]. Briefly, 100ng of total RNA for each sample was extracted using a NORGEN Biotek Animal Tissue RNA Purification kit (Cat #25700) followed by labeled with an Affymetrix 3' IVT Express kit (Cat # 901229) using the manufacturer's instructions. Labeled samples were hybridized to Affymetrix GeneChip Mouse Genome 430 2.0 Arrays, and scanned at the University of Otago Genomics & Bioinformatics Facility. Raw data processing used R (version 3.6.1) [73] with the "rma" function of the "affy" package [74], including quantile normalization but no background correction. Quality assessment of the microarray data was performed in R using the "affyQCReport" package [75]. Differential expression was performed using limma [76], and gene set analysis of Reactome pathways was performed using ReactomePA [77]. All additional data analysis and visualization were performed using Python and R.

Transcriptome sequencing
Total RNA was isolated from flash frozen necropsy specimens using mirVana miRNA Isolation Kit (Ambion by Life Technologies, AM1560) from a shaved portion of each pathology verified cuSCC mass from 7 tumors for which sufficient tissues were available and the presence of a high read depth Zmiz1 or Zmiz2 trunk driver insertion was identified by SBCapSeq. Whole transcriptome RNA-Seq (wtRNA-Seq) libraries were prepared from total RNA (5 μg per sample) plus ERCC RNA Spike-In (Ambion by Life Technologies, 4456739) followed by selective ribosomal RNA (rRNA) depletion using the RiboMinus Eukaryote System v2 (Ambion by Life Technologies, A15026). rRNA-depleted total RNA (500 ng per sample) was used for whole transcriptome library construction according to the Ion Total RNA-Seq Kit for the AB Library Builder System (Life Technologies, 4482416) protocol for barcoded libraries, Ion Sphere Particle (ISP) loading using Ion Chef System and sequencing on the Ion Proton platform with PI v3 semiconductor wafer chips per manufacturers recommended instructions. Up to 4 RNA-Seq libraries were multiplex sequenced on a PI v3 chip twice to achieve >3 GB (~40 million reads) per specimen. The Life Technologies Torrent Suite software was used to perform checking of the raw sequence data before the generation of sequencing read files in FASTQ format and minimally processed mRNA-Seq and wtRNA-Seq reads were processed using the Bowtie2 [78] and Tophat [79] algorithms to align reads to a custom version of the mouse mm9+pT2/Onc genome, as described [9]. The detection of SB fusion events, defining novel SB fusion transcripts, and measuring transcript abundance in SB cuSCC tumor specimens from RNA-Seq data were performed as described [9].

Biological pathway and process enrichment analysis
Enrichr [31,32], an online analysis tool for human and mouse gene-set enrichment, was used to identify specific signaling pathways and processes enrichment using the various cohort candidate cancer driver genes and/or their human orthologs (S9 Table and S1 Text). The online STRING network enrichment analysis tool [80] was also used to determine functional connections between trunk drivers from the cuSCC and cuKA cohorts.

Generation of stable shRNA-expressing cell lines
High titer lentiviral particles for pGIPZ-shRNA constructs targeting each of the 12 cutaneous candidate driver genes, and one non-targeting control, were purchased from Thermo Scientific Open Biosystems (S19 Table). Cells were plated at a density of 5 × 10 4 cells per well in a 24-well plate in complete media 24 hours prior to infection. The following day, cells were plated with serum-free culture medium containing 8 μg/mL polybrene (Millipore), and transduced as pools of three independent shRNAs to different exons of each target gene, or individually for the control, at multiplicity of infection of 6. Puromycin selection was added the following day at concentrations of 1.0 to 3.0 μg/mL puromycin (Thermo Fisher Scientific) in complete media, and replaced every three until stable lines were achieved.

Proliferation assay
To assess cuSCC progression, cuSCC cell lines were seeded in 6 well plates at a density of 5 × 10 4 cells per well in complete media. Cell numbers for each condition were counted after 48-and 96-hour post-seeding. To assess transformation by proliferation, HaCaT cells were seeded at a density of 5 × 10 3 cells per well and cell counts performed after seven days.

Soft agar assay
To assess anchorage-independent growth, HaCaT cells were seeded at a density of 1 × 10 4 cells per well in a six well plate in 0.3% agar, over a layer 0.6% agar. Complete media was added to each well the following day, and replaced every two days for four weeks. At end point, cells were fixed in 20% methanol and 0.0025% crystal violet. Washes were done with milliQ water until low background was achieved, and plates were scanned, and colonies counted for each condi-

qRT-PCR
Total RNA was purified and DNase treated using the RNeasy Mini Kit (Qiagen). Synthesis of cDNA was performed using SuperScript VILO Master Mix (Life Technologies). Quantitative PCR analysis was performed on the QuantStudio 12K Flex System (Life Technologies) or 7900HT Sequence Detection System (Applied Biosystem). All signals were normalized to the levels of GAPDH TaqMan probes. TaqMan probes were obtained from Life Technologies (S20 Table).

Xenografts
One million cells were prepared for injection into the left flank of randomized selected male and female immunodeficient NSG (NOD.Cg-Prkdcscid;Il2rgtm1Wjl/SzJ; JAX, 005557; 6-15 weeks old) mice. Allocation to study groups was random. Xenograft measurements were taken twice weekly using digital calipers while mice were conscious but restrained by an experimenter familiar with collecting caliper measurements of xenografts and blinded to the experimental group designations. GFP fluorescence was visualized at the time of caliper measurement. Ellipsoid tumor volumes were calculated as volume (mm3) = 0.52(length (mm) × width2 (mm2)), where the two longest axes, length and width, were the major and minor diameter measurements, respectively; width2 represents an assumption that the xenograft depth was equivalent to the diameter of the minor axis. Statistical significance of tumor volumes at defined time points was determined by either one-way t test for all cohorts relative to shNTC or by two-way, repeated-measures ANOVA with Bonferroni correction for multiple comparisons, as indicated.

Software
Unless otherwise noted, bioinformatic analysis pipelines, report generation, and figure visualization were performed using bash, R (version 3.6.1), Python scripts, and GraphPad Prism 8 software (Version 8.1.1). Hierarchical clustering was performed in Python 2.7.10 with the scipy 0.13.0b1 toolbox using the Hamming distance metric with Ward's linkage method. The R packages ggplot2 [84] and Gviz [85] were used to generate the graphics in various figure panels.   1190002N15Rik, 1700034K08Rik, 1700057G04Rik, 1700065D16Rik Table. Normalized microarray values per gene from RNA isolated from cuSCC genomes with and without Zmiz1 insertions. (XLSX) S17 Table. Normalized microarray values per probe from RNA isolated from cuSCC genomes with and without Zmiz1 insertions. (XLSX) S18 Table. All 289 genes with differential expression analysis from microarray data from RNA isolated from cuSCC genomes with and without Zmiz1 insertions with P<0.0001 and q<0.05. (XLSX) S19  teams for 454 sequencing; the Elsa Flores Lab for use of the LiCOR imaging system; Sean Yoder (Moffitt Cancer Center) and Dr. Chaomei Zhang (Moffitt Cancer Center) for providing sequencing services; Pearlyn Cheok, Nicole Lim, Dorothy Chen and Cherylin Wee for assistance with tumor monitoring and animal husbandry at IMCB (Singapore, Republic of Singapore), Hubert Lee and Erik Freiter for assistance with animal husbandry at HMRI (Houston, TX, USA), and Crystal Reed and Bethanie Gore for assistance with animal husbandry at MCC/ USF (Tampa, FL, USA). Necropsy and histology were performed by the Advanced Molecular Pathology Laboratory, Institute of Molecular and Cell Biology (A � STAR), Singapore. Necropsy and histology were also performed by the Tissue Core Facility and Ion Torrent sequencing was performed by the Molecular Genomics Core at the H. Lee Moffitt Cancer Center & Research Institute.