Transcriptome Analysis Identifies the Dysregulation of Ultraviolet Target Genes in Human Skin Cancers

Exposure to ultraviolet radiation (UVR) is a major risk factor for both melanoma and non-melanoma skin cancers. In addition to its mutagenic effect, UVR can also induce substantial transcriptional instability in skin cells affecting thousands of genes, including many cancer genes, suggesting that transcriptional instability may be another important etiological factor in skin photocarcinogenesis. In this study, we performed detailed transcriptomic profiling studies to characterize the kinetic changes in global gene expression in human keratinocytes exposed to different UVR conditions. We identified a subset of UV-responsive genes as UV signature genes (UVSGs) based on 1) conserved UV-responsiveness of this subset of genes among different keratinocyte lines; and 2) UV-induced persistent changes in their mRNA levels long after exposure. Interestingly, 11 of the UVSGs were shown to be critical to skin cancer cell proliferation and survival. Through computational Gene Set Enrichment Analysis, we demonstrated that a significant portion of the UVSGs were dysregulated in human skin squamous cell carcinomas, but not in other human malignancies. This highlights the potential and specificity of the UVSGs in clinical diagnosis of UV damage and stratification of skin cancer risk.


Introduction
Skin cancer is the most common cancer in the US and has become a major public health problem due to rising incidence and treatment costs [1,2]. Every year nearly 5 million cases of skin cancers are treated, at an estimated cost of $8.1 billion [3]. There is compelling scientific and epidemiological evidence that skin cancer is caused mainly by an interplay between genetic factors and exposure to solar UV radiation (UVR) [4][5][6]. Most skin cancer cases are preventable through proper protection against excessive UV exposure. Sunscreen is one of the commonly used sun protection strategies, especially in skin cancer susceptible populations [7]. However, there are controversies surrounding the efficacy and safety of sunscreen products [8][9][10][11]. Currently, the sun-protection efficacy of sunscreens is measured by Minimal Erythema Dose (MED), which refers to the lowest time interval or dosage of UV sufficient to produce a minimal, perceptible erythema on unprotected skin. As an indicator of UV damage, MED is both insensitive and inadequate because significant molecular and cellular damage occurs at sub-erythema UV doses that may contribute to cumulative photo damage [12,13]. MED also varies widely among different skin types. To reduce skin cancer incidence, sensitive biomarkers and methods are needed for accurate assessment of UV-induced damage and evaluation of the protective efficacy of sunscreen products to enhance skin cancer prevention.
Presently, no consensus panel of UV biomarkers is available for quantifying UV damage and stratifying skin cancer risk. Previous studies have attempted to identify UV-responsive genes as UV biomarkers [14][15][16][17][18][19]. However, there are significant variations in the experimental designs including the choices of cell types, UV sources and doses, time points of analysis, and expression profiling methods with different genomic coverage. Such variances make crosscomparison and validation of previous findings challenging. To define a consensus UV biomarker panel with broad applicability and accuracy, we performed RNA-Seq studies to generate a transcriptomic cohort containing UV-responsive genes in human skin cells exposed to different UVR conditions. We then performed rigorous bioinformatics analysis to define a UV gene expression signature that is conserved among cells from different donors. We further demonstrated that the UV signature genes (UVSGs) are significantly enriched in genes dysregulated in human skin squamous cell carcinomas (SCCs), highlighting the potential clinical utility of the UVSGs in risk prediction and stratification of skin cancer.

Material and Methods
Human keratinocyte cultures, human SCC and normal skin tissues Primary human keratinocytes from neonatal foreskins were provided by the Columbia University Medical Center Skin Disease Research Center's Tissue Culture and Histology Core facility. The protocol was exempt by the Institutional Review Board at Columbia University. Keratinocytes from 4 individual donors (N0, N1, N2, and N6) were cultured in 154CF medium supplemented with human keratinocyte growth supplement (Life Technologies, Grand Island, NY). Cells from each donor were maintained and analyzed separately. Human SCC tumor tissues and adjacent normal skin tissues were obtained from the Molecular Pathology Shared Resource/Tissue Bank of the Herbert Irving Comprehensive Cancer Center of Columbia University under CUMC IRB protocol AAAB2667.

UV radiation
Primary keratinocytes were propagated in culture for 2 to 3 passages in 10 cm culture dishes. UVR was delivered using four FS20T12 UV tubes, which emit UV rays between 290 and 340 nm, with an emission peak at 310 nm [20]. Before UVR, cells were rinsed once with PBS and irradiated in approximately 0.5 ml PBS in each 10 cm dish with 3 different doses: 10, 20, or 30 mJ/cm 2 , which was determined by an IL1400 radiometer connected to a SEL240/UVB-1/TD detector [21]. Cells were collected at 4 h, 1 day or 3 days after UVR. The seeding density for each UVR condition was adjusted based on a pre-determined LD50 UV dose (35 mJ/cm 2 at 24 h after UVR) as follows [22]: 0.75x10 6 /dish for control cells or cells irradiated with 10 mJ/cm 2 UVR; 1x10 6 /dish for cells irradiated with 20 mJ/cm 2 UVR; 1.5x10 6 /dish for cells irradiated with 30 mJ/cm 2 UVR (collected 1 day after UVR) or 3x10 6 /dish for cells irradiated with 30 mJ/ cm 2 UV (collected 3 days after UVR). Cells were allowed to recover for 24 h after subculture. UVR was applied in the order of 3 days, 1 day, and 4 h prior to the collection of all cells on the same day to ensure that all cells were cultured for the same amount of time within each experiment.

RNA isolation and RNA-Seq analysis
Total RNA was isolated from cultured keratinocytes, human SCC tissues, and adjacent normal skin tissues using the RNeasy Kit (QIAGEN, Gaithersburg, MD). All RNA samples were subsequently analyzed using an RNA 6000 nano chip (Agilent Technologies, Wilmington, DE) to confirm that the RNA integrity index was 8.0 or above. Total RNA (500 ng) from each sample was subjected to poly-A pull-down to enrich mRNAs for library preparation by using Illumina TruSeq RNA prep kit (Illumina, San Diego, CA). The resulting libraries were sequenced using Illumina HiSeq2000 at Columbia Genome Center. Sequencing reads were mapped to the human reference genome (NCBI/build37.2) using Tophat (version 2.0.4). Differentially expressed genes (DEGs) between each UVR-treated group and the control group were determined using the DESeq software package [23], which is an R Bioconductor package that defines DEGs based on sequence read counts data. DESeq uses a negative binomial model to approximate the distribution of read counts and estimate the mean and variance of the dispersion. We selected DEGs using fold change (FC) cutoffs set at >2 or <0.5 between irradiated and non-irradiated keratinocytes. A False Discovery rate (FDR) of <0.05 was used to control for false discoveries.

Bioinformatics and statistical analyses
DEG lists were used in principal component analysis (PCA) to test the variations in transcriptomic response to different UVR conditions among the keratinocyte lines. PCA is a statistical method that performs orthogonal linear transformation to convert a gene expression matrix into a linearly uncorrelated variable called the principle component (PC) [24]. Each PC is a linear combination of the original gene expression values. The first two PCs contain the most variance in the data, and could be visualized in 2-dimentional space. DAVID is a free online bioinformatics resource for systematic analyses of gene function classification, annotation and clustering based on various databases including KEGG, BioCarta, and Gene Ontology [25]. We used DAVID to identify the biological pathways in which the UV-induced DEGs were enriched. To determine the overlap between UVSGs and SCC signature genes, we performed gene set enrichment analysis (GSEA), which computes the enrichment of a pre-defined gene set on a gene signature generated by fold-change, t-test or other method using the Kolmogorov-Smirnov statistic [26].
We used Paired t-test to identify genes displaying time-dependent UVR responses following exposure. To identify genes manifesting UV dose-dependent changes, we constructed a linear regression model using UV dose as an independent variable and gene expression as a dependent variable for each gene in the same keratinocyte line and at the same time point. To evaluate the overall effects of the various UVR doses on the expression of a specific gene, we integrated the multiple p-values from every regression analysis for that gene using Fisher's Method.
To obtain cancer-specific gene signatures for various human malignancies, we retrieved the RNA-Seq raw counts from The Cancer Genome Atlas (TCGA) database, which contains genomic and clinical information for tumors and their matching normal tissues across over 30 cancer types from more than 11,000 patients (publically available at cancergenome.nih.gov). We selected RNA-Seq data sets with both primary tumors and matched normal tissues for each tumor type. We generated DEG sets for each tumor type using the DESeq software package. To identify genes critical to skin cancer cell proliferation or survival, we queried the Achilles database with 67 of the UVSGs that were upregulated by UVR [27]. Genes were considered essential to skin cancer cell survival if their corresponding shRNAs became depleted after 40 days or 16 population doublings following shRNA infection. Normalized shRNA depletion scores were downloaded from the "cBOTv8_sbsv3_allreps_log.gct2" file in the Achilles database. All statistical analyses were performed using the R software package.

Transcriptomic responses to different UVR conditions
In addition to its mutagenic effect, UVR can cause genome-wide transcriptional instability affecting thousands of genes [14][15][16]. To fully characterize UVR-induced transcriptomic changes, we conducted RNA-Seq to profile mRNA expression changes in keratinocytes isolated from 4 individual donor foreskins in response to different UVR doses at varied time points after exposure. The resulting 22 DEG lists were subjected to principle component analysis (PCA) to visualize the similarities of the DEG profiles in response to different UVR conditions. As shown in Fig 1A, DEG profiles from Day 1 and 3 groups, but not the 4 h group, demonstrated great similarities in the first principle component (PC1). Along the second principle component (PC2) axis, the range of differences among the Day 3 DEG lists appeared smaller than that of the Day 1 DEG group, demonstrating a time-dependent increase in the similarity of UVR-induced transcriptomic changes among different keratinocyte lines and different UVR doses.
To uncover the biological pathways that were most responsive to UVR, we calculated the average of the FC of each DEG from Day 1 and Day 3 DEG lists (S1 Table). Using an FC cutoff of 2, we obtained a total of 531 genes that were upregulated (FC>2) and 610 genes that were down-regulated (FC<0.5) in response to different UVR conditions. We performed DAVID analysis to identify top biological pathways in which the upregulated genes and down-regulated genes were enriched. DAVID revealed multiple pathways that were significantly modulated by UVR. The down-regulated genes were significantly enriched in the following top biological pathways: cell cycle regulation (83 genes), chromosome structure (19 genes), DNA damage response (DDR) (59 genes), and microtubule organization (23 genes); whereas the upregulated genes were enriched in pathways such as apoptosis (33 genes), defense inflammatory response (43 genes), ectoderm epithelium development (36 genes), cell adhesion (4 genes) and leukocyte activation (9 genes) ( Fig 1B).

Time-dependent transcriptomic changes in response to UVR
The PCA analysis in Fig 1A revealed time-dependent changes in the transcriptome following UVR. To identify which genes exhibit time-dependent UVR responses, we performed paired ttests to compare the FCs of Day 3 DEGs with those of Day 1 DEGs for each keratinocyte cell line (N0, N1 and N2) under the same UVR dose. We found that 164 out of the 531 upregulated genes had higher expressions at Day 3 than at Day 1 (FDR-corrected p-value<0.05); while 239 out of the 610 down-regulated genes were more repressed at Day 3 than at Day 1 at the same p-value threshold (S2 Table). Two examples of time-dependent upregulation include ADAMTSL4, encoding a disintegrin and metalloproteinase; and CST6, encoding a cystatin superfamily protein. Examples of time-dependent down-regulation include UHRF1, encoding a member of a subfamily of RING-finger type E3 ubiquitin ligases; and TRIP13, which encodes a protein that interacts with thyroid hormone receptors (Fig 2).
Dose-dependent transcriptomic changes in response to UVR Next, we tested which genes displayed dose-dependent mRNA expression changes in response to UVR. To do so, we fitted linear regression models for each of the differentially expressed genes using UVR doses (10, 20 and 30 mJ/cm 2 ) as independent variables and gene expression as the dependent variable for each keratinocyte cell line (N0, N1, N2) at the same time point (Day 1 or 3). For each gene, we constructed 6 models representing the following 6 conditions: N0-1d, N0-3d, N1-1d, N1-3d, N2-1d and N2-3d. We then integrated the 6 coefficient p-values from the 6 models using Fisher's method. We found that 285 out of the 531 upregulated genes showed dose-dependent up-regulation with FDR-corrected p-value<0.05, and 452 out of the 610 down-regulated genes demonstrated significant dose-dependent decreases in gene expression at the same FDR threshold (S3 Table). Dose-dependent changes in 8 representative genes with the lowest p-values from each group are illustrated in Fig 3.

Identification of conserved UVR transcriptomic signature genes
The time-dependent kinetic changes illustrated in Fig 2 suggested that UVR might exert persistent effects on a subset of genes, which may serve as UV gene expression signature with a biomarker potential for assessing UVR-induced skin damages. To identify UV-induced DEGs with biomarker potential, we performed bioinformatics and statistical analyses on DEGs derived from 30 mJ/cm 2 UVR exposure to isolate DEGs that were conserved among different donors (N0, N1, N2, and N6). We identified 401 conserved UV signature DEGs, which we designated as the putative UV biomarker panel (S4 Table). To test protein-protein interactions (PPIs) among the protein products of these UVSGs, we performed STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) network analysis using the Pajek software (version 3.1) [28] based on the known and predicted protein interactions available in the STRING database (version 10) [29]. A STRING cutoff score of 0.7 was used to select PPIs with high confidence. Altogether, we found 54 vertices (genes) and 106 edges (interactions) among the UVSGs (Fig 4A). Clustering analysis using the VOS algorithm [30] to maximize modularity within each cluster revealed 11 modules that were connected with each other except the histone protein cluster (Fig 4A). Among the UVSGs, 13 of them showed more than 5 interacting  neighbors (degree), also known as the hubs on the PPI network, including IL6 (19), PTGS2 (15), and IL1B (12), highlighting the potentially central roles of these genes in mediating UVR responses.

Dysregulation of UVSGs in human SCCs
Compelling evidence suggests that UVR is the main etiological factor in SCC pathogenesis [4,31]. To test whether UVSGs were dysregulated in human SCCs, we performed similar RNA-Seq analyses to identify DEGs in human SCCs by comparing the transcriptome of the SCC tumor tissue with that of the adjacent normal skin from the same patient. We then performed GSEA to determine the enrichment of the UVSGs in the SCC DEGs, or vice versa. As shown in Fig 4B and 4C, GSEA analyses revealed a significant mutual enrichment between the UVSG and the SCC signature (p = 0.006 and 0.02, respectively). When we used an SCC signature discovered by microarray-based analyses [32], we observed a significant enrichment between the SCC signature and the UVSG as well (p = 5.19e-05 by Fisher exact test analysis), confirming the molecular similarities between the UVSG and SCC signature.
To test whether the identified UVSG is specific for skin cancer, we performed additional GSEA analyses to compare the UVSG with gene sets dysregulated in 14 other human cancer types (obtained from the TCGA RNA-Seq database). Each cancer type contained at least 6 pairs of primary tumors and adjacent normal tissues (Table 1). We generated DEG sets specific for each cancer type using paired t-test. Each resulting cancer DEG set was then used in GSEA analyses to assess the mutual enrichment between the UVSG and each cancer signature. As summarized in Table 1, there was no significant enrichment between the UVSG and other cancer signatures (p>0.05) except for thyroid cancer (p = 0.0222, Table 1). The similarity between the UVSG and thyroid cancer signature might be related to the fact that ionizing radiation is a significant risk factor for thyroid cancer [33], and that UVR and other radiations may share common target genes involved in pathways such as DNA damage and inflammation. A recent prospective study also found a non-linear association between UVR and thyroid cancer [34]. Further studies are needed to determine whether UVR may indeed increase thyroid cancer risk.
To test the stability of the UV gene expression signature following exposure, we performed RNA-Seq on keratinocytes exposed to 30 mJ/cm 2 of UV to generate a UV-induced DEG list at Day 21 after exposure. Cross-comparison of the UVSG with the Day 21 DEG list revealed an overlap of 144 genes (Fig 4D and S5 Table) (p<2.2e-16 per Fisher's exact test), suggesting that a significant portion of the UVSGs maintained their initial UV responsiveness long after exposure. Furthermore, cross-comparison of the UVSG derived in keratinocytes and the SCC signature (consisting of DEGs between the SCC and adjacent normal skin tissues) revealed that the UVSGs were significantly enriched in the SCC signatures (p<2.2e-16 per Fisher's exact test) (Fig 4E and 4F), underscoring the potential of the UVSGs as biomarkers for assessing UVR damage and predicting skin cancer risk.

Role of UVSGs in skin cancer cell proliferation and viability
Project Achilles leverages both biological and computational analyses to identify genes that affect cancer cell survival and/or proliferation using a genome-wide shRNA library screening in over 200 cancer cell lines [27]. Based on the degree of depletion of a specific shRNA-treated cancer cell population following infection, a depletion score is assigned to each shRNA. The depletion score is therefore inversely correlated with the role of its target gene in cancer cell survival, based on the assumption that loss of a key cancer survival gene (due to RNAi effect) is detrimental to the infected cells [27]. Given that Achilles data were derived from loss-of-function analysis, we focused on 67 UVSGs that were upregulated in both SCCs and by UVR (FC>2). We have validated 25 of the 67 genes in the Achilles database in multiple cancer cells lines. We queried the Achilles database with these 25 genes to determine which are critical to skin cancer cell proliferation and/or survival. Using the Wilcoxon test, we determined that 11 out of the 25 genes had significantly lower depletion scores in skin cancer cell lines compared to other non-skin cancer lines (p<0.05, Table 2), indicating that this subset of UVSGs may play key roles in skin carcinogenesis. The depletion scores of the shRNAs targeting these 11 genes in 5 skin cancer cell lines, together with the median depletion scores of the same shRNAs in non-skin cancer lines, and the p-values from Wilcoxon tests are summarized in Table 2. These analyses suggest that a subset of UVSGs may be targeted in future skin cancer prevention and therapeutic development.

Discussion
Despite decades of research, no consensus panel of molecular biomarkers is available for accurate assessment of UVR damage and stratification of skin cancer risk. mRNA transcripts have been successfully used as molecular biomarkers to enable early detection and diagnosis of disease and to track disease progression. To facilitate the development of a biomarker-based sensitive method to assess UVR damage and stratify skin cancer risk, we employed RNA-Seq to identify UV-induced gene expression signatures to establish a UV biomarker panel. Utilizing bioinformatics and statistical analyses, we compiled a UV biomarker panel consisting of 401 genes that are consistently altered by UVR and conserved among keratinocytes from different donors. We further demonstrate that alterations in the mRNA expression of the UVSGs persisted 21 days after exposure, illustrating the stability and reliability of the identified UV biomarker genes in future diagnostic applications. The persistent gene expression changes may be due to UVR-induced epigenetic changes in DNA methylation and/or histone modifications at target gene loci following UVR. The dose-dependent response among some of the UVSGs further suggests that this novel UV biomarker panel may offer quantitative assessments of UVR skin damage and thus help identify individuals at increased levels of risk for developing skin cancer. While some of the UVR target genes uncovered in this study have been reported in previous studies [14][15][16][17], we have identified many new UVR target genes due to the near-complete coverage of the transcriptome by RNA-Seq compared to the limits of previous microarray-based analyses. Our experimental design also allows comprehensive characterizations of the UVresponsive kinetics in the keratinocyte transcriptome (Figs 2 and 3). We would like to point out, however, that we performed only one RNA-Seq for each cell line at each UVR dose or time point. Furthermore, our findings are based on cultured cells, which do not involve the complex interactions between keratinocytes and other cell types in vivo. Therefore, additional validations of the identified UVSGs are warranted in future studies. Among the DDR genes downregulated by UVR (Fig 1B), many of them encode DNA helicases, DNA-dependent ATPases, and pyrophosphatases that participate in DNA replication and repair (S6 Table). DDR plays pivotal roles not only in maintaining genome integrity and stability, but it also has a broad impact on cell cycle regulation, chromatin dynamics, and apoptosis [35]. This may explain the observations that these related pathways all appear as the top UV-responsive pathways ( Fig  1B). Among the genes upregulated by UVR exposure in the inflammatory response pathway, HDAC5 and HADC9 are two epigenetic regulators belonging to the class IIA family of histone deacetylases that are known to regulate adaptive immunity [36], underscoring the importance of epigenetic regulation of UVR-induced skin immune response.
The UV biomarker panel identified in this study consists of more genes (401) than other biomarker panels currently used in clinical diagnosis of various diseases [37][38][39]. A large biomarker panel can offer better coverage and accuracy in assessing UV impact and skin cancer risk. New molecular tests can be developed based on the UV biomarker genes to replace the current MED-based method in testing the UV-protective efficacy of sunscreen products. The significant similarity between the UV signature and SCC signature suggests that such biomarker-based tests may also enable clinical diagnosis and skin cancer risk prediction in susceptible individuals following severe sunburns. Due to the steep decreases in the run times and costs of next-generation sequencing technologies, profiling an individual's transcriptome has become a feasible clinical undertaking. We anticipate that the UV biomarker panel, together with ever-improving RNA-Seq and bioinformatics tools, will greatly facilitate the development of next-generation diagnostic tests to enhance skin cancer prevention and early detection to reduce the incidence of this prevalent human malignancy.
Supporting Information S1