Molecular signatures for inflammation vary across cancer types and correlate significantly with tumor stage, sex and vital status of patients

Cancer affects millions of individuals worldwide. One shortcoming of traditional cancer classification systems is that, even for tumors affecting a single organ, there is significant molecular heterogeneity. Precise molecular classification of tumors could be beneficial in personalizing patients’ therapy and predicting prognosis. To this end, here we propose to use molecular signatures to further refine cancer classification. Molecular signatures are collections of genes characterizing particular cell types, tissues or disease. Signatures can be used to interpret expression profiles from heterogeneous samples. Large collections of gene signatures have previously been cataloged in the MSigDB database. We have developed a web-based Signature Visualization Tool (SaVanT) to display signature scores in user-generated expression data. Here we have undertaken a systematic analysis of correlations between inflammatory signatures and cancer samples, to test whether inflammation can differentiate cancer types. Inflammatory response signatures were obtained from MsigDB and SaVanT and a signature score was computed for samples associated with 7 different cancer types. We first identified types of cancers that had high inflammation levels as measured by these signatures. The correlation between signature scores and metadata of these patients (sex, age at initial cancer diagnosis, cancer stage, and vital status) was then computed. We sought to evaluate correlations between inflammation with other clinical parameters and identified four cancer types that had statistically significant association (p-value < 0.05) with at least one clinical characteristic: pancreas adenocarcinoma (PAAD), cholangiocarcinoma (CHOL), kidney chromophobe (KICH), and uveal melanoma (UVM). These results may allow future studies to use these approaches to further refine cancer subtyping and ultimately treatment.


Introduction
Many cancers are found when there is already local invasion or even distant metastatic disease. Among the issues complicating treatment options are the fact that there are many tumor types, whose response to therapy may differ depending on site of origin and cellular composition [1]. Even within the same organ, there are heterogeneous tumor types with different responses to therapies.
As a result, precise tumor classification is crucial; depending on the categorization of a tumor, the clinical course, prognosis, and treatment can vary dramatically [2]. The traditional histology-based method to classify cancer is based on observing the site of origin, degree of spread and cellular morphology [3][4][5]. However, because tumors are heterogeneous and frequently contain abundant somatic mutations, traditional approaches for classifying tumor subtypes are often insufficient.
By contrast, molecular classification is based on the analysis of tumor genomes as well as gene expression [6]. Successful molecular subdivision of tumors originating from the same tissue may result in different treatments targeting a specific tumor type, as is found in the case of ERBB2-amplified breast cancer and EGFR mutant lung carcinoma [7,8]. Furthermore, molecular signatures may also be utilized to inform biological interpretations. Molecular signatures are collections of genes with associated biological processes that can identify genes upregulated in specific sample subsets when compared to broader groups [9]. Signatures can be composed of genes associated with specific diseases; for instance, breast cancer molecular signatures have identified subphenotypes indistinguishable by traditional histologic analysis [9,10]. Molecular signatures may be further curated to develop a 'hallmark' gene set conveying a specific biological state. One such example is the hallmark inflammatory response gene set that includes 200 genes commonly expressed in the setting of inflammation. [11] Inflammation is of importance in the setting of cancer because chronic inflammation has been shown to increase cancer risk [12,13] by causing tumor initiation, promotion, and metastatic progression [14]. Many environmental causes of cancer are related to chronic inflammation. As many as 20% of cancers are associated with chronic infection, 30% with tobacco smoking and inhaled pollutants such as asbestos, and 35% with dietary factors [15][16][17][18]. Chronic disease exposing patients to inflammation are also associated with increased cancer risk; inflammatory bowel disease (i.e. ulcerative colitis and Crohn's disease) is associated with an increased risk of colon adenocarcinoma [19], chronic pancreatitis is a significant risk factor for pancreatic cancer [20], and chronic gastritis secondary to Helicobacter pylori infection is associated with the majority of gastric cancer cases [21].
Several large consortia, such as The Cancer Genome Atlas (TCGA), provide tools and data to study the molecular basis of cancer [6,22]. The purpose of our study is to understand molecular patterns related to inflammation. Although TCGA started out by collecting only three cancer types-glioblastoma multiforme, lung, and ovarian cancers-it expanded rapidly; by 2014, genomic characterization and sequence analysis had been completed for 33 cancer types with data for over 12,000 individuals [22].
Signature visualization of individual samples allows identification of patient subcategories a priori on the basis of well-defined molecular signatures [9]. As such, data from TCGA could potentially be utilized to obtain and evaluate molecular signatures. To overcome limitations of existing tools to evaluate molecular signatures, the Signature Visualization Tool (SaVanT) was previously developed as a web-based tool to visualize signatures in user-generated expression profiles [9]. SaVanT has been utilized to distinguish signature scores in patients with various conditions such as infections and leukemia, providing insight into immune response of various skin diseases [9]. By visualizing molecular signatures, SaVanT allows users to efficiently Gilead Pharmaceuticals. Gilead Pharmaceuticals provided support in the form of salary for author DL, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific role of this author is articulated in the 'author contributions' section."

Competing interests:
We have the following interests: David Lopez is employed by Gilead Pharmaceuticals. There are no patents, products in development or marketed products to declare. This does not alter our adherence to all the PLOS ONE policies on sharing data and materials.
leverage pre-existing biological knowledge (such as from TCGA) to interpret transcriptomic experiments [9]. To our knowledge, no systematic study utilizing gene signatures to evaluate tumor inflammation in TCGA has been carried out. Therefore, in this study, we aimed to use SaVanT to evaluate molecular signatures obtained from TCGA to evaluate the relationship between clinical status and inflammatory responses across multiple cancer types.

Data collection
In order to examine the role of inflammation in different cancer types, we sought to utilize a large-scale, systematically-processed dataset. We chose to analyze data from TCGA. All of the data has been processed with a uniform analysis pipeline, allowing for robust comparison across samples and tumor types. Gene expression data was retrieved from TCGA using their web-accessible data portal. In order to ensure that the data was normalized, we used their Harmonized Data Portal to access the data and did not include any datasets processed independently from the harmonized data. For all TCGA projects, we downloaded RNAseq data as normalized counts for all patients. Individual files (one per patient) were combined into a single matrix per primary site. In order to focus the evaluation of our methods, seven different tumor primary types were chosen to be utilized for analysis with clinical metadata-pancreatic adenocarcinoma (PAAD), glioblastoma multiforme (GBM), cholangiocarcinoma (CHOL), kidney renal papillary cell carcinoma (KIRP), kidney chromophobe (KICH), adrenocortical carcinoma (ACC), and uveal melanoma (UVM). We chose these seven out of the thirty-four tumor primary sites on TCGA, to obtain a range of inflammatory states estimated based on our analyses described below. Four types of metadata were retrieved for each sample: sex, age at initial cancer diagnosis, cancer stage, and vital status.

Quality control
All data retrieved from TCGA was inspected for consistency by making sure that all profiles contained the same number of genes and that patient data was not redundant or duplicated. Furthermore, the distribution of normalized counts was analyzed at both the patient level as well as primary tumor site to identify any outliers or issues with normalization. Patient-level data was averaged to determine a single value for all genes per primary site.

Comparison to molecular signatures
Molecular signatures were taken from the repository MSigDB [11] and SaVanT [9]. Described in more detail in a prior publication [9], SaVanT is a web-based tool that facilitates the sample-level visualization of molecular signatures in gene expression profiles. SaVanT combines scripts implemented in Python and R. Python scripts process the user-submitted expression matrix and compute signature scores, while R scripts perform ANOVA analyses and cluster the signature-sample matrix. After computing the signature-sample matrix and clustering, Python scripts generate the HTML output and render an interactive heatmap [9]. Several studies and efforts have sought to identify genes involved in inflammatory pathways [11,[23][24][25]. Many of these projects have produced inflammatory signatures, which catalog the genes most important in several inflammatory states. In order to determine the role of inflammation across the 7 cancer types in our analysis, we utilized the 'hallmark inflammation' signature from MsigDB [11], a repository of molecular signatures. This signature includes 200 genes associated with acute and chronic inflammation responses, as well as elements of the TGF-β signaling cascade [26].

Data and statistical analysis
Averaged gene expression data from TCGA for each primary site was correlated with the hallmark inflammatory response signature through SaVant. Seven primary tumor sites were chosen as aforementioned. For each primary tumor site, each patient's clinical metadata was compared with the corresponding p-value of correlation between their gene expression and hallmark inflammatory response. Metadata and corresponding statistical tests were utilized as follows-Age (Pearson correlation), Cancer Stage (Anova Single Value), Sex (Anova Single Value), and Patient Vital Status (Anova Single Value). Statistical significance was set at p < 0.05. Utilizing this data, box-whisker plots were created for metadata and inflammatory response correlations that had significant P-values. Box and whisker plots show distribution of p-values from correlation of individual patient data with hallmark inflammatory response.

Tumor types
Hierarchical clustering was performed to group cancer subtypes by inflammatory signature scores, and the three subgroups were determined by the dendrogram structure resulting from the hierarchical clustering (Fig 1). Of the tumor types evaluated in our study, we found that tumors in areas exposed to airways or gastrointestinal tracts, including pancreatic, lung and esophageal cancer, tended to be more inflammatory. Based on these results we selected seven tumor types with varying levels of inflammation for further analysis (Fig 2). Of the 7 tumors chosen for analysis, the levels of inflammation in descending order are summarized in Table 1 along with the number of individuals' genetic data analyzed.

Correlation of metadata with hallmark inflammatory response signature correlation
The full list of the 200 genes composing the Hallmark Inflammatory Response Signature, along with the hallmark annotations are published in full online [26]. Furthermore, the website provides the original gene sets utilized to generate this hallmark [26]. Based on a statistical analysis of the metadata and its association with the hallmark inflammatory signature score, we were able to determine significant associations between inflammation and other clinical characteristics across specific primary cancers. Out of the 7 primary cancers, 4 had statistically significant association (p-value < 0.05) with at least one of the clinical values that we tested (sex, vital status, age at initial diagnosis, and tumor stage) as follows: pancreatic adenocarcinoma (PAAD), cholangiocarcinoma (CHOL), kidney chromophobe (KICH), and uveal melanoma (UVM).

Pancreas adenocarcinoma (PAAD)
After associating the metadata of 181 patients with the hallmark inflammatory response signature in pancreatic adenocarcinoma samples, we found a significant association of the hallmark inflammatory response signature with sex (p = 0.0313) and tumor stage (p = 0.0054) (Fig 3). There was a slightly higher level of inflammation in females than males. Of all tumor stages, stage II showed the highest level of the Hallmark Inflammatory Response, and stage I the lowest.

Cholangiocarcinoma (CHOL)
We found a significant p-value of 0.0496 (Fig 4) for the association of cholangiocarcinoma inflammatory score in 45 patients with the vital status of patients, a clinical data element categorized under diagnosis on TCGA. Patients who were alive at the time of last update in data collection had higher levels of inflammation than patients whose biopsy was collected post mortem.

Kidney chromophobe (KICH)
Using the signatures and metadata of 89 patients, we found a significant p-value = 0.0172 ( Fig  5) between tumor stage and the hallmark inflammatory response. Stage IV tumors showed the highest levels of inflammation, compared to the other 3 stages.

Uveal melanoma (UVM)
After associating the vital status with the hallmark inflammatory response signature correlation of 80 patients, we found a significant p-value of 0.0033 (Fig 6). Overall, samples from patients who were listed as dead at the time of last data collection had higher levels of inflammation compared to those collected from patients who were alive at time of last data collection.

Discussion
Utilizing the TCGA database allows us to leverage the systematic profiling of thousands of tumors from individuals with different types of cancer. The first aim in our study was to evaluate levels of inflammation across tumor types utilizing appropriate signature scores. We used inflammation signatures to analyze the gene expression data ("hallmark inflammation") [11,[23][24][25]. These signatures allow us to classify the cancer subtypes at an immunological level, which is not possible with traditional classification schemes relying on histological data. Such a classification technique allows us to examine individual pathways and signaling cascades, particularly those important in inflammatory responses.
Once we compared the patient data to the inflammatory signatures, we found three distinct groups of cancers: (1) those with high inflammation, (2) those with low inflammation, and (3) those with both high and low levels of inflammation. We found PAAD to be a member of the high inflammation group. This grouping is supported by multiple studies associating pancreatic inflammation (pancreatitis) with the development of pancreatic cancer [20,27]. One of the cancers we found to be in the low inflammation group was UVM. Melanomas are associated with environmental insults, such as exposure to ultraviolet light. As such, we expect that

PLOS ONE
inflammation is not necessarily involved in the mechanism responsible for the development of melanoma. We believe that the gene expression data for these tumor types is heterogeneous across individuals, with multiple subgroups of patients per type. As such, the inflammatory signature presented in this first analysis is averaged across individuals, and that within these broad categories there may be subgroups of patients with high inflammation and others with low inflammation. This suggests that patients with these cancers could potentially benefit from further molecular subclassification.
In addition to correlating levels of inflammation with specific cancer types, we compared clinical metadata from individuals with 7 different types of distinct tumors with molecular signatures derived from the web-based tool SaVanT. Molecular signatures are gene collections with associated biological interpretations that can identify genes upregulated in specific sample subsets compared to broader groups [9]. Signatures can be composed of genes associated with specific diseases. By performing a comparison of metadata with molecular signatures, we sought to evaluate if there was significant correlation between these values.
We found that four of the cancer types we evaluated (PAAD, CHOL, KICH, and UVM) had statistically significant associations between hallmark inflammatory response and at least one clinical variable. PAAD and KICH had a significant association with the patients' cancer stage at time of diagnosis, and CHOL and UVM had an association with vital status. Additionally, PAAD was significantly associated with sex. On average, females and individuals with stage II PAAD had the highest correlation between the clinical variable and hallmark inflammatory response. While for KICH, the highest average correlation was for individuals with stage IV cancer. Within each cancer type, living individuals (at time of last data collection) with CHOL and dead patients (at time of last data collection) with UVM had the highest average correlation with hallmark inflammatory response. However, the correlation for both living and dead vital status individuals was higher for CHOL than UVM.
By correlating inflammatory response hallmarks with patient metadata, our results indicate that there are statistically significant associations as detailed above. Our results from our preliminary investigation indicate a potential linkage between certain molecular subtypes of cancer and patients of different sex. Furthermore, the statistically significant linkage between hallmark inflammatory response and cancer stage for PAAD and KICH are suggestive of a linkage between certain tumor subtypes and aggressiveness of disease. The linkage between hallmark inflammatory response and vital status for CHOL and UVM was statistically significant. However, it is possible that certain subtypes of cancers could be associated with mortality rates of different time frames. Thus at a minimum, our results from our preliminary investigation indicate a potential linkage between the molecular composition of tumors and the clinical characteristics of the corresponding patients.
The signatures provided by SaVanT supplement MSigDB utilize the depth and specificity of large expression studies to describe the biology pertaining to various cancers and cell types. The availability of this information for patients with cancer diagnoses could potentially facilitate a deeper understanding of a patient's clinical status. Furthermore, as there is marked heterogeneity even amongst specific organ-based tumors (Fig 2), molecular signatures could provide valuable information regarding the patient's specific subtype of tumor. While our results are interesting and potentially provide further insight into the behavior of these tumors, we believe that subsequent more comprehensive analyses are required to draw more conclusive results as well as to enhance the clinical utility of these analyses. This information could be of particular clinical relevance in assisting the selection of a potential targeted therapy (while avoiding treatments that may have less efficacy for a patient's tumor subtype).
In order to expand upon our preliminary results, in future studies we aim to further evaluate the role of inflammation in cancer. Future studies should evaluate the relationship between additional tumor types with an expanded set of clinical variables. While we have shown three distinct groups of cancer types relative to inflammation levels, we also believe these results can be further honed and expanded. For example, limiting the number of genes in the signatures or creating a cancer-specific inflammatory response panel of genes could produce a more costeffective diagnostic test to ultimately translate to the clinical setting.
In addition, although many cancer types fall into the high or low inflammation classifications, there are others with a mixed inflammation signal. The ambiguity in this group of cancer subtypes could arise from several sources and correcting for these sources may allow us to place these cancers into either the high-or low-inflammation group. For example, the mixed signal could be due to the need to subclassify patients even further for a particular cancer type. It is possible that some primary sites contain several populations of samples-such as those from a different biopsy type (i.e., blood or tumor sample). Determining these subgroups within the primary types would allow them to be treated independently.
In summary, our study evaluated the association between inflammation signatures for various different tumor types. We found associations between levels of inflammation and tumor types, and also found statistically significant relationships between patient metadata and inflammation for four tumor types. We believe our results demonstrate the potential clinical utility in the continued establishment of personalized medicine and care for cancer patients, while further establishing the utility of SaVanT as a clinical tool.