Transcriptional and immunohistological assessment of immune infiltration in pancreatic cancer

Pancreatic adenocarcinoma is characterized by a complex tumor environment with a wide diversity of infiltrating stromal and immune cell types that impact the tumor response to conventional treatments. However, even in this poorly responsive tumor the extent of T cell infiltration as determined by quantitative immunohistology is a candidate prognostic factor for patient outcome. As such, even more comprehensive immunophenotyping of the tumor environment, such as immune cell type deconvolution via inference models based on gene expression profiling, holds significant promise. We hypothesized that RNA-Seq can provide a comprehensive alternative to quantitative immunohistology for immunophenotyping pancreatic cancer. We performed RNA-Seq on a prospective cohort of pancreatic tumor specimens and compared multiple approaches for gene expression-based immunophenotyping analysis compared to quantitative immunohistology. Our analyses demonstrated that while gene expression analyses provide additional information on the complexity of the tumor immune environment, they are limited in sensitivity by the low overall immune infiltrate in pancreatic cancer. As an alternative approach, we identified a set of genes that were enriched in highly T cell infiltrated pancreatic tumors, and demonstrate that these can identify patients with improved outcome in a reference population. These data demonstrate that the poor immune infiltrate in pancreatic cancer can present problems for analyses that use gene expression-based tools; however, there remains enormous potential in using these approaches to understand the relationships between diverse patterns of infiltrating cells and their impact on patient treatment outcomes.


Introduction
Pancreatic cancer is commonly characterized by extensive desmoplastic stroma and an environment that is poorly supportive of adaptive immune responses, yet like many other cancers, the degree of T cell infiltrate in pancreatic tumors is correlated with patient outcome [1][2][3][4]. T cells in pancreatic tumors face an array of suppressive mechanisms that can limit their ability to control tumors, and it would be beneficial to understand the relationship between T cell infiltration and the presence of other immune populations that positively or negatively regulate immune responses. For this reason, there is significant effort in the field to understand and manipulate the complex immune environment of tumors.
Quantitative immunohistochemistry (IHC) has long represented the gold standard by which tumor infiltrating immune populations can be assessed, and recent advances in multispectral IHC combined with automated image analysis have made possible an unprecedented ability to map out the immune environment of tumors. However, these approaches are limited by the availability and quality of antibodies, and complex multispectral panels require extensive validation to confirm the specificity and selectivity of binding. Recently, multiple groups have shown that the quantity of a diverse array of infiltrating immune cell types in a specimen can be inferred based on characteristic gene expression patterns unique to or enriched in specific cell types [5,6]. In theory, a single RNA sequencing (RNA-Seq) analysis of preserved tissue can provide an assessment of immune cell infiltration as well as other information such as the cytokine and chemokine balance that may be regulating cell entry and retention in the tissue, together with candidate features of the cancer cells that orchestrate this environment. The addition of simultaneous whole exome sequencing can permit comprehensive profiling of cancer driver mutations, immune targetable mutations, as well as a personalized understanding of the patient's immune profile [7,8]. However, it remains unclear whether IHC and gene expression-based immune assessment approaches are highly concordant. For example, the utility of RNA-Seq in tumor profiling can be limited by a range of unique factors including degradation of transcripts in excised human tissues and by common tumor preservatives (e.g. formalin) and the ability to detect low-abundant transcripts. The latter is of particular concern in pancreatic cancer, which can have a relatively low infiltration of critical cell types such as CD8 T cells.
Thus far we are not aware of any studies that have directly compared IHC quantification of immune infiltration to RNA-Seq-based analyses in pancreatic cancer. In this study we aim to directly compare conventional IHC and gene expression-based approaches to characterize the immune environment of pancreatic cancer. We hypothesize that RNA-Seq analysis can provide a comprehensive alternative to quantitative IHC for immunophenotyping pancreatic cancer. We performed RNA-Seq on a prospective cohort of 39 pancreatic adenocarcinoma patient tumors with matched quantitative IHC, and evaluated approaches to quantify infiltrating immune cells using gene expression data. We found limited agreement between IHC and RNA-Seq analysis of infiltrating cells, however concordance was greatest when multiple cell types were aggregated to identify a mixed population, such as CD3 + T cells from a combination of CD4 + , CD8 + , and other cell types that express CD3. This aggregation may overcome the limitation of low T cell-derived RNA transcripts in poorly-immune infiltrated tumors. As an alternative, we identified gene signatures that were enriched in highly T cell infiltrated tumors that are associated with increased disease-free survival in other patient cohorts. These data demonstrate that immune infiltration remains an important predictor of outcome in pancreatic cancer patients, and that RNA analysis can provide an important addition to IHC data to understand the complexities of the immune environment that influence patient outcome.

Quantitative immunohistology for infiltrating immune cells in pancreatic cancer
We conducted a prospective cohort study of resectable pancreatic masses to determine which immunologic parameters have prognostic value. All procedures were approved under Providence Portland Medical Center Institutional Review Board, with approval number IRB 10-037, and patients provided written informed consent. We restricted our analysis to adult patients who underwent surgical resection for pancreatic masses. Patients were recruited from June 2010 to November 2014 at Providence Portland Medical Center in Portland, OR, where the research was conducted. Inclusion criteria included patients 18 years or older who had a diagnosis of a pancreatic or ampullary mass who were scheduled for surgical resection. Patients had to be able to give informed consent and could not have a diagnosis of a prior malignancy unless they were disease free for 10 years. We included patients who were subsequently determined to have other histologies. Demographics and survival of these patients are outlined in Table 1. Prior studies on this cohort had identified a positive correlation between CD3+ T cell infiltrate and overall survival by Multivariate Cox modeling and univariate analysis [1], so this sample set was applied for additional genomic analysis. Additional power calculations were not performed. Tumor infiltrating immune cells were quantified by immunohistochemistry and quantitative digital image analysis for CD3 + , CD68 + , and CD8 + cells as previously described [1]. Infiltrating cells were quantified from whole slide digital Table 1. Demographics of patients on the study.

Characteristic Data
Pancreatic Adenocarcinoma, n 75 images scanned at 20x resolution (Leica SCN400). Regions of interest were defined with Pathologist guidance using Definiens Tissue Studio (Definiens Inc), and the automated algorithm used the immunohistology staining combined with nuclear counterstain to count total cells and positive cells to report a marker positive cell density per mm 2 tissue for each patient. Primary outcome was overall survival.

RNASeq analysis of pancreatic cancer
Of the patients above with pancreatic masses, we randomly selected 39 patients with pathologically diagnosed pancreatic adenocarcinoma with no neoadjuvant treatment and with matched quantitative IHC for RNASeq. All subsequent analyses of IHC and RNASeq were performed on unpretreated patients. A representative Hematoxylin and Eosin (H&E) stained slide for each formalin-fixed paraffin-embedded (FFPE) tissue block specimen was reviewed by a board-certified pathologist for tumor content and tumor-rich regions were identified for microdissection. Blocks were matched but not in series with IHC sections. 5 μm thick unstained sections on glass slides were processed for DNA and RNA purification by the Providence Molecular Genomics Laboratory. The FFPE tissue sections were deparaffinized using Envirene (Hardy Diagnostics) followed by RNA extraction and purification using the Qiagen AllPrep DNA/RNA FFPE kit. 85ng of input RNA was used to prepare sequencing libraries using the Illumina TruSeq RNA Exome kit. Sequencing of the RNA Exome libraries was performed on the Illumina HiSeq 4000 instrument at 2 x 75 read paired end configuration. Transcripts were quantified using salmon-v.0.11.2 [9]. A matrix of gene expression values for all patients analyzed in this study along with matched quantitative IHC are provided as a S1 Table.

Computational analysis of infiltrating cells and comparison of techniques
RNA-Seq-based cell type deconvolution was performed using xCELL (5) and CIBERSORT (6), with TPM gene expression levels as input. xCELL was used to perform cell type enrichment analysis from gene expression data for 64 immune and stroma cell types, whereas CIBERSORT provides absolute and relative abundance of different immune cell types depending on the specified gene set. In the present analysis, the LM22 signature provided by CIBERSORT was applied. We also applied EPIC (10) and MCPcounter (11) to estimate the abundance of immune cell population. EPIC estimates the proportions of Immune and Cancer cells by using RNA-Seq-based gene expression reference profiles from immune cells and other nonmalignant cell types found in tumors. MCPcounter quantifies abundance of tissue-infiltrating eight immune and two stromal cell populations based on transcriptome profile.
Initial clustering of patients based on infiltrating cell types was performed using ClusterVis [10]. Principal components are calculated as described in (6) using ClusterVis (7). Missing data is assigned using Singular Value Decomposition with imputation iteratively until estimates of missing values converge. Statistical significance of the resulting hierarchical clusters were assessed using the sigclust2 R package [11]. Correlation between infiltrating cell types calculated using RNA-Seq versus quantitative IHC was performed by generating a composite of T cell and macrophage cell types determined by RNA-Seq for direct comparison to IHC populations (Table 2). Correlations between xCELL immune cell type enrichment and quantitative IHC, as well as between CIBERSORT cell type abundances and quantitative IHC, were determined by Spearman Rank correlation.

Identification of genes associated with high and low T cell infiltration
Patients were categorized into high and low infiltration of CD3 + and CD8 + T cells according to quantitative IHC and sub-categorized into those with high infiltration of both populations (CD3 HI CD8 HI ) versus low infiltration of both populations (CD3 LO CD8 LO ). Using RNA-Seq data from these patients, gene expression analyses were performed using a univariate two-sample T-test with a stringent false-positive threshold to identify genes significantly differentially expressed (p<0.001) in our patients [12,13]. These genes were mapped to known pathways using the Reactome Functional Interaction network tool [14]. These genes were tested on the TCGA Pancreatic Adenocarcinoma PanCancer Atlas [15] as a validation cohort using cBioportal [16,17] where mRNA expression z-scores are compared to the expression distribution of each gene in tumors that are diploid for this gene. To be classified as enriched for the gene score the sample must have at least one gene that is 2 log fold overexpressed. Survival and expression data exported to Graphpad Prism for survival comparison using log-rank tests. Pre-calculated xCELL analysis of patients in the TCGA database were obtained from http:// xcell.ucsf.edu.

Statistical methods
Survival data were analyzed using Prism (Version 8.4.2, GraphPad Software, La Jolla, CA). Overall survival of groups was compared using a log rank test for differences in Kaplan-Meier survival curves. All cutoffs for high/low infiltration by RNA analysis use median values. Gene expression analyses were performed using a univariate two-sample T-test with a stringent false-positive threshold to identify genes significantly differentially expressed (p<0.001) [12,13]. Correlations between immune cell type enrichment and quantitative IHC were determined by Spearman Rank correlation. To assess the statistical significance of correlation between RNA-Seq-based cell type deconvolution and CD3, CD8 or CD68 immune cell types determined by quantitative IHC, the composition of underlying cell types specified in Table 2 were randomly permuted. The number of cells that constitute a cell type were kept the same between the true set and permuted set. Owing to the number of cell types estimated by xCELL (64 different cell type populations) and CIBERSORT (22 different cell type populations), we conducted 100 and 20 different random assignments of cell types, respectively, attributed to the CD3, CD8 and CD68 IHC populations in Table 2. For each permutation, Spearman rank correlation was computed between the random cell type assignments and quantitative IHC, allowing for a level of statistical significance to be estimated for the true set with respect to all permuted sets, thereby indicating whether the RNA-Seq-based cell type deconvolution methods are statistically significantly correlated with quantitative IHC. Additional correlations between multiple variables were analyzed using Prism (Version 8.4.2) to calculate a Pearson correlation coefficient. Statistical significance of hierarchical clusters were assessed using the sigclust2 R package [11]. Differential gene expression analysis between CD3 Hi CD8 HI vs CD3 Lo CD8 Lo RNA-Seq samples were carried out using DESeq2 [18]. We identified differentially expressed genes, ranked them based on fold-change and p-value. This gene signature was further used for gene set enrichment.

Results
We previously demonstrated that increased CD3 + T cell infiltrates in surgically resected pancreatic adenocarcinoma correlate with improved outcomes using a Cox proportional hazards model, and remained prognostic by multivariable analysis [1]. By contrast, CD8 + T cell infiltrates and CD68 + macrophage infiltrates did not correlate with outcome [1]. However, using categorical variables of high or low CD3 + , CD8 + and CD68 + cell numbers in tumors, we were able to identify cutoffs that demonstrated that patients with high CD3 + T cells or high CD8 + T cells exhibited improved survival, but again CD68 + macrophages did not correlate with survival (Fig 1a). To evaluate the complexity of the infiltrate in patient tumors, we performed hierarchical cluster analysis. Initially, we used our broader dataset that included pancreatic ductal adenocarcinoma (PDA) with and without neoadjuvant treatment, as well as benign pancreatic masses, pre-malignant disease, neuroendocrine tumors, as well as a small number of duodenal adenocarcinoma and gallbladder adenocarcinoma. Principal component analysis was not able to distinguish these pathologies based on CD3 + , CD8 + and CD68 + cell infiltrate (Fig 1bi), and while cluster analysis tended to group benign and premalignant disease in poorly infiltrated groups, there was not a clear classifier to distinguish PDA from other related pathologies (Fig 1bii). Prior studies have demonstrated that patients with the highest macrophage proportions and lowest CD8 + T cell proportions exhibit worse outcome than those with lowest macrophage proportions and highest T cell proportions [19]. We calculated the correlation between CD3 + , CD8 + and CD68 + cell infiltrate in patients with all pathologies and those with PDA and found good correlation between CD3 + and CD8 + infiltrates, and poor, but not negative, correlation with T cells and CD68 + cell infiltrate (Fig 1c). To determine whether the degree of CD68 + cell infiltrate impacted outcome for patients with high or low T cell infiltrates, we tested the effect of CD68 + cell infiltrate with each T cell cutoff. For patients with PDA we did not find an effect of macrophage infiltration on the survival benefit of CD3 + T cells (Fig  1di and 1dii) or CD8 + T cells (Fig 1diii and 1div). These data demonstrate that quantitative immunohistology was able to identify good and poor outcome groups based on CD3 + and CD8 + T cell infiltrate, but analyzing the degree of CD68 + macrophage infiltrate alone or in combination with T cell infiltrates did not help refine outcome groups.
There are significant limitations in the use of CD68 as a sole marker of macrophages in tumors [20], particularly in view of the diverging phenotypes macrophages can generate. There is not a well-defined set of markers that is unique to distinct macrophage polarization states that do not overlap with other cell types. Recently, a number of algorithms have been developed that can analyze gene expression data to estimate the prevalence of the broad range of cell types in a mixed tissue sample [5,6,21]. To evaluate whether gene expression analysis could refine our understanding of the immune environment of pancreatic cancer we tested two different approaches, CIBERSORT [6] and xCELL [5]. CIBERSORT has been most widely applied, and provides an assessment of 22 of the most common immune cell types and some information on the differentiation of CD4 + T cells and macrophages [6]. We performed RNA-Seq on a subset of our PDA patients with quantitative IHC, and performed CIBERSORT analysis of immune infiltration using the RNA-Seq data. We used a correlation analysis to determine whether some cell types were co-regulated in the tumor, but there was little evidence of correlation between the infiltration of different immune cell types in the tumor Cutoffs used were approximately CD3 -75 th percentile; CD8 -median; CD68 -no significant cutoff found, 75 th percentile shown. b) Infiltrating CD8 + , CD3 + , and CD68 + cells across a range of pathologies was used to evaluate principal component analysis and clustering. i) Unit variance scaling is applied to rows; SVD with imputation is used to calculate principal components. X and Y axis show principal component 1 and principal component 2 that explain 56.7% and 30.5% of the total variance, respectively. Prediction ellipses are such that with probability 0.95, a new observation from the same group will fall inside the ellipse. N = 123 data points. ii) Clustering of patients according to infiltrates. Imputation is used for missing value estimation. Both rows and columns are clustered using Manhattan distance and complete linkage. 3 rows, 123 columns. c) Pearson correlation coefficients for CD8, CD3 and CD68 infiltrating cells in i) all (S1 Fig). We performed cluster analysis on patients based on their immune infiltrates calculated by CIBERSORT (Fig 2ai), which appeared to identify a diffuse cluster of patients with higher numbers of CD8 T cells and dendritic cells, but there was no strong statistical association between these cell types and this analysis was not able to identify patient groupings with significant differences in overall survival (not shown). To directly compare these CIBERSORT calculated infiltrating cell types to the quantified immunohistology from the same samples, we made three combined groups ( Table 2): 1. CD3 + equivalent based on cell types that express CD3 (Treg+CD4 populations+CD8+gd T cells); CD8 + equivalent (CD8); and CD68 + macrophage equivalent (MO+M1+M2). We then evaluated the correlation between infiltration determined by CIBERSORT analysis of RNA-Seq versus quantified immunohistology. Both CD3 + and CD8 + T cells showed a weakly positive correlation between the IHC and CIBER-SORT assessments, but CD68 did not correlate well between histology and CIBERSORT ( Fig  2b). Spearman's rho computed between IHC and CIBERSORT relative cell type abundance were 0.355, 0.308, 0.144 for CD3, CD8 and CD68 respectively. Notably, we see a number of patients with no detectable infiltrating T cells or macrophages by CIBERSORT, who did have detectable cells by histology. Since CIBERSORT is dependent on key RNA transcripts being present amongst the RNA sequenced, we believe that at low cell infiltration this approach can struggle to identify the RNA signature of rare infiltrating cells. To determine whether CIBER-SORT infiltration of these key cell types predicted outcome, we similarly stratified patients into high or low infiltration groups. We found that patients with high combined CD3 + equivalent scores exhibited improved overall survival (p<0.05), but the CD8 and macrophage scores were not able to discriminate patients with significantly different overall survival (Fig 2c). These suggest that while CIBERSORT analysis can provide additional information on the diversity of immune cells in tumors, it does not improve our ability to predict outcome over quantitative histology. Since there is a general agreement between the assessment of total CD3 infiltrate by histology and CIBERSORT and each are associated with improved outcome in patients, aggregating CIBERSORT T cell infiltration could be further tested as a prognostic factor in pancreatic cancer patients.
As an alternative approach xCELL can identify 64 different cell populations and composite infiltration scores from RNA-Seq data [5]. We performed xCELL analysis of immune infiltration in pancreatic cancer and generated a correlation matrix to examine associations between different cell types. Clear patterns emerged, with some tight clusters based around epithelial cells or Th2 cells, and more broad groupings of co-regulated cells including Th1 cells, DC, M1 macrophages and CD8 T cells (S2 Fig). To determine whether these co-resident cells identified unique patient populations, we clustered patients based on their immune infiltrate, identifying patients with higher levels of CD8 T cells, DC, and M1 macrophages (Cluster A), and those with higher levels of fibroblasts and endothelial cells to form a distinct cluster (Cluster B) (Fig  3a). Comparing the overall survival of patients in each cluster demonstrated there were no significant differences between groups (Fig 3b). In view of the high correlation between specific cell types resulting in apparent clusters, we investigated whether this was due to closely related cells having overlapping genes that are included in the underlying gene signature. Using the gene signatures that determine cell types in xCELL, we computed the percent match between genes across all cell types (S3 Fig). These data demonstrated that most cells were defined using a unique gene set. We did identify overlapping gene usage between related cell types such as pathologies; ii) PDA. d) Overall survival of PDA patients with i) high or ii) low CD3 + infiltrates, and iii) high or iv) low CD8 + infiltrates subdivided according to high or low CD68 infiltrates as determined in a). Number of patients on each arm of survival curves are shown in grey.
https://doi.org/10.1371/journal.pone.0238380.g001  (Table 2) determined by CIBERSORT compared to quantitative IHC from the same patient. Each symbol represents one patient. c) Overall survival of patients with high versus low infiltrates of i) CD3 + , ii) CD8 + , and iii) CD68 + equivalent cell populations (   (Table 2) determined by xCELL compared to quantitative IHC from the same patient. Each symbol represents one patient. c) Overall survival of patients with high versus low infiltrates of i) CD3 + , ii) CD8 + , and iii) CD68 + equivalent cell populations (Table 2) determined by median xCELL infiltration. NS = not significant. d) Overall survival of patients with high versus low i) immunescore, ii) stromascore, and iii) environment score using median values as cutoffs. https://doi.org/10.1371/journal.pone.0238380.g003

PLOS ONE
DC subtypes or T cell subtypes; however, there were no significant gene overlaps between DC and T cells, for example, that would explain their correlation. These suggest that the positive correlation between the number of T cells and DCs in patient tumors is likely a result of the presence of both cell types in the analyzed sample. To compare these calculated infiltrating cell types to quantified immunohistology, we again made three combined groups: 1. CD3 equivalent (Treg + all CD4 populations + all CD8 populations +gd T cells); CD8 equivalent (all CD8 populations); and CD68 macrophage equivalent (MO + M1 + M2) ( Table 2). We evaluated the correlation between cell infiltration assayed by xCELL analysis of RNA-Seq versus quantified immunohistology. Each population showed a positive correlation between the two approaches; however, many of the samples were calculated to have no CD8 T cells by xCELL, even in patients with relatively abundant CD8 T cells as determined by immunohistology (Fig 3b). As with CIBERSORT analysis, despite the positive correlation, the R 2 value was not strong for each cell type. Spearman's rho computed between IHC and xCELL were 0.354, 0.307 and 0.144 for CD3, CD8 and CD68 respectively. We also evaluated whether infiltration of these cell types impacted outcome, and we could not detect a cutoff that impacted overall survival (Fig 3c).
xCELL analysis does calculate three additional fields that integrate many of the infiltrating cell features to generate an "immunescore", a "microenvironment score" and a "stromascore". Such combined fields may have an advantage over individual cell types, particularly where the cells are of low abundance. Broadly, the immunescore was correlated with T cell infiltration, while the stromascore was correlated with fibroblast and endothelial cell infiltration (S3 Fig). Patients with a high immunescore exhibited improved overall survival, but the stromascore and the microenvironment score were not able to distinguish patient groups with improved outcome. These data suggest that xCELL can provide a more complex understanding of the immune cell diversity in tumors; however, there remain significant issues identifying the small numbers of T cells infiltrating pancreatic adenocarcinoma. There may be a benefit in integrating multiple immune cell types through features such as the immunescore to identify tumor environments indicative of improved outcome.
There are increasing numbers of methods to analyze cell infiltrates from RNA data. We compared the additional methods EPIC [22] and MCPcounter [23] with the same dataset. The MCPcounter assessment of T cell infiltration correlated well with the IHC CD3 infiltrate, but all other analyses had poor correlation to IHC data (S4 Fig). To understand whether there was agreement amongst the various RNA-based approaches, we analyzed the correlation between the various infiltrating T cell populations assessed by CIBERSORT, xCELL, MCPcounter, and EPIC. The correlation between the different approaches using the same RNA dataset was moderate, but the closest correlations were found among CD4 T cell populations and relatively poor correlations between CD8 T cell populations (S4 Fig). These data suggest that there are significant differences between the RNA-based approaches and each has difficulty consistently identifying CD8 T cell infiltrates in T cell poor tumors like pancreatic cancer.
To determine whether there is an alternative RNA signature of high T cell infiltration that can be used in pancreatic tumors to infer T cell infiltration and assess outcome using RNA-Seq samples, we identified genes that were enriched in tumors with both high CD3 and high CD8 infiltration or both low CD3 and low CD8 infiltration by quantitative IHC. Class comparison of gene expression identified a subset of genes that were statistically associated with highly T cell infiltrated tumors (Table 3, Fig 4a). As would be predicted, these included genes encoding for CD3 as well as genes involved in T cell signaling such as FYN and LAT. Interestingly, the gene set includes the immunotherapy target CTLA4 [24], as well as SLAMF6, which is a marker of progenitor exhausted T cells [25]. To determine whether increased expression of these genes were useful predictors of outcome in pancreatic cancer patients, we examined expression of these genes in pancreatic adenocarcinoma patients in the TCGA database [15],

PLOS ONE
and their effect on patient outcome. Initial analysis indicated that patients with increased expression of the genes positively associated with T cell infiltration in our cohort had significantly increased overall survival and disease-free survival. However, following curation of the TCGA dataset according to Peran et al. [26] to remove mischaracterized tumors from the cohort, overall survival was no longer significantly different, but disease-free survival was significantly improved (Fig 4b). These data suggest that the geneset was identifying patients with neuroendocrine tumors and the improved prognosis of these patients was influencing the overall survival results in the uncurated dataset. To understand whether the geneset was associated with increased T cell infiltration, we obtained pre-calculated xCELL analysis of infiltrating cells in these pancreatic cancer TCGA specimens (https://xcell.ucsf.edu), and examined the correlation of each cell type with the expression of genes in our panel. To find potential patterns of biological significance, we correlated the expression of the gene set with the infiltration of immune cells across the TCGA cohort and performed clustering to gather co-regulated genes and cells together. We discovered that as with our cohort, there was a strong correlation between infiltrating T cells and dendritic cells in pancreatic tumors, and these correlated closely with the expression of genes in our panel (Fig 4c). Macrophage and endothelial cell infiltration did not correlate well with any of the genes in our panel, and smooth muscle cells, keratinocytes, and epithelial cell infiltration correlated best with some of the genes that were negatively associated with T cell infiltration, such as NEBL and HSP90AB1. These data indicate that the gene signature associated with high T cell infiltration in our pancreatic cancer cohort can similarly identify high T cell infiltration in other pancreatic cancer cohorts represented in the TCGA database. Further studies are needed to understand whether these genes have functional roles and are potential therapeutic targets.

Discussion
Despite the poor overall prognosis of pancreatic cancer, patients with high numbers of infiltrating T cells as determined by quantitative immunohistology have improved outcome. To understand the complexity of the tumor immune environment, we performed RNA-Seq and evaluated gene expression-based analyses of tumor-infiltrating cells. We found that there were limitations in current gene expression analyses of infiltrating immune cells, particularly where overall infiltration was low. Both CIBERSORT and xCELL showed poor concordance with IHC, and individual immune cell types identified by gene expression analysis had limited prognostic value. This could be somewhat overcome by aggregating molecularly-identified immune populations, for example into total T cell infiltrates, which showed some correlation to quantitative immunohistology and could be predictive of outcome. To determine if we could identify alternative molecular signatures of highly infiltrated tumors, we performed class comparison of gene expression and identified a transcriptional pattern in pancreatic adenocarcinoma that had predictive value for disease-free survival in the TCGA cohort. Immunohistology with validated antibodies is the gold standard for quantitative assessment of infiltrating immune cells in cancer [27]. However, the diversity of cell types and limited number of cell-type specific markers makes it difficult to accurately assess many infiltrating cell types using histology. Flow cytometry is more able to address this diversity, using a series of gates to distinguish cell subtypes expressing multiple overlapping markers; however, this cannot be performed with archived tissues. The recent improvement in image analysis and technological improvements in multiplex staining has permitted much more complex assessment of tumors using immunohistology [28]. This is critical since many of our single markers have limitations. For example, our study used CD68 as a well-validated marker for tumor associated macrophages. However, there are significant limitations in the use of CD68 as a sole marker of macrophages in tumors [20], particularly in view of the diverging phenotypes macrophages can generate. In particular, while there may be a spectrum of macrophage phenotypes [29], the polarized M1 (classical) versus M2 (alternative) macrophage phenotype has proven useful in discriminating macrophages that support versus suppress adaptive immunity to tumors [30][31][32][33]. Therefore, the presence of macrophages does not necessarily indicate that they generate immune suppression in the tumor.
The recent development of algorithms that can analyze gene expression data to estimate the prevalence of the broad range of cell types in a mixed tissue samples, combined with the increasing affordability of comprehensive genomic profiling of patient tumors, has opened new avenues of research [21]. While gene expression data can provide a great deal of information from small quantities of tissue, there are potential issues in estimating immune cell numbers in pancreatic tumors, since the abundance of some of these populations can be very low even when they have prognostic significance. For example, xCELL gene signatures were identified using purified populations and validated on peripheral blood samples [5], which have a very different immune cell abundance when compared to tumors. CIBERSORT was shown to have superior performance to other approaches available at the time to assess immune infiltrating cells from genomic data [6]; however, performance was limited when the target immune populations represented fewer than 1% of the spiked mixture. Each approach provides valuable information on immune infiltration and may be sufficient to assess the environment of more abundantly infiltrated tumors. However, there are limitations based on the number of RNA reads provided by each infiltrating cell in a bulk population. Immunohistology has its own limitations, particularly those relating to epitope preservation through tissue processing, and the difficulties in standardizing staining over time and between institutions. In this study we set quantitative immunohistology as the gold standard for comparison; however, standard sampling issues such as selection of an appropriate archived tissue block and the relevance of a single 5μM section to the tissue as a whole can lead to inaccuracies that apply to each approach. Novel technologies are emerging that incorporate the geographic information of histology with comprehensive gene profiling, and have the potential to change how we assess the immune complexity of tumors [34]. Further analysis of patients giving discordant RNA and IHC data would be valuable to understand the impact of sampling versus other complicating factors that could explain the variations.
The strength of genomic and other omic profiling is the wealth of data that can be extracted simultaneously. Along with infiltrating cells, omic analyses can inform as to the mutational status of the tumor [8] to identify immunotherapeutic targets [35], and identify expression of inflammatory and chemokine markers that may dictate immune cell recruitment [36,37]. Such analyses would be best combined with genomic analysis approaches that subdivide patients according to novel molecular subtypes, some of which include tumor subtypes associated with higher immune infiltrates [38,39]. However, further subdivision of patients will require much larger cohorts to generate meaningful results. All of this can be performed on very small quantities of patient material, at decreasing cost and with increasing speed. While IHC-based multiplexing continues to increase the number of analytes that can be assessed on a single tissue sample, this approach depends on the availability of high-quality validated antibodies for each target. Unbiased sequencing-based approaches are to some degree futureproofed against genes that may be of interest yet do not currently have reagents for IHC or other analyses.
Further refinement of gene expression profiling for pancreatic adenocarcinoma and similar poorly infiltrated tumors could have significant benefits in personalizing immunotherapy for these recalcitrant tumors. For example, we show that CTLA4 is enriched in patients with high T cell infiltration and is part of the gene set that is associated with improved disease free survival. Antibodies targeting CTLA4 are an effective immunotherapy in some tumors [40], but single agent anti-CTLA4 is not effective in patients with locally advanced and metastatic pancreatic adenocarcinoma [41]. In preclinical models of pancreatic adenocarcinoma, we similarly found that anti-CTLA4 is ineffective as a single agent [42]. The combination of anti-CTLA4 and radiation therapy is curative, but only where the host has good pre-existing immunity to the tumor. Thus, gene expression profiling may help identify patient subsets with adequate T cell infiltration that may benefit from immunotherapy combinations, and direct other patients to novel interventions to improve their tumor immune environment prior to further treatment.

Conclusions
While immunotherapy options are currently limited for patients with pancreatic adenocarcinoma, these and other data showing an impact of immune infiltrates on patient outcomes suggests we should continue to refine our understanding of the immune environment and pursue immune therapies that are appropriate to the particular tumor environments of pancreatic cancer. At present RNASeq-based analyses must take into account the poor overall infiltrate in some tumor types to provide accurate assessments of the tumor environment. IHC CD3 + , ii) MCPcounter CD8 T cell vs IHC CD8 + , and iii) MCPcounter monocytic lineage vs IHC CD68 + cell populations from the same patient. Each symbol represents one patient. b) analysis as in a) for i) EPIC Bref CD8 T cells and ii) EPIC Tref macrophages. c) Pearson correlation matrix of infiltrating T cell subsets calculated from RNASeq data using xCELL, CIBERSORT, MCPcounter, EPIC Bref, and EPIC Tref. Rows are centered; no scaling is applied to rows. Both rows and columns are clustered using Manhattan distance and average linkage. (PDF) S1 Table. Gene-based expression levels for all patients with RNASeq analysis in the study, along with quantitative IHC. (TXT)