Investigation of Radiosensitivity Gene Signatures in Cancer Cell Lines

Intrinsic radiosensitivity is an important factor underlying radiotherapy response, but there is no method for its routine assessment in human tumours. Gene signatures are currently being derived and some were previously generated by expression profiling the NCI-60 cell line panel. It was hypothesised that focusing on more homogeneous tumour types would be a better approach. Two cell line cohorts were used derived from cervix [n = 16] and head and neck [n = 11] cancers. Radiosensitivity was measured as surviving fraction following irradiation with 2 Gy (SF2) by clonogenic assay. Differential gene expression between radiosensitive and radioresistant cell lines (SF2 median) was investigated using Affymetrix GeneChip Exon 1.0ST (cervix) or U133A Plus2 (head and neck) arrays. There were differences within cell line cohorts relating to tissue of origin reflected by expression of the stratified epithelial marker p63. Of 138 genes identified as being associated with SF2, only 2 (1.4%) were congruent between the cervix and head and neck carcinoma cell lines (MGST1 and TFPI), and these did not partition the published NCI-60 cell lines based on SF2. There was variable success in applying three published radiosensitivity signatures to our cohorts. One gene signature, originally trained on the NCI-60 cell lines, did partially separate sensitive and resistant cell lines in all three cell line datasets. The findings do not confirm our hypothesis but suggest that a common transcriptional signature can reflect the radiosensitivity of tumours of heterogeneous origins.


Introduction
Intrinsic radiosensitivity is an important factor underlying radiotherapy response [1]. Radiosensitivity can be measured as the fraction of cells surviving a single 2 Gy dose of radiation (SF2) with high values indicating radioresistance. While other methods are available to measure cellular radiosensitivity in cell lines, SF2 is considered to be the gold standard and is supported by strong clinical evidence. In vitro measurements of SF2 correlate with in vivo radioresponse in mouse models [2]. Measurement of SF2 in primary human tumours was an independent prognostic factor in patients with carcinoma of the cervix [3] and head and neck [4] following potentially curative radiotherapy. Despite the evidence for its importance, no method is available for its routine assessment in patients, due to the impracticalities of measuring tumour radiosensitivity. The ability to measure a tumour's radiosensitivity would be a major advance and allow individualised treatment to reduce dose and/or omit chemotherapy in patients with sensitive tumours or conversely to intensify treatment against resistant tumours. Treatment individualisation should increase survival and reduce morbidity. Estimates suggest a biologically individualised approach to treatment based on radiosensitivity testing could increase survival rates by .10% [5].
Consequently there is interest in deriving a gene signature that reflects radiosensitivity. Several methods have been explored: identifying genes induced following irradiation in cell lines [6]; identifying differential expression between induced radioresistant and parental radiosensitive cancer cell lines [7] and profiling the in vitro response of cervix tumours to irradiation [8]. Most published studies were small and have not been independently validated. The most comprehensive studies used the NCI-60 panel of cell lines [9]. One study identified 22 genes that together discriminated between low and high SF2 values in 63 cell lines, based on a threshold of 0.2 (i.e. cell lines with less than 20% colony survival following 2 Gy defined as radiosensitive) [10]. Another series of studies developed a predictive classifier of radiosensitivity based on SF2 associated gene expression profiles in the NCI-60 lines [11,12,13,14]. The endpoint of these studies was a regression model of 10-hub genes, which had prognostic significance when applied to three clinical datasets (rectal, oesophageal and head & neck cancers) [13] and was also predictive of benefit from radiotherapy in breast cancer [15]. Additionally a meta-analysis of published data from four microarray platforms for NCI-60 cells identified a 31 gene radiosensitivity signature [16].
The NCI-60 panel is the most extensively characterised set of cancer cell lines and a public resource that is frequently used as a screening tool for drug discovery [9]. The panel contains cell lines from multiple tissues of origin but few radiobiologically relevant tumour types such as cervix (n = 0) or head and neck (n = 0), i.e., cancers where radiotherapy is an important part of treatment. It is well known that tumours derived from different tissues vary in radiosensitivities; with haematological malignancies being sensitive, and glioblastoma and melanomas the most radioresistant [17]. Studies show that basal gene expression levels correlate strongly with tissue of origin, particularly between haematological and solid tumours [10]. As such, considerable variation and noise is present in the NCI-60 'basal' gene expression data, potentially hampering the identification of genes associated with SF2. The transcription factor P63 is a marker of squamous cell origin and regulates many genes associated with epidermoid/squamous cell fate. Loss of p63 is associated with the up-regulation of genes associated with a more mesenchymal/migratory cell fate [18].
It was hypothesised that deriving a radiosensitivity signature using a more homogeneous group of cell lines would be a better approach. We obtained 16 cervical carcinoma cell lines, a tumour type where radiotherapy is important but that is not represented in the NCI-60 panel. The cells were characterised in tightly controlled basal conditions; parameters measured included SF2, protein expression by reverse-phase protein array (ZeptoMARK) and gene expression by Affymetrix Exon 1.0ST array. We attempted to identify genes that were differentially expressed between high and low SF2 cell lines in a single homogeneous tumour type. We had access to a second independent radiobiologically-relevant head and neck squamous cell carcinoma (HNSCC) cell line cohort (n = 11) to validate our findings and those derived from the publically available NCI-60 data.

Cell Lines
Fourteen commercially available cervical carcinoma cell lines were obtained from the American Type Culture Collection (ATCC) or the Japanese Collection of Research Bioresources (JCRB). Two other cell lines (778 and 808) were derived in house [19]. All cervix cell lines were cultured in identical conditions: 4.5 g/l glucose DMEM plus Glutamax (Life Technologies, Paisley, UK), supplemented with 10% foetal calf serum (FCS) (Lot: A04305-0160, PAA Laboratories (Yeovil, UK)) and kept in a humidified incubator. Eleven head and neck cell lines were cultured as described in Table S1. All cell lines underwent STR authentication and were mycoplasma free.

Clonogenic Assays
The method is described elsewhere [20]. Briefly, exponentially growing cells were trypsinised and irradiated with 0-10 Gy at room temperature using an X-ray unit at a dose-rate of 1.37 Gy/ min. Following plating and 2-3 weeks growth, the colonies formed were stained with crystal violet and those with .50 cells scored. Each experiment involved a minimum of three but usually six technical replicates and experiments were repeated two (n = 4) or

HPV Genotyping
The HPV genotyping of these cervical carcinoma cell lines was described previously [21]. For head and neck carcinoma cell lines qRT-PCR for E2, E6 and E7 for HPV16 and HPV18 was performed as described previously [22].

MTT Assay
Doubling time was estimated for each cell line using the CellTiter 96 Aqueous Non-radioactive cell proliferation assay (Promega, Madison, WI, USA) as per manufacturer's 'overnight' protocol. A standard 7-day growth curve was performed in 96-well plates. Colorimetric readings were taken at 570 nm and compared, by exponential regression to a standard curve of known cell density. An average of three independent replicates at different densities was used to calculate the mean doubling time.

RNA Extraction
Cells were washed in PBS and snap-frozen in liquid nitrogen. RNA was extracted and DNase treated using the Qiagen RNeasy Kit (Qiagen, UK), as per manufacturer's instructions. RNA integrity (RIN) and quantification were measured using a Bioanalyser (Agilent Technologies Ltd, Santa Clara, CA, USA). 260/230 and 260/280 ratios were assessed using a Nanodrop 1000 Spectrophotometer (Thermo Scientific, Wilmington, DE, USA).

Western Blotting
The p63 protein status of the cervix carcinoma cell lines was described previously [21]. Using the same methods Western blotting was performed on the head and neck cell lines, using the following antibodies: p63 mouse monoclonal (BC4A4) (Abcam, Cambridge, UK) and anti-b-Actin mouse monoclonal (Clone AC-15) (Sigma-Aldrich, Dorset, UK).

ZeptoMARK Reverse-phase Protein Arrays
Exponentially growing cells were washed with PBS, lysed in 75 ml of CLB1 lysis buffer (Zeptosens: a Division of Bayer (Schweiz) AG, Switzerland), scraped into microfuge tubes, vortexed and incubated at room temperature for 30 minutes. Samples were centrifuged at 15,000 rpm at room temperature, supernatants collected and concentrations determined by Bradford assay. The spotting procedure has been described before [23]. Briefly, cervix carcinoma protein lysates were standardised to 2 mg/ml, from which four concentrations (0.20, 0.15, 0.10 and 0.05 mg/ml) were spotted, in duplicate onto a ZeptoMARK hydrophobic chip (Zeptosens). Each cell line was independently grown and harvested on two occasions; consequently two biological replicates were spotted onto the array. Chips were blocked with CeLyA buffer (Zeptosens), before incubation with primary antibodies for 22 hours at 20uC. Twenty-four antibodies (Zeptosens) were selected based on their role in cancer or therapy resistance [24]. After incubation excess primary antibody was removed and a fluorescently-labelled species-specific antibody hybridised for 2.5 hours at 20uC. After washing, arrays were read on a ZeptoREADER (l ex /l em = 635/670 nm). The resulting relative fluorescent intensity (RFI) was calculated from a standard curve constructed from the four concentrations (in duplicate). This is a quantitative protein measurement. Values displayed are the mean of two biological replicates (i.e. 4 standard curves).

Exon Array Data Analysis
Microarray data were normalised using RMA [25]. The R/ BioConductor package annmap and the annmap database [26] were used to remove non-exonic and multi-targeting probesets. Array performance was measured as the percentage of probesets flagged as ''present'' with a conservative cut-off (%Detection Above BackGround [%DABG] P,0.01) and only those probesets flagged ''present'' in at least three samples were retained. This filtering reduced the number of probesets considered from 1,411,399 to 353,981 exonic probesets, of which 243,301 passed DABG filtering. Gene level summaries were calculated by taking the median signal of filtered probesets that mapped to unique gene symbols. When summarised this resulted in 31,345 genes considered. Unsupervised hierarchical clustering was performed on the 1000 most variant genes (ranked on coefficient of variation) to show the separation of samples based on the most variable genes in the data, while minimising computational requirements. Signature Generation: A gene signature was determined to be the set of genes or probesets that were significantly differentially expressed between two groups of cell lines according to either LIMMA or Rank Product Analysis. The cut-off for significance was a false discovery corrected p value of 0.01. Packages: R: 3.0.2, Annmap: v1.2.1 using human database build 66, LIMMA: 3.17.26, RankProd: 2.32.0, Pheatmap: 0.7.7.

Validation Cohorts, Array Mapping and Data Analysis
Head and neck cell line Affymetrix U133A Plus2 array data were RMA normalised using the affy package in R. Affymetrix control probesets ('AFFX' annotated) were removed. For variance analysis, _x_, _a_ and _s_ annotated probesets were also removed. NCI-60 -Affymetrix Plus2 cel files were downloaded from CellMiner (http://discover.nci.nih.gov/cellminer/) and RMA normalised as before. After normalisation, replicate arrays for each cell line were averaged. For comparison to the gene-level summarised exon array data, Plus2 probesets were mapped to gene symbols using annmap.

Radiosensitivity Signature Mapping
All signatures were applied to the gene-level summaries of the cervix data using gene symbol mapping. For application of signatures to the HNSCC and NCI-60 Affymetrix Plus 2 datasets, the following protocols were used: 1. Probeset IDs for the Eschrich et al [13] ten hub genes were taken from Table 3 from the group's first paper [13]. NCI-60 test set cell lines were taken from Table 4 from the group's second paper [12]. Twelve cell lines were listed but there was no corresponding Plus2 array for the breast cell line MDN. 2. The top four ranking genes from Torres-Roca et al [14] (RPIA, RBBP4, RGS19, ZNF208) were mapped to Affymetrix Plus2 probesets using annmap. The corresponding expression data for the probesets were extracted and plotted on a linear scale (antilog). 3. Gene symbols for the Amundson et al gene signature were taken from the second table of the original article [10]. One gene could not be mapped (Unigene ID Hs.494347) as there was no corresponding gene symbol in the table. The remaining 21 gene symbols were mapped to Plus2 probesets using annmap.
Multi-mapping probesets were removed. 4. The Tewari et al signature was taken from the second table of the original article [8]. Forty-nine of the 60 probesets with a unique gene symbol were extracted and mapped to Plus2 probesets using annmap. Multi-mapping probesets were removed.
Unsupervised analyses (clustering, PCA) of gene expression data, signature analysis and differential expression analysis (LIMMA [27], RankProd) were carried out using R. The threshold for differential expression using Rank Product Analysis (RankProd) was a Percent False Positive (PFP) rate of ,0.01.

Graphing and Statistics
Results show the mean of biological replicates and precision measurements are the standard error of mean unless otherwise stated. R values indicate Pearson's product moment coefficient. Boxplots were generated in GraphPad Prism (v6.0): box-whisker parameters: horizontal bar indicates median expression, the box indicates interquartile range; whiskers represent the range. For visualisation of radiation survival curves a linear quadratic equation was fitted in R, with radiobiological parameters derived from DRFIT [28]. The R package LIMMA, was used to calculate differential expression values for protein profiling data. Where appropriate, p-values are Benjamini and Hochberg false-discovery rate (FDR) corrected [29]. Principal component analysis (PCA) reduces multi-dimensional data (i.e. thousands of genes) into datapoints in 2-D space. The closer two data-points (samples) the more similar the samples. PC1 (x-axis) accounts for the majority of variance in an experiment, PC2 (y-axis) accounts for the component representing the second highest variance. Table 1 summarises the cervical carcinoma cell lines. Two cell lines did not form colonies and SF2 values for the remaining 14 lines ranged from 0.25 to 0.75 ( Figure 1). SF2 values for six of the cell lines were published by another group [30], and the ranking was identical in both studies. In the 14 cell lines, there was no correlation of SF2 with plating efficiency (R 2 = 0.005, p = 0.82), doubling time (R 2 ,0.0001, p = 0.99) or the RNA expression of TP63, a marker of squamous cell differentiation (p = 0.90).

Molecular Characterisation of Seemingly Homogeneous
Cervical Carcinoma Cell Lines Shows Significant Disparity p63 expression (protein and mRNA) was measured because it discriminates between squamous (p63+) and non-squamous (p632) histological types of cervix cancer [21]. Following transcriptional profiling, unsupervised clustering of the most variant 1,000 genes (ranked by coefficient of variation) separated the lines into three clusters (Figure 2A) with cluster 1 (C33a and HCSC1) being outliers. The other 14 cell lines partitioned as p632 and p63+ clusters with the exception of SKG1 which had the lowest TP63 transcript level of the p63 positive lines. HCS2 and 778, which did not form colonies in our conditions, did not cluster together suggesting no common transcriptional expression associated with ability to form colonies. These results suggest that the major basal transcriptional differences between the cell lines relate to p63 expression. Interestingly, while HeLa cells were the only adenocarcinoma (AC) according to provenance information, several cervix cell lines had similar global transcriptional profiles. HCSC1 is 'small cell carcinoma' derived, consequently we explored whether the clustering of C33a and HCSC1 was due to a shared histological origin. Principal component analysis (PCA) using the combined gene expression from two gene signatures, trained on (i) AC and SCC [21] and (ii) small cell carcinoma [31], showed that HCSC1 and C33a had very similar histological gene expression ( Figure 2B). Figure 2C shows that C33a and HCSC1 had low levels of SCC genes and higher than average levels of small cell carcinoma genes. It is interesting to note that the AC gene expression was low in all cell lines, including HeLa, suggesting that this signature, derived in primary tumour material may have limited applicability in cell lines. These data suggest that C33a is histologically a small cell carcinoma derived cell line and highlights the transcriptional differences associated with histological type found in a relatively homogeneous single tissue of origin cohort.
Protein Profiling of 'Cancer Associated Genes' shows Key Pathway Differences between Cell Lines, but not between High and Low SF2 Groups A panel of 24 proteins were selected from a catalogue of prevalidated antibodies of proteins implicated in cancer, or resistance to therapy [24]. Few DNA damage response antibodies were available and so selection was limited to well-validated proteins associated with cancer, such as p53, Rb, EGFR etc. As p63 is essential for the proliferative potential of stem cells in stratified epithelia [32], we postulated that p63+ cells would express higher levels of the epithelial marker protein E-cadherin, compared with p632 cells and this was confirmed by the protein array (p = ,0.0001) ( Figure 3A). We also compared the mRNA expression level of E-cadherin (Exon-array derived) with the protein abundance measured by the array (relative fluorescence intensity [RFI]; Figure 3B). There was a strong correlation (R = 0.95, p,0.001) demonstrating that protein levels reflect transcript levels for E-cadherin. We also detected high levels of p53 protein in C33a cells compared with all other cell lines ( Figure 3C), due to a known mutation in the TP53 gene [33] resulting in protein stabilisation. These data gave us high confidence in the protein profiling data. Unsupervised clustering of the protein data showed no relationships with known characteristics ( Figure S1). Ranking the cell lines by SF2 showed no clear visual structure to the data ( Figure 3C). The 14 cell lines were split into high and low radiosensitivity groups using the median SF2 value, as previously used with clinical specimens [3,4]. Four proteins were differentially expressed (p,0.05) between the two groups: mTOR, PTEN, IkB alpha, and NFkB, but none were significant after false discovery rate (FDR) correction ( Figure 3D, Table S2). mTOR was borderline significant (FDR p = 0.09) and there was a trend for a moderate correlation between mTOR and SF2 (R = 0.48, p = 0.08, Figure 3D). These data reveal that while there were considerable differences between the cells in terms of protein expression and pathway activation, none of the proteins/ pathways were robustly associated with SF2 in this cell line cohort. Table 2 summarises the 11 HNSCC cell lines, which were all HPV negative ( Figure S2). Although reported to be squamous cell carcinoma, three lacked p63 protein expression by Western blot (Figure S3), and had low transcript levels detected by microarray. The SF2 range (0.3-0.8) was similar to that for the cervix lines ( Figure 4A), but the HNSCC cell lines were more radioresistant compared with the cervix (p = 0.003). The median SF2, used to partition the cell lines was 0.36 for cervix and 0.61 for HNSCC cell lines. As with cervix cell lines, there was no difference in SF2 between cell lines expressing high versus low levels of TP63 ( Figure 4B) and unsupervised hierarchical clustering partitioned the HNSCC cell lines into three groups reflecting TP63 expression ( Figure 4C). The most outlying cluster had the lowest TP63 expression while the remaining two clusters divided the cell lines with expression ,/.6.0 (log 2 ) TP63. These data show that both cervical carcinoma and HNSCC cell lines have similar radiosensitivities and global transcriptional profiles, with the majority of differences relating to the transcription factor p63. As such, the HNSCC cohort is a tissue-type distinct from cervix, but should be a good comparator for SF2 associated genes derived in cervical cell lines and vice versa.

Head and Neck Cancer Cell Lines Show Similarities in Global Gene Expression
Genes Differentially Expressed between High and Low SF2 Groups are Primarily Cell Type Specific and Cannot Stratify the NCI-60 Cell Lines Differences between the cell lines partitioned using median SF2 were explored using genome-wide expression profiling. No differentially expressed transcripts were found by LIMMA following multiple-testing correction. This was also the case for linear models incorporating HPV and p63 expression as covariates, or in a 3-way ANOVA. While genes were identified that were differentially expressed (raw p,0.05), none passed false-discovery Figure 3. ZeptoMARK protein profiling of the cervix cancer cell lines. A) Histogram displaying the ZeptoMARK protein-array derived abundance for the 16 cervix cancer cell lines. The y-axis displays E-cadherin protein level (relative fluorescent intensity (RFI) for each of the cell lines (x-axis). Cell lines are ranked based on TP63 expression. Grouping into p63 negative and p63 positive cell lines confirms the association of E-cadherin with p63. The p value is T-test derived comparing the difference in E-cadherin expression between the p63 positive and negative groups, error bars display standard deviation of two biological replicates. B) x-y scatterplot showing E-cadherin gene expression (Exon array) on the y-axis against Ecadherin protein expression on the x-axis. Dashed line represents perfect correlation. Exon array data-points represent the average of multiple exonic probesets (n = 19) from a single Exon expression array, where protein data are the mean of two biological replicates. C) Heatmap showing clustering of proteins with similar expression (y-axis) in the ZeptoMARK protein profiling data. Cell lines ranked by SF2. Heatmap colouring is based on row Zscore. D) xy-scatter plot showing the expression (y-axis) of the top 5 proteins from LIMMA against SF2 (x-axis). Table summarises the results of Limma differential protein expression analysis between high and low SF2 groups and Pearson correlation of protein expression (RFI) against SF2. p values denote those proteins with differential expression (* p,0.05 or ** p,0.01) between SF2 low and high groups according to LIMMA analysis. However these fail to pass false discovery rate correction. doi:10.1371/journal.pone.0086329.g003 correction. An alternative method, Rank Product Analysis, applied to the cervix cell lines identified 96 differentially-expressed genes (pfp,0.01) (Table S3). These genes separated the cervix samples on the first principal component, accounting for 36% of the variation ( Figure 4D), but could not separate the HNSCC cell lines based on SF2 ( Figure S4A). A reciprocal analysis on the HNSCC lines identified a similar number of probesets (n = 97, mapping to 42 unique gene symbols, pfp,0.01) differentially expressed between high and low SF2 (Table S4). These genes performed well in separating the HNSCC cell lines ( Figure 4D), but failed to separate the cervix lines ( Figure S4B). This shows that the majority of the genes identified are cohort/tumour type specific. Only four (2.9%) of the 138 differentially expressed gene symbols were in both gene lists: MGST1, IFITM2, TFPI and TGFB2. Of these only two were congruent in being associated with radiosensitivity or radioresistance in both cohorts (MGST1, TFPI). Expression of these two genes did not separate the NCI-60 cell lines based on SF2 ( Figure 4D). Similar results were achieved taking the convergence (n = 134) of the cervix and head and neck gene lists.

Identification and Independent Validation of a Signature Associated with p63 Protein Expression
To test our signature generation approach, we applied the same methods (i.e. Rank product, mapping Exon 1.0ST gene-level data to U133 plus 2.0 array) to a more obvious biological phenotype: p63 protein expression. Rank product analysis identified genes differentially expressed between p63 positive and negative cell lines in both cervix (n = 395) and HNSCC (n = 335) cell lines pfp,0.01 ( Figure S5A&B). Of these genes 62 were differentially expressed in both cell types and associated with p63 expression ( Figure S5C). These common genes represent genes previously associated with squamous histology (e.g. KRT5, DSC3, CTA-55I10.1) [21]. Reassuringly, when this gene signature was applied to an independent dataset it could discriminate between adenocarcinoma and squamous cell carcinoma of the lung (non-small cell lung cancer) ( Figure S6) [34]. There was little overlap between the p63 negative component of this signature and the adenocarcinoma signature applied previously (Figure 2). Given HeLa is the only adenocarcinoma cell line, this suggest that losing p63 expression is not the same transcriptionally as being 'adenocarcinoma'. In terms of classification, it is predominantly the p63 positive component of the signature that facilitates separation in both cell lines and tumours. That said, as our methods could derive a signature capable of independent validation, SF2 appears to be a difficult phenotype to describe at the transcriptional level. Graph showing that there is no difference in TP63 expression between the SF2 high and low groups. Bar shows the median expression. C) Unsupervised hierarchical clustering of the top 1000 genes ranked by coefficient of variation (from U133 array data). Heatmap colouring is by log 2 expression value. Rows represent genes and columns are cell lines. x-axis dendrogram (clusters) indicates the similarity of the cell lines and y-axis dendrogram the similarity of genes. Cluster 1 represents two samples with the lowest TP63 values (p63 negative). Cluster 2 shows the grouping of the other p632 cell line, along with low TP63 expressing lines. Cluster 3 groups together all HNSCC lines with .6.0 (log2 expression) TP63 expression. D) Diagram to represent the integrated SF2 analysis of the cervix and HNSCC cell lines. Rank product analysis (FDR ,0.05) identified 96 genes in the cervix cohort differentially expressed between SF2 low and high cell lines. An identical analysis in the HNSCC cell lines identifies 97 probesets (42 genes) differentially expressed between SF2 low and high cell lines. PCA of the cervix genes shows that they are capable of separating the cell lines by SF2. PCA of the HNSCC genes is equally capable of separating the samples based on SF2. The Venn diagram shows that only 4/138 genes are common between the two cohorts and of these only 2/138 are ''congruent'' and associated with the same directionality (high SF2/low SF2 in both HNSCC and cervix). PCA shows probeset expression of these two ''common'' and ''congruent'' genes (MGST1 and TFPI) in the NCI-60 dataset. The NCI-60 upper PCA shows data-points coloured for median SF2 and lower PCA coloured for 0.2, used previously to partition radiosensitive and radioresistant cell lines in this cohort. doi:10.1371/journal.pone.0086329.g004  Given our suspicion that there are only small transcriptional differences associated with the SF2 phenotype, we calculated the sample size required to detect transcriptional difference reflecting SF2 from the cervix cell line data. Using gene CTC-359D24.3 which had the largest standardised difference between group means of 2.48 (log 2 ) combined with the smallest within group standard deviation (1.54) provides an optimistic estimate of required sample size for microarray classifiers [35]. This suggests that 27 samples (13 SF2 low and 14 SF2 high) would be required given the current spread of the data. The cohorts were combined to improve statistical power (n = 25). The two cohorts were split independently based on median SF2, as splitting on SF2 alone would create a bias between the cervix and HNSCC samples ( Figure S7A). Samples below the median, whether cervix or head and neck were defined as radiosensitive and above the median were classified as radioresistant ( Figure S7B). Twenty-two genes were differentially expressed between the SF2 high and low cohorts. These genes competently separated the cervix and HNSCC cell lines, with only one misclassification ( Figure S7A), but did not separate the NCI-60 samples, whether separated on the median or 0.2 ( Figure S8) [10]. This suggests that the data are potentially over-fitted and cannot generalise to the NCI-60 dataset. Interestingly 3/22 genes (KRT5, CSTA, FGFBP1) were identified as being associated with p63 previously and suggest an imbalance between histologies within the two SF2 groups. Repeating the analyses using the overall median for the combined cervix and HNSCC cell line cohort or the lowest quartile as a cutoff did not improve the discriminatory power in the three cell line cohorts. Similarly, pooling the cervix, HNSCC and NCI-60 cohorts did not work.

Published Radiosensitivity Gene Signatures have a Varying Ability to Classify Cell Lines based on SF2
We also investigated published radiosensitivity gene signatures. Given that principal component analysis (PCA) gives an unsupervised/unbiased view of the major variation between different samples we used this method to assess how well a gene signature could separate samples based on SF2. First, we considered the Tewari signature derived by assessing cell viability in in vitro irradiated cervix tumour samples [8]. A 54 transcript signature mapped to 49 unique gene symbols partly separated the cervix ( Figure 5A) but did not separate the HNSCC or NCI-60 cell lines into SF2 groupings ( Figure 5A).
The Torres-Roca signature was trained on a historical microarray platform and when applied to the authors NCI-60 test subset [12] on a current array version (U133 plus 2.0) with standard normalisation (RMA), did not predict SF2 in the NCI-60 test subset (Figure 5B), the cervix lines or the HNSCC cells ( Figure 5C). There was no statistically significant difference between the calculated radiosensitivity index (RSI) and the SF2 groupings in either the cervix (p = 0.74) or HNSCC (p = 0.32) cell lines ( Figure S9) and no grouping by PCA when considering the gene expression values for the ten ''hub'' genes ( Figure S10). In a subset of four published and described genes [12] there was a weak, but significant correlation between expression of RPIA (R = 20.3, p,0.01), RBBP4 (R = 20.36, p,0.01), RGS19 (R = 20.4, p,0.01) and SF2, in the NCI-60 (training) data. These genes however showed no association with SF2 in the cervix or HNSCC cohorts ( Figure S11). Taken together our findings suggest that this gene signature, while capable of prognostication in clinical datasets [11] and successfully classifying cell lines based on SF2 in 5/12 (41.7%) cases, was not sufficiently robust to predict SF2 on the same cell lines on a different microarray platform or in independent cell line cohorts.
The last signature (Amundson) assessed was also derived using the NCI-60 panel and the basal expression of genes associated with SF2 [10]. Twenty-one genes partitioned the samples based on an SF2 threshold of 0.2. This 21-gene signature separated the NCI-60 cell lines according to SF2 along the first principal component ( Figure 5D). These genes also partly separated the HNSCC cell lines using the second principal component. Similarly the cervix cell lines showed grouping based on SF2, using the first two principal components. These groupings are not ideal and could be optimised further, but this demonstrates for the first time that a gene signature trained on basal SF2 can be successfully applied to two independent cell line cohorts. This suggests that some genes associated with SF2 may generalise across cell-types, rather than being cohort/cell type specific.

Discussion
Cell line-derived gene signatures have been successfully translated to clinical biomarkers that are both prognostic and predictive [36] and are particularly relevant in situations where measurements in primary tumours are difficult, as with radiosensitivity. Radiotherapy plays an important part in the management of cervix cancer and HNSCC and measurements of radiosensitivity have been shown to correlate with clinical radioresponse [3,4]. The use of these tissue types is, however, currently underrepresented when training signatures on radiosensitivity [10,12].
As expected [30] the cervix cell line SF2 values varied but were independent of proliferation and plating efficiency. We also showed no association between SF2 and the expression of key cancer associated proteins. However, mTOR was .20-fold higher in radioresistant compared with radiosensitive cells and was moderately correlated with SF2 (R = 0.48, p = 0.08). High expression of mTOR protein was associated with a poor prognosis in cervical cancer treated with radiotherapy [37]. Therefore, mTOR may have a role in intrinsic radiosensitivity and clinical radioresponse and should be investigated further.
A rationale of this study was that radiosensitivity signatures might be improved if derived from homogeneous rather than the radioresistant lines (above the median SF2). B) Implementation of the Eschrich radiosensitivity model [12]. Applied to a training set of 16 samples from the NCI-60 [13]. xy-scatterplot with the x-axis showing reported SF2 values, generated with these cell lines on a earlier array type (U95) against values generated by implementing the model in the current U133 plus 2.0 dataset (y-axis). Line indicates perfect correlation. C) Applied to the HNSCC and cervix cancer cell line cohorts. The y-axis indicates the predicted SF2 determined from the radiosensitivity model. The x-axis shows the empirically derived SF2 values. D) Principal component analysis of the Amundson radiosensitivity gene signature [10]. The original signature consists of 22 genes (33 Plus2 probesets), with mapping to the NCI-60 (33 Plus2 probesets), HNSCC (33 Plus2 probesets) and cervix cell line (21/22 genes) datasets. The xaxis shows PC1, accounting for the largest amount of variation in the experiment and the y-axis shows the second principal component (PC2). In the NCI-60 data colouring is based a threshold of 0.2 (previously defined [21] where the HNSCC and cervix cell line datasets are coloured by median SF2. In all cases blue data-points are radiosensitive cell lines (below the median SF2) with red data-points being the radioresistant lines (above the median SF2). doi:10.1371/journal.pone.0086329.g005 heterogeneous cell line cohorts involving multiple tissues of origin and culture conditions. Despite the greater homogeneity of our cohort, there were key differences between the cell lines. Transcriptome analysis showed that C33a, commonly used as a model for HPV negative cervix cancer [38], is likely to be derived from small cell carcinoma and therefore may not be a good model, given that most cervix cancers are squamous cell in origin [39]. The epithelial marker p63 that can be lost in culture by squamous cells [18] was the most significant source of transcriptional variation between cervix cell lines. We also show that p63 has no association with SF2 in these cell lines. This result was also seen in HNSCC lines, showing that SF2 in two independent cohorts of cell lines is not associated with epithelial character.
HNSCC cell lines were similar to cervix cancer cell lines in their SF2 range, basal gene expression and partitioning based on p63 status. However, radiosensitivity signatures did not transfer between the tumour types. Only two genes (MGST1 and TFPI) were differentially expressed between low and high SF2 groups in both tumour types. There are a number of potential reasons for this finding. First, cell lines from different origins may have different mechanisms and consequential gene expression to deal with radiation-induced damage. This is supported by different tissues and their derivative cell lines having varying radiocurability and radiosensitivity [17]. However, cervix and head and neck cancer have broadly similar radiosensitivities and radiocurabilities. Second, the simple dichotomisation strategy applied (i.e. median partitioning of the cell lines) might not be the best approach, however previous work in clinical samples showed that median SF2 informed clinical radioresponse [3,4], and repeating the analyses using a lower cut-off did not work. However, with a larger cohort, perhaps including the extremes of SF2, a different partitioning strategy might be more successful. Third, technical variation in measuring SF2 might be a problem, particularly with borderline samples. Fourth, differences in radiosensitivity (SF2) occur at a post-transcriptional level and protein-profiling methods may be more fruitful in deriving a radiosensitivity signature, although these rely on the availability and selection of appropriate antibodies [40]. Although there is interest in the protein expression of DNA damage response, the literature is conflicting with high expression associated with both good [41,42] and poor [40] outcomes following radiotherapy. Another technical issue that might account for the lack of transferability of the signatures is the use of different platforms to measure gene expression. For example the methods for filtering applied to the Exon 1.0ST arrays used to generate the cervix signature cannot be applied to the U133 plus 2.0 arrays. However we show that this technical issue can be overcome in the generation of a p63 signature, using the same platforms ( Figures S5 & S6).
Validation in an independent cohort is required to avoid overtraining of gene signatures but has been rarely applied for radiosensitivity signatures. We tested three published signatures trained on either SF2 or viability 48 h following 3 Gy irradiation. The most developed signature trained on SF2 in the NCI-60 panel did not validate. However, the normalisation (MAS 5.0) and array type (HU6800) used in the original derivation and testing of the signature were different and this may account for the lack of reproducibility. This signature had been shown to be prognostic for radioresponse (locoregional control) and predictive of benefit from adjuvant radiotherapy in breast cancer patients [11]. Nevertheless, using the raw expression values alone or PCA transformation showed no separation of the three datasets; showing the signature is not sufficiently robust to transfer to other cell line datasets.
A signature derived in cervix tumours based on viability 48 h following 3 Gy irradiation [8] partly stratified the cervix lines, but did not separate the HNSCC or NCI-60 lines. This observation is consistent with our original hypothesis that radiosensitivity signatures might be more robust if trained on more homogeneous (and radiobiologically relevant) cell line cohorts. However, the NCI-60 trained Amundson signature separated the cervix and head and neck lines into high and low SF2 groups, albeit imperfectly. This finding does not support our original hypothesis but does suggest that further development of radiosensitivity signatures is worthwhile. The research area will benefit from expanding the number of cell line cohorts which have been well characterised and for which gene expression data are available. It is hoped that making our data publically available will aid further developments of radiosensitivity signatures.
In summary, our attempt to identify common transcripts associated with low and high SF2 measurements was not fruitful in a homogeneous single tumour type cell line cohort. We applied a relatively naïve approach to identify the genes associated with SF2. While it is likely that more advanced modelling of the data will result in a better understanding of the data and potentially reveal interesting candidate transcripts, this is beyond the scope of this paper. What is clear from these analyses is that intrinsic radiosensitivity, as measured by SF2, is a relatively subtle phenotype. The datasets generated in this study should benefit future work aimed at deriving a robust radiosensitivity signature. Our work suggests that a common transcriptional signature can reflect the radiosensitivity of tumours of heterogeneous origins, although much larger cohorts are required to overcome background noise. Table S1 Culture conditions for the cancer cell lines. List of requirements for culture of the cervix and head and neck cell lines.

(DOCX)
Table S2 Differentially expressed proteins from protein profiling array. Results from LIMMA differential expression analysis of ZeptoMark protein-profiling arrays, comparing radiosensitive and radioresistant lines. (DOCX) Table S3 Differentially expressed genes from Affymetrix Expression profiling (Exon 1.0ST arrays). Results from Ranked Product differential expression analysis of cervix carcinoma cell lines profiling using Affymetrix expression 1.0ST arrays, comparing radiosensitive and radioresistant groups. (DOCX)