Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Patterns of gene expression characterize T1 and T3 clear cell renal cell carcinoma subtypes

  • Agnieszka M. Borys,

    Roles Investigation, Methodology, Writing – original draft

    Affiliation Center for Medical Genomics OMICRON, Medical Faculty, Jagiellonian University Medical College, Krakow, Poland

  • Michał Seweryn,

    Roles Data curation, Formal analysis, Writing – original draft

    Affiliation Center for Medical Genomics OMICRON, Medical Faculty, Jagiellonian University Medical College, Krakow, Poland

  • Tomasz Gołąbek,

    Roles Investigation

    Affiliation Chair and Department of Urology, Medical Faculty, Jagiellonian University Medical College, Krakow, Poland

  • Łukasz Bełch,

    Roles Investigation

    Affiliation Chair and Department of Urology, Medical Faculty, Jagiellonian University Medical College, Krakow, Poland

  • Agnieszka Klimkowska,

    Roles Investigation, Methodology

    Affiliation Chair of Pathomorphology, Medical Faculty, Jagiellonian University Medical College, Krakow, Poland

  • Justyna Totoń-Żurańska,

    Roles Conceptualization, Investigation, Methodology

    Affiliation Center for Medical Genomics OMICRON, Medical Faculty, Jagiellonian University Medical College, Krakow, Poland

  • Julita Machlowska,

    Roles Investigation

    Affiliation Center for Medical Genomics OMICRON, Medical Faculty, Jagiellonian University Medical College, Krakow, Poland

  • Piotr Chłosta,

    Roles Investigation, Supervision, Writing – review & editing

    Affiliation Chair and Department of Urology, Medical Faculty, Jagiellonian University Medical College, Krakow, Poland

  • Krzysztof Okoń,

    Roles Conceptualization, Supervision, Writing – review & editing

    Affiliation Chair of Pathomorphology, Medical Faculty, Jagiellonian University Medical College, Krakow, Poland

  • Paweł P. Wołkow

    Roles Conceptualization, Funding acquisition, Project administration, Supervision, Writing – review & editing

    Affiliation Center for Medical Genomics OMICRON, Medical Faculty, Jagiellonian University Medical College, Krakow, Poland

Patterns of gene expression characterize T1 and T3 clear cell renal cell carcinoma subtypes

  • Agnieszka M. Borys, 
  • Michał Seweryn, 
  • Tomasz Gołąbek, 
  • Łukasz Bełch, 
  • Agnieszka Klimkowska, 
  • Justyna Totoń-Żurańska, 
  • Julita Machlowska, 
  • Piotr Chłosta, 
  • Krzysztof Okoń, 
  • Paweł P. Wołkow


Renal carcinoma is the 20th most common cancer worldwide. Clear cell renal cell carcinoma is the most frequent type of renal cancer. Even in patients diagnosed at an early stage, characteristics of disease progression remain heterogeneous. Up-to-date molecular classifications stratify the ccRCC samples into two clusters. We analyzed gene expression in 23 T1 or T3 ccRCC samples. Unsupervised clustering divided this group into three clusters, two of them contained pure T1 or T3 samples while one contained a mixed group. We defined a group of 36 genes that discriminate the mixed cluster. This gene set could be associated with tumor classification into a higher stage and it contained significant number of genes coding for molecular transporters, channel and transmembrane proteins. External data from TCGA used to test our findings confirmed that the expression levels of those 36 genes varied significantly between T1 and T3 tumors. In conclusion, we found a clustering pattern of gene expression, informative for heterogeneity among T1 and T3 tumors of clear cell renal cell carcinoma.


Renal tumors are classified as the 20th most common malignancy worldwide, both based on incidence and death rates [1]. Clear cell renal cell carcinoma (ccRCC) is the most frequent renal tumor (80–90% of cases)[2,3]. Multiple morphotypes have been described within RCC [4,5] and a growing body of evidence suggests that those morphotypes represent different molecular entities [68].

There are several classification systems used to describe renal tumors. Grading is performed by Fuhrman system, based on the nuclear and nucleolar features, and recently modernized by International Society of Urologic Pathology [4]. The most important for prognosis is the stage of the tumor which is evaluated by American Joint Committee on Cancer / The Union for International Cancer Control (AJCC/UICC) TNM system[9]. Although ccRCC cases are usually diagnosed at early stages (in TCGA database, T1 stage represents 48% of all ccRCC cases), clinical outcomes remain heterogeneous within each staging group, suggesting the existence of molecular features unaccounted for by pathology assessment [6,7]. A significant challenge is that metastatic potential and clinical outcome are not well correlated with tumor size and stage [6,7].

In the up-to-date molecular classifications, ccRCC samples are classified into two groups [10]. The authors annotate those clusters ccA and ccB and state that ccA tumors have markedly improved disease-specific survival compared to ccB. Their analyses suggests that the proposed classification was independently associated with survival. However, the heterogeneity within described clusters is significant. An important step in progression of cancer is extension of the tumor beyond the natural limits of the affected organ. In current classification, T1 and T2 tumors differ by their size only, and both are confined to the kidney, while both T3 and T4 tumors extend beyond this organ. Therefore, we decided to select T1 and T3 samples for our study. We aim to verify whether gene expression patterns reflect stage of the disease and to investigate the heterogeneity based on gene expression within the current classification systems in T1 and T3 tumors. Gene expression in ccRCC was studied extensively in the past (exemplary papers:[1113]). Our study provides additional information on heterogeneity within the samples from various tumor stages as well as points out towards potential mechanisms of transition between these stages.

Materials and methods

Sample collection

23 ccRCC tumor samples were collected during radical nephrectomy at the Department of Urology, JUMC. Samples were fixed with formalin and embedded in paraffin at the Department of Pathology for microscopic evaluation and transferred to the Center for Medical Genomics OMICRON for gene expression studies. The study was approved by the Bioethics Committee of the Jagiellonian University.

All patients signed written informed consent forms. Experiments conform to the provisions of the Declaration of Helsinki in 1995 (as revised in Edinburgh 2000). Patient tumors were classified into T1 (13) or T3 (10) stages by a pathologist and independently re-evaluated. Selection of T1 and T3 tumors, as a basis of sample collection for our study, gave prospect to investigate clinically most frequent specimens. In addition, each study group remains homogeneous and sample selection parallels kidney restriction of the tumors in T1 group and extension beyond the kidney in T3 group. Additional clinical data were collected, along with immunohistochemical information summarized in S1 Table.

RNA isolation

RNA was isolated from 10 x 5 μm slides from Formaldehyde Fixed-Paraffin Embedded (FFPE) block, using Maxwell 16 FFPE Tissue LEV DNA Purification Kit (Promega). Briefly, 300 μl of Mineral Oil and 250 μl of lysis master mix were added per sample and incubated in 56°C for 15 min and subsequently at 80°C for 1 hour. DNA was degraded by DNase I treatment (15 min, RT). The aqueous phase was transferred to Maxwell FFPE Cartridge and RNA was isolated according to Promega RNA—FFPE protocol. 50 μl of Nuclease-Free Water was used for RNA elution. The RNA quantity was measured using NanoDrop 1000 (Thermo Scientific) device and quality was assessed on 2200 TapeStation System (Agilent, RNA ScreenTape), according to manufacturer instructions. DV200 parameter, describing percentage of RNA fragments longer than 200 bases was used for sample classification (S1 Table). Samples with DV200 > 30% were classifies as suitable for further analysis.

Whole genome DASL assay

The Illumina Whole Genome-DASL assay was performed using 200 ng of RNA following the manufacturer's instructions. Briefly, RNA was reverse transcribed to cDNA using biotinylated primers, followed by immobilization to streptavidin-conjugated paramagnetic particles. Biotinylated cDNAs were then simultaneously annealed to a set of assay-specific oligonucleotides. Extension and ligation of the annealed oligonucleotides generated PCR templates that were amplified using Titanium Taq DNA Polymerase (Clontech). Labeled PCR products were washed and denatured to yield single-stranded fluorescent molecules, which were hybridized to the HumanHT12 v4.0 Whole Genome Gene Expression BeadChips for 16 h at 58°C. The Illumina HiScan was used to scan the arrays.

Data analysis

Microarray data in *.IDAT format were uploaded and pre-processed in R environment. The ‘beadarray’ package was used to upload the data and ‘lumi’ for normalization and filtration of the data.

Differential expression analysis

The differentially expressed probes were detected via the Generalized Linear Model framework implemented in the package 'limma'.

For the comparison between T1 and T3 groups as well as the groups defined via hierarchical clustering (A1, A2, A3) the functions '' and 'eBayes' were used. For analysis of differential expression in the TCGA cohort the framework implemented in the package 'edgeR' was utilized. Gene counts were normalized with the default options and subsequently filtered to reduce the number of hypotheses tested. After estimating the dispersion parameter, the Generalized Linear Model was fitted and tests for coefficients were performed. Since this was used as a replication cohort, we have only recorded the number of genes differentially expressed between the two study groups with the standard level of statistical significance 0.05.

Hierarchical clustering

The 23 T1 and T3 samples were clustered based on expression of 543 probes. To this aim the function 'hclust' with complete linkage as implemented in the 'heatmap.2' procedure was used. The noticeable pattern where the dendrogram is divided into three main groups was further confirmed with the use of the 'cutreeDynamic' function in the 'dynamicTreeCut' package. The faithfulness of clustering was evaluated using the cophenetic correlation coefficient.

Both the T1 and the T3 samples were clustered based on normalized gene expression values (pseudocounts) generated with 'edgeR' package. To overcome the issue of the Euclidean metric being driven by highly expressed genes, the Renyi divergence function was used as the measure of similarity. Renyi divergence was previously used by the authors of [14] in the context of liver cancer. Once the similarity matrix was estimated, hierarchical clustering was performed as implemented in the function 'hclust'. The optimal number of clusters on each dendrogram, was established via analysis of gap statistics as implemented in the function 'clusGap'.

Dimension reduction by the t-SNE algorithm

The t-SNE algorithm was used as implemented in R-package 'Rtsne, with all default parameters except for 'perplexity' where 7 was chosen as the value that is expected to produce the least number of 'groupings' for the sample size of 23.

The ROC analysis was based on logistic models with the indicator of the event that the sample is T3 used as the dependent variable. For each of the probes used for hierarchical clustering 300 random training and testing sets were selected (each time the testing set was of size 7) and ROC as well as AUC was calculated as implemented in packages 'ROCR' and “OptimalCutpoints'. Subsequently, for each probe the median AUC was calculated for each sample (taken as the median AUC over all testing sets which contained a given sample). For each sample, the 'goodness' of classification was quantified as the median of these median AUC values over all probes.

Pathway enrichment

Pathway Enrichment analysis was performed using ‘ClueGO’ plugin for Cytoscape 3.3.0 (, [15]). For all analyses, unless otherwise specified, default Advanced Term/Pathway Selection options were used with Benjamini-Hochberg p-value correction.


We analyzed 23 ccRCC samples on a microarray platform. Our samples belonged to T1 and T3 stages, as the T2 and T4 stages are rarely diagnosed (only 69 (13%) T2 and 11 (2%) T4 samples in TCGA database). Our main interests were to (1) test the hypothesis if gene expression reflects the histological classification of the JUMC samples (in particular, does the gene expression pattern allow for discrimination between T1 and T3 cases via unsupervised clustering) and (2) whether we will be able to find molecular features that reflect the observed diversity of disease progression.

Differential gene expression

Differential gene expression comparing T3 vs. T1 samples resulted in 481 genes (543 probes, S2 Table) with adjusted p-value < 0.1 and 181 probes with adjusted p-value < 0.05. The most deregulated genes (36 genes, 41 probes), with |logFoldChange| > 1.5 and adjusted p-value < 0.05, including 2 probes for: GBA3, HAO2, SLC22A2, SLC5A10 (all downregulated) and STEAP3 (upregulated) gives 24 under- and 12 over-expressed genes, presented in Table 1 (heatmap representing those genes is presented in S1 Fig).

Table 1. Differentially expressed genes between T3 and T1 groups.

Positive and negative FC values correspond to the genes with higher or lower expression in T3 samples, respectively.

Pathway Enrichment performed on the differentially expressed gene set (adjusted p-value < 0.1, 481 genes) was narrowed down to those in level 3 in the Genome Ontology (GO) hierarchy. This returned a list of enriched terms, presented in S2 Fig. Further narrowing the results with ‘Use GO Term Fusion’ option reduced the list to 9 GO biological processes terms (Fig 1A) including ‘kidney development’ with corrected p-value 1.31x10-3 (18 associated genes). Interestingly, genes associated with this term were downregulated in T3 samples vs. T1 samples. Analysis of genes with higher log-fold-change values and more stringent adjusted p-value cut-off (0.05) (Table 1) revealed one enriched pathway (Fig 1B)–‘response to copper ion’ with three downregulated genes: aquaporin 1 (Colton blood group, AQP1), amine oxidase, copper containing 1 (AOC1) and aldolase B, Fructose-Bisphosphate (ALDOB).

Fig 1. Results of differential expression analysis performed on T3 vs. T1 samples.

A. Pathway enrichment comparison performed in ClueGO plugin for Cytoscape software on 481 differentially expressed genes from T3 vs. T1 comparison. Green–downregulated, pink–upregulated genes; the size of the node is inversely proportional to the term p-value. B. Pathway enrichment performed in ClueGO plugin for Cytoscape on gene set with LogFoldChange > |1.5| and adjusted p value < 0.05. C. Heatmap of differentially expressed genes in T3 vs T1 comparison. Based on the expression pattern the samples were divided into three clusters. Color bar indicates what cluster the sample was assigned to: red–A1 (pure T1), green–A3 (pure T3), blue–A2 (mixed).

Sample clustering

Unsupervised hierarchical clustering, based on expression of 481 genes, divided 23 T1 and T3 samples into three distinct clusters: A1, A2, and A3. Two of the clusters contain populations of T1 (A1) or T3 (A3) samples only, whereas the third cluster includes samples from both groups (A2). This three-cluster pattern (two 'pure' and one mixed) is not present when all (~34K) probes are used for analysis. Therefore, it is unlikely that it is due to a batch effects.

As the distances between the clusters suggests that the A2 cluster is more closely related with A3, despite containing samples from both T1 and T3, we aimed to investigate which expression profiles characterize the A2 group. A heatmap presenting relative gene expressions is shown in Fig 1C.

Dimension reduction by t-SNE algorithm in the context of sample clustering

To further test whether the pre-selection of features (based on differential expression) allows for faithful sample classification between T1 and T3, an additional machine-learning approach has been adapted. Three sets of probes were used in this analysis: (1) the probes used for hierarchical clustering (aligned to 481 genes); (2) top 40 differentially expressed probes, and (3) all 34476 probes. Subsequently, using these sets of features, samples were projected, using the t-SNE algorithm (see Methods section), on a 3-dimensional space. For the unbiased case (all probes) no association between tumor size and the three components is present. Interestingly, for the two sets of pre-selected features, not only do we see a separation between T1 and T3 samples in the 3D space, but also a separation between the three clusters defined in the previous section. The results are presented in S3 Fig. Additionally, to further test the three-cluster pattern, we applied the UMAP algorithm [16] to project the entire dataset (~33000 probes) onto a 10-dimensional space. Subsequently, we selected three dimensions for which the projection has the strongest association with the clinical diagnosis (T1 vs T3) and visualize the projected data. Interestingly, even in this agnostic approach (with features not being pre-selected) we see a further support for the ‘intermediate cluster’ to appear (see S4 Fig).

ROC-based classification of T1 and T3 samples in the context of sample clustering

To further test whether there are indeed samples more difficult to correctly classify as T1 or T3 (i.e. samples in the 'mixed cluster'), a ROC-based analysis was performed. For each of the probes aligned to 481 genes, the AUC was calculated for 300 random test subsets of size 7 for a (logistic) model fitted on the remaining 16 training samples. Subsequently, the median for each sample/probe was calculated and the median of these 500 number was assigned to each sample as a measure of 'goodness' of classification. Fig 2 includes violin plots for the 23 samples divided according to the three-cluster pattern. It is clear that the AUC in the 'mixed cluster' is lower than for the two remaining 'pure' clusters.

Fig 2. Violin plots of the median AUC based on 300 randomly selected training and testing sets for probes used for hierarchical clustering.

The boxplots present median and quartiles. The leftmost violin corresponds to the 'pure T1' cluster, the center corresponds to 'pure T3' and the rightmost to the 'mixed' cluster.

Differential variability and clustering faithfulness

In the current study, we use a relaxed threshold (p<0.1) in the process of selection of probes for sample clustering. We wish to support this choice by demonstrating that probes, which are differentially variable between the study groups are more informative about the clustering of samples than the ones with similar variances. To this aim we first compare the variances between T1 and T3 tumors using Levene's test and detect six probes (ILMN_1762410 (SLC22A2), ILMN_1716246 (FRZB), ILMN_1677851 (RARRES1), ILMN_1746128 (ACSM2B), ILMN_3311035 (miR-1251) and ILMN_1793309 (BEND4)) with FDR below the standard 0.05 significance threshold. Secondly, we compare the cophenetic correlation coefficient for two different clusterings: (1) based on differentially expressed probes (p<0.1) with p-value in the Levene's test above the median, and (2) based on differentially expressed probes (p<0.1) with p-value in the Levene's test below the median. We note that for the first set of probes the coefficient equals 0.72 and the second 0.87. Note, that in both of the above clusterings, we use the same number of probes for analysis.

Characterization of Intermediate Cluster

A2 vs A1.

First the A2 cluster was compared to A1. In total 13 genes with adj. p-value < 0.05 were found, with the largest log-fold-change = -1.98 achieved by interleukin 6.

A2 vs A3.

Secondly, A2 and A3 clusters were compared. 22 differentially expressed genes (adj. p-value < 0.05) were identified and the top 15 (with |log-fold-change|>1.5) of them are presented in Table 2. In ClueGO analysis no enriched pathways with at least three genes were found.

Table 2. List of differentially expressed genes in A2 vs A3 comparison.

A list of all differentially expressed probes between A2 and A3 is presented in S3 Table. Main groups/families of genes represented in the results are (trans)membrane proteins, ion-channel proteins or carrier proteins, suggesting a role of regulatory genes and modulation of signal transduction in the observed outcome heterogeneity.

A3 vs A1.

Differential expression analysis of A3 and A1 clusters revealed a larger set of differentially expressed genes than A1 vs A2 and A3 vs A2. A list of 58 down-regulated and 101 up-regulated probes with |logFoldChange| > |1.5| is presented in S4 Table. ClueGO-based analysis resulted in network depicted in Fig 3A and 3B. Interestingly, genes with lower expression in A3 are associated with morphogenesis and stress response related GO’s and those that are overexpressed with metabolic processes.

Fig 3. Results of pathway enrichment analysis performed in ClueGO (plugin for Cytoscape software) for A3 vs A1 comparison.

A. GO interaction pathway with genes from A3 and A1 initial clusters. B. Indication whether the genes associated with given biological process were up- or down- regulated. Green—GO's associated with down-regulated genes, pink—GO's associated with up-regulated genes.

Validation of results with TCGA data.

Our sample size was relatively small, therefore we used TCGA RNA-seq data as a larger replication cohort. Of 481 genes, differentially expressed between T1 and T3 groups, 394 had expression levels available in the TCGA database. Illumina Probe IDs were converted to ENSG# using BioMart. A Gene was considered for further analysis if it was expressed in at least 80% of samples and the median read count exceeded 10.

T3 vs T1 comparison

Validation of our primary analysis revealed that almost 67% of differentially expressed genes (264; non-adj. p-value < 0.05) were also differentially expressed in the TCGA RNA-seq data. We additionally note that the correlation coefficient for the logFC’s between the two cohorts equals 0.78, as presented in the Fig 4.

Fig 4. Correlation coefficient plot for the logFoldChanges between the two cohorts–UJ CM and TCGA.

Hierarchical clustering with Renyi divergence

The TCGA cohort was further used to test the observations made with the use of unsupervised classification. T1 and T3 samples from TCGA were clustered based on each of the 394 differentially expressed genes in the UJ CM cohort. These 394 clusters were then evaluated, using Renyi Divergence measures, for heterogeneity with the expectation that those genes driving the clustering observed in the UJ CM cohort will show evidence of heterogeneity. To this aim, differential expression analysis was performed (between samples in a given cluster versus the largest, reference cluster). The results of this analysis were compared to the set of 36 genes which discriminate between A2 from A1 and A2 from A3.


Using above-described procedure, T1 samples were divided into 8 clusters (C1-C8), where C1 was the largest and was further used as reference. Results of this analysis are presented in Table 3. Clusters 7 and 8 were excluded from further analysis due to sample size (i.e. the disproportion in the sample size in the case-control design versus the largest cluster).

Table 3. Results of differential expression analysis of T1 clusters with sufficient number of samples (Clusters 2 to 6) with cluster 1 (n = 103) as a reference and results of differential expression analysis of T3 clusters with sufficient number of members (Cluster 2 through 4), using Cluster 1 (n = 66) as a reference group.


T3 samples were also divided into 8 clusters. Clusters 5 to 8 were excluded from further analysis based on sample size. Results of the analysis are presented in Table 3.


Clear cell renal cell carcinoma is the most frequent kidney neoplasm in adults, comprising 80–90% cases of renal tumors [2]. A characteristic feature of ccRCC is large heterogeneity of individual survival times and disease outcomes, even within the same TNM classification groups. Existing pathological classifications do not reflect the molecular basis of the disease [10]. The inability to predict treatment outcome and metastasis in ccRCC could be attributed to the molecular heterogeneity of tumor cells [6,7,17]. Since the high molecular heterogeneity within staging groups could implicitly account for treatment outcome and disease recurrence, we investigated the molecular landscape of ccRCC. We characterized differences in gene expression patterns between T1 and T3 stages in search of genes associated with the molecular heterogeneity of tumors. This approach aimed to identify genes which would be altered between the pure and mixed group (A1 vs A2 and A3 vs A2). The detected genes are involved in regulatory processes and signal transduction. Therefore we hypothesize that the sample heterogeneity can be accounted for by accumulation of subtle deviations in metabolic processes caused by changes in gene expression. We repeated this analysis on the TCGA ccRCC cohort and confirmed 67% of our results. We verified the usability of the gene set to depict the molecular heterogeneity of ccRCC samples.

Differential gene expression analysis

Among the 36 differentially expressed genes identified between T3 and T1, several have known associations with ccRCC: TRPM3, AQP1, FBP1, ITPKA, LOX, TUBB3, IGFBP1, ALDOB [1825], other cancer types: FLRT3, ACE2, OGDHL, EYA1, STEAP3, GPRC5A, COMP, [2632] or other renal diseases: MIOX, TINAG, ANGPTL3 [3335].

One of the main goals of the study was to emphasize heterogeneity of expression patterns in the context of discrimination between study groups. Therefore, as noted in the Results section (see Differential Variability and Clustering faithfulness) we choose to relax the statistical significance threshold (from 0.05 to 0.1) to include in further analysis genes which have more heterogeneous expression profiles in our cohort and thus higher chance of falling above the standard significance level.

Pathway enrichment analysis emphasized the role of copper metabolism, which is an important process in renal tissue in general, and has a role in cancer development. However, presented genes do not take direct part in pathways regarding those issues.

Other differentially expressed genes include molecular transporters (SLC22A12, SLC22A6, SLC22A2, SLC5A10), (trans)membrane proteins (AOC1, TMEM27, FLRT3, STEAP3, GPRC5A, TMEM145) and other channel proteins (TRPM3, AQP1) involved in regulation and signal transduction in cell metabolism and response to external stimuli. This suggests that dysregulation of signal transduction maybe important in defining the observed diversity of ccRCC outcomes.

Sample clustering

Clustering of 23 samples, based on all significantly differentially expressed genes, revealed partition of T1 and T3 samples into 3 distinct clusters (Fig 1C). Two of them (A1 and A3) contained only T1 and T3 samples respectively, whereas A2 contained samples from T1 and T3. Interestingly, the gene expression profiles in A2 show no clear pattern of up or downregulation, in contrast to the other two clusters. Therefore we aimed to identify genes involved in molecular heterogeneity–i.e. differentially expressed between A1 vs A2 or A3 vs A2.

Comparison of a A2 with A1 cluster revealed a role for IL-6. Overexpression of IL-6 is associated with enhanced invasiveness and epithelial–mesenchymal transition (EMT) and IL-6 is involved in a JAK/STAT signaling pathway [36]. Although there has been reported lack of correlation between expression of this protein and tumor size or grade [37] our analysis suggests another evidence on regulative role of IL-6 in clear cell renal cell carcinoma.

Comparison of a mixed cluster with pure T3 cluster resulted in 15 genes with |logFoldChange| > 1.5. The 13 overexpressed genes were reported to play a role in RCC: PAX2, NAT8, GBA3, SLC22A2 [3843] other cancer types: AOC1, HAO2, TMEM27 [4447], cell death (NPR3 [48]) or kidney metabolism: TMEM171, CYS1 [49,50]. One of the two down-regulated genes–IGF2BP3 is not expressed in normal adult tissues and is known to promote tumor invasion and metastasis [45,51,52]. Some of these same genes were identified as differentially expressed in T3 vs. T1 comparison: HAO2, AOC1, SLC22A2, GBA3, TMEM27, SLC5A10, NPR3, PAX2, IGF2BP3.

The differences shown here lead us to postulate that the isolated intermediate cluster reflects the tumors that are less metastatic prone/aggressive. Several statistically significantly disturbed genes (IL6, GBA3, TMEM27) show contradictory expression change trend to expression changes described in the literature and associated with tumor progression and metastasis [32,38,43].

The 36 genes obtained from A2 vs A1 and A2 vs A3 comparisons code for proteins associated with intracellular signaling and metabolic processes, but lack driver genes or commonly known cancer master regulators, yet these modulators account for the observed sample heterogeneity. This is in line with the previous results of T3 vs T1 comparison and underlies the significance of regulatory/modulatory genes in the progression of the disease.

Validation of results in TCGA ccRCC cohort

Use of TCGA ccRCC cohort confirmed almost 70% of our results. We tested whether genes differentially expressed between A1/A3 and A2 can be used to measure heterogeneity in a larger dataset. For that purpose, we clustered TCGA ccRCC samples separately T1 and T3 and used the A2-specific genes as an input for differential expression. We found that expression changes in gradual fashion for T3 clusters (Fig 5) suggesting growing dysregulation. Interestingly, the largest T1 clusters (cluster 2 and 5) show contradictory changes in expression suggesting opposite directions of regulatory processes in these samples.

Fig 5. Results of Renyi divergence analysis performed on T1 and T3 data from TCGA database.

Gene set resulting from comparisons of three presented clusters were used as an input for clustering method. Next differential expression was calculated in comparison to the largest cluster obtained. Heatmaps show logFoldChange of selected gene set. A. Clusters of T1 samples, B. clusters of T3 samples.

In conclusion we propose that expression of certain RNAs can be used to study the molecular basics of the heterogeneity of ccRCC.

We have found a clustering pattern reflecting heterogeneity of samples. Furthermore, we detected genes associated with diversity of ccRCC samples. We postulate that genes associated with regulatory or signal transduction modulation roles are related to diverse representations of ccRCC occurring regardless of the histological classifications. Further functional research is needed to test these observations.

Supporting information

S1 Table. Clinical parameters of analyzed samples.

T, N, M–classification of samples, T_—expanded T classification, diameter–measured in the widest dimension, Grade–ISUP modified Fuhrman grade, survival time–calculated as the number of days between collection date and date of death (calculated when applicable), mdm2 –result of histochemical staining of mdm2 protein, p53—result of histochemical staining of p53 protein, procedure–name of the procedure at which the sample was obtained, necrosis–was the tumor tissue necrotic, DV 200 –Illumina proposed parameter for description of quality of FFPE derived RNA samples (over 30% qualifies sample as sufficient for further analysis).


S2 Table. List of all differentially expressed probes between T3/T1 comparison with adjusted p value under 0.01.

ILMN ID–Illumina Probe ID, logFC–logFoldChange of probe expression, AveExpr–average expression of the given probe, P.Value–p value, adj.P.Val–p value adjusted for multiple testing.


S3 Table. List of differentially expressed genes in A2 vs A1 and A2 vs A3 comparisons.

All probes that reached adj. p. value < 0.05 cut-off value. ILMN ID–Illumina probe ID, logFC–log Fold Change, AveExpr–average probe expression value, P.Value–p value, adj.P.Val–p value adjusted for multiple testing.


S4 Table. List of differentially expressed genes in A3 vs A1 comparison.

All probes that reached adj. p. value < 0.05 and logFC > 1.5 cut-off values. ILMN ID–Illumina probe ID, logFC–log Fold Change, AveExpr–average probe expression value, P.Value–p value, adj.P.Val–p value adjusted for multiple testing.


S1 Fig. Heatmap of differentially expressed genes (24 under- and 12 over-expressed) in T3 vs T1 comparison.

Cut-off p-value 0.05. Blue–underexpressed, red–overexpressed genes. Based on the expression pattern the samples were divided into three clusters. Colour bar indicates what cluster the sample was assigned to: red–A1 (pure T1), green–A3 (pure T3), blue–A2 (mixed).


S2 Fig. Pathway Enrichment performed on the differentially expressed gene set.

Adjusted p-value < 0.1, 481 genes. Narrowed down to genes in level 3 in the Genome Ontology (GO) hierarchy.


S3 Fig. Results of analysis with the tSNE algorithm.

Three sets of probes were used in this analysis: (1) the probes used for hierarchical clustering (aligned to 481 genes); (2) top 40 differentially expressed probes, and (3) all 34476 probes. S were projected on a 3-dimensional space. For the unbiased case (all probes) no association between tumor size and the three components is present. Interestingly, for the two sets of pre-selected features, not only do we see a separation between T1 and T3 samples in the 3D space, but also a separation between the three clusters defined in the previous section.


S4 Fig. Results of analysis using the UMAP algorithm.

Entire dataset (~33000 probes) was projected onto a 10-dimensional space. Three dimensions for which the projection has the strongest association with the clinical diagnosis (T1 vs T3) was selected and projected data visualized. Interestingly, even in this agnostic approach (with features not being pre-selected) we see a further support for the ‘intermediate cluster’ to appear.



This work used the HPC resources provided by the Ohio Supercomputer Center (GRANT #: PAS0885-2). This research was supported in part by PLGrid Infrastructure.

The results shown here are in part based upon data generated by the TCGA Research Network:"


  1. 1. Ferlay J, Soerjomataram I, Dikshit R, Eser S, Mathers C, Rebelo M, et al. Cancer incidence and mortality worldwide: Sources, methods and major patterns in GLOBOCAN 2012. Int J Cancer. 2015;136: E359–E386. pmid:25220842
  2. 2. Ridge CA, Pua BB, Madoff DC. Epidemiology and staging of renal cell carcinoma. Semin Intervent Radiol. 2014;31: 3–8. pmid:24596434
  3. 3. Riazalhosseini Y, Lathrop M. Precision medicine from the renal cancer genome. Nat Rev Nephrol. Nature Publishing Group; 2016;12: 655–666. pmid:27694978
  4. 4. Delahunt B, Cheville JC, Martignoni G, Humphrey PA, Magi-Galluzzi C, McKenney J, et al. The International Society of Urological Pathology (ISUP) grading system for renal cell carcinoma and other prognostic parameters. Am J Surg Pathol. 2013;37: 1490–1504. pmid:24025520
  5. 5. Sircar K, Rao P, Jonasch E, Monzon FA, Tamboli P. Contemporary approach to diagnosis and classification of renal cell carcinoma with mixed histologic features. Chinese Journal of Cancer. 2013. pp. 303–311. pmid:23237216
  6. 6. Steffens S, Junker K, Roos FC, Janssen M, Becker F, Henn D, et al. Small renal cell carcinomas—how dangerous are they really? Results of a large multicenter study. Eur J Cancer. 2014;50: 739–745. pmid:24321262
  7. 7. Klatte T, Patard J-J, de Martino M, Bensalah K, Verhoest G, de la Taille A, et al. Tumor size does not predict risk of metastatic disease or prognosis of small renal cell carcinomas. J Urol. 2008;179: 1719–1726. pmid:18343437
  8. 8. Williamson SR, Cheng L. Clear cell renal cell tumors: Not all that is “clear” is cancer. Urol Oncol Semin Orig Investig. Elsevier; 2016;34: 292.e17–292.e22. pmid:26988177
  9. 9. Amin MB, Edge SB, American Joint Committee on Cancer. AJCC cancer staging manual. 7th ed. Springer International Publishing; 2010.
  10. 10. Brannon A, Reddy A, Seiler M, Arreola A, Moore DT, Pruthi RS, et al. Molecular Stratification of Clear Cell Renal Cell Carcinoma by Consensus Clustering Reveals Distinct Subtypes and Survival Patterns. Genes Cancer. 2010;1: 152–163. pmid:20871783
  11. 11. Higgins JPT, Shinghal R, Gill H, Reese JH, Terris M, Cohen RJ, et al. Gene expression patterns in renal cell carcinoma assessed by complementary DNA microarray. Am J Pathol. 2003;162: 925–32. pmid:12598325
  12. 12. Takahashi M, Yang XJ, Sugimura J, Backdahl J, Tretiakova M, Qian C-N, et al. Molecular subclassification of kidney tumors and the discovery of new diagnostic markers. Oncogene. 2003;22: 6810–6818. pmid:14555994
  13. 13. Zhao H, Ljungberg B, Grankvist K, Rasmuson T, Tibshirani R, Brooks JD. Gene expression profiling predicts survival in conventional renal cell carcinoma. Marincola F, editor. PLoS Med. 2006;3: e13. pmid:16318415
  14. 14. Pietrzak M, Rempała GA, Seweryn MM, Wesołowski J. Limit theorems for empirical Rényi entropy and divergence with applications to molecular diversity analysis. Test. Springer Berlin Heidelberg; 2016;25: 654–673.
  15. 15. Bindea G, Mlecnik B, Hackl H, Charoentong P, Tosolini M, Kirilovsky A, et al. ClueGO: A Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics. Oxford University Press; 2009;25: 1091–1093. pmid:19237447
  16. 16. McInnes L, Healy J, Melville J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. 2018; Available:
  17. 17. Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature. 2013;499: 43–49. pmid:23792563
  18. 18. Hall DP, Cost NG, Hegde S, Kellner E, Mikhaylova O, Stratton Y, et al. TRPM3 and miR-204 establish a regulatory circuit that controls oncogenic autophagy in clear cell renal cell carcinoma. Cancer Cell. 2014;26: 738–753. pmid:25517751
  19. 19. Morrissey JJ, Kharasch ED. The Specificity of Urinary Aquaporin 1 and Perilipin 2 to Screen for Renal Cell Carcinoma. J Urol. 2013;189: 1913–1920. pmid:23154208
  20. 20. Ning X-H, Li T, Gong Y-Q, He Q, Shen QI, Peng S-H, et al. Association between FBP1 and hypoxia-related gene expression in clear cell renal cell carcinoma. Oncol Lett. 2016;11: 4095–4098. pmid:27313747
  21. 21. Liu Q, Zhao S, Su P-F, Yu S. Gene and isoform expression signatures associated with tumor stage in kidney renal clear cell carcinoma. BMC Syst Biol. BioMed Central Ltd; 2013;7: S7. pmid:24564989
  22. 22. Klatt MG, Kowalewski DJ, Schuster H, Di Marco M, Hennenlotter J, Stenzl A, et al. Carcinogenesis of renal cell carcinoma reflected in HLA ligands: A novel approach for synergistic peptide vaccination design. Oncoimmunology. 2016;5: e1204504. pmid:27622074
  23. 23. Quaas A, Rahvar A-H, Burdelski C, Koop C, Eichelberg C, Rink M, et al. βIII-tubulin overexpression is linked to aggressive tumor features and shortened survival in clear cell renal cell carcinoma. World J Urol. 2015;33: 1561–1569. pmid:25527909
  24. 24. Ibanez de Caceres I, Dulaimi E, Hoffman AM, Al-Saleem T, Uzzo RG, Cairns P. Identification of novel target genes by an epigenetic reactivation screen of renal cancer. Cancer Res. 2006;66: 5021–5028. pmid:16707423
  25. 25. Sanders E, Diehl S. Analysis and interpretation of transcriptomic data obtained from extended Warburg effect genes in patients with clear cell renal cell carcinoma. Oncoscience. 2015;2: 151–186. pmid:25859558
  26. 26. Packer LM, Pavey SJ, Boyle GM, Stark MS, Ayub AL, Rizos H, et al. Gene expression profiling in melanoma identifies novel downstream effectors of p14ARF. Int J Cancer. 2007;121: 784–790. pmid:17450523
  27. 27. Larrinaga G, Pérez I, Sanz B, Blanco L, López JI, Cándenas ML, et al. Angiotensin-converting enzymes (ACE and ACE2) are downregulated in renal tumors. Regul Pept. Elsevier B.V.; 2010;165: 218–223. pmid:20692300
  28. 28. Sen T, Sen N, Noordhuis MG, Ravi R, Wu T-C, Ha PK, et al. OGDHL is a modifier of AKT-dependent signaling and NF-κB function. Li J, editor. PLoS One. 2012;7: e48770. pmid:23152800
  29. 29. Sun Y, Li X. The canonical wnt signal restricts the glycogen synthase kinase 3/fbw7-dependent ubiquitination and degradation of eya1 phosphatase. Mol Cell Biol. 2014;34: 2409–2417. pmid:24752894
  30. 30. Grunewald TGP, Bach H, Cossarizza A, Matsumoto I. The STEAP protein family: Versatile oxidoreductases and targets for cancer immunotherapy with overlapping and distinct cellular functions. Biol Cell. 2012;104: 641–657. pmid:22804687
  31. 31. Tao Q, Fujimoto J, Men T, Ye X, Deng J, Lacroix L, et al. Identification of the retinoic acid-inducible Gprc5a as a new lung tumor suppressor gene. J Natl Cancer Inst. 2007;99: 1668–1682. pmid:18000218
  32. 32. Wang L, Diao H, Zhou H, Li X, Chen Q, Jiang Z, et al. Cartilage oligomeric matrix protein (COMP)-mediated cell differentiation to proteolysis mechanism networks from human normal adjacent tissues to lung adenocarcinoma. Anal Cell Pathol. 2013;36: 93–105. pmid:24064399
  33. 33. Croze ML, Soulage CO. Potential role and therapeutic interests of myo-inositol in metabolic diseases. Biochimie. Elsevier Masson SAS; 2013;95: 1811–1827. pmid:23764390
  34. 34. Takemura Y, Koshimichi M, Sugimoto K, Yanagida H, Fujita S, Miyazawa T, et al. A tubulointerstitial nephritis antigen gene defect causes childhood-onset chronic renal failure. Pediatr Nephrol. 2010;25: 1349–1353. pmid:20157734
  35. 35. Shoji T, Hatsuda S, Tsuchikura S, Kimoto E, Kakiya R, Tahara H, et al. Plasma angiopoietin-like protein 3 (ANGPTL3) concentration is associated with uremic dyslipidemia. Atherosclerosis. 2009;207: 579–584. pmid:19540497
  36. 36. Kamińska K, Czarnecka AM, Escudier B, Lian F, Szczylik C. Interleukin-6 as an emerging regulator of renal cell cancer. Urol Oncol Semin Orig Investig. 2015;33: 476–485. pmid:26296264
  37. 37. Paule B, Belot J, Rudant C, Coulombel C, Abbou CC. The importance of IL-6 protein expression in primary human renal cell carcinoma: an immunohistochemical study. J Clin Pathol. 2000;53: 388–390. pmid:10889822
  38. 38. Papadopoulos EI, Petraki C, Gregorakis A, Chra E, Fragoulis EG, Scorilas A. L-DOPA decarboxylase mRNA levels provide high diagnostic accuracy and discrimination between clear cell and non-clear cell subtypes in renal cell carcinoma. Clin Biochem. 2015;48: 590–595. pmid:25721989
  39. 39. Doberstein K, Pfeilschifter J, Gutwein P. The transcription factor PAX2 regulates ADAM10 expression in renal cell carcinoma. Carcinogenesis. 2011;32: 1713–1723. pmid:21880579
  40. 40. Hueber PA, Iglesias D, Chu LL, Eccles M, Goodyer P. In vivo validation of PAX2 as a target for renal cancer therapy. Cancer Lett. 2008;265: 148–155. pmid:18439754
  41. 41. Zand B, Previs RA, Zacharias NM, Rupaimoole R, Mitamura T, Nagaraja AS, et al. Role of Increased n-acetylaspartate Levels in Cancer. J Natl Cancer Inst. 2016;108: djv426. pmid:26819345
  42. 42. Astudillo L, Therville N, Colacios C, Ségui B, Andrieu-Abadie N, Levade T. Glucosylceramidases and malignancies in mammals. Biochimie. 2016;125: 267–280. pmid:26582417
  43. 43. Yang J, Kalogerou M, Gallacher J, Sampson JR, Shen MH. Renal tumours in a Tsc1+/- mouse model show epigenetic suppression of organic cation transporters Slc22a1, Slc22a2 and Slc22a3, and do not respond to metformin. Eur J Cancer. 2013;49: 1479–1490. pmid:23228442
  44. 44. Kirschner KM, Braun JFW, Jacobi CL, Rudigier LJ, Persson AB, Scholz H. Amine Oxidase Copper-containing 1 (AOC1) is a downstream target gene of the Wilms tumor protein, WT1, during kidney development. J Biol Chem. 2014;289: 24452–24462. pmid:25037221
  45. 45. Taniuchi K, Furihata M, Hanazaki K, Saito M, Saibara T, Taniuchi K, et al. IGF2BP3-mediated translation in cell protrusions promotes cell invasiveness and metastasis of pancreatic cancer. Oncotarget. 2014;5: 6832–6845. 2257 [pii] pmid:25216519
  46. 46. Mattu S, Fornari F, Quagliata L, Perra A, Angioni MM, Petrelli A, et al. The metabolic gene HAO2 is downregulated in hepatocellular carcinoma and predicts metastasis and poor survival. J Hepatol. 2016;64: 891–898. pmid:26658681
  47. 47. Javorhazy A, Farkas N, Beothe T, Pusztai C, Szanto A, Kovacs G. Lack of TMEM27 expression is associated with postoperative progression of clinically localized conventional renal cell carcinoma. J Cancer Res Clin Oncol. 2016;142: 1947–1953. pmid:27417314
  48. 48. Lin D, Chai Y, Izadpanah R, Braun SE, Alt E. NPR3 protects cardiomyocytes from apoptosis through inhibition of cytosolic BRCA1 and TNF-α. Cell Cycle. Taylor & Francis; 2016;15: 2414–2419. pmid:27494651
  49. 49. Köttgen A, Albrecht E, Teumer A, Vitart V, Krumsiek J, Hundertmark C, et al. Genome-wide association analyses identify 18 new loci associated with serum urate concentrations. Nat Genet. 2012;45: 145–154. pmid:23263486
  50. 50. Hou X, Mrug M, Yoder BK, Lefkowitz EJ, Kremmidiotis G, D’Eustachio P, et al. Cystin, a novel cilia-associated protein, is disrupted in the cpk mouse model of polycystic kidney disease. J Clin Invest. 2002;109: 533–540. pmid:11854326
  51. 51. Shantha Kumara H, Kirchoff D, Caballero OL, Su T, Ahmed A, Herath SA, et al. Expression of the cancer testis antigen IGF2BP3 in colorectal cancers; IGF2BP3 holds promise as a specific immunotherapy target. Oncoscience. 2015;2: 607–614. pmid:26244168
  52. 52. Lochhead P, Imamura Y, Morikawa T, Kuchiba A, Yamauchi M, Liao X, et al. Insulin-like growth factor 2 messenger RNA binding protein 3 (IGF2BP3) is a marker of unfavourable prognosis in colorectal cancer. Eur J Cancer. 2012;48: 3405–3413. pmid:22840368