Identification of a 4-microRNA Signature for Clear Cell Renal Cell Carcinoma Metastasis and Prognosis

Renal cell carcinoma (RCC) metastasis portends a poor prognosis and cannot be reliably predicted. Early determination of the metastatic potential of RCC may help guide proper treatment. We analyzed microRNA (miRNA) expression in clear cell RCC (ccRCC) for the purpose of developing a miRNA expression signature to determine the risk of metastasis and prognosis. We used the microarray technology to profile miRNA expression of 78 benign kidney and ccRCC samples. Using 28 localized and metastatic ccRCC specimens as the training cohort and the univariate logistic regression and risk score methods, we developed a miRNA signature model in which the expression levels of miR-10b, miR-139-5p, miR-130b and miR-199b-5p were used to determine the status of ccRCC metastasis. We validated the signature in an independent 40-sample testing cohort of different stages of primary ccRCCs using the microarray data. Within the testing cohort patients who had at least 5 years follow-up if no metastasis developed, the signature showed a high sensitivity and specificity. The risk status was proven to be associated with the cancer-specific survival. Using the most stably expressed miRNA among benign and tumorous kidney tissue as the internal reference for normalization, we successfully converted his signature to be a quantitative PCR (qPCR)-based assay, which showed the same high sensitivity and specificity. The 4-miRNA is associated with ccRCC metastasis and prognosis. The signature is ready for and will benefit from further large clinical cohort validation and has the potential for clinical application.


Introduction
Renal cell carcinoma (RCC) accounts for about 3% of all malignant tumors in adults. Its worldwide incidence and mortality are approximately 209,000 and 102,000 per year respectively, including approximately 39,000 new cases and 13,000 deaths in the United States. [1] Clear cell RCC (ccRCC) represents the most common renal cancer histology, comprising 70-80% of all RCC cases. [2] About 30% of patients with newly diagnosed disease have evidence of metastases at presentation. [3] In the setting of metastasis, few patients achieve a durable remission with currently available therapies, with the response rate of about 15-25% and overall median survival of less than one year. [1] RCC metastasis cannot be reliably predicted based on patients' clinical manifestations, pathologic findings or other currently available laboratory tests. Although several algorithms have been used to predict clinical outcome for patients with metastatic RCC (mRCC) on the basis of clinical and pathologic features, these do not incorporate the more complex biological features of individual patients. [2,4] Recent studies have shown that the metastatic capability of cancer is conferred by genetic changes occurring relatively early in tumorigenesis and that metastatic dissemination may occur continually throughout the course of primary tumor development. [5][6][7] In light of this, it is scientifically and clinically relevant to identify the metastasisspecific molecular biomarkers at the time of nephrectomy to predict ccRCC metastasis. The early identification of ccRCC metastatic potential may be beneficial for a more precise prediction of clinical outcomes and may ultimately be used to identify subsets of patients that may benefit from specific targeted therapies. [1].
MicroRNA (miRNA) is a group of small non-coding RNAs that regulate gene expression during development and differentiation. [8] Alteration of miRNA expression has been shown in malignancies [9][10][11] and plays a critical role in tumorigenesis and cancer progression [12][13]. Studies have specifically shown that certain miRNAs play important roles in various steps of the metastatic cascade, such as the epithelial-mesenchymal transition (EMT), adhesion, migration, invasion, apoptosis and angiogenesis. [14][15] Since one miRNA could regulate the expression of multiple genes, miRNA expression profiles can be more accurate in cancer subtyping than RNA profiles of protein-coding genes. [16][17] Molecular signatures based on miRNA expression have been shown to aid in diagnosis and prognostication of cancer. [18][19].
In this study, we used microarray technology to profile miRNA expression in benign kidney and ccRCC specimens. We analyzed the miRNA expression associated with metastasis in a training cohort to develop a 4-miRNA expression signature model that can determine the metastatic status and predict cancer-specific survival of ccRCC patients. More importantly, this molecular signature has been validated in an independent testing cohort and has also been converted to a quantitative PCR (q-PCR)-based assay. This study is ready for and will benefit from further large clinical cohort validation and has the potential to be applied in a routine clinical setting.

Clinical Characterization of Patients' Specimens in the Training and Testing Cohorts
A set of benign kidney specimens (n = 10) and a 28-sample ccRCC training cohort including localized (pT1, n = 13) and metastatic (M1, n = 15) tumor samples were used to profile miRNA expression in ccRCC and to develop a signature associated with metastasis. In addition, an independent testing cohort of primary tumors from 40 ccRCC patients was used to validate the signature. At the time of nephrectomy, these patients had stage I (pT1, n = 6), II (pT2, n = 5), III (pT3, n = 13) and IV (N2 or M1, n = 16) diseases. In the testing cohort patient group, 35 (35/40) patients had been followed for at least 5 years if no metastasis developed. At presentation, 16 (16/35) had concurrent metastasis and 13 (13/35) developed metastasis in the follow-up period, while 6 (6/35) did not have metastatic disease during the follow-up period. The clinical characteristics of the specimens are summarized in Table 1 and Table S1.

Profiling of miRNA Expression in ccRCCs
Using the Agilent microarray technology, the miRNA expression of all of the benign kidney samples (n = 10) and the training cohort specimens (n = 28) was profiled. An unsupervised hierarchical clustering using these miRNA expression data could separate the benign and tumor samples (Figure 1). With a cutoff of a 2-fold change and FDR #0.05, 56 miRNAs were found to be aberrantly expressed in ccRCCs; 29 were up-regulated and 27 were down-regulated (Table 2). Within the tumor group, 21 miRNAs were found to be differentially expressed between localized and metastatic specimens; 7 were upregulated and 14 were down-regulated in the metastatic tumors (Table 3).
Developing a 4-miRNA Signature Model for the Determination of the Status of ccRCC Metastasis Patients with stage I (T1) ccRCC usually have a favorable clinical outcome and their 5-year survival reaches 95% post nephrectomy. [20] In the study, T1 tumors were considered to be ''good'' tumors and were used to represent the control samples to compare with the metastatic ccRCCs. Using a univariate logistic regression test and Leave-One-Out cross validation (LOOCV) within the training set, the optimal p value cut-off to select the miRNAs associated with metastasis was determined. A range of p values were tested in this LOOCV test and the p value ,0.01 was determined due to its best performance among all the p value cutoffs tested. Additionally, at least 2-fold change difference between the miRNA expression in metastatic and localized tumors was used to identify all the miRNAs that showed the largest difference between metastatic and local tumors. Four miRNAs, miR-10b, miR-139-5p, miR-130b and miR-199b-5p, satisfied the above criteria, and hence were selected to build a metastatic tumor signature. miR-199b-5p and miR-130b were over-expressed in metastatic tumors, while miR-10b and miR-139-5p were downregulated ( Figure 2A).
We used a risk score method to construct a signature model for ccRCC metastasis. [18] Specifically, the risk score formula is a linear combination of the expression levels of all the 4 miRNAs, weighted by the regression coefficients derived from the univariate logistic regression analysis, which is described as following: Risk score = 21.2755646X miR-10b +2.1067016X miR-130b -2.2781926X miR-139-5p +1.1011396X miR-199b-5p.
The next step was to determine a cut-off point for a risk score to stratify patients into a group of high or low risk for metastasis. The Table 1. Clinical characteristics of patients and tumor specimens (n = 68) in the training and testing cohorts. risk score of each patient in the training set was calculated using the signature model developed, and the FPR and TPR within a range of cut-off scores were computed. The cut-off point of 28.12 was selected since it gave the best FPR and TPR ( Figure 3). Therefore, a 4-miRNA signature model was developed to determine the risk status of tumor metastasis, in which a score $28.12 indicates high risk.

Validation of the 4-miRNA Signature in an Additional Independent Testing Cohort
To validate the signature, we used the independent testing cohort of primary ccRCCs. Each specimen was predicted to be either high or low risk based on its calculated risk score using the signature. The predicted risk status for each patient was then compared to the clinical outcome. Of 35 (35/40) patients with at least 5-year follow-up if no metastasis developed, 22 of 29 (22/29) that had metastatic disease had high risk primary tumors while 6 of 6 (6/6) with no metastasis had low risk tumors predicted by the signature. This gave a sensitivity of 76% and a specificity of 100%. Specifically, 13 of 16 patients (81%) with concurrent metastasis were predicted to be of high risk; 9 of 13 (69%) with subsequent metastasis, including 2 of 5 (2/5, 40%) with T1/2 tumors and 8 of 9 (8/9, 89%) with T3 tumors, were predicted to be of high risk; and 6 of 6 (100%) without metastasis were predicted to be of low risk ( Figure 4A). For all 40 patients with or without 5-year follow-up, the signature showed a sensitivity of 76% (22/29) and a specificity of 64% (7/11). If patients with concurrent metastasis (stage IV) were not included, the sensitivity was 69% (9/13) and the specificity remained the same (7/11, 64%). In the additional 5 primary ccRCC specimens, 1 (1/5) was predicted to be of low risk and 4 (4/5) was predicted to be of high risk. However, these specimens were collected within the last two years, and whether these patients will develop metastasis is not known.
Interestingly, all 4 patients predicted to have high risk had stage III diseases and the 1 predicted to have low risk had stage I disease.
The risk score of each ccRCC specimen determined by the 4-miRNA signature model is associated with the status of metastasis (OR = 5.50, 95% CI = 1.23-24.51, p,0.05). Other varieties, such as a patient's sex, age, tumor grade and stage, did not reliably predict metastasis (Table 4).

The 4-miRNA Signature Correlates with Overall Cancerspecific Survivals
We were also interested in examining whether this signature model could be independently associated with the cancer-specific survival. With patients in the combined training and testing cohorts (n = 68), a univariate Cox regression analysis showed that the predicted risk status was a significant prognostic factor for the patient's cancer-specific survival ( Table 5). The relative risk for patients predicted to be of high risk was 12.68 compared to patients of low risk (HR = 12.68, 95% CI = 2.97254.13, p,0.0001). The stage of disease was the only other significant prognostic factor, while age, sex, tumor grade and size were not correlated with survival. Patients predicted to be of high risk had a 5-year survival rate of only 32%, whereas those of low risk had a 5year survival rate of 84% ( Figure 5A).

Converting the Microarray-based Signature to a RT-PCR Based Assay
The greatest challenge for performing RT-PCR based tissue miRNA expression analysis is to find a reliable reference miRNA or small RNA for the test normalization. To further develop a 4-miRNA signature assay using a RT-PCR platform, the microarray database of miRNA expression in all of the benign and tumor kidney samples (n = 78) was analyzed. miR-24 was found to be the most stably expressed in all of the -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00  (Table 6). Therefore, miR-24 was selected as a reference miRNA for normalization.
Each of the 4 miRNAs selected for the signature in each specimen in the training and testing cohorts were used, and their expression, normalized by that of miR-24, was analyzed using ABI TaqMan MicroRNA Assay. Similar to the microarray study described, using the training cohort, a PCR-based risk score formula model (Risk score = 1.4315596X miR-10b -1.5305096X miR-130b +1.8881446X miR-139-5p -2.5692806X miR-199b-5p ) was constructed and the corresponding high risk score cut-off (218.11) was determined ( Figure 2B). The signature was then validated using the testing cohort. For the 35 primary tumor cases with follow-up,   the sensitivity and specificity of the qPCR-based signature were 72% (21/29) and 100% (6/6). Among these cases, 12 of 16 (75%) with concurrent metastasis were predicted to be of high risk; 9 of 13 (69%) with subsequent metastasis, including 2/5 (40%) T1/2 and 8/9 (89%) tumors, were predicted to be of high risk; and 6 of 6 (100%) without metastasis were predicted to be of low risk ( Figure 4B). For all 40 cases with or without follow-up, the overall sensitivity and specificity were 72% (21/29) and 64% (7/11). The signature was also found to be significantly associated with cancerspecific survival (HR = 8.8, 95% CI = 2.62229.58, p,0.0001) ( Figure 5B, Table 5).

Discussion
Generally, mRCC has an extremely poor prognosis. [1] Early identification of patients with high risk for cancer metastasis can enhance disease outcome prediction, stratify patients for suitable treatment and potential clinical trials and, ultimately, decrease cancer-specific mortality.
miRNA plays important roles in tumorigenesis and progression. Many miRNAs reported to be dysregulated in RCC were also seen in our current study. [21][22][23][24] Studies of cancer metastasis have shown that certain miRNAs, termed ''metastamir'', were specifically involved in the critical steps of the metastatic cascade and appeared to be either pro-metastatic or anti-metastatic by regulating their target genes. [15] In the current study, we identified and used the altered expression of miR-10b, miR-130b, miR-139-5p and miR-199b-5p to generate a metastasis-specific signature. miR-139-5p is down-regulated in endometrial serous and gastric adenocarcinoma. [19,25] Overexpression of miR-130b is involved in the growth control of breast epithelial cells via the modulation of the cell cycle inhibitor p21 Waf1/Cip1 . [26] Altered expression of miR-199b-5p is associated with HES-1 gene regulation and metastatic spread of medulloblastoma. [27] Dysregulation of miR-10b has been observed in malignant glial tumors, esophageal cancer cell lines and primary breast cancer, though whether it is involved in breast metastasis was in debate. [28][29][30][31][32] In our study, miR-10b has been found to be down-regulated in ccRCCs. The expression appears to be even lower in metastatic ccRCCs than that in localized non-metastatic tumors. Our preliminary data revealed that the overexpression and knockdown of miR-10b in a cell line derived from a metastatic ccRCC caused decrease and increase in proliferation and invasion of tumor cells, respectively (data not shown), which might be involved in regulating CDK6 and other target genes (www.miRBase.com and www.oncomine.org).
The 4-miRNA signature is associated with ccRCC metastasis. Though the validation test has shown that the signature appeared to be more powerful in identifying concurrent metastases (81%), the signature is also significantly associated with subsequent/future metastasis of the primary tumors (69%), including patients with T3 (89%) and T1/2 (40%) diseases. Patients with stage I or II ccRCCs usually have 5-year survival of 95% and 88% and are often less likely to develop metastasis compared to those with late stage diseases. [10] Clinically, it is extremely helpful if the metastatic potential of T3 tumors can be predicted early, ideally at the time of nephrectomy. Due to the limitation of sample size, the signature model warrants and will benefit from further large cohort validations.
Currently, there is no clinically available molecular assay to predict ccRCC metastasis. A retrospective study reported that IMP3 expression analysis by immunohistochemistry could predict RCC metastasis and prognosis. [33] The study identified IMP3positive tumors in 59/95 metastatic RCCs, 60/119 primary RCCs with metastasis and 11/287 primary RCCs with no metastasis, rendering an overall sensitivity of 56%, specificity of 96% and a hazard ratio of 5.66. Our 4-miRNA signature achieves a higher sensitivity (76% in overall and 69% for predicting tumor with future metastasis), specificity (100%) and hazard ratio (12.68) as compared to the IMP3 study. However, our current study has fewer cases tested and is only limited to the clear cell type of RCCs. We are planning to evaluate our signature model using much larger cohorts and to test the effectiveness of the current model for other types of RCCs.
Our signature has also shown its association with disease prognosis. Currently, the UCLA Integrated Staging System (UISS) is a widely used prognostic tool for RCC patient' outcome. It classifies cases into high, intermediate and low risk groups, based on tumor stage, histological grade and Eastern Cooperative Oncology Group (ECOG) performance status (PS). [34] As reported in international multi-center studies [34][35], the overall 5-year cancer-specific survival rates estimated by the UISS were 92-94%, 65-78% and 30-48% for the low, intermediate and high risk group patients, respectively. To compare the UISS with our signature, we assigned a UISS risk score to the 35 of 40 testing cohort cases with available information of ECOG performance status. The predicted 5-year cancer-specific survival rates were 0%, 63% and 52% for the high, intermediate and low risk group patients, respectively, by UISS, compared to 32% and 84% for the high and low risk patients, respectively, by the 4-miRNA signature ( Figure 6). The UISS score seems not to be a significant prognostic factor for the cases tested in our testing cohort. However, there are only 35 cases tested, this finding might not be representative. Our risk scores based on both microarray and RT-PCR are statistically significant ( Table 7). The hazard ratio of our high versus low risk status is 6.81 (95% CI = 1.52230.53, p value ,0.01) and 4.88 (95% CI = 1.37217.38, p value ,0.01), by microarray and qPCR, respectively.
In the study, we have found that the clinical stage was the only other significant prognostic factor. Patients' age, sex, tumor grade   and size have not been found to be significantly correlated with survival using our data. As mentioned previously, the sample size of our study is relatively small and further large cohort validation is definitively needed for more accurate analysis.
The miRNA signature developed from the current study has the potential to be applied in a routine clinical setting. Certainly, converting a microarray-derived signature to a PCR-based test will make the signature assay more practical for a clinical laboratory usage. It is always very challenging to perform qPCR-based miRNA expression studies in clinical tissues, mainly because there have been no reliable conventionally known or commercially available reference miRNAs or other small RNAs to serve as house-keeping genes in mRNA expression studies. This probably explains why many published PCR-based clinical tissue studies of miRNA expression are not reproducible. It has been suggested to use miR-191 and miR-103 for tissue miRNA normalization. [36] However, miRNA expression is very tissue-specific. [8] Some miRNAs stably expressed in certain tissue types might be expressed differently in other tissue types. Having carefully analyzed our own microarray data, we found that miR-24 is most constantly and stably expressed among all the benign and malignant kidney specimens. The 4-miRNA signature based on qPCR data also showed a high sensitivity (72%) and specificity (100%), as well as a similar association with cancer-specific death, which further validated our microarray results and provided the technologic basis for a possible larger scale qPCR-based validation. Our recent study has shown that formalin-fixed paraffin-embedded (FFPE) samples can be reliably used for miRNA expression profiling studies. [24] We are planning to validate the signature developed in the current study using a larger FFPE tissue cohort and to evaluate it in the context of specific therapies.

Tissue Sample Preparation and Total RNA Extraction
A total of 78 frozen benign kidney and ccRCC specimens were used for the study. All the samples were selected from the frozen tissue specimens stored at the City of Hope National Medical Center (COH) Tumor Bank. All the available frozen specimens collected from ccRCC patients at COH between 1986 and 2008 which were included for the study were first tested using the 2100 Bioanalyzer (Agilent Technologies, Inc., Santa Clara CA) as quality control (total RNAs with the quality index .5.0 for each specimen).
Specifically, the benign kidney tissue and primary tumors were sampled from the nephrectomy specimens. All the qualified benign samples (n = 10) were randomly selected from the available specimen collection. All the localized ccRCC samples (n = 13) for the training cohort were pT1 (stage I) primary tumors with no reported subsequent metastasis were randomly selected from the same available collection. The metastatic ccRCC samples (n = 15) used for the training cohort included all the available frozen specimens taken from the resection/biopsy specimens of distant metastases or lymph nodes during the time period. All the remaining available primary ccRCC specimens were used for the testing cohort (n = 40). The information of the patients' age, sex, race and clinical tumor stage is listed in a table (see Table S1).
The samples were snap-frozen shortly after operation and had been stored at -80uC at the COH Tumor Bank. The protocol for using these samples was approved by the COH Cancer Protocol Review and Monitoring Committee (CPRMC) and Institutional Review Board (IRB). A waiver of informed consent and HIPAA authorization has also been approved by the COH IRB. Total RNA was extracted from up to 10 sections (10 mm in thickness) of each sample as described previously. [24].

Microarray Analysis for miRNA
Microarray testing of miRNA expression was performed at the COH Microarray Core using the Agilent human miRNA microarray V2 (Agilent Technologies, Inc., Santa Clara CA), which contains probes for 723 human miRNAs from Sanger miRBase 10.1, as described previously. [24] Briefly, 1 mg total RNA was labeled with Cy3 with T4 RNA ligase and hybridized to the array for 20 hours at 55uC. The arrays were then washed and scanned using an Agilent scanner with default settings. Scanned images were subject to Agilent Feature Extraction Software v. 10.5 for raw data processing.

Statistical Analysis and the Method of miRNA Signature Construction
The analysis was performed using R statistical language. Raw data from Agilent miRNA array was processed by Quantile normalization, followed by log2 transformation with an offset of 1.  [37] Differentially expressed miRNAs between tumor (training cohort) and benign samples were selected using t-test with a FDR #0.05 and fold change of 2.
To develop the miRNA signature, univariate logistic regression analysis was used to identify miRNAs that were associated with metastasis. Specifically, the miRNA signature development consists of the following steps: 1) Univariate logistic regression analysis was used to identify miRNAs that were associated with metastasis; 2) A mathematical formula based on the expression levels of the identified miRNAs was developed to assign a risk score for each patient; 3) A risk score cut-off was determined to classify each patient into a high or low risk group.
Step 1 is a feature selection step, and step 2 and 3 are model building steps. In step 1, a range of p values (0.05, 0.02, 0.01, 0.005, 0.002 and 0.001) were tested with LOOCV and found the best p value cutoff of 0.01. Specifically, at each iteration step of the cross validation, one sample was tested (the test sample) while the others remained in the training group (n = 28-1). During the process, the feature selection and formula development were repeated within each iteration step and the signature model was    [38], the feature selection and signature model building steps were entirely independent of the test sample. This is critical to ensure that the performance of the signature model formula developed can be estimated without any bias. Using LOOCV, we could achieve a minimal error rate with a p value ,0.01. We also arbitrarily required a fold-change between metastatic and localized specimens of $2, which could help to develop a PCR-based assay for the signature. These criteria resulted in 4 miRNAs that were significantly associated with metastasis.
To investigate the effectiveness of these four miRNAs as a signature to determine the status of metastasis, a mathematical formula constructed, taking into account both the strength and the positive or negative association of each miRNA with metastasis. More specifically, a risk score was calculated for each patient in the training cohort group using the formula, which was a linear combination of the expression levels of the miRNAs, weighted by the regression coefficients derived from the aforementioned univariate logistic regression analysis. To choose the optimal risk score cutoff, a range of scores were tested to stratify these patients into high and low risk groups. The false positive rate (FPR) and true positive rate (TPR) of these cutoffs were calculated and a risk score cutoff point was selected based on the lowest FPR and highest TPR (FPR = 8%, TPR = 100%) ( Figure 3). Therefore, a miRNA signature model, which consists of a risk score formula and a high risk score cutoff, was developed to classify patients into high and low risk groups for developing metastasis.
The performance of the signature was further validated using the additional independent testing cohort data set (n = 40), in which each patient's risk for developing metastasis was determined based on the calculated risk score and then compared to the clinical follow-up information.
To investigate whether the 4-miRNA signature was also an independent prognostic factor for cancer specific survival, univariate Cox regression analysis was used to examine the patients' risk status based on the signature, patient age and gender, tumor histologic grade and size, clinical stage and available UISS score (see discussion). A p value ,0.05 was used to determine significance.

RT-PCR Testing
In each sample, the expression of hsa-miR-10b, 130b, 139-5p and 199b-5p was analyzed using RT-PCR TaqMan MicroRNA Assays and 7900 HT Fast Real-time PCR System (Applied Biosystems, Carlsbad, CA). Briefly, 10 ng of total RNA from each sample was subjected to reverse-transcription forming 1st strand cDNA with mature miRNAs specific primers containing stem loop, followed by real-time PCR with TaqMan probes. PCR reactions for each sample were carried out in triplicate. Each miRNA expression, normalized by hsa-miR-24, was quantified using the formula X = 2 -DCT , where DCT = CT (miR-X) -CT (miR-24) .

Supporting Information
Table S1 Patients' Information.