Pre-Micro RNA Signatures Delineate Stages of Endothelial Cell Transformation in Kaposi Sarcoma

MicroRNAs (miRNA) have emerged as key regulators of cell lineage differentiation and cancer. We used precursor miRNA profiling by a novel real-time QPCR method (i) to define progressive stages of endothelial cell transformation cumulating in Kaposi sarcoma (KS) and (ii) to identify specific miRNAs that serve as biomarkers for tumor progression. We were able to compare primary patient biopsies to well-established culture and mouse tumor models. Loss of mir-221 and gain of mir-15 expression demarked the transition from merely immortalized to fully tumorigenic endothelial cells. Mir-140 and Kaposi sarcoma–associated herpesvirus viral miRNAs increased linearly with the degree of transformation. Mir-24 emerged as a biomarker specific for KS.


Introduction
Kaposi sarcoma (KS) is one of the few human cancers of endothelial origin. KS remains the most frequent AIDS-associated malignancy even in populations with ready access to highly active anti-retroviral therapy (HAART) [1,2]. Today, approximately one third of AIDS-KS tumors develop in patients on successful longterm HAART, i.e. with near normal T lymphocyte counts and undetectable HIV viral loads [3]. In sub-Saharan Africa, KS ranks among the most common cancers overall since HIV turned the endemic form of this disease into an epidemic. By comparison to epithelial cancers, endothelial-lineage cancers are less common still, and most study endothelial cells because of their ancillary role in tumor angiogenesis rather than their role as the driving force of tumor formation. In KS, endothelial lineage cells drive tumor growth. Recent data suggest that tumor-associated stromal cells, including endothelial cells can acquire epigenetic or perhaps even genetic features of transformation, which in turn support tumor growth [4,5,6]. KS offers the opportunity to study endothelial cell transformation and tumorigenesis in detail, and miRNAs provide one possible means of large-scale, stable epigenetic reprogramming. To test the hypothesis that miRNA signatures delineate progressive stages of endothelial cell transformation resulting in metastatic KS, we used high throughput, quantitative real-time PCR-based pre-miRNA profiling.
KS is tightly associated with Kaposi sarcoma associated herpesvirus (KSHV) [7,8,9]. Every tumor cell carries the virus and expresses at least the viral latent proteins [10,11]. Here, we show for the first time using primary patient biopsies that every KS tumor transcribes the viral miRNAs (miRNA) as well.
KSHV is also the etiological agent of the B cell lineage tumor primary effusion lymphoma (PEL), as well as the B cell lineage hyperplasia, plasmablastic variant of multicentric Castleman disease (MCD) [12,13]. This dichotomous phenotype (lymphoid/ endothelial) allows testing of the hypotheses that the miRNA profile for these two cancers reflect (a) their tissue of origin, (b) progressive cancer signatures, (c) a signature induced by latent viral infection or (d) a combination of all.
The miRNAs have emerged as master regulators of cell lineage differentiation and key modulators of cancer (reviewed in [14]). They are small, 22 nucleotide non-coding RNA molecules that, upon incorporation into the cytoplasmic RNA-induced silencing complex (RISC), can inhibit translation of target messenger RNAs and ultimately target them for degradation. At present, the Sanger database has recorded 678 human miRNAs [15] each capable of targeting up to several hundred different mRNAs. KSHV encodes multiple viral miRNAs [16,17,18,19], including a viral ortholog to miR-155 [20,21]. Even though some targets for these viral miRNAs have been identified [22], exactly how they function in KS tumorigenesis is unresolved.
MiRNA profiling has provided invaluable insights into tissue development and cancer. Many tumor-specific and cell lineagespecific signatures have been compiled (e.g. [23,24,25] and many others). We previously established the miRNA signature for PEL [26] using real-time quantitative PCR. Pre-miRNA profiling has also been used successfully to stratify human tumors. It often correlates well with mature miRNA levels [27,28,29], but we also found that pre-miRNA profiling provides non-redundant information with utility for tumor classification. Pre-miRNAs are an intermediate product for mature miRNAs, analogous-in the widest sense-to mRNAs being an intermediate product for proteins. They are generated by Drosher and DGCR8 from the nascent pri-miRNAs and eventually exported by Exportin-5 to the cytoplasm. For the purpose of profiling they offer advantages because they are longer (,70 nt), each nucleotide contributing additional specificity in diagnostic assay, as opposed to mature miRNAs, which because of their limited target region of only ,22 nt impose limitations due to cross hybridization and variable primer/probe annealing efficiency for different miRNAs. Here, we use real-time QPCR-based profiling to discern pre-miRNAs that identify KS, KSHV infection and distinct, progressive stages of endothelial cell transformation.
KSHV transforms primary human endothelial cells in culture [30], though this is a rare event. KSHV infection consistently only leads to morphological alterations (''spindling'') and reduced growth factor dependence [31,32,33,34]. KSHV infection of immortalized human endothelial cells leads to extended survival, and growth factor independence [35,36,37] but not complete transformation, as defined by the ability to form tumors in nude mice. Importantly, during latent viral infection, lymphatic endothelial cell differentiation markers remain expressed, and if not already of lymphatic endothelial origin, KSHV is capable of inducing this phenotype in human endothelial cell preparations derived from other tissues such as the vasculature [35,38,39]. KSHV infection per se does not induce dedifferentiation of lineage-committed lymphatic endothelial cells. A study by An et al. succeeded in deriving two fully tumorigenic clones of lymphatic endothelial cells (TIVE E1 and TIVE L1) that maintain KSHV in the absence of selection [40]. Introduction of KSHV into murine endothelial progenitor cell preparations also resulted in the clonal outgrowth of at least one fully transformed cell line [41]. By contrast, attempts to culture tumor cells directly from KS lesions largely failed. Today, we have only a single KS tumor derived cell line, SLK, which is fully transformed, but has lost the KSHV genome [42]. Together with primary KS biopsies, these culture systems exemplify multiple stages of endothelial cell cancer progression (Table 1). In this study, we used high throughput profiling to identify cell lineage and cancer progression stagespecific pre-miRNAs for this representative set of KSHV-infected and uninfected immortalized endothelial cells, KS biopsies and PEL lymphoma cell lines.

Experimental Approach
A total of 47 samples were selected for profiling at the DNA and pre-miRNA level ( Table 1 and Table S1). These include the largest number of PEL cell lines to date (n = 14). Four KSHV-

Author Summary
MicroRNAs are key regulators of cancer and development. We can use their pattern of expression to classify different cancers or in our case different stages in cancer development. We used a novel method to define progressive stages for Kaposi sarcoma (KS), which is a cancer of the endothelial cells. We identified specific precursor-miRNAs that accurately identify stages of KS tumor progression. For the first time, we were able to profile KS patient material. This is difficult to come by, but it is more closely related to the human cancer than even the best cell culture models. Our work statistically defined clusters of pre-miRNAs, each signifying one step in cancer progression. This is the first time that precursor miRNAs profiling was used to define cancer stages. It is also the first time that we have defined makers of any kind that allow us to distinguish between different types of the endothelial cancer KS. Loss of mir-221 precursor miRNA and gain of mir-15 precursor miRNA expression demarked the transition from merely immortal to fully tumorigenic cells. Mir-140 and Kaposi sarcomaassociated herpesvirus viral microRNAs increased progressively with the degree of transformation, i.e. more aggressive stages expressed higher levels of these biomarkers. High levels of the precursor microRNA mir-24 emerged as a biomarker only in patient derived KS samples, not in any of the culture models.
negative Burkitt lymphoma cell lines were included as controls as they are expected to transcribe miRNAs that are common to B lineage lymphomas. The pre-miRNA difference between these samples and PEL defines part of the PEL signature [26]. For this study, we added 9 tonsil tissues as a normal tissue control. These serve to determine which pre-miRNAs are highly abundant in B cells, as tonsils consist of over 50% B cells, including germinal center (GC) B cells, which many assume to be the normal precursor of PEL [43,44,45]. Two T-cell lymphoma cell lines were included to differentiate T cell pre-miRNAs.
For the first time, KS primary biopsies were also assessed for pre-miRNA transcription. We collected 9 AIDS-KS skin biopsies from the Americas. The biopsies were collected by individuals with experience in KS clinical trials. They are considered representative lesions for the purpose of tumor and response staging. The majority of cells in each biopsy are KSHV-infected endothelial cells [11,46,47]. All biopsies were from male subjects with a median age of 44 years (range . Patients had biopsyconfirmed KS and were on HAART as well as concurrent chemotherapy. Their median CD4 count was 78 cells/microliter (range 7-402). CD4 counts were not available for one subject. All patients had extensive cutaneous KS and with the exception of one are alive at present. Tumor samples were obtained within the last three years. Hence, these patients represent the current post-HAART AIDS epidemic.
Two immortalized virus-negative endothelial cell lines were included in the arrays, as well as isogenic controls carrying latent KSHV [36]. A similar model exists in the E1 TIVE and L1 TIVE cell lines [40]. These currently represent the best human cell culture tumor model for KS, as these two cell lines induce KS-like tumors in nude mice with 100% efficiency. Also included is the only known KS-derived cell line, SLK [42], which has lost the KSHV genome, but is tumorigenic in mice.
As positive control we used DNA as the input and real-time QPCR and primers directed against the pre-miRNA as described [26]. We were able to independently verify KSHV-infection status for each sample. Likewise, the EBV miRNA genes were detectable only in the EBV-positive PEL and BL cell lines, but not KS or any other samples. EBV miRNA genes were not detectable in normal tonsil tissue. The relative copy number for the KSHV miRNA genes was significantly lower in KSHV carrying endothelial cell lines compared to PEL (see Figure S4). This is consistent with earlier reports that PEL carry more viral plasmids (50,100 copies/cell) than KS (,10 copies/cell) and KSHV-infected endothelial cell cultures [46,48,49,50]. Among the KSHV infected endothelial cell models, the HMVEC carried the highest KSHV genome number, suggesting that they are most capable of maintaining high levels of the KSHV plasmid. This is consistent with earlier studies showing that not all endothelial cells are equally permissive for KSHV infection, which drives reprogramming towards lymphatic endothelial cells [35,38,39].
Unsupervised clustering reveals the pre-miRNA profile of KS Our pre-miRNA data set, which included 160 primer pairs, representing 145 cellular miRNAs, 9 viral miRNAs, 2 viral mRNAs and 4 cellular RNAs (U6), and 47 samples consisted of .20,000 individual data points. QPCR measures target abundance on a 2 log scale with higher CT numbers reflecting lower abundance. For this analysis, the average of the triplicate CT values was taken. These were normalized to U6 levels, to give dCT. Note that dCT values represent the underlying pre-miRNA levels on a 2 log scale thus facilitating robust clustering [51,52].
Following normalization, each sample set was Z-standardized to remove variation between samples [53,54]. Figure 1 shows the heatmap representation after hierarchical clustering for the full panel of samples, with red indicating a higher level of expression and blue indicating a lower level of expression compared to the median of all data (white). 6 distinct groups were identified. These represent the minimal number of non-overlapping clusters based on principal component analysis (PCA) (data not shown). The first two groups represent the pre-miRNAs that are unchanged across all samples, those with low levels of expression (I in blue) and those with high levels of expression across all samples (II in red). The KSHV pre-miRNAs all cluster in group III. Group IV represents the pre-miRNAs that are downregulated in KSHV-positive cells. 20 miRNAs are contained in this group. Group V represents 11 cellular miRNAs that are highly expressed in immortalized HUVEC and HMVEC cells, both uninfected and KSHV-infected, but not any of the tumor cell lines and biopsies. They do not appear to be significantly enriched in any of the other endothelial cell types (KS or TIVE). Finally, group VI contains cellular miRNAs that are downregulated in all B-cell lymphomas, including PEL, vis-àvis tonsil and KS.
To remove the impact of lineage-specific determinants [B cell (PEL and Tonsil) vs. endothelial cell] from the analysis, we analyzed the two KSHV-associated cell types separately. Our analysis of PEL specific miRNAs was previously published [26] and analysis of the extended data set confirmed this observation (data not shown). When the endothelial-derived subset of samples was analyzed alone, a clearer picture emerged that highlights similarities and disparities between different stages of endothelial cell transformation (Figure 2A). The groups represent the minimal number of non-overlapping clusters based on PCA (data not shown). The first two groups (I and II) represent miRNAs with minimal discernable patterns across all samples-at least at the power of our analysis. Blue indicates low levels while red indicates comparable high levels of miRNAs, vis-à-vis the median of all data in this set. This is not to say that pre-miRNAs within these two clusters did not exhibit any change between samples classes, only that these changes were smaller compared to others and therefore less interesting from a biomarker perspective. For example, mir-222 clusters in group II because it was more highly transcribed in all samples relative to 50% of all other pre-miRNAs. Nevertheless mir-222 was downregulated in KSHV-infected, tumorigenic samples, compared to EC. The pattern of mir-222 parallels that of mir-221, which is expected because of their known coregulation [55,56]. However, the range of change was much larger for mir-221 as seen in group IV.
To demark the degree of viral latent transcription, LANA mRNA levels are shown ( Figure 2B). LANA is transcribed in all KSHV-positive samples but not the KSHV-negative SLK, HUVEC or HMVEC cell lines. KSHV latent RNA levels correlated positively with increasing tumor-forming capability of the infected cells (p#10 213 by ANOVA of linear model). They were undetectable in uninfected cells, lowest in KSHV-infected HUVEC and E1/L1 cells in culture, higher in E1 mouse tumors and KS lesions and highest in PEL (data not shown). This was mirrored by KSHV pre-mir-K12-2 ( Figure 2C). KSHV pre-miRNA transcription levels correlated KSHV plasmid copy number (DNA) as measured by real-time QPCR using the same primer sets with DNA as input (data not shown). The positive correlation between the level of viral miRNA and the relative tumorigenicity of the sample class supports a causal role for miRNA in KS tumorigenesis. It suggests that KSHV miRNAs are required to maintain the KS tumor phenotype. Group IV contains a set of 8 cellular miRNAs that are highest expressed in KS tumors only, compared to cell lines. These include mir-24-2, mir-30c-2, mir-125a, mir-130a, mir-196, mir-215, mir-218-2, and mir-367.
The bar graph of mir-24-2 levels in Figure 2E serves as an example for the pre-miRNA expression pattern of this group, for which miRNA levels were highest in KS tumors and significantly lower in other samples whether KSHV-infected or not. As expected for all primary tumor samples, we observed more heterogeneity in the KS biopsies compared to clonal cell lines. This necessitated the use of 9 independent biopsies, which is a larger number then used in prior KS mRNA array analyses. With this number of biopsies, PCA analysis validated the significance of cluster membership for all pre-miRNA, including those that group in cluster IV.  Group V compromises a group of 13 cellular pre-miRNAs with highest levels in the E1 and L1 TIVE cell lines. These pre-miRNAs were present at higher levels in E1/L1 cells even compared even to KS biopsies. These are mir-17, mir-22, mir-28, mir-32, mir-128b, mir-135b, mir-143, mir-151, mir-181b-2, mir-205, mir-213, mir-216 and mir-372. The bar graph of mir-32 expression in Figure 2F is an example of the pre-miRNA expression pattern for this group.
Group VI consists of 13 pre-miRNAs with highest levels in the non-tumorigenic endothelial HUVEC and HMVEC cell lines, whether KSHV-infected or not. These are mir-26b, mir-29a, mir-34b, mir-92-1, mir-93, mir-133a-1, mir-133a-2, mir-193, mir-221, mir-223, mir-301, mir-323 and mir-346. 11 of these miRNAs were also contained in the HUVEC/HMVEC upregulated cluster from the larger data set ( Figure 1). Additionally, mir-34b and mir-92-1 fell into this group upon clustering of only the endothelial cell data. The histogram of mir-29a expression in Figure 2G is an example of the pre-miRNA transcription pattern for this group, with highest levels in both infected and uninfected HUVEC/HMVEC cells and significantly lower levels in all other samples.
In sum, unsupervised clustering as a discovery tool identified (i) distinct stages of endothelial cell transformation and (ii) specific pre-miRNAs that serve as biomarkers for each of them.
One of the concerns in profiling cell lines in culture is that the transcription signature may be reflective of a particular proliferation state rather than a general characteristic of the tumor subtype. Proliferation dependence is well documented for mRNA levels in fibroblasts [57]. For several miRNAs, too, proliferation and miRNA transcription rates are linked [58,59,60,61,62,63,64]. To guard against this fallacy, we only used RNA derived from logphase cells for our profiling analysis. Nevertheless, to test the hypothesis that some miRNA levels were proliferation state dependent, we conducted a time course experiment for the E1 and L1 TIVE cell lines (see Figures S1 and S2). This revealed a very limited number of pre-miRNAs that were enriched in logphase cells compared to stationary phase cells and vice versa. They were at the lower limit of detection and additional experiments are needed to validate the biological significance of this observation.

miRNAs as endothelial cell tumor stage biomarkers
Unsupervised comparisons represent the first level of large scale profiling studies. Here, they revealed (i) the existence of multiple distinct steps of endothelial cell transformation and (ii) pre-miRNAs that were selectively transcribed in one or more stages and that therefore serve as biomarkers. The latter were further validated by supervised class prediction methods. Based upon pre-miRNA clustering (Figure 1 and 2) and published phenotype (Table 1) First, we conducted pair-wise comparisons between classes using the median dCT U6 for each class ( Figure S2). The two TERT-immortalized EC cell lines HUVEC and HMVEC exhibited a nearly identical pre-miRNA transcription pattern (r 2 = 0.7238). Infection with KSHV of these immortalized cell lines did result in changes (r 2 = 0.6798). Of note, this comparison is between median levels for the two EC cell lines (HUVEC and HMVEC) and three independent clones of tightly latently infected TERT-HUVEC cells. Thus, it exhibited more variability than a pair-wise comparison of just two cell lines. The most drastic change in overall pre-miRNA transcription emerged when comparing KSHV-infected, non-tumorigenic EC cell lines to the two KSHV-infected, highly tumorigenic E1/L1 cell lines. Here, we failed to detect any linear correlation. The two TIVE cell lines E1 and L1, of course, exhibited a strikingly similar pattern of pre-miRNA transcription as shown in detail in Figure S4 and Figure  S1. The pair-wise comparison between E1/L1 cells in culture to E1 xenograft tumors showed a reasonable linear correlation, but less than between different culture models (r 2 = 0.5684). Analysis of residuals identified all KSHV pre-miRNAs as well as mir-223 to be significantly upregulated in the tumorgraft (data not shown). Since there are no human infiltrating lymphocytes in the SCID mouse model, and since the tumor vasculature is made of murine endothelial cells, any changes in pre-miRNA composition reflect the grafted human tumor cells. Importantly, the comparison between E1 xenograft tumor biopsies and patient KS biopsies yielded a better correlation (r 2 = 0.5846) than between E1/L1 cells in culture and E1 tumor grafts. This reinforces the results of the phenotypic characterization of E1/L1 cells [40] and demonstrates that the E1/L1 xenograft model adequately mimics primary KS patient biopsies.
Next, we identified and validated a set of diagnostic pre-miRNA biomarkers that signify the different steps of endothelial cell transformation. To do so we used the miRNAs identified by hierarchical clustering (Figure 2), extended the dataset to include mouse xenograft tumor samples and used visual inspection followed by ANOVA and appropriate pair-wise t-test to identify pre-miRNAs with distinct distributions among the different steps of endothelial cell transformation. To give a better impression of within class variability, Figure 3 A-C plots individual dCT U6 for cellular pre-miRNAs including technical replicates for each class.
The mir-221 pre-miRNA emerged as a biomarker for the transition from immortalized to tumorigenic endothelial cells independent of KSHV infection status ( Figure 4A). Mir-222 was co-regulated with mir-221, but did not change as dramatically (data not shown). Since mir-221/222 exhibit tumor suppressor activity in endothelial and other cancer models [55,56,65,66], this suggests that the down-regulation of the mir-221 biomarker is of biological significance.
The mir-15 pre-miRNA is an example for miRNAs that exhibit the opposite pattern of transcription as mir-221. Therefore it did contribute additional information that would have improved tumor classification. It was high in tumorigenic KSHV-infected endothelial calls, KS and PEL (data not shown). There was one significant difference between mir-15 and mir-221 expression: the KSHV-negative SLK cells transcribed significantly lower levels of mir-15. In a separate analysis of only the endothelial/KS sample and excluding SLK cells (data not shown), mir-15 levels correlated closely with KSHV latent mRNA and miRNA transcription and can thus be considered KSHV -regulated.
The mir-140 pre-miRNA levels correlated linearly with tumor status. It was present at appreciable levels only in the xenograft tumors and KS biopsies, but not KSHV-infected cells grown in culture ( Figure 3B, class ETM, KS). Pre-mir-140 levels did not distinguish tonsil and PEL, since 50% of PEL lines as well as all KSHV-negative lymphoma lines had only very low levels of mir-140. Hence, the utility of mir-140 as a biomarker is limited to the endothelial lineage, but not lymphatic lineage cancers.
The mir-24-2 pre-miRNA levels were strikingly elevated only in KS biopsies, not E1 xenograft tumors or PEL ( Figure 3C). It therefore serves as a KS-specific biomarker and not as a marker for KSHV-associated transformation. This may have utility for clinical diagnosis, but more importantly it represents at least one molecular difference between clinical KS lesions and all available tissue culture models. In other words, any of the KS-specific mir-24-2 dependent reprogramming of target mRNA and protein levels is not captured in our current, laboratory-based understanding of KS and KSHV biology.
To establish the utility of these four biomarkers for endothelial cell tumorigenesis, we calculated cumulative density distributions (cdf) (Figure 3E-G) and a decision tree ( Figure 3D). Pre-mir-221 and pre-mir-24-2 showed steep cdfs, which allowed for binary classification into positive and negative classes. Pre-mir-140 ( Figure 3F) showed an almost linear cdf consistent with gradual changes among multiple sample classes. This is reflected in the minimal decision tree ( Figure 3D) that computes cut-off values for each miRNA to yield the most parsimonious and accurate classification schema. Similar decision trees could be derived using other representative miRNAs from each of the clusters identified in Figure 2. We also built decision trees based on just viral pre-miRNA levels (data not shown). These were comparable to ANOVA for individual pre-miRNAs, since KSHV genome copy number ( Figure S3), latent RNA levels and latent pre-miRNA levels were all correlated (they clustered together by unsupervised clustering (Figure 1, 2) and increased progressively with increasing tumorigencity.
In sum, supervised classification established (i) the presence of molecularly distinct, progressive steps of endothelial cell transformation and (ii) a set of biomarkers that distinguishes between these steps.

Pre-miRNA profiling
Mature miRNA profiling has previously been used to stratify lineage types and disease progression stages. Pre-miRNA profiling has also been used successfully to stratify human tumors [27,28,29]. We previously profiled pre-and mature miRNAs for PEL [26] in order to establish a PEL cancer signature, and found that pre-miRNA profiling offered technical advantages as well as provided additional, non-redundant information to mature miRNA-based PEL classification. Here, using 9 primary patient biopsies and validated pre-clinical cell culture models, we have ascertained the first pre-miRNA profile of KS.
At the genomic level, we found a variety of changes between different cell lines and tissue types, but no deletions or amplifications common to all KS biopsies or all KSHV-positive samples. At the pre-miRNA level, we identified groups of cellular miRNAs that define distinctive tissue types. QPCR has been shown to be an effective form of miRNA profiling. Northern blotting has limitations including low throughput and poor sensitivity. Alternative high throughput profiling methods, like microarrays, require high concentrations of target input, show poor sensitivity for rare targets, a limited linear range and the need for post-array validation by real-time QPCR. Therefore, QPCR appears to be a better method for a limited set of targets such as the ,650 human miRNAs, and it can be applied easily on a pre-miRNA level as well.
The miRNA genes are named according to the 60-80 bp sequence of the pre-miRNA segment [67]. Each miRNA gene locus produces one pre-miRNA, which in turn can produce one or two mature miRNAs depending on whether both strands of the mature product are inserted into the functional RISC complex. While all miRNA genes, and therefore all pre-miRNAs, are made of unique sequence, different pre-miRNAs can be processed to yield an identical mature 22-nucleotide miRNA. For instance, there are 3 different let-7a genes: let7-a-1, let-7a-2 and let-7a-3, each located on a different chromosome (9, 13, and 22, respectively) and subject to different regulatory controls. Pre-miRNA profiling but not mature miRNA profiling distinguishes between these transcripts.
How well do pre-miRNA levels correlate with mature miRNA levels? This seemingly simple question has a non-trivial answer. (i) We and others have shown that pre-miRNA levels generally correlate with mature miRNA levels [26,28,29,68,69,70], but we also found that some pre-miRNAs were present in a slightly different pattern of expression from the mature miRNAs. The obvious example, are the aforementioned pre-miRNA paralogs, which encode the same mature miRNA, but reside on different genomic locations. Furthermore, there are well-documented instances, where SNPs affect Dicer processing [71,72]. These exceptions are informative in their own right and only simultaneous quantification of pre-and mature miRNA levels can identify these. In the present case mature mir-221 levels were also downregulated in KS and PEL compared to non-tumorigenic controls, but for the two others (mir-140, mir-24-2) we could not establish a statistically significant pattern based on mature miRNA levels (O'Hara et al., in press). (ii) The two assays (mature miRNA and pre-miRNA) measure two different events and thus provide non-redundant information. The pre-miRNA pool represents an intermediate step and thus responds without delay to changes in cellular transcription. Pre-miRNAs are co-transcriptionally processed [73,74]. They have a short half-life, much like mRNAs, and thus provide a sensitive read-out for the purpose of tumor profiling. By contrast, mature miRNAs are part of the relatively stable RISC complex and thus provide a time-delayed read-out of the state of the cell. (iii) The two assays (mature miRNA and pre-miRNA) have different performance characteristics. Unfortunately, these are different for each miRNA (data not shown). Even if relative levels of pre-and mature miRNAs correlate, the different assay formats for pre-and mature miRNAs have different sensitivities, different response characteristics and a different lower limit of detection (much of which is dependent on the miRNAspecific primer sequences) and thus they have a varying ability to distinguish between the presence and absence of a miRNA sequence.

Pre-miRNA profiling defines progressive stages of endothelial cell transformation
In the case of KS and its related cell culture and animal models, each class in our collection (E, EK, ET, ETM, KS, PEL) exhibited a distinctive cellular miRNA profile (Figure 4). Even though we found some differences in the transcription pattern between individual KSHV miRNAs (unpublished), the KSHV miRNA levels as a group correlated with an increasing tumor-forming capability of infected cells (p#10 210 by ANOVA of linear model). They were present in KSHV-infected HUVEC clones, high in E1/L1 cells in culture, higher in E1 mouse tumors and KS lesions and highest in PEL. Of note, the non-tumorigenic EC clones were made with JSC-1 derived KSHV [75], whereas TIVE E1/L1 clones were made from BCBL-1 derived KSHV [40], which may yield to a difference in miRNA regulation. KSHV gene copy number also increased with increasing tumorforming ability in this set of samples. At present we cannot discern whether high KSHV pre-miRNA levels are a driver for or a consequence of increased gene copy number. There are also sequence differences between other KSHV isolates that may contribute to variability among the individual samples [76]. Sequence variation is more pronounced for pre-miRNAs because of length (70 vs 22 nt) and less selective pressure for non-essential positions.
Our data support a stepwise progression towards KS based on cellular pre-miRNA patterns alone ( Figure 3D) or after integration of the KSHV miRNA data (Figure 2). This model is exemplified in Figure 4. Initially, normal endothelial cells (E) are infected with KSHV to yield stage EK. Both the uninfected and infected endothelial cells share a common endothelial lineage pre-miRNA signature (Figure 4, group I). In addition, all KSHV-infected cells express low levels of KSHV miRNAs as well as a distinct group of cellular miRNAs (Figure 4, group IV). These are able to grow in reduced serum, indicating that KSHV is a transforming virus. However, these cell lines do not form tumors in mice and are therefore not oncogenic. The viral life cycle in these cells remains tightly latent. The E1 and L1 TIVE cells are infected cells that have undergone a second transformation event. As a result cells progress to the ET stage. These cells are capable of forming tumors in mice and express intermediate levels of KSHV miRNAs and a distinctive group of pre-miRNAs ( Figure 4, group II). While these cells are highly transformed, similar to KS, the life cycle of the virus is still tightly latent. We were able, for the first time, to also profile primary KS biopsies. KS lesions exhibited the highest levels of KSHV miRNAs, as compared to other infected endothelial cell samples. PEL exhibited still higher levels due to a higher genome copy number. In addition, unsupervised clustering identified a group of pre-miRNAs that are highly upregulated only in primary KS lesions (Figure 4, group III). Unlike cell culture models, which are tightly latent, KS lesions are known to undergo spontaneous lytic reactivation to varying degrees [77]. In sum, each sample class profiled had a unique set of highly transcribed cellular pre-miRNAs, independent of the presence of virus, and a second set of pre-miRNAs that were dependent on the presence of KSHV.
There exists an important distinction between tumorigenicity with is a phenotype of cell culture models and tumor take, which is a phenotype of primary tumor explants. In experimental transformation models such as NIH3T3 cells tumorgenicity in immune deficient mice is conferred by the adding one or two single oncogenes. In tumor explant models tumor take is defined as how many mice will form transplantable tumors after injection of a given dose of primary tumor cells. Tumor take is highly variable among cancer types and even individuals. It does necessarily correlate with clinical aggressiveness and does not easily correlate with a single gene. KS and EBV+ nasopharyngeal carcinoma are examples of highly aggressive, angiogenic tumors, which almost never yield stable cell lines in culture or transplantable xenografts in nude mice.

Individual miRNAs emerge as novel tumor-stage specific biomarkers with important biological functions
Mir-221 is a tumor suppressor for endothelial cell lineage cancers independent of KSHV infection. We found the highest levels of pre-mir-221 in uninfected and KSHV latently infected tert-HUVEC and tert-HMVEC cell lines. This corroborates prior reports of high mir-221 levels in endothelial cell lines [66,78,79]. High levels of mir-221 exert anti-angiogenic effects in HUVEC cells, resulting in inhibited tube formation, migration and wound healing [66]. This anti-angiogenic effect correlated with downregulated expression of the mir-221 target protein c-kit [66]. In Dicer siRNA-transfected cells, mir-221 expression has also been shown to indirectly downregulate expression of endothelial nitric oxide synthase (eNOS) [78]. Nitric oxide is a key regulator of endothelial cell growth, migration, vascular remodeling and angiogenesis. The picture is more complicated, though, since depletion of mir-221 in HUVEC cells causes secondary changes in other miRNAs [66,80]; these included many that are predicted to also target c-kit. C-kit expression was also reduced by mir-221 in hematopoietic progenitor cells. In this system, mir-221 also inhibited proliferation [81]. Additional targets for mir-221 include CDKN1B/p27 and CDKN1C/p57, which are cell cycle regulators [56,59,65,82]. Disregulation of mir-221 has been found in melanomas due to silencing of the promyelocytic leukemia zinc finger (PLZF) transcription factor [83]. In summary, mir-221 seems to possess endothelial cell lineage-specific differentiation functions as well as general tumor/proliferation suppressor functions.
Pre-mir-34a and c were found at detectable levels in endothelial cells, PEL and KS, as these tumors retain wild-type p53 (Figure 1). The miR-34 promoter is p53-responsive [61,64,84,85,86,87]. Of all three p53-responsive miRNAs, mir-34a appears to be the most responsive in terms of fold change [61]. High levels of miR-34 are consistent with the biology of PEL and KS, which are unusual among human cancers because they almost universally retain fully functional, wild type p53 [88,89]. Three different miR-34 genes are present in the human genome. Mir-34a is located within the second exon of a non-coding gene, which contains a predicted p53-binding site. Genes mir-34b and mir-34c are located within a single non-coding precursor with a transcriptional start site adjacent to a predicted p53-binding site [86]. All three genes produce mature miRNAs with an identical seed sequence. It will be interesting to determine whether mir-34a, similar to p53responsive mRNAs [88], can be even further induced upon chemotherapy in PEL and KS.
Pre-mir-140 levels were tightly correlated with KSHV latent mRNA (LANA) and latent miRNA levels in KS and KS tumor models. Currently, little is known regarding mir-140 expression profiles or possible mRNA targets [90]. TargetScan miRNA target software indicates a number of possible targets for mir-140, including E2F3, a member of the E2F family of transcription factors essential for cell cycle regulation. This prediction, however, awaits experimental verification.
Pre-mir-24 emerged as a highly specific KS biomarker compared to all other pre-miRNAs in our array. Mir-24 has been shown to be important in cell-cycle regulation, cell growth and differentiation in a variety of cell types [91,92]. However, these as well as functional studies on mir-24 and its targets are still in the early stages. Tantalizing data predict p16 and dehydrofolate reductase (DHFR) as mir-24 targets among others [93,94,95,96,97].
In summary, the first pre-miRNA profiling of primary KS tumor biopsies and the subsequent comparison to well-studied culture and mouse xenograft models of KS yielded a progression model for endothelial lineage cancer and KS (Figure 4) akin to the now classical model for colorectal cancer progression [98]. We hope that this will benefit basic and translational studies of KS, which remains the most frequent cancer in people living with HIV/AIDS today. We also identified specific KSHV and KSassociated pre-miRNAs, foremost among them mir-221, mir-140, mir-15a and mir-24. Based upon their strength of association with specific stages of endothelial cell tumor progression, we speculate that these are also functionally involved in KS tumorigenesis.

Cell culture and clinical biopsies
Cells were grown in continuous culture on a 3T3-like schedule, i.e. passaged at subconfluency, and RNA collected in log phase, typically 24-48 hrs after reseeding. All B and T cells were cultured in RPMI containing 25mM HEPES, 10% fetal bovine serum (AP2, AP3, AP5 in 20%), 0.05 mM 2-mercaptoethanol, 1 mM sodium pyruvate, 2 mM L-glutamine, 0.05 ug penicillin/mL, and 20 U streptomycin/mL at 37u and in 5% CO 2 . TIVE cells were cultured in DMEM containing 10% fetal bovine serum, 2 mM Lglutamine, 0.05 ug penicillin/mL, and 20 U streptomycin/mL at 37u and in 5% CO 2 . HUVEC-hTERT cells were cultured in EGM-2 containing 10% fetal bovine serum, hydrocortisone, hRGF, VEGF, R3-IGF, Ascorbic acid, hEGF, GA-1000 and heparin at 37u and in 5% CO 2 . HUVEC+KSHV cells were cultured in the same media also containing 0.5pM/ul puromycin to maintain selection. De-identified frozen tonsil and melanoma tissue biopsies were obtained from the cooperative human tissue network (CHTN). KS frozen tissue biopsies were obtained after informed consent at University of Miami, Prof. Edgard Santos University Hospital, Salvador, Brazil and Beth Israel Deaconess Medical Center. All cell lines and references are described in Table S1.

DNA and RNA isolation
DNA was isolated from cell lines and samples using the Wizard SV Genomic kit (Promega, Madison, WI). Total RNA was isolated using Triazol (Sigma-Aldrich, St Louis, MO) as previously described [78]. Total RNA was quantitated on a Nanodrop and equal amounts of RNA were subjected to DNase I treatment (Ambion, Austin, TX). RNA was reversed transcribed using the cDNA Archive Kit (Applied Biosystems, Foster City, CA), with the addition of RNase Inhibitor and, in pre-miRNA screening, the additions of T4 gene protein 32. RNA integrity was evaluated using a 2100 Bioanalyzer Series C (Agilent, Santa Clara, CA). Total RNA was measured using the RNA 6000 Series II Nano kit and small RNA was measured using the Small RNA kit, according to the manufacturers recommendations. All Chips were analyzed using 2100 Expert software version B.02.04. The average RNA integrity value for total RNA among all samples profiled was 8.0062.60.

Real-time QPCR
For DNA and pre-miRNA expression profiling, two 96-well plates containing 372 different primers were used. These primers represent 168 cellular and 12 viral pre-miRNA targets, as well as 6 cellular and viral control miRNA targets. All primers conform to universal real-time PCR conditions with a predicted Tm of 60 and 100-bp or smaller amplicon length. Real-Time QPCR was conducted under universal cycling conditions of 40 cycles with SYBR Green as the method of detection following our previously validated methods. A 36ul reaction mix was made using a CAS-1200 robot that uses filtered carbon-graphite pipette tips (Tecan Inc., Durham, NC) for liquid level sensing, allowing for a pipetting accuracy of 0.1ul. The reaction mix was then distributed in triplicate into a 384-well plate using a Matrix repeat pipettor (Thermo Inc.). The final primer concentration was 250nM in total of the 9ul reaction volume. Because pre-miRNA-specific primers also detect the corresponding gene, these primers were used for DNA gene profiling as well. For DNA QPCR, each reaction contained 1.67ng DNA/ul. For pre-miRNA QPCR, 40ul of the 100ul RT reaction was used for each 384 well plate, yielding a final amount of 0.1ul cDNA per each 9ul reaction. Real-time QPCR primers against 93 mature miRNAs and 1 cellular mRNA were used according to the manufacturers protocol (Applied Biosystems Inc.). The combined pipetting and instrument error for all of the QPCR reactions was less than 6% (data not shown). All reactions were done in technical triplicates. QPCR was performed on a 384 well LC480 (Roche Inc.) platform.

Statistical analysis
Data were collected in triplicate for each RT reaction. Since averaging these replicates would mask individual reaction failures, we clustered all replicates individually after masking outliers. Each array of 160 primers contained four separate reactions for U6 yielding six CT U6 . We calculated the mean and median of these four reactions to yield^CT U6 . The maximal difference between mean and median was 0.30 CT units. All other CT data were normalized to^CT U6 . This yielded dCT for each primer/sample combination. The dCT were normalized to median for each array and subjected to unsupervised clustering using a Correlation metric [51] and the program Arrayminer TM .
Our exploratory cluster analysis included all primers and all samples. However, for statistical analysis we excluded primers, which yielded CT ,38 for the non-template and reverse transcriptase negative controls. We also excluded primers, which did not yield a signal (CT,38) in at least one of the samples as uninformative. We used QQ plots for each sample and Kolgomoroff-Smirnoff statistics to test for normal distribution across arrays (data not shown).
For miRNA gene copy number analysis, data were collected in triplicate (or duplicate) for each sample. Since averaging these replicates would mask individual reaction failures, we clustered all replicates individually after masking outliers. Each set of 160 primer pairs contained four separate reactions for U6 yielding four CT U6 . We calculated the mean of these four reactions to yield CT U6 . In only two samples did the median deviate significantly from the mean based on an analysis of residuals (data not shown). This suggested individual reaction or pipetting failures. We imputed modified^CT U6 for those singular cases based on the following rule: if ,50% of replicates differed by .1 CT from the mean of the remainder, they were replaced by the mean of the remaining data points. After imputation, the mean^CT U6 across all samples and all technical replicates (n = 348) was 20.9461.97 with a median of 21.26. For individual quadruplicate CT U6 measurements, the SDs ranged from 0.03 to 1.00 CT. 71.26% of SDs for technical replicates were #0.32 CTs. Hence, this array identified 2-fold changes in copy number. All other CT data were normalized to^CT U6 . This yielded dCT for each primer/sample combination. The dCT were subjected to unsupervised clustering using an Euklidian metric and visualized on a log2 scale [51].
For supervised comparisons between two classes, we used the Welch-modified t-test as implemented in the R statistics program [99]. This yielded unadjusted, univariate p values for each individual miRNA gene. This particular variant of the t-test allows for unequal variances between the two classes. An analysis of variances showed that most miRNA genes had identical variances between the KSHV-infected (n = 53) and normal tissue (n = 18) data sets. We used q-value computation [100], to assess the false discovery rate. The statistical methods for supervised comparisons between two classes of pre-miRNAs were as described above for miRNA gene loci. We only report the minimal set of miRNA genes for which we do not expect any false positives. The Bonferroni-adjusted p-value was #0.05 for each of the hits. Decision trees were computed as implemented in R [99] using 10 fold cross-validation.

Quality control
To monitor RNA integrity and yield, we used the Agilent bioanalyzer ( Figure 5A-D). The Agilent bioanalyzer provides two chips with different size resolution. The small RNA chip resolves RNA species from 4-150 nt ( Figure 5C,D), the RNA nano chip RNA species from 25-6000 nt ( Figure 5A,B). It allows size determination as well as quantitation. We compared two RNA preparations: total RNA isolated with Triazol and total RNA isolated with Triazol followed by high molecular weight depletion (HMWD). The total RNA isolation with Triazol TM retained the highest concentration of miRNAs (20.5 pg/ml) while preserving overall RNA integrity as measured by RIN value (9.7), which is a proprietary estimate based on ratio of 28S to 18S peak ( Figure 5A). Subsequent HMWD as required for other mature miRNA profiling approaches (e.g. [101]) depleted mRNA and rRNA, as expected. It did not change the relative abundance of miRNA (,22nt) to pre-miRNA ,70 nt), but decreased overall small RNA yield by half (10.5 pg/ml). Hence, we used total RNA for all further studies.
We used real-time QPCR to determine individual pre-miRNA levels as per our published procedures [26]. Individual miRNA CT readings were normalized to CT of U6 rRNA (dCT U6 ) to account for variation in sample input RNA or DNA. However, we generally obtained more consistent results if we used very similar amounts of RNA (as determined by nanodrop TM based quantitation) for the reverse transcriptase (RT) reaction. The reason for this pre-RT normalization step is that the RT reaction, too, has a linear range just as the real-time QPCR reaction, otherwise the RT reaction may be saturated or, in case of diluted samples, of lower than expected RT efficiency. Figure 5E shows average raw CT values for U6 for each DNA sample, aggregated by sample class. Figure 3F shows average CT values for U6 for each post RT cDNA sample, aggregated by sample class. Again, variation is minimal except for two outliers (HMVEC and KS 101), for which we had only small amounts of RNA available. Though even those two samples had U6 CT values of #27 cycles. Since we ran a 40 cycle QPCR reaction, this gave us an assay range of 2 (27-40 = 13) = 0 to 8192 fold above the level of detection.
All experimental samples were run in triplicate for RT-positive and DNA reactions. Pooled samples were run in triplicate for RTnegative control reactions. Non-template control (NTC) reactions were run to assure that the primers were free of contamination and did not yield non-specific products or primer dimers at a significant rate. A reaction was considered positive if the corresponding CT was greater than 38.00 cycles. To remove primers with substantial capacity for primer dimer formation, any primer pair with mean CT ,38.00 in the RT-or NTC reactions was omitted from further analysis. This removed 8 cellular pre-miRNA and 1 viral miRNA primer pairs (KSHV-mir-k12-10a) from our original set [26]. Conversely, any primer pair that failed to efficiently amplify the corresponding DNA target (mean CT DNA .38.00) was also omitted from further analysis. This filtering removed 15 cellular pre-miRNA primer pairs, 1 viral miRNA primer (KSHV-mir-k12-3). It also removed all no primer controls from the data set. In all, the final array included 160 primer pairs, representing 145 cellular miRNAs, 6 KSHV miRNAs, 3 EBV miRNAs, 2 KSHV mRNAs and 4 cellular rRNAs (U6). These were run in parallel for each sample. The distribution of all data used in the analysis was as follows: for the NTC (Figure 5J), 203 of 5120 reactions (3.96%) of the reactions had a CT,38. These were randomly distributed across all samples and all primers. For the RT-reactions ( Figure 5G), 286 of 5920 (4.83%) of the reactions had a CT,38. Pair-wise comparison of the RT-and NTC results indicated perfect overlap between the positive samples indicating these were incidences of shared positivity and should therefore only be counted once (data not shown). Therefore the false positive rate was 4.83%. In the DNA samples ( Figure 5I), 12,567 of 13,920 (90.28%) of the reactions had CT,38. Given that less than 5% of the total number of reactions represent viral targets, the presence of which is variable from sample to sample, the remaining negative values most likely indicate deletions of miRNAs in certain samples. The distribution was unimodal and followed a normal distribution (data not shown). Finally, the RT-positive ( Figure 5H) set contained 8,299 of 20,000 (41.50%) positive reactions (CT ,38) in total. These are less than half the positive reactions recorded for the corresponding DNA results. This means that although we could detect 86% of all miRNA genes (DNA) in all samples, only 45.54% were actively transcribed in our set of samples. This means that half of all miRNAs in our array were transcribed in the endothelial cell lineage. This is consistent with the known tissue specificity of miRNAs and underscores their value as differentially expressed biomarkers. This is likely due to a combination of factors, including tissue type and developmental stage. For instance, we would not expect liver or brain-specific miRNAs to be present in any of our samples.