5′ UTR Control of Native ERG and of Tmprss2:ERG Variants Activity in Prostate Cancer

ERG, a member of the ETS transcription factor family, is frequently overexpressed in prostate cancer as a result of its fusion to the androgen-responsive Tmprss2 gene. Different genomic rearrangements and alternative splicing events around the junction region lead to multiple combination of Tmprss2:ERG fusion transcripts that correlate with different tumor aggressiveness, but their specific functions and biological activities are still unclear. The complexity of ERG expression pattern is compounded by the use of alternative promoters, splice sites, polyadenylation sites and translation initiation sites in both the native and fusion contexts. Our systematic characterization of native ERG and Tmprss2:ERG variants reveals that their different oncogenic potential is impacted by the status of the Ets domain and the configuration of the 5′ UTR region. In particular, expression and activity of functional ERG and Tmprss2:ERG variants are influenced both by translation initiation signals within the different isoforms and by inhibitory upstream Open Reading Frames (uORF) in their 5′ UTRs. Stable expression of ERG and Tmprss2:ERG variants promoted cell migration/invasion, induced a block of proliferation and induced a senescence-like state, suggesting a role for these variants in the prostate tumorigenesis process. In addition to Tmprss2:ERG fusion products, a group of related native ERG isoforms is also highly over-expressed in fusion-carrying prostate cancers, and share the same translation initiation site (in ERG exon 4) with the commonly observed Tmprss2 exon1 joined to ERG exon 4 (T1:E4) fusion-derived variant. Usage of this ATG can be preferentially down-regulated by directed antisense-based compounds, possibly representing the basis of a targeted approach that distinguishes between tumor–associated and normal ERG.


Introduction
Chromosomal translocations are well established key events in the development of hematological malignancies and sarcomas [1]. The identification of recurrent translocations between the androgen-responsive Tmprss2 gene and members of the Ets family of transcription factors in prostate cancer (PCa) has changed the panorama of PCa biology [2].
More than 80% of PCa samples harbor oncogenic fusions, most commonly involving the 59 region of the Tmprss2 locus (with its promoter) joined to Open Reading Frame (ORF) of various Ets genes [3], although other combinations have been described [4,5,6]. The single most common rearrangement, Tmprss2 exon1 joined to ERG exon 4 (T1:E4, or variant III) is present in ,50% of PCa cases [3].
Ets gene fusions play a role in the transition from prostatic intraepithelial neoplasia (PIN) to adenocarcinoma and are associated to aggressive lesions and poor prognosis [7,8,9,10,11]. Overexpression of ERG in prostate cell lines activates cell invasion programs and results in the development of PINs in mice, but cannot drive carcinogenesis [5,8,10,12], as cooperation with separate genetic lesions, for example pTEN loss, is needed to trigger progression to advanced disease [7,13,14].
Several variants of the normal ERG gene product have been described, arising from a combination of alternative splicing, polyadenylation and transcriptional initiation [15,16,17,18]. The specifics of the genomic rearrangements also introduce considerable structural heterogeneity in the 59 region. In addition to the common T1:E4 transcript, Tmprss2 exons 1, 2 and 3 can be combined with ERG exons 2, 3, 4, 5 and 6 in various alternative splicing patterns that can generate at least 17 distinct Tmprss2:ERG transcripts [2,19,20,21]. These can serve as markers for disease progression and correlate to the aggressiveness of the tumors [8,9,19]. In particular, the T2:E4 variant (Tmprss2-exon2:ERG-exon4 or variant VI), where the native ATG in Tmprss2 exon 2 is in frame with the ERG ORF, is associated with pathological and clinical aspects of aggressive disease [19].
The mechanistic basis for the different oncogenic potential of the fusion isoforms remains to be elucidated, and it could be related to intrinsic differences in the N-terminal regions or to the effect(s) that variation in the 59 region of the mRNA can have on RNA stability or expression.
To test this hypothesis, we assessed the use of 3 alternative promoters, 2 common alternative splicing events and 3 polyadenylation sites (PAS) in normal tissues relative to Tmprss2:ERGexpressing prostate tumors. These independently regulated events combine to generate 30 'main' native isoforms, some of which we found to be also highly overexpressed in tumors. The characterization of the translation initiation sites used by the most common native ERG and fusion variants reveals that the specific organization of the 59 UTR region is one of the principal determinants of their biological activity and identifies an ATG in exon 4 as a putative target for antisense-based translation inhibition in prostate cancer.

Structure of the ERG Gene
Multiple native ERG isoforms can arise due to a combination of alternative promoters, splicing and polyadenylation. These isoforms can in turn combine with Tmprss2 and other 59 partners to produce numerous ERG-derived variants, with variable prognostic values [8,9,19]. To better understand the activities of ERG-derived oncogenic products, we sought to initially clarify the ERG gene structure and to characterize its expression pattern.
There are considerable inconsistencies in the published ERG nomenclature, with at least four different classification schemes, with the obvious consequence of generating confusion in understanding and comparing data from different studies (e.g.: the large terminal exon that contains the Ets and transactivation domains is variably referred to as exon 11 [2,22], 12 [3], 16 [18,23], 17 [24]).
We report in Figure 1A an up-to-date view of the exon-intron structure of the ,300 Kb ERG locus (ENSG000000157554) and propose a unified, rational nomenclature of RNA and protein variants that aims at incorporating the most established conventions. The structure is mainly based on the 11 exons of Refseq entry NM_004449.4 (Uniprot entry P11308 for ERG2), and incorporates the exon numbering used in the seminal Tomlins paper [2,22] first describing the Tmprss2:ERG fusion. In short, we maintained as ''exon 4'' the 218nt exon that is the main partner of Tmprss2 and as ''exon 11'' the large 3897nt exon encoding the Ets domain; we named Exon 1a, 1b, 1c the three mutually exclusive 'first' exons following the three validated promoters P A , P B and P C . To ensure the rational inclusion of additional exons that might be identified in the future, we distinguished by letters the alternative exons not part of the 11 reference ones. For example, the 72 nt alternative exon included between exons 7 and 8 is called 'exon 7b'.
We then identified the main alternative regulatory events that generate the majority of ERG variability: 3 alternative promoters (P A , P B and P C ); two common alternative splicing events (inclusion/skipping of exon 4 or 7b); three separate polyA signals (PAS 7bpA, 11LpA and 12pA). Combinatorial usage of the alternative events generates 30 possible 'major' ERG transcript variants ( Figure 1B), which can encode 15 different predicted ERG-related polypeptides, with 3 different N-terminals, 3 different C-terminals and 1 possible internal variation (inclusion or skipping of 24 aa in the Alternative Domain encoded by Exon 7b).
In addition to the 30 'major' variants described in Figure 1, a plethora of 'minor' isoforms have been reported in the literature or are present in databases. The list includes variants showing skipping of exons 2, 5, 7 and 8; usage of a proximal (Short) polyA site in exon 11 (11SpA) or of an additional intronic one downstream of exon 8 (8pA); and inclusion of supplementary alternative exons 7c, 10b and of multiple alternative exons derived from intron 3 (exons 3b-h), indicated in gray in figure 1A. Isoforms ERG4, ERG6 and ERG9 from the previous Owczarek study [18] are minor variants and have been reclassified in this group.
Since any of these additional minor events could combine independently with all the structurally compatible major isoforms, hundreds of variants could potentially be generated. However, these events appear to occur sporadically, and their physiological relevance is unknown.

ERG Expression: a Cancer-associated Switch in Promoter Usage
We set out to investigate usage of promoters, PAS or splicing signals by qPCR analysis of ERG expression in different normal tissues, tumors and cell lines ( Figure 2). While some degree of tissue-to-tissue variability is observed (Fig. S1A), in general promoter P C (mean Dc(t) = 10.2) appears to be the most active in normal tissue (including in prostate), ,25-fold and ,10-fold more active on average than promoters P A (Dc(t) = 14.9) and P B (Dc(t) = 13.4), respectively ( Figure 2A). On the other hand, in a panel of 8 primary PCa samples expressing Tmprss2-ERG fusions, promoter P B (mean Dc(t) = 5.4) accounts for the majority of native ERG transcript and is present at levels comparable to those of the fusion itself (Dc(t) = 6), over 100-fold more abundant than the same transcript in normal prostate (Dc(t) = 12.8) ( Figure 2B and Fig.  S1A). Indeed, in several Tmprss2:ERG-carrying samples the native P B promoter, rather than the fusion gene, is the principal source of ERG transcript (Fig. S1B). Compared to normal prostate, promoters P A (Dc(t) = 9) and P C (Dc(t) = 8) are also activated, but not to the degree of P B . In PCa cell lines, P B is the only active native ERG promoter, with undetectable signals from P A or P C ( Figure 2C). Two of the tested PCa cell lines, VCap and NCI-H660, express the Tmprss2:ERG fusion ( Figure 2C, full symbols), which is instead absent in PCa cell lines not containing the Tmprss2:ERG fusion (LNCap, DU145, C4-2, open symbols and Fig. S1C). As expected, no expression from any of the native promoters was observed in the NCI-H660 cell line, which carries the fusion on both alleles and lost all endogenous promoters [25].
From this first set of experiments we conclude that a promoter switch occurs in PCa tumors and cell lines and that increased usage of native promoter P B is associate with (and possibly contributes to) the tumor phenotype.
Overexpression of ERG in PCa was previously described [23], but following the discovery of the Tmprss2:ERG fusion, it has been typically ascribed to this event. However, our finding that in addition to the fusion-derived transcripts, some native ERG variants can also be highly overexpressed in PCa, suggests a bigger role for native ERG in PCa development. This is supported by the observation that endogenous mouse Erg transcripts are overexpressed in tumors from prostate conditional Pten 2/2 ;Trp53 2/2 mice, compared to Pten 2/2 ;Trp53 +/+ mice [7]. Importantly, while the latter model results in an indolent form of PCa, the former produces an aggressive phenotype [7].
The differential activation of the three native promoters in the fusion-carrying PCa samples suggests that this aberrant expression is transcriptionally regulated. An intriguing possibility is that activation of the native P B promoter may be driven directly or indirectly by ERG itself as part of a positive regulatory loop. For example, several putative c-Myc responsive elements were identified immediately upstream of the ERG P B promoter [18]. Since c-Myc is a key downstream target of ERG [26], the androgen-dependent activation of ERG from the fusion, or a separate PTEN-dependent Myc activation [27] could trigger a self-sustaining ERG/Myc oncogenic loop, which could eventually become androgen-independent. Indeed, consistently with this model, a feed-forward mechanism where expression of endogenous ERG is controlled by overexpression of the fusion product has recently been described [28].

ERG Expression: Alternative Polyadenylation and Splicing
Analogous to what we did to study the promoter's usage, we then explored the role of alternative polyadenylation and splicing in the expression of the ERG variants. Of the three principal PAS described (Figure 1), the 11L-pA is needed for a fully functional exons. Red circles indicate polyA sites. Bottom: alignment of the exons forming the main ORF (ERG-1b) with the protein's domains. Numbers indicate size in amino acids. PNT = pointed domain, AD = alternative domain, Ets = Ets domain, TAD = transactivation domain. Asterisk and circle indicate position of the first and second ATG. (B) Human ERG main variants. Alignment of exons forming the 30 main RNA variants of human ERG. Blue indicates the ORF, light blue the additional region from the ATG in exon 3. For each variant, the proposed name is indicated next to previous nomenclature (if available). The proposed protein name is reported along the predicted size in aa and KDa. Variants derived from the alternative usage of promoter 1a and 1b are paired as they lead to related mRNAs and identical proteins. doi:10.1371/journal.pone.0049721.g001  ERG protein, whereas the PAS in intron 7b (7b-pA) and exon 12 (12-pA) generate C-terminally truncated ERG isoforms lacking the Ets DNA-binding domain and the transactivation domain (TAD).
The 11L-pA PAS was the most commonly used in normal tissues (mean, DC(t) = 10.8), about 8-fold more than 7b-pA (DC(t) = 13.8) and 50-fold more than 12-pA (DC(t) = 16.6) (Fig. 2D). However, in Tmprss2:ERG-expressing PCa samples (DC(t) = 5.2), usage of 7b-pA is strongly activated, and becomes about as common as the 11L-pA (DC(t) = 5.8) ( Figure 2E). The same is true in PCa cell lines expressing the fusion ( Figure 2F) suggesting that switching to this pattern of expression could correlate with tumor progression. It is important to note that, while this variant lacks the Ets and TAD domains, it retains the dimerization determinants, and thus its expression might influence the activity of other full-length variants present. A proximal PAS within exon 11 (11S-pA) [15,18] was indirectly assessed by comparing qPCR of regions on either side of it (E11 and 11L-pA), and similar levels of expression were observed, suggesting that usage of 11S-pA is marginal under most normal and pathological conditions ( Figure 2D-F and Figure S1D-F).
Since alternative splicing variants can influence ERG or Tmprss2:ERG fusion functions [24], we analyzed regions around exon 7b and exon 4 in normal tissues, PCa samples and cell lines by semi-quantitative PCR. Both alternative splicing events were readily detectable, with a clear prevalence of the exon 4 and exon 7b inclusion variants. However, we didn't observe any significant enrichment in the relative amount of either splicing variants in PCa samples, suggesting that the modulation of these two specific splicing events may not play a determinant role in PCa.
Altogether, these data indicate that the full-length products, containing exon 7b and the Ets and TAD domains, are the most abundant ERG variants regardless of the status of the upstream regions, both in normal tissues and tumors. The strong increase of the truncated TE:7bpA isoform relative to the full-length variant could simply reflect changes in the splicing/polyadenylation machinery of cancer cells, but it could also indicate its (yet unexplored) role in ERG biology in tumors.

N-terminal Heterogeneity of ERG Isoforms
To evaluate whether the described heterogeneity in the 59 region affect ERG expression and activity, we sub-cloned the native variants ERG-1a, ERG-1b, ERG-1c, ERG-1b.D4, ERG-1c.D4, and the common fusion variant T1:E4, with their complete 59 UTRs and with exon 7b included (Fig. 3A). Upon transient transfection in HeLa cells, ERG-1c cDNA efficiently expressed the expected 54 KDa product (Fig. 3B, lane 4). On the contrary, expression of the native ERG-1a and ERG-1b variants was inefficient, resulting in peptides smaller than the ,55 KDa expected if the first in-frame ATG in exon 3 was used (Fig. 3B, lanes 2-3). Interestingly, this peptide co-migrated with that generated by the transient over-expression of the T1:E4 fusion variant, at ,50 KDa (lane 5). Additional variation in the Nterminal region derives from exon 4 skipping, which alters ERG's main ORF so that the predicted starting ATGs in exon 1c or exon 3 would not be able to generate an ERG-related peptide ( Figure 3A). Transient overexpression of ERG-1b.D4 or ERG-1c.D4 both resulted in ERG-related peptides migrating at ,44 KDa, consistent with the usage of an in-frame M5 ATG in exon 5 ( Figure 3B, lanes 7,9). This is most evident when using ERG antibody C-17, which readily recognizes recombinant ERG, but not the endogenous proteins. A different ERG antibody (C-20) preferentially recognizes an endogenous band corresponding in size to ERGD4 (lanes 1-5), making the switch to M5 usage harder to detect.
Since T1:E4 lacks the predicted initiation codons from both Tmprss2 and ERG transcripts, an alternative internal ATG from within ERG's open reading frame must be used to express an ERG-related product from the fusion mRNA, likely from exon 4. To identify the T1:E4 initiation codon, a series of methionine-toalanine point mutations were generated for the three in-frame ATG codons present in ERG exon 4 (Fig. 3A). Mutation at nucleotides 79 (M4a), but not 121 (M4b) and 184 (M4c) of exon 4 abolished T1:E4 expression ( Figure 3C), indicating that the fusion transcript uses the first in-frame ATG for expression of the ERG peptide.
To map the starting ATGs from the native ERG isoforms similar mutations where also introduced alone or in combination in the first in-frame ATGs in exon 3, 1c and 4 ( Figure 3D). Mutation of the ATG in exon 3 (M3) didn't have any effect on the expression of the ,50 KDa product (Figure 3d, lanes 2 vs. 4), while mutation of the next ATG in exon 4 (M4a) abrogated it both in the wt and M3 context ( Figure 3D, lanes 3 and 5), indicating that ERG-1b (and ERG-1a) do not use their first in-frame ATG, as would be predicted by a 59cap-dependent scanning model of translation initiation [29]. Instead, usage of the following ATG in exon 4 generates a peptide identical to that encoded by the T1:E4 fusion. On the contrary, when the start codon in exon 1c is mutated (M1c), ERG-1c mobility is reduced from ,54 KDa to ,50 KDa ( Figure 3D, lanes 6 vs. 8), like T1:E4 and ERG-1b (lane 1 and 2), consistent with a switch to the M4a ATG. Indeed, mutation of this ATG results in the abrogation of the ,50 KDa product in favor of a still smaller product.

Biological Characterization of the ERG Isoforms
To assess whether the structural N-terminal differences between the cancer-associated ERG-1b/T1:E4 and the normal ERG-1c affect intrinsically their biological activity, we initially stably expressed their corresponding cDNAs in NIH-3T3 cells ( Figure 4A) and selected clones with robust expression. In agreement with previous reports, we observed promotion of cell invasion and migration by both ERG variants, as assayed by transwell migration through matrigel ( Figure 4B) and scratch wound assay ( Figure 4C-D). The extent of the effect on migration/ invasion was comparable in ERG-1b and ERG-1c indicating that they are similarly active, at least in some basic aspects of their biology.
Surprisingly, expression of ERG variants in this context resulted in decreased cellular proliferation ( Figure 4E). This was not associated with cell death. On the contrary, expression of ERG-1b and especially ERG-1c resulted in protection from cell death following serum starvation ( Figure 4F). This behavior could reflect the activation of an oncogene-dependent senescence-like state by ERG expression as previously described for other oncogenes [30]. Indeed, ERG-expressing cells, but not controls, stained positive for senescence biomarker (beta)-Galactosidase ( Figure 4G). Intrigued by these results we subsequently stably overexpressed the ERG-1b/T1:E4 isoform and the corresponding empty vector in the normal human fibroblast cell line IMR90, a well-characterized cell model system for cellular senescence. As with the NIH-3T3 cells, over-expression of the ERG-1b/T1:E4 isoform resulted in increased cell migration and cell invasion, ( Figure S2). Moreover, IMR90 cells over-expressing the ERG-1b/T1:E4 isoform showed senescence-like phenotypes such as a inhibition of cellular proliferation ( Figure 4H), accumulation of bi-nucleated cells and elevated SA-b-gal activity, a classical biomarker of senescence ( Figure 4I).
This result is particularly exciting in light of the fact that the senescence program is activated once a cell has sensed a critical level of damage or dysfunction, pointing to a critical role for ERG-1b/T1:E4 overexpression as an initial event leading to PCa. Moreover it is important to note that cellular senescence can be detected in early-stage human PCa specimens and can be triggered by acute loss of PTEN in a p53-dependent fashion in mouse PCa models [31], suggesting that ERG activation might result in an initial senescent phenotype which requires subsequent environmental or genetic changes to progress to malignancy.

ATG Context and uORFs Affect ERG Translation Efficiency and Functions
Efficiently translated eukaryotic mRNAs require an optimal context around the start codon, to aid in its recognition by the ribosome (the Kozak sequence: GCCRCCATGG) [32]. However, this is not always sufficient to explain translation efficiency. For example, although ERG-1a/b and T1:E4 share the same ATG in  Figure 3B lanes 2,3 and 5) suggesting a role for their different 59 UTRs. Indeed, substitution of the natural 59 UTRs with one containing a consensus Kozak ( Figure 5A-B) led to activation of expression from the M3 ATG in the ERG-1b variant, with synthesis of the originally predicted 55 KDa peptide ( Figure 5C, lanes 1-2). This confirms that elements within the 59 UTR of ERG-1b inhibit its translation and it rules out that the low ERG-1b (M3) abundance is due to an intrinsic instability of its Nterminal domain.
Upstream Open Reading Frames (uORFs) are typically short ORFs that start within the 59 UTR, are out-of frame with the main downstream coding sequence and can reduce its expression [33,34]. While no uORFs are present in ERG-1c or in T1:E4, three uORFs exist in the ERG-1b 59UTR ( Figure 5D, top), which could interfere with recognition of the predicted ATG of ERG-1b (Table S1). Abrogation of the uORF starting at uM2a by mutation of the ATG to GCG led to increased ERG-1b expression ( Figure 5E, lane 2), indicating that the engagement of the scanning ribosome by the first encountered ATG, to generate a short 29 aa uM2a peptide, is a limiting factor in ERG-1b expression from the downstream ATG.
The presence of uORFs could also similarly affect the levels of expression of the various androgen-driven Tmprss2:ERG fusion proteins, with prognostic implication, since the 59 heterogeneity in fusion is associated with different clinical outcomes [19,20]. The different pathological profiles could be due to changes in biological activity depending on the primary protein sequence or to differences in its abundance, via modulation of translation. Such variations might explain how, for example, the T2:E4 fusion is associated with a more aggressive phenotype [19,20].
Expression of the 54 KDa peptide from the T2:E4 isoform, which is associated with an aggressive phenotype, is robust ( Figure 5F, lane7), as expected from a transcript that contains a strong native translation initiation site (MT2, Figure 5B). The presence of a weak overlapping uORF (uMT2a) is not sufficient to reduce T2:E4 expression. Isoforms T1:E2 and T1:E3 include the native ERG ATG on exon 3 (M3), and in principle should be efficiently expressed. However, as for the native ERG variants ERG-1a and ERG-1b, the M3 ATG is not used. Instead, usage of the M4 ATG in exon 4 leads to generation of the same ,50 KDa product described for ERG-1b. Similarly to ERG-1b, T1:E2 low expression levels might be due to the presence of the 3 uORF depicted in Figure 5D. Indeed T1:E3, which lacks exon 2 and the uORFs it carries, has a slight but reproducibly higher level of expression than T1:E2 ( Figure 5F, lanes 4 vs. 3). Isoform T1:E5 uses the good Kozak in exon 5 and is expressed efficiently, despite the presence of two uORFs (5a and 5b). The same 44 KDa protein is also generated by T2:E5, but less efficiently, due to the presence of two additional uORFs (T2a and MT2). A similar scenario plays out for the T3:E4 variant tested, which uses the weak ATG in exon 4. In this case, the presence of 4 inhibitory uORFs almost completely abrogates translation of the ,50 KDa product. Skipping of exon 5 generates the rare isoform T1:E6, a putative protein lacking the PNT domain but containing the Ets and transactivation domain. However, its expression is suppressed due to the weakness of the Kozak associated to the in-frame ATG in exon 6, and the presence of 4 inhibitory uORFs.
From these studies we can conclude that overall, the levels of transient expression of Tmprss2:ERG isoforms appear to correlate with their 59 UTR context. The strength of the Kozak sequence and the number and relative strength of the uORFs both contribute to translation efficiency of the ERG and Tmprss2:ERG variants, and should be taken into consideration as an additional layer of complexity when assessing the oncogenic potential of the fusion variants observed in clinical samples.
Indeed, in invasion and wound healing assays in NIH-3T3 cells stably expressing multiple fusion isoforms, we noticed that cell mobility correlated to the expression levels of the various isoforms ( Figure 6A-D). The same type of correlation was apparent in transient co-transfection experiments ( Figure 6E), where we measured the ability of the different isoforms to trans-activate a luciferase reporter under the control of either the VE-cadherin or the matrix metalloproteinase-1 (MMP-1) promoters, which are well established ERG targets [35,36].
Two additional isoforms, TE:7bpA and TE:12pA, which lack Ets and TAD functional domains, where included in the above analysis because they could potentially display modifying or even dominant-negative properties on ERG activity. Additionally, TE:7bpA RNA is overrepresented in tumors. However, TE:7bpA is expressed at very low levels both in transient and stable experiments ( Figure 5F, lane 10 and Figure 6A, lane 10), possibly because of intrinsic instability, making it impossible to reach any conclusion on its biological role. On the other hand, the robust expression of the structurally similar TE:12pA does not affect the rate of migration ( Figure 6). In addition, it does not drive expression of luciferase from the VE-Cadherin or the MMP1 promoters, nor it can interfere with the activity of full-length variants in promoting such expression ( Figure S3). Thus, at least under these conditions, this isoform behaves like a null mutant rather than a dominant-negative, and the Ets and/or TAD domains encoded by exon 11 are essential for the activity of ERG variants.

Specific Inhibition of Oncogenic ERG Variants
We have shown that both the common PCa fusion variant T1:E4 and the native PCa-overexpressed variant ERG-1b use the same ATG in exon 4, whereas the variant most abundant in normal tissues uses one in exon 1c. This raises the possibility to specifically target the cancer-associated ATG, without interference with the normally expressed ERG isoforms. To this end, we decide to use phosphorodiamidate morpholino oligomers (morpholinos), a class of antisense compounds that can be used in vitro and in vivo to modulate gene expression by interfering with splicing patterns, miRNA maturation or translation [37,38]. Morpholinos targeted directly at the translation initiation site can inhibit translation by hindering recognition by the ribosome [37]. On the other hand, compounds targeted downstream of the ATG are easily displaced by the translating ribosome and are thus ineffective (Fig. 7A). In the case of ERG, a morpholino targeted to the exon 4 ATG should thus prevent its usage but not that of an upstream ATG, such as for example that on exon 1c. Indeed, a compound (AS1) designed to bind to the region containing the ATG on exon 4 reduced expression of the T1:E4 variant in transient transfection experiments, whereas a similar compound (AS2) targeted to a region upstream did not ( Figure 7B). Most importantly, neither had any effect on the transient expression of ERG-1c, the most commonly expressed variant in non-tumor tissues ( Figure 7B).  Table S1. (E) Effect of independently mutating uORFs' ATG in ERG-1b: mutation of the first ATG in exon 2 releases suppression of translation and increases levels of ERG from the ATG in exon 4. (F) Expression of ERG and Tmprss2:ERG variants. Full-length cDNAs for the variants, including their entire 59 UTRs, were expressed and lysates from transiently transfected HeLa cells were analyzed by western blot. doi:10.1371/journal.pone.0049721.g005 Figure 6. Activity of ERG and Tmprss2:ERG variants. (A) Stable expression of ERG variants in NIH-3T3 cells. Expression was analyzed as in Fig 3. (B-C) Cells from (A) were assayed for their invasion potential using a matrigel invasion assay (B) and for their motility using a wound healing assay (C). Representative images are shown. (D) Quantification of (C), using ImageJ software ($4 independent experiments). Bars indicate standard deviations. (E) Transient transcriptional activation of ERG-dependent promoters. Firefly luciferase cDNA was cloned downstream of the VE-Cadherin or MMP-1 promoters as depicted. The VE-Cadherin (red) and the MMP-1 (blue) promoters contain respectively 4 and 11 Ets binding sites (blue boxes, not to scale). Luciferase reporters were transiently co-expressed in HeLa cells with Tmprss2:ERG variants. A renilla luciferase construct was used to normalize for variation in transfection efficiency. Dual-luciferase assay was performed and activity is represented as fold-activity over that of co-transfected empty vector. Data represent averages of at least 3 independent experiments, with bars indicating standard deviations. doi:10.1371/journal.pone.0049721.g006 Although further testing of these compounds are beyond the scope of the present study, they suggest that development of specific translation-blocking compounds that effectively and selectively reduce the levels of aberrantly expressed ERG isoforms without affecting normally expressed ones is feasible. Such compounds will serve as an important toolset to further our understanding of a pathway that is improperly activated in the majority of prostate cancers, and could form the basis of a future therapeutic approach.
In summary, the work presented here helps elucidating the complex scenario of ERG and tmprss2:ERG fusion variants by rationally organizing and characterize ERG variants. Furthermore, the observation that a specific subset of native (non-fusion) ERG transcripts is strongly activated in tumors, shifts the focus back to what contribution may come from the endogenous gene and underscores the possibility of a positive-feedback loop involving ERG and Tmprss2-ERG, which could conceivably lead to androgen-independence.
The identification of controlling 59UTR elements, in particular the translation initiation context and the presence and characteristics of uORFs, furthers our comprehension of the correlation between expressed variants and pathological characteristics. Besides, a correct understanding of what polypeptides are actually expressed from the different mRNAs would inform both in vitro and in vivo experiments with model animal systems.
Finally, we present proof-of-principle evidence that the cancerassociated oncogenic ERG variants could be pharmacologically inhibited without disturbing the functions of the 'normal' variants expressed in non-tumor tissues. An approach that could be in principle applied to other such oncogenic fusions.

Ethics Statement
All patients provided written informed consent and samples were procured and the study was conducted under Memorial Sloan-Kettering Cancer Center Institutional Review Board approval.

Samples
Total RNAs from prostate tissues expressing Tmprss2:ERG fusion transcripts were a generous gift from the late Dr. W. Gerald and were collected with the written consent from all participants, in accordance with MSKCC approved IRB protocols. Total RNAs from the normal panel of tissues were from Clontech (Human RNA Master Panel II).

PCR
Total RNA was extracted from cancer cell lines using Trizol (Invitrogen). Contaminating genomic DNA was removed by TURBO DNA-free (Ambion). cDNAs were generated using SuperScript III First-Strand Synthesis System (Invitrogen). RT-PCRs were performed for 35 cycles (30s@95uC, 30s@60uC, 90s@ 72uC), using cDNA derived from 45 ng of retrotranscribed RNA (RNA-equivalents), 0.2 mM primers and Platinum Taq Pol (Invitrogen). qPCR were performed using ,24 ng RNA-equivalents from the normal tissues or PCa cell lines, or ,2.4 ng RNAequivalents from the PCA samples and Platinum SYBR Green qPCR Supermix (Invitrogen). The mean cycle threshold (C t ) for each sample was normalized to the expression of GAPDH. Primers from IDT.

Plasmids and Site-directed Mutagenesis/Exon Deletion Constructs
cDNAs of ERG-1a, -1b, -1bD4, -1c, -1cD4, T1:E4 and TED7B were amplified from the VCap cell line and subcloned into pcDNA3.1(+) or pLPC retroviral vector. cDNAs include the entire 59UTR region and the ORF until the stop codon in exon 11 of ERG. cDNAs of the fusion isoforms were all generated by RT-PCR from the prostate tumors (except T1:E5 and T1:E6, generated by exon deletion of T1:E4). Overlap extension strategy was employed for exon deletions and point mutations.

Invasion Assay
Cell culture transwell inserts (24-well format, 8 mM pores, Falcon) were coated with 2 mg Matrigel Basement Membrane Matrix (BD Biosciences). NIH-3T3 and IMR90 stable clones were plated at 1610 5 cells/well in serum-free media. Medium containing 10% serum was added to the lower chamber as chemoattractant, and cells were incubated for 6 hs at 37uC, 5%CO 2 . Invading cells on the lower surface of the transwell were fixed with 20% methanol and stained with 0.5% crystal violet.
Wound Healing Assay 5610 5 NIH-3T3 or IMR90 cells were seeded in 6-well plates and starved 16 h in media containing 2% serum. The cell layer was disrupted by introducing a scratch (wound), rinsed with PBS and then fresh media (2%FBS) was added. Cell migration was monitored every 6/8 hours. Quantification of the area covered by migrating cells was assessed with ImageJ software.

Growth Curve
NIH-3T3 and IMR90 cells were infected with the indicated retroviruses. After drug selection, cells were seeded in 48-well plates. Cells were fixed and stained with 1% crystal violet for 30 minutes at room temperature. The plates were then washed extensively and rigorously to remove excess dye and dried. The dye taken up by cells was eluted in 0.1 N HCl for 60 min at 37uC and the A 590 of the acid-extracted stain was measured on a plate reader (Molecular Devices). Each point of the growth curve experiments was calculated from 4-6 wells.

Cell Death Experiment
NIH-3T3 and IMR90 cells were infected with the indicated retroviruses. After drug selection, cells were allowed to recover for 24h and seeded on 12-well plates. After 24h, cells were washed with PBS and grown for 24 h in medium containing 0.1% (v/v) FBS. Cells were then trypsinized and dead cells were stained with trypan blue.

Luciferase Reporter Gene Assay
The VE-cadherin and MMP-1 promoters were PCR amplified from human genomic DNA, and cloned into the pGL3 Luciferase reporter vector (Promega). HeLa cells were seeded at 1.5610 4 cells/well in 48-well plates and transfected with 0.5 ml Fugene6, 100ng of the fusion isoform plasmid, 390 ng Firefly reporter construct and 10 ng of Renilla-TK as a co-reporter. 24 h posttransfection, luciferase activity was assessed with the Dual-Luciferase Reporter Assay System (Promega) in a LMaxII reader (Molecular Devices). 50 ml of LARII reagent were added, followed 10 sec later by 30 sec reading time. Stop&Glo (50 ml) reagent was then added to quench the firefly signal for 5 seconds, followed by a 2 sec reading time of the renilla activity.

Morpholino Treatment
Morpholinos (Gene Tools) were dissolved at 1 mM in 100 mM Tris-HCl (pH 8). HeLa cells were transfected for 3 hs (1610 6 cells, 100 mm plate, 6 mg plasmid DNA), trypsinized, seeded in a 24well plate at 3610 4 cells/well and left to adhere for ,2 hours. 20 mM morpholinos (in complete DMEM) were added first, followed by 6 mM Endoporter (Gene Tools) to facilitate uptake, and incubated for 24 hours. AS1 59-CGGTCATCTCTGTCT-TAGCCAG-39; AS2 59-TCCGTAGGCACACTCAAACAAC-39. Figure S1 This figure represents an alternative depiction of the data in Figure 2, to emphasize comparison of data points within samples. (A-C) Quantification by qPCR of promoter usage. Primer sets specific for the 3 different ERG promoters and for the Tmprss2:ERG fusion where used, along with a primer set spanning exons 5-7 to quantify total ERG. Normal tissues do not express the fusion product. Expression of variants 1a and 1c is virtually undetectable in the PCa cell lines analyzed. For panel C, open symbols indicate LnCap, DU145 and C4-2 (not expressing Tmprss2:ERG fusion), full symbols indicate VCap and NCI-H660 (expressing Tmprss2:ERG fusion). Promoter P C is the most active in normal tissues, while promoter P B is the most active in cancer tissue and the only one active in prostate cancer cell lines. NCI-H660 cells only express ERG from the Tmprss2:ERG fusion because the fusion is present on both alleles and therefore the natural ERG promoters are completely absent. (D-F) Quantification by qPCR of alternative polyadenylation usage. Primer sets specific for the 3 different polyA sites where used, along with a primer set to quantify total ERG and one set to quantify total exon 11 levels in order to infer 11SpA usage. The distal PolyA site on exon 11 (11LpA) is the most active in normal tissues, while PolyA site 7b is strongly activated in tumors and in prostate cancer cell lines that carry the fusion. The proximal site on exon 11 (11SpA) is barely used under any circumstance, although the evidence is indirect. (G) Approximate location of the primers used for the amplifications. In all cases, each indicated values represent averages of $3 independent experiments and is presented as DC(t) normalized to the housekeeping gene GAPDH, therefore a ''high'' DC(t) value means low levels of expression and a ''low'' value means high level of expression. Horizontal bars indicate the mean. Asterisks indicate that the product was not detected in the sample and would therefore be equivalent to an experimental point at the top of the table, but it is not reflected by the mean. (TIF) Figure S2 Effect of ERG 1b/T1:E4 ERG fusion variant on migration and invasion of IMR90 cells. Following drug selection, ERG 1b/T1:E4 -overexpressing clones or empty-vector control (pLPC) were assayed for their migration and invasion potential using a transwell migration (A) or matrigel invasion assay (B), as detailed in Material and Methods. (TIF) Figure S3 Activity of truncated fusion isoforms lacking Ets and TAD domains. Transient transcriptional activation of ERG-dependent VE-Cadherin promoter. The luciferase reporter was transiently co-expressed in HeLa cells with full-length T1:E4 or/and truncated TE:7bpA and TE:12pA variants (plus a Renilla luciferase vector to normalize for variation in transfection efficiency). Dual-luciferase assay was performed and activity is represented as fold-activity over that of co-transfected empty vector. Averages of at least 3 independent experiments, with standard deviations, are represented. The truncated isoforms alone cannot induce expression of luciferase, and when coexpressed together with the full-length protein they fail to inhibit its activity, ruling out for them a dominant-negative role. Further experiments would be required to reach more definitive conclusions.

(TIF)
Table S1 Context and characteristics of uORF and inframe ATG in ERG variants. ATGs from the 59 region of various ERG variants are listed. ATG in frame with ERG are indicated in red and preceded by a 'M' ( = Met), ATG resulting in uORF are indicated in black, with the number indicating the exon that harbors them. The predicted length of the ORF/uORF (in amino acids and KDa) is reported, along with the distance (in nucleotides) from the translated ATG, whether the uORF overlaps with the translated ORF, and the ATG context. The context is considered 'strong', if both the determinant positions at 23 (G/A) and +4 (G) are conserved, and 'weak' if not. (TIF)