Identification of New Splice Sites Used for Generation of rev Transcripts in Human Immunodeficiency Virus Type 1 Subtype C Primary Isolates

The HIV-1 primary transcript undergoes a complex splicing process by which more than 40 different spliced RNAs are generated. One of the factors contributing to HIV-1 splicing complexity is the multiplicity of 3′ splice sites (3'ss) used for generation of rev RNAs, with two 3'ss, A4a and A4b, being most commonly used, a third site, A4c, used less frequently, and two additional sites, A4d and A4e, reported in only two and one isolates, respectively. HIV-1 splicing has been analyzed mostly in subtype B isolates, and data on other group M clades are lacking. Here we examine splice site usage in three primary isolates of subtype C, the most prevalent clade in the HIV-1 pandemic, by using an in vitro infection assay of peripheral blood mononuclear cells. Viral spliced RNAs were identified by RT-PCR amplification using a fluorescently-labeled primer and software analyses and by cloning and sequencing the amplified products. The results revealed that splice site usage for generation of rev transcripts in subtype C differs from that reported for subtype B, with most rev RNAs using two previously unreported 3'ss, one located 7 nucleotides upstream of 3'ss A4a, designated A4f, preferentially used by two isolates, and another located 14 nucleotides upstream of 3'ss A4c, designated A4g, preferentially used by the third isolate. A new 5′ splice site, designated D2a, was also identified in one virus. Usage of the newly identified splice sites is consistent with sequence features commonly found in subtype C viruses. These results show that splice site usage may differ between HIV-1 subtypes.

Previous studies on HIV-1 splicing have been done almost exclusively using subtype B viruses, usually T-cell line-adapted isolates. To our knowledge, non-subtype B viruses reported to be analyzed for splicing patterns are limited to two group O viruses [8,17]. Here we analyze splice site usage by primary isolates of subtype C, the most prevalent clade in the HIV-1 pandemic [18], using an in vitro infection assay of peripheral blood mononuclear cells (PBMCs).

Materials and Methods
Three subtype C primary isolates, X1702-3, X1936, and X2363-2 [19,20], were used for infection of PBMCs, obtained from healthy donors, who gave their written informed consent. For each isolate, infection assays were done in triplicate using PBMCs from three different donors. The subtype B isolate NL4-3 was used as control in one of the assays. PBMCs were prestimulated with phytohemagglutinin and interleukin-2 for three days and exposed to virus at a multiplicity of infection of 0.1 50% tissue culture infectious dose (TCID 50 ) per cell for 2 h, followed by two washes with phosphate-buffered saline. Cells were collected on days 1, 2, 3, 4, and 7 postinfection and total RNA was extracted. HIV-1 splicing patterns were analyzed through RT-PCR followed by nested PCR, using primers recognizing sequences in the outermost exons common to either all DS or SS HIV-1 RNAs, yielding amplified products of different sizes according to the splice sites used for generation of the transcripts. Reagents and PCR conditions were similar to those described previously [10], except that in the nested PCR 15 cycles were used, the sense primer was US22 [CTCGACGCAGGACTCGGCTTGC, HXB2 nucleotides (nt) 685-706], and for DS RNAs the antisense primer was TRN-AS (CGGTGGTAGCTGAARAGGCACAG, HXB2 nt 8511-8533). US22 was 59-labeled with VIC fluorophore, which allowed for analysis of the amplified products electrophoresed in an automated sequencer by using GeneMapper software program (Applied Biosystems, Carlsbad, CA), which can accurately determine sizes of PCR products by running a size standard labeled with a different fluorophore in the same capillary and quantify them by measuring peak areas. Identification of PCR products with sizes different from those expected by the use of known splice sites was done through TA cloning and sequencing of the amplified products.
Since most peaks with unexpected sizes were close to those predicted for known rev transcripts, and those corresponding to RNAs using 3'ss A4a and A4b were either undetected or relatively weak, we suspected that the unidentified peaks corresponded to rev transcripts using previously unreported splice sites. To examine this possibility, nested PCRs using the antisense primer TatRev-AS (GCTTCTTCCTGCCATAGGAGATGC, HXB2 nt 5961-5984) recognizing a sequence downstream of A4b and upstream of A5, able to amplify all known rev transcripts, in addition to tat and vpr (but not nef) RNAs, were done using RT-PCR products derived from DS transcripts from PBMCs collected on day 2 postinfection. In all three subtype C viruses, the analyses of sequences of the cloned products revealed the preferential usage of previously unreported 3'ss for generation of rev RNAs located at positions in the HIV-1 genome consistent with peaks detected with Gene-Mapper (Fig. 3, Table 1). In X1702-3 and X1936, rev RNAs preferentially used a 3'ss at HXB2 position 5948, 7 nt upstream of A4a, which was designated A4f (named consecutively after A4d, identified in one isolate of subtype B and one of group O, and A4e, identified in a group O virus [8]). A4f was used in 20 (90.9%) of 22 rev clones in X1702-3 and in 18 (94.7%) of 19 rev clones in X1936, with the remaining rev transcripts using A4c. In X2363-2, all 12 analyzed rev clones used a 3'ss at HXB2 position 5923, 14 nt upstream of A4c, which was designated A4g (splicing at this site does not create a new open reading frame, since there is no AUG between it and the Rev initiation codon). One clone of X2363-2 contained three noncoding exons upstream of A4g, corresponding to exon 1, a second exon 91 nt long using 3'ss A1 and a newly identified 5'ss at HXB2 position 5003, 41 nt downstream of 5'ss D2 (which was designated D2a), and exon 3. The proportion of rev transcripts using A4f in X1702-3 and X1936 and A4g in X2363, as determined by clone sequencing, was generally consistent with quantification of peak areas in GeneMapper analyses ( Table 2). Sequencing of clones of PCR products derived from SS RNAs also revealed the usage of A4f in X1702-3 and X1936 and of A4g in X2363 (results not shown).
We examined sequence features surrounding the newly identified splice sites that could explain different splice site usage by the subtype C isolates, compared to subtype B (Fig. 4). The usual elements of the metazoan 3'ss include an AG at the 39 end of the intron, a branch point site (BPS), usually 18-40 nt upstream of the AG, whose sequence is weakly conserved among mammalians (in humans, the consensus sequence is simply yUnAy, where the underline denotes the branch point, and lowercase pyrimidines are less conserved than the uppercase U and A [21,22]), and a polypirimidine tract (PPT) downstream of the BPS. All three subtype C isolates have the AG and a PPT with 8 pyrimidines (UUUGUUUUC) (interrupted by a purine, similarly to all HIV-1 3'ss, which are suboptimal due to interspersed purines [11][12][13][14]) upstream of A4f. All also have an AG 59-adjacent to A4g, but only X2363-2 has a sequence with 5 pyrimidines (UCUUGC) just upstream of this AG and one with 7 pyrimidines (CUCCUUGU) 34 to 27 nt upstream of A4g, which may contribute to preferential usage of this site in X2363-2 but not in X1702-3 and X1936. Among full-length HIV-1 genomes [23], sequence features consistent with potential usage of A4f and A4g are common in subtype C viruses, but are rare in other subtypes. Thus, among subtype C viruses, the AG adjacent to A4f is found in 86%, and an upstream PPT with 8 pyrimidines in 97% viruses, while the AG adjacent to A4g is found in 87% sequences, with a PPT of 5 pyrimidines just upstream of this AG in 3%, and one of 5 or 6 consecutive pyrimidines within 40 nt upstream of A4g in 60%. In a previous study, four branch points used for generation of rev transcripts were identified in the subtype B isolate NL4-3, two for splicing at 3'ss A4a and A4b and two for splicing at 3'ss A4c [14] (Fig. 4a). Three of these branch points were also shown to be used by the subtype B isolate SF2 [8]. One of these BPS, located 20 nt upstream of A4f, could Green peaks represent PCR products and orange peaks represent size standards. Size of PCR product, encoded gene, and exon composition (named as in previous studies [1,2]) predicted according to the size of the PCR product are shown on top or on the side of each peak. Peaks whose sizes do not match HIV-1 transcripts using previously reported splice sites are marked with interrogation signs. For each subtype C virus, three GeneMapper analyses are shown, corresponding to infections using PBMCs from three different donors. doi:10.1371/journal.pone.0030574.g002 potentially be used for splicing in X1702 and X1936, which have the conserved BPS motif UnA at this site [21,22]. By contrast, in X2363-2 a C is found at position -2 from the potential branch point which may explain the infrequent use of A4f in this isolate. With regard to A4g, potential BPS are those identified in NL4-3 and SF2 [8,14], used for splicing at A4c, located 10 and 16 nt, respectively, upstream of A4g. At both sites, the sequence in X2363-2 contains the UnA motif, whereas X1702-3 and X1936 have Cs at position -2 from the branch sites identified in subtype B viruses. If the PPT located 34-27 nt upstream of A4g is the one used for splicing at this site, there is one possible BPS just upstream of this PPT with sequence ACCUAAA, which has 4 consecutive nt complementary to U2 snRNP (underlined) (Fig. 4a), whose base-pairing to the BPS is an important step in mRNA splicing [24,25]. The sequence analyses therefore may explain differential 3'ss usage for rev RNA generation between subtype B and subtype C viruses, and, within subtype C, between different isolates, and suggest the locations of potential BPS used for newly identified 3'ss in subtype C viruses. However BPS locations need to be experimentally determined, as multiple factors in addition to the weakly conserved BPS sequence, including PPT sequence, length, and proximity to the BPS [26-  29], and the presence of nearby splice enhancer and suppressor elements [16,30], may influence BPS selection. With regard to D2a, occasionally used in X2363-2, the sequence is AAG|GUAGUA (the vertical line indicates the exon-intron border), which has 5 potential base-pairings with U1 snRNA (underlined) (Fig. 4b). Previous studies have shown that the strength of a 5'ss correlates with the stability of its interaction with U1 sRNA [31,32], which for D2a may be similar to D2, which also has 5 potential base-pairings with U1 snRNA. The D2a sequence in X2363-2 coincides with the consensus of most subtypes, except B and H. The subtype B consensus is AAA|GUAGUA, whose predictable weak interaction with U1 snRNA, with only 4 potential discontinuous base-pairings (underlined), may preclude its usage as 5'ss.

Discussion
This study is the first to analyze splice site usage by viruses of HIV-1 subtype C, which is the most prevalent clade in the HIV-1 pandemic, estimated to represent around 48% global infections [18]. The most notable finding is that subtype C primary isolates, in contrast to subtype B viruses, rarely use 3'ss A4a and A4b for generation of rev transcripts, and, instead, they preferentially use two previously unreported 3'ss, designated A4f and A4g, located, respectively, 7 nt upstream of A4a and 14 nt upstream of A4c.
Usage of these splice sites is consistent with sequence features commonly found in viruses of subtype C, which frequently contain an AG dinucleotide at the intron's end adjacent to the newly identified splice sites, as well as upstream PPT and sequences with potential to be used as branch points. The infrequent usage of A4a and A4b in subtype C viruses may derive from the linear scanning mechanism for 3'ss recognition [33], whereby the nt after the first AG downstream of the BPS is preferentially selected as splice site. Although the mammalian BPS sequence is highly variable [21,22,25], it contains two conserved positions, corresponding to the A at the branch site and the U two nt upstream of it [21,22,34]. In two isolates, X1702 and X1936, a potential BPS would be one previously identified in the subtype B isolates NL4-3 and SF2 [8,14], used for splicing at A4a and A4b, located 20 nt upstream of A4f (Fig. 4a). Although the sequence in X1702 and X1936 at this BPS differs from that of NL4-3 in two nt, the conserved UnA motif is maintained, and at position +1 from the branch site there is one additional potential G-C base-pairing with U2 snRNP, whose complementarity to the BPS has been shown to correlate positively with splicing efficiency [24,35]. The first AG encountered downstream of this BPS in X1702-3 and X1936 is that immediately upstream of A4f, and this would explain the preferential usage of this splice site over A4a and A4b in these isolates. In the third subtype C isolate, X2363-2, failure to use A4f may derive from sequence changes at the previously mentioned Table 2. Relative expression of rev RNAs in subtype C viruses according to peak areas in GeneMapper analyses.

Isolate
PCR product size Exon composition % total rev RNA peak area % rev clones Results correspond to peaks shown in Fig. 2, and are shown as % of individual peak areas relative to the sum of peak areas of all rev RNA-derived products. Percentages at the column on the right correspond to the cloned and sequenced rev RNA-derived amplicons ( Table 1). 1 A small 331 nt peak, coincident with that of 1.4g.7 rev RNA, was seen in X1702-3 and X1936 (Fig. 2). However, nested PCR using an antisense primer specific for rev, tat and vpr RNAs failed to detect 1.4g.7 rev RNA in these isolates. 2 Nested PCR with primers recognizing exons 2 and 3 allowed to confirm that these products, only 1 nt longer in X1702-3 and X1936 than in X2363-2, correspond to 1.3.4f.7 in the first two viruses and to 1.2.4g.7 in the third one. 3 The 367 nt peak seen in X2363-2 may correspond to both 1. BPS, with C substituting for U at position -2 from the branch site identified in subtype B viruses. The sequence at a second BPS previously identified in NL4-3 for splicing at A4a and A4b, located 6 nt downstream of the previous one, also may fail to function as BPS in X2363-2, because the A used as BPS is substituted for U (Fig. 4a). Although the sequence at a potential branch site may determine its use by the splicing machinery, it is important to note, as stated above, that it is only one factor among others, which also include the PPT sequence, length and proximity to the BPS [26][27][28][29] and the presence of nearby splice enhancer and suppressor elements [16,30], contributing to the selection of the BPS, whose actual location needs to be determined experimentally. The reason A4c is not used more frequently in the analyzed subtype C viruses may derive from weak PPT, which contain 3 or 4 purines interspersed among 8 or 9 pyrimidines. These sequences, in spite of lacking runs of pyrimidines longer than 3 nt, could still act as functional PPT, in accordance with a previous study showing that a stretch of alternating purines and pyrimidines can promote branch point selection [29]. The close proximity of this PPT to the downstream AG [29] and the presence of an exonic splice enhancer (GAR ESE) at exon 5 [36] could also contribute to render this weak PPT functional. In X2363-2, the scanning mechanism selecting A4g as 3'ss would also explain the infrequent usage of A4c and other downstream 3'ss.
Occasional use of a new 5'ss, designated D2a, located 41 nt downstream of D2, was also observed in one subtype C isolate, X2363-2. Usage of D2a is also consistent with sequences present in this isolate and in most subtype C viruses, which have greater complementarity with U1 snRNA at this site relative to subtype B viruses. In addition, the usage of D2a as an alternative to D2 in subtype C may be favored by the fact that D2 is a suboptimal 5'ss [15]. Its less frequent usage relative to D2 may derive from the scanning mechanism proposed for recognition of the 5'ss, whereby among several consecutive potential sites, the 59-most site is usually selected [37].
With the newly identified sites, seven 3'ss have been reported to be used in HIV-1 for rev RNA generation, which, in addition to the commonly used A4a, A4b, and A4c, also include A4d, located 5 nt upstream of A4a, reported in the subtype B isolate SF2 and the group O virus ANT70C [8] [and also preferentially used by one additional subtype B primary isolate studied by us (unpublished data)], and A4e, located 1 nt upstream of A4a, reported in ANT70C [8] (and, according to the presence of an intronic AG dinucleotide adjacent to the A4e site, also predicted to be used by most subtype F and CRF02_AG viruses). Such multiplicity of 3'ss used for rev RNA generation may derive from the facts that rev 39 splice sites are located in the first coding exon of Tat, which is one of the most variable HIV-1 proteins [38], and that HIV-1 replication is absolutely dependent on Rev, whose absence cannot be compensated by viruses from other infected cells, as occurs with Tat, which can be secreted extracellularly and activate HIV-1 transcription in neighboring cells [39].
Previously reported in vitro biological features which may differ between HIV-1 subtypes include the response of the transcriptional promoter to tumor necrosis factor-alpha [40][41][42][43][44], replicative capacity [45,46], use of coreceptors [47][48][49][50][51], and activity of . Intronic and exonic sequences surrounding newly identified splice sites in three subtype C isolates. Sequences are aligned with consensuses of subtypes B and C. (a) Sequences surrounding 3'ss A4f and A4g. AG dinucleotides in the intron ends adjacent to splice sites are in bold type. Polypyrimidine tracts potentially used for splicing at A4f and A4g are boxed. The sequences of subtype B NL4-3 and SF2 isolates are on bottom with branch sites previously identified for rev RNA splicing [8,14] underlined. Nucleotides in the subtype C isolates and in the consensus subtype C sequence potentially used as branch points for splicing at A4f and A4g (see main text) are indicated with arrows. (b) Sequences surrounding 5'ss D2 and D2a. Exon-intron borders are signaled with vertical lines. Highly conserved GU dinucleotides at intron ends adjacent to the 5'ss are in bold type. Nucleotides at splice sites potentially pairing with U1 snRNA are underlined. doi:10.1371/journal.pone.0030574.g004 reverse transcriptase [52]. The results here reported add one more biological feature in which HIV-1 subtypes may differ, which is the usage of RNA splice sites.