Analysis of the Trypanosoma brucei EATRO 164 Bloodstream Guide RNA Transcriptome

The mitochondrial genome of Trypanosoma brucei contains many cryptogenes that must be extensively edited following transcription. The RNA editing process is directed by guide RNAs (gRNAs) that encode the information for the specific insertion and deletion of uridylates required to generate translatable mRNAs. We have deep sequenced the gRNA transcriptome from the bloodstream form of the EATRO 164 cell line. Using conventionally accepted fully edited mRNA sequences, ~1 million gRNAs were identified. In contrast, over 3 million reads were identified in our insect stage gRNA transcriptome. A comparison of the two life cycle transcriptomes show an overall ratio of procyclic to bloodstream gRNA reads of 3.5:1. This ratio varies significantly by gene and by gRNA populations within genes. The variation in the abundance of the initiating gRNAs for each gene, however, displays a trend that correlates with the developmental pattern of edited gene expression. A comparison of related major classes from each transcriptome revealed a median value of ten single nucleotide variations per gRNA. Nucleotide variations were much less likely to occur in the consecutive Watson-Crick anchor region, indicating a very strong bias against G:U base pairs in this region. This work indicates that gRNAs are expressed during both life cycle stages, and that differential editing patterns observed for the different mitochondrial mRNA transcripts are not due to the presence or absence of gRNAs. However, the abundance of certain gRNAs may be important in the developmental regulation of RNA editing.


Introduction
The life cycle of Trypanosoma brucei involves two distinct environments, the animal host and the insect vector. These environments are distinct in temperature and nutrient composition, providing a unique challenge to T. brucei as it cycles between hosts. In the bloodstream, trypanosomes exist in two forms, the actively dividing slender form and the non-dividing stumpy form. The slender form is optimized to utilize its glucose rich environment, using glycolysis to generate energy [1]. The stumpy form appears to be transitional, activating mitochondrial genes in preparation for uptake in a blood meal by its tsetse fly vector and subsequent transfer to a harsher environment [1]. Once inside the tsetse fly, the parasite utilizes proline to drive oxidative phosphorylation and ATP production in the mitochondrion [2]. While the activity of the mitochondrion is relatively low during the bloodstream stage (BS), expression of the mitochondrial genome is still essential [3,4]. In T. brucei, the mitochondrial genome consists of two types of DNA molecules, maxicircles and minicircles. Maxicircles are 22kb circular DNA that contain the genes for two ribosomal RNAs, 12S and 9S, and eighteen mRNA genes [5]. While some of the protein-coding genes do not require RNA editing prior to translation, most require extensive editing before they can be translated [for review see 6,7]. This process involves the insertion of hundreds of uridylates (U)s and less frequently deletion of Us, often doubling the size of the transcript. The sequence changes are guided by small complementary RNA molecules (the guide RNAs) that are encoded on the minicircles [8]. Minicircles make up the bulk of the kinetoplastid network (anywhere from 5,000-10,000 present in each network) with each minicircle encoding 3-5 gRNAs. In T. brucei, there are more than 200 different minicircle sequence classes (~1200 gRNAs) [8].
Distinct differences in mitochondrial transcript abundance, polyadenylation and the extent of RNA editing are observed during the complex life cycle ( Table 1). The pattern of differential RNA editing observed is especially interesting. For example, the cytochrome b (CYb) and cytochrome oxidase II (COII) mRNAs are edited during the insect stage, but are primarily unedited in bloodstream forms [9,10]. In contrast, editing of the NADH dehydrogenase subunit transcripts (ND3, ND7, ND8 and ND9) and editing of the ribosomal protein subunit 12 transcript (RPS12) appears to occur preferentially in bloodstream forms [5,[11][12][13][14][15]. Other transcripts, cytochrome oxidase III (COIII) and ATPase subunit 6 (A6) are edited in both life cycle stages [16,17]. Early studies using both Northern blot and primer extension analyses on a limited number of gRNAs indicate that gRNAs are present in both insect and bloodstream forms, suggesting that the regulation of RNA editing is not at the level of gRNA availability [13,18,19]. Our lab has previously published deep sequencing results of the gRNA transcriptome of the T. brucei EATRO 164 procyclic form [20]. Here we present the deep sequencing data for the gRNA transcriptome of a bloodstream form of EATRO 164. A total of 211 populations of gRNAs were identified. We define a population as a group of gRNAs that may vary in sequence, but direct the editing of the same or near same region of the mRNA. Because kinetoplastid RNA editing allows G:U base pairing, most populations contain multiple sequence classes that can guide the generation of the same mRNA sequence. While the number of populations identified was similar to the number identified in the procyclic gRNA transcriptome (214 populations), the total number of gRNAs identified was much reduced and the coverage was less complete; full complements of gRNAs were only identified for COIII and CYb. In spite of the reduced number of gRNAs, an interesting correlation was found that suggests a relationship between the relative abundance of initiating gRNAs between stages and the developmental pattern of mRNA editing.

Materials and Methods
Parasites, isolation of mitochondria and RNA extraction T. brucei brucei clone IsTar from stock EATRO 164 were grown in rats and isolated as previously described [37]. Bloodstream forms were virtually all long-slender forms isolated after 4 days of infection. Parasites were used immediately for isolation of mitochondria using differential centrifugation as previously described or stored frozen at -80°C until RNA extraction [20]. Both total RNA from whole parasites and mitochondrial RNA (mtRNA) from purified mitochondria were isolated by the acid guanidinium-phenol-chloroform method [38].

Ethics statement
Rats were raised according to the animal husbandry guidelines established by Michigan State University. All vertebrate animal use procedures were approved by MSU's Institutional Animal Care and Use Committee (Application 03/11-051-00). MSU has filed with the Office of Laboratory Animal Welfare (OLAW) an assurance document that commits the university to compliance with NIH policy and the Guide for the Care and Use of laboratory Animals.

Library preparation and Illumina sequencing
Samples of mtRNA and total RNA were both treated with DNAse RQI and size fractioned on a polyacrylamide gel as previously described [20]. Guide RNAs were extracted from the gel and prepped for sequencing using the Illumina 'Small RNA' protocol as previously described [20]. Libraries from both mtRNA and total RNA samples were deep sequenced on Illumina GAIIx. Reads were then processed and trimmed as previously described [20]. Data with two or more Ns, shorter than 20nts after trimming or with an overall mean Q-score < 25 were discarded. Redundant reads were then removed, while maintaining the number of redundant reads and reads containing fewer than 4 consecutive Ts were removed.

Identification of gRNAs
To identify gRNAs, each transcript read was aligned to the conventionally edited mRNAs based on known base pairing rules (canonical Watson-Crick base pairs and the G-U base pair). In the initial screen, no gaps were allowed in the alignment, allowing the formulation of the gRNA-mRNA alignment as an extended longest common substring (LCS) problem as previously described [10]. Matched gRNAs were then scored (two points for G:C and A:U base pairs and one point for G:U base pairs). gRNAs with scores >45 were identified as guiding a specific region based on the identified mRNA fully edited sequence. Additional searches with reduced stringency (scores >30) were performed on regions with low gRNA coverage. The matched gRNAs were sorted into populations based on their guiding positions, and the populations analyzed and sorted into major sequence classes.
It was also reported that the developmental regulation was not controlled by gRNA availability, as gRNAs were found in both life cycle stages [13,18,19]. In these early studies, however, only a small number of gRNAs were investigated. In this study, we used deep sequencing to compare the gRNA transcriptomes of a bloodstream form to a procyclic form of T. brucei EATRO 164. The EATRO 164 strain was isolated in 1960 from Alcephalus lichtensteini and maintained in the lab of Dr. K. Vickerman until being obtained by Dr. Stuart in 1966 [39]. Dr. Stuart derived the procyclic form from the Bloodstream culture in 1979 [39]. Both cell lines have been maintained in separate culture since that time.
Trypanosomes from the EATRO 164 strain were grown in Wistar rats to a parasitemia of 1-2 x 10 9 trypanosomes per mL and isolated using DEAE cellulose columns. Mitochondria and gRNAs were purified as previously described [20]. Libraries were generated using gRNAs isolated from whole cell RNA and gRNAs isolated from mitochondrial RNA. Both bloodstream gRNA libraries were searched using conventionally accepted fully edited mRNA sequences, and a total of 1,024,604 gRNA reads were identified. Surprisingly, the library generated using gRNAs isolated from whole cell RNA had more than twice as many identified gRNA reads as the data generated using gRNAs isolated from mitochondrial RNA. To insure sufficient abundance and gRNA coverage, the two data sets were combined for the analyses presented here. In contrast, over 3 million gRNA reads were identified in our procyclic gRNA transcriptome generated from gRNAs isolated from mitochondrial RNA. Of the 1,024,604 reads identified from the bloodstream transcriptomes, 982,450 reads were sorted into major sequence classes.
The overall ratio of identified procyclic gRNA reads to BS gRNA reads was 3.5:1. This ratio varies significantly by gene (Table 2), and by populations within genes (S1 Fig) and, except for the initiating gRNA, no apparent trend relating gRNA abundance and developmental editing pattern was observed. Interestingly, for the initiating gRNA, mRNAs that are fully edited in the procyclic stage only, or are fully edited in both life cycle stages had initiating gRNAs with more reads in the procyclic data set (Fig 1) [16,17,27,33]. In contrast, mRNAs that are only fully edited or are more abundant in the BS, had more initiating gRNAs reads in the BS data set (Fig 1) [11][12][13][14][15].
Because the identified gRNAs from the BS cells were less abundant, the rule used to identify major gRNA sequence classes was relaxed. Instead of using a strict cut off for the minimum number of reads required, the cut off was assessed on a case-by-case basis. For example, if the total population only had 100 reads, a sequence class with only 10 reads would still be identified as a major sequence class. Once all major classes were identified, 657 sequence classes were identified that could be sorted into 211 populations (Table 3). Although the overall gRNA numbers were down in comparison to the procyclic data set, most of the populations found in that stage (214 gRNA populations) were also identified in the BS transcriptome. However, there were a number of populations that were unique to either the procyclic or BS stage. Surprisingly, when the bloodstream and procyclic data sets were compared, only 37 identical major sequence classes were found in both. However, distinctly related sequence classes could be identified when comparing the BS and procyclic populations. Comparing the related  [16,17,27,33]. mRNAs to the right of the dashed line are only fully edited or more abundant fully edited in the bloodstream stage [11][12][13][14][15].
doi:10.1371/journal.pntd.0004793.g001  major classes from each transcriptome (BS vs procyclic) revealed a median value of ten single nucleotide variations per gRNA. Interestingly, nt variations were much less likely to occur in the consecutive Watson-Crick anchor region of the gRNA than in the rest of the gRNA indicating a very strong bias against G:U base pairs in this region (Fig 2). The Watson-Crick anchors (defined as the number of consecutive nts in the 5' region with only G:C and A:U base pairs) had a median length of eleven nucleotides and anchor length did not vary between the two forms. The vast majority of major classes of gRNAs had consecutive Watson-Crick anchors greater than seven nts long (92.5%). In addition, most gRNAs with Watson-Crick anchors shorter than eight nts were not an abundant major class for their respective populations. Consistent with observations made from the procyclic data set, most gRNAs had zero non-base pairing nucleotides 5' to the poly-uridine tail and 4 to 6 non-base pairing nucleotides 5' to the anchor region (Fig 3). Also consistent with procyclic data, most of  the gRNAs (59%) had 38 to 48 nts of complementarity (including anchor regions) with their respective mRNAs (Fig 4). Transcription start sites also did not vary, as preference for an RYAYA start site was observed (Table 4).

Coverage and gaps
In order to determine if the BS gRNA transcriptome contained a full complement of guide RNAs, the gRNA populations were aligned to the fully edited mRNAs (S2 Fig). We note, that for an mRNA to be fully edited, not only must all editing sites on the mRNA be covered by a gRNA, the downstream gRNA must generate the anchor binding site for the subsequent gRNA. Therefore, adjacent gRNAs must overlap. Overall, there was an average of 17 nts of overlap between adjacent gRNAs, with the average overlap varying slightly by gene ( Table 3). As the median Watson-Crick Anchor is 11 nts, in most cases, the overlap extends beyond the Watson-Crick anchor of the subsequent gRNA. However, we did observe a number of regions where the overlap is minimal. Currently, there is no data that stipulates the minimum anchor needed for efficient editing. However, we postulate that similar to microRNAs, for an  anchoring sequence to be sufficiently specific, it should be at least six nucleotides [40]. Indeed, when examining the overlaps between most gRNAs, there are only ten (four procyclic and six BS) that are less than six nucleotides (Fig 5).
We therefore used six nucleotides as a cut off to identify regions with potential missing guide RNAs for both life cycle stage transcriptomes. In contrast to the procyclic data, where full complements of gRNAs were identified for five of the mRNA transcripts (A6, COIII, CR4, CYb, and RPS12), in the BS transcriptome, a full complement of gRNAs was only identified for COIII and CYb. Overall, there are 12 edited regions where no gRNAs were identified, and five regions with weak gRNA overlaps in the BS data (Table 5). Of these 17 regions, seven belong to ND7 alone. Interestingly, nine of the 17 missing populations are in very low abundance in the procyclic data, having 100 or fewer reads. Because the number of reads in the BS data is~3.5 fold less abundant, this could account for some of these regions of poor coverage. There are six regions that lack gRNA coverage in both data sets. These are found in CR3, MurfII, ND3 and ND7 (Table 5). Interestingly, three of these regions are close to the 3' end of their respective genes. Regions of weak overlap (ND9(238-242), ND9(609-612)) and regions without gRNA coverage (CR3(278-292), ND8(541-553)) that are unique to the procyclic transcriptome were also observed. Interestingly, the regions of poor procyclic coverage are found in CR3, ND8 and ND9, all transcripts that are preferentially edited in the BS form [5,13,14,34].
While the number of reads in the BS data is less abundant than the procyclic data in general, there are 87 gRNA populations with more identified reads than in the procyclic data set ( Table 6). These populations are found in every gene except CYb and MurfII, but most of the populations belong to one of the NADH dehydrogenase subunits, particularly, ND7 or ND9. Another trend worth noting is that the all of the genes with fully edited transcripts that are more abundant in the BS stage, (CR3, CR4, ND3, ND7, ND8, and ND9, with the exception of RPS12), have a higher percentage of classes that are more abundant in the BS (Table 6).
Gene specific gRNA characteristics ATPase 6. In the BS gRNA transcriptome, a total of 29 gRNA populations containing 86 different major sequence classes were identified that could guide the editing of A6 (Table 3;  S1A Table). One population was identified that was unique to the BS transcriptome (gA6(281- 329)). The gRNAs bordering this population share extensive overlap, so its absence in the procyclic transcriptome would not impact the editing process (S1A Table). We note that two of the gRNAs identified have single nucleotide mismatches. The bloodstream gA6(640-668) has an identified mismatch (C:U) that disrupts the complementarity of the gA6(640-668) population (S2A Fig). The second mismatched gRNA (gA6(520-533)) would introduce a frameshift. Excluding these two mismatched regions, there is complete coverage of ATPase 6. In contrast to the procyclic data, where the conventional initiating gRNA and the gRNA immediately following it were extremely rare, both of these gRNAs, gA6(773-822), previously identified as gA6-14 and gA6(745-789), were fairly abundant, each having hundreds of reads. The alternative initiating gRNA identified in the procyclic data set was not found. This finding is similar to that found in the T. brucei Lister strain 427 where authors identified alternative initiating gRNAs not found in the EATRO 164 procyclic gRNA transcriptome [41].
Another disparity between the two life cycle data sets was found when comparing the abundance of gRNAs implicated in a potential alternative edit. In the procyclic gRNA transcriptome, a gRNA was identified that would guide the insertion of 11 U-residues instead of the needed 12 between G555 and A568 [20]. This gRNA (pA6(557-593)) was 25-fold more abundant than the conventional gRNA (pA6(549-593)). In the BS data set however, more than 400 reads of the 12U gRNA were identified and only one read was found that would encode the alternative 11U edit. Surprisingly, while G555-A568 would be correctly edited (insertion of 12Us), the next editing site (A549-G555) is edited by bsA6(520-553), the gRNA that introduces the 1 nt frameshift. This frameshift would generate a predicted protein with nearly the same amino acid sequence as the procyclic 11U frameshift edit (two amino acid changes) (Fig 6). Cytochrome oxidase subunit III. Forty-two gRNA populations, guiding the editing of COIII were identified in the BS transcriptome; three more than in the procyclic data set (S1B Table). This disparity is caused by the presence of several unique populations. While the procyclic data set contained one unique population, the BS data contained four gRNA populations not previously identified. Of these four unique populations, three of them are required for full overlapping coverage in the bloodstream. They are not however, required for full coverage in the procyclic stage. These three unique gRNA populations all span relatively small regions of weak overlap (Fig 7, S2B Fig).
An alternative edit of COIII has been described, involving distinct edits at two adjacent sites that links the open reading frame of the edited 3' end to an ORF found in the 5' pre-edited sequence [42]. The previously identified alternative gRNA that can generate the needed editing events was not found in either the BS or procyclic transcriptomes.
C-rich regions 3 and 4. In the BS data set, nine populations and 34 major sequence classes were identified that direct the editing of the CR3 transcript. The coverage of edited CR3 is nearly complete in the bloodstream data set with only a one nt gap in coverage (editing site 233) (S2C Fig). This is in contrast to the procyclic transcriptome, where gRNAs that matched Table 6. Summary of populations found in both data sets that have more reads in the bloodstream data set than in the procyclic data set. the published sequence downstream of nt 196 were very rare (<10 copies) and no gRNAs were identified that could direct editing near the 3' end (nucleotides 275-292). A full consensus sequence for edited CR4 has only been found in BS T. brucei [35]. Using this sequence, 16 gRNA populations, containing 62 major sequence classes were identified in the BS transcriptome (S1D Table). In contrast to the procyclic data, where a full complement of gRNAs were identified, there are two gaps in the BS coverage (Table 5).
Cytochrome b and maxicircle unidentified reading frame II. RNA editing in the Cytochrome b (CYb) transcript is limited to the 5' end and two gRNA populations are sufficient to guide the small number of edits needed to render the CYb transcript functional. Both populations were observed in both data sets, with a total of 6 major classes. Interestingly, in both data sets, the initiating gRNA is significantly more abundant than the second gRNA, being approximately 30 fold more abundant in the procyclic data set and approximately 200 fold more abundant in the bloodstream data set (S1E Table). This is in contrast to most of the other transcripts where the initiating gRNAs are not very abundant. In addition, almost all of the CYb gRNA major classes have an A-run transcription start site, deviating from the common RYAYA initiation site pattern.
Editing in MurfII is also limited to the 5' end and requires only two gRNAs. One of these gRNAs (gMurfII ) is encoded on the maxicircle [43]. While this gRNA was observed in both data sets, the gRNAs identified were not identical. A purine-purine transition near the 3' end of the gRNA differentiates the procyclic and BS forms (S1F Table and S2F Fig). An initiating gRNA is needed to generate the 3' most edits that create the anchor sequence for gMurfII . This gRNA was not found in either data set, despite additional searches with reduced search stringency.
NADH dehydrogenase subunits 3, 7, 8, and 9. In the initial characterization of RNA editing in T. brucei EATRO 164, fully edited ND subunit transcripts were only found in RNA isolated from the BS stage. We were therefore surprised to find that fewer ND gRNA populations were identified in the BS transcriptome and a full complement of gRNAs was not identified for any of the ND subunits. The most complete coverage was found for ND3 and ND9. For ND3, the BS data set contained twelve populations and 41 major classes of gRNAs. One gap in coverage was observed, from 389-401. This region overlaps a region that has no clear consensus sequence, 375-395 [11]. ND9 is the only gene in this study whose bloodstream gRNA reads outnumber the procyclic gRNA reads identified (Table 2). Twenty-four bloodstream gRNA populations were identified with all edited nucleotides covered if gRNAs with a single base pair mismatch are taken into account (S2J Fig). While 45 gRNA populations were identified for ND7 in the BS data set, the gRNA coverage was significantly worse when compared to the identified procyclic gRNAs (Table 5). Despite the poor coverage, two unique gRNA populations (bs gRNA (772-816) and (1128-1182)) were identified (S2H Fig and S1H Table). ND8 also had poor gRNA coverage (Table 5). Interestingly, there are several populations in ND8 that contain highly abundant gRNA sequence classes with mismatches that shorten the complementarity of the gRNA. These usually have a single mismatch in the gRNA that would otherwise guide conventional editing (S1I Table).
Ribosomal protein S12. The BS data set contained 11 populations and 26 major sequence classes that direct editing of RPS12 (Table 3). While the procyclic transcriptome contained a full complement of gRNAs, the BS RPS12 data contains one gap in coverage and one region of poor overlap, (Table 5). This was surprising, as RPS12 has been shown to be essential in both life cycle stages [44,45]. The region of the mRNA with poor coverage has a high percentage of C residues and gRNAs covering this region may utilize C:A base pairs. If this is the case, some classes of gRNAs may not have been detected, as the program used to search for gRNAs does not allow for C:A base pairs (S2K Fig).

Discussion
This is the first comprehensive characterization of the mitochondrial gRNA transcriptome from the bloodstream stage of Trypanosoma brucei brucei. As we have previously characterized the insect stage gRNA transcriptome, these data allow the comparison of gRNA characteristics across the two main life cycle stages [20]. In the EATRO 164 BS gRNA transcriptome, gRNAs for every edited gene were identified. Interestingly, while the number of populations identified in this data set was only slightly lower than that reported in the procyclic data set, the total number of gRNA transcript reads identified was considerably lower despite the fact that multiple transcriptome libraries were combined. While this may be a reflection of the down regulation of mitochondrial transcription in the bloodstream stage (see Table 1), it is impossible to rule out technical problems in the generation and sequencing of the libraries. It has been previously reported that gRNA presence did not correlate with developmental RNA editing patterns in T. brucei and our data does not challenge this [18,19]. The data did however, show an interesting trend in the abundance of the initiating gRNAs as relates to their developmental editing patterns (Fig 1). It may be that the abundance of the initiating gRNAs is regulated in order to control editing of their target mRNAs. However, we cannot rule out the possibility that not all of the populations of initiating gRNAs were identified. For the pan-edited mRNAs, the initiating gRNAs direct sequence changes that are often downstream of the stop codon. Sequence changes in this region would be tolerated, as long as the anchor sequence for the next gRNA is maintained. This type of mutation was observed in the 3' end of ATPase 6 [2]. In addition, characterization of the initiating gRNAs in the Lister 427 T. brucei cell line identified several gRNAs that would direct an alternative editing pattern, suggesting a high tolerance for sequence changes near the mRNA 3' ends. [41].
As expected, general gRNA characteristics are conserved across the two life-cycle stages. Populations retain the general location of their anchors, there is relatively little shift in the location of populations, and the lengths of complementarity are very similar. We did observe that considerable nucleotide variations were found in the guiding regions of the gRNAs from the different life cycle strains of the EATRO 164 cells. This particular cell line dates back to 1960 when the BS form was originally acquired [39]. Procyclic cells were derived from the BS stock in 1979 and the two cell lines maintained separately since that date [39]. Mixed trypanosome genotypes are detected frequently in field isolates from both tsetse flies and mammals and it may be that separation into different culture conditions allowed different genotypes to predominate in each life cycle strain [46][47][48]. Because gRNAs utilize both canonical (Watson-Crick) as well as G:U base-pairing to direct the change in sequence, most transition mutations in the gRNA, would not lead to changes in the mRNA sequence and would not be selected against [49]. We do note however, that a very strong bias against A to G transitions is observed in the anchor regions of the gRNAs. This suggests that transition mutations in this region are not tolerated. This suggests that the editing machinery recognizes and selects for a conventional base-paired double helix in the initial gRNA/mRNA pairing. The ability to discriminate against G:U base-pairs in the initial interaction would greatly increase the accuracy of the gRNA targeting event. Considering the sequential nature of the overall editing process, this would be very advantageous.
Coverage Surprisingly, complete gRNA coverage was observed only for the pan-edited COIII and for CYb, where editing is limited to the 5' end. The identification of the CYb gRNAs was expected, as it has been previously reported that the gRNAs are present in both life cycle stages even though editing of CYb is limited to the procyclic stage [8,9]. The full coverage of COIII was also not surprising, as COIII was shown to be fully edited and equally abundant in both stages [17]. However, we expected to see complete coverage of ATPase 6 and RPS12 as both of these transcripts have been shown to be essential in both life cycle stages [3,44,45,50]. For ATPase 6, we did identify a total of 29 gRNA populations that do cover all of the editing sites. However, one of the gRNAs (bsA6(643-667)) has a single nucleotide mismatch (C:U) and one would introduce a frameshift (bsA6(520-553)). The C:U mismatch occurs near the middle of the gRNA, placing the C:U mismatch in a region that is unusually high in Gs and Cs (S2A Fig). It may be that the G:C basepairs immediately upstream of the mismatch stabilize the gRNA/ mRNA interaction, allowing it to be tolerated. The frameshift gRNA is also interesting, as it occurs just upstream (1 editing site) of another site where we had previously observed a frameshift sequence anomaly. Both frameshifts (the BS 4U and the Procyclic 11U) generate a predicted protein with nearly the same amino acid sequence. As the frameshifts occur downstream of the highly conserved amino acid region involved in proton translocation [16], it may be that this different carboxyl terminus is tolerated.
Near full coverage is also observed for RPS12. For this transcript, one BS identified gRNA (bsRPS12(96-121)) has an A-nt insertion that disrupts the gRNA complementarity. Surprisingly, the other mRNA transcript found with near complete coverage was ND9 (one gRNA has a single nt mismatch). All of the other mitochondrially encoded Complex I members did have substantial gaps in coverage. Currently, there is considerable debate on the necessity of Complex I subunits for either stage of the trypanosome life cycle. Studies using RNAi and knockout cell lines of nuclear-encoded members of Complex I have shown that the complex is unnecessary for survival in either life cycle stage [51,52]. However, the nuclearencoded Complex I member genes are maintained [29], and while we not did identify full coverage for the ND transcripts, a vast majority of the gRNAs were found in both life cycle stages.
This study used high-throughput sequencing to characterize the gRNA transcriptome during the bloodstream stage of the trypanosome life cycle. This work suggests that gRNAs are expressed during both life cycle stages, and that differential editing patterns observed for the different mitochondrial mRNA transcripts are not due to the presence or absence of gRNAs.