A Small RNA Encoded in the Rv2660c Locus of Mycobacterium tuberculosis Is Induced during Starvation and Infection

Enhanced transcription of the Rv2660c locus in response to starvation of Mycobacterium tuberculosis H37Rv encouraged addition of the predicted Rv2660c protein to an improved vaccine formulation. Using strand-specific RNA sequencing, we show that the up-regulated transcript is in fact a small RNA encoded on the opposite strand to the annotated Rv2660c. The transcript originates within a prophage and is expressed only in strains that carry PhiRv2. The small RNA contains both host and phage sequences and provides a useful biomarker to monitor bacterial starvation during infection and/or non-replicating persistence. Using different approaches we do not find any evidence of Rv2660c at the level of mRNA or protein. Further efforts to understand the mechanism by which Rv2660c improves efficacy of the H56 vaccine are likely to provide insights into the pathology and immunology of tuberculosis.


Introduction
The pathogenesis of Mycobacterium tuberculosis depends on the ability of the bacteria to adapt to a range of environmental conditions in the infected host, and characterisation of relevant in vivo phenotypes is crucial for the rational design of improved therapies and selection of antigens for use in vaccines and immunodiagnostics [1]. Inclusion of antigens that are preferentially expressed by bacterial populations that persist in a nonreplicating state may enhance the ability of vaccines to prevent reactivation disease, and effective protection of non-human primates by the recently described H56 vaccine provides an encouraging proof-of-concept for this strategy [2,3]. The H56 vaccine combines two dominant antigens expressed in exponential culture with a novel antigen, Rv2660c, originally identified on the basis of enhanced transcription in a starvation model of M. tuberculosis growth arrest [4]. In addition, strong induction of Rv2660c has been associated with hypoxia-induced non-replicating persistence and the enduring hypoxic response [5,6]. The recent application of strand-specific RNA sequencing (RNAseq) has uncovered an extensive repertoire of non-coding RNA in M. tuberculosis, including 39 and 59 untranslated regions, antisense transcripts and intergenic small RNAs (sRNAs) [7][8][9]. One of the novel sRNAs detected by RNAseq -ncRv12659 (originally referred to as MTS2048) [7,10] -overlaps with the locus annotated as encoding Rv2660c, and the aim of the present study was to characterise ncRv12659 and to determine its relationship to the Rv2660c vaccine antigen. Our findings strongly suggest that the starvation-induced transcriptional signal is due to increased expression of ncRv12659 arising from the plus strand and not, as previously thought increased expression of the hypothetical Rv2660c mRNA encoded on the minus strand.

Transcriptional mapping of Rv2660c locus
The Rv2660c mRNA was originally described as the most highly up-regulated transcript according to microarray analysis of M. tuberculosis H37Rv starved for 24 or 96 hours in PBS [4]. It was annotated as a questionable open reading frame located adjacent to the PhiRv2 prophage, which is integrated into the valU tRNA gene in many strains of M. tuberculosis [11,12]. Several of the phage genes, including Rv2659c encoding the phage integrase, were also up-regulated in the microarray study [4]. Recently we identified a number of sRNAs including MTS2048/ncRv12659, which is convergent and overlapping with the Rv2660c open reading frame [7]. In order to investigate the relationship between the two genes further we compared transcriptomes from M. tuberculosis H37Rv from exponential growth and after 24 hours of starvation in PBS. Figure 1 illustrates RNAseq profiling (visualized in the Artemis Genome Browser, [13]) of the two transcriptomes and confirms the prominent induction of the Rv2660c locus but demonstrates that transcription occurs on the forward rather than the reverse strand ( Figure 1A). In agreement with the recent annotation of non-coding RNA in M. tuberculosis we refer to this transcript as ncRv12659 [10]. In order to precisely determine the termini of the ncRv12659 transcript we mapped the 59 and 39 ends by RACE. A transcription start site (TSS) located at position 2980911, between the TSS of the PhiRv2 integrase (Rv2659c) and the right hand phage boundary marked by the attR duplication, was identified by 59 RLM-RACE [14] of ncRv12659. An identical start site was found by RNAseq-based TSS mapping [15], with a marked increase in peak height in starved cultures ( Figure 1B) [16]. The dominant 39 end of ncRv12659 was identified by 39 RACE to be at position 2981083, with additional 39 ends detected at positions 2981010, 2981011, 2981026, 2981047, 2981055 and 2981077, but we did not identify any canonical intrinsic terminators (i.e. a stem-loop followed by a poly-U stretch) in this region. The total length of 173 nucleotides established by RACE is in good agreement with the largest prominent transcript identified by Northern blot ( Figure 1C). The blot also revealed a series of smaller transcripts with a particularly prominent one around 120 nucleotides in stationary phase. Since we identified only one TSS and no apparent terminators for ncRv12659, we assume that the smaller transcripts and possibly also the main transcript are a result of post-transcriptional processing of a longer primary transcript.
The ncRv12659 transcript comprises 60 nucleotides of PhiRv2 sequence followed by 28 nucleotides duplicated from the 39 end of valU and a further 87 nucleotides of host sequence ( Figure 2). The forward sequence after the sRNA contains a stretch of 61 nucleotides that have an almost exact duplication at the attL end of PhiRv2 annotated as part of Rv2645. Alternative annotations have been proposed for this region. For example in Mycobacterium canettii it is annotated as an open reading frame on the forward strand with homology to a partial protein sequence annotated in Salmonella typhimurium (VBIMycCan278382_3386). We did not detect transcripts on the reverse strand corresponding to the proposed Rv2660c and the upstream Rv2661c mRNAs; in both cases, the transcription profile is dominated by reads mapping to the opposite (i.e. plus) strand.
To screen for expression of Rv2660c protein we performed mass spectrometric analysis of protein extracts from exponential and starved cultures of M. tuberculosis. For this purpose we used the targeting approach of selected reaction monitoring (SRM), which is the method of choice for the sensitive detection of predetermined low abundance proteins in complex samples by mass spectrometry [17]. Using SRM, more than 70% of annotated M. tuberculosis proteins, spanning the whole dynamic range of the proteome are detectable in unfractionated cell lysates from liquid cultures and the lower limit of detection has been estimated to be in general below 10 protein copies per cell [18,19]. In the present study, during a starvation time course over 96 hours we were unable to detect any of the three mass spectrometry-compatible tryptic peptides predicted for Rv2660c in spite of corresponding synthetic reference peptides being detectable ( Figure S1).
Combined transcript and protein data from exponential and starved cultures lead us to conclude that the dramatic upregulation observed in starved cultures of M. tuberculosis H37Rv reflects expression of an sRNA (ncRv12659) with little or no contribution from the putative Rv2660c mRNA.

Expression of ncRv12659 in clinical isolates
Localisation of the TSS of ncRv12659 within PhiRv2 suggested that the sRNA should only be expressed in strains that carry the phage, and raised the possibility that in the absence of a competing antisense RNA we might detect Rv2660c expression in PhiRv2negative strains. To test these hypotheses, we screened the RNAseq profiles of five clinical isolates [20] and of the PhiRv2negative Mycobacterium bovis BCG vaccine strain. N0072 and N0153 belong to M. tuberculosis Lineage 1 and lack PhiRv2. N0031, N0052 and N0145 all belong to Lineage 2 and are PhiRv2-positive (Table 1).
RNAseq profiles of the two PhiRv2-positive N0052 and N0145, shown in Fig. 3A and clearly demonstrate similar patterns, although the levels of ncRv12569 differ slightly between strains, but more importantly no detectable signal from the reverse strand Rv2660c ( Figure 3A). In addition, TSS mapping for M. tuberculosis N0145 identified the same primary transcript as in H37Rv, with a significant increase in stationary phase. The RNAseq profile of the PhiRv2-negative N0153 suggests that a minimal level of reads can be mapped to the region beyond the 39 of the mature valU ( Figure 3B). Northern blot analysis confirmed the presence of ncRv12659 only in PhiRv2-positive strains, and also confirmed the finding from the RNAseq data that ncRv12659 is more abundant in N0052 ( Figure 3C). The blots revealed a faint signal of approximately 300 nucleotides specific for the PhiRv2-negative strains, which we presume corresponds to the 39 end of the unprocessed tRNA transcript ( Figure 3B). More importantly however, we found no evidence of a reverse strand Rv2660c mRNA in any of the strains regardless of whether ncRv12659 was expressed or not.

Expression of ncRv12659 during infection
To characterise expression of ncRv12659 during intracellular growth we employed the mouse model used by Andersen and co-workers in development of the H56 vaccine [2]. Mice were infected with PhiRv2-positive M. tuberculosis H37Rv or PhiRv2negative M. tuberculosis N0072 for 31 days, and expression of M. tuberculosis RNA recovered from tissues was measured by qRT-PCR. In order to distinguish expression of the PhiRv2-derived ncRv12659 RNA from the 39 tRNA signal in the absence of PhiRv2, we used two amplicons. One (59 amplicon) had a forward primer within the PhiRv2-derived sequence, and hence this amplicon is present only in H37Rv (and other PhiRv2 positive strains). The second (39 amplicon) was located entirely downstream of the tRNA repeat and hence this amplicon represents an M. tuberculosis core sequence, present in both H37Rv and N0072 ( Figure 4). The maximum value obtained for the 59 amplicon in N0072 samples (3610 25 ) was set as the baseline, since this region is not present in PhiRv2-negative strains and any 59 signal from PhiRv2-negative strains should therefore be considered as noise. This method confirmed a 50fold starvation-induced increase in RNA levels for the 59 amplicon in M. tuberculosis H37Rv compared to exponential phase levels, and a six-fold increase during infection ( Figure 4). The 39 amplicon was detected at a level approximately two-fold lower than the 59 amplicon in H37Rv during exponential growth, with a marked reduction in the relative level in starvation and decreasing below baseline during infection ( Figure 4). Finally, we observed a very low level of the 39 amplicon from N0072 exponential phase samples and even less after starvation, implying that transcriptional readthrough from the tRNA operon is minimal and only the PhiRv2-derived promoter is induced by starvation. Moreover, since our qRT-PCR was not strand specific, the results for the 39 amplicon, which corresponds to the middle of the proposed Rv2660c sequence, also demonstrate that there is no detectable expression of Rv2660c mRNA in M. tuberculosis N0072 during mouse infection. A. Sequence analysis of total RNA in the Rv2660c locus. RNAseq data visualised with the Artemis genome browser [13] show mapping of reads from exponential (upper panel) and 24-hour starved cultures (lower panel). Reads mapping to the forward strand are shown in blue, and to the reverse strand in red. The valU insertion representing the boundary between PhiRv2 and host sequences is marked in yellow. The profile is dominated by ncRv12659, a forward strand transcript originating within PhiRv2. A proportion of reads continue beyond the 39 end of ncRv12659, as determined by 39 RACE. Reads have been normalised to total number of reads and adjusted to the same scale. B. Transcription start site (TSS) mapping. The forward ncRv12659 TSS maps to a location 62 nucleotides downstream of the reverse strand TSS for Rv2659c (PhiRv2 integrase). Promoter activity assessed by normalised read counts increases at both TSSs in response to starvation (lower panel). C. Northern blot analysis of ncRv21659. Expression of ncRv12659 in extracts from exponential (e) and stationary (s) phase cultures of M. tuberculosis H37Rv. 20 mg of total RNA was separated on a denaturing gel, transferred to a membrane and hybridised to a probe specific for the 39 end of ncRv12659. The blot shows two transcripts of approximately 175 nucleotides in size from exponential phase RNA (indicated by arrows), corresponding to the size determined by RACE (see text). In stationary phase a slightly larger transcript as well as a smaller of around 125 nucleotides is also visible (arrow). doi:10.1371/journal.pone.0080047.g001

Origin and function of ncRv12659
Comparison with other temperate bacteriophages suggests that the original function of the ncRv12659 promoter was to drive expression of genes in the circular chromosome of the lytic phage ( Figure 5). In the case of PhiRv2, this includes Rv2645 (a fusion of duplicated host sequence and a flap endonuclease-like domain), Rv2646 encoding a second integrase, and KorB-like repressor Rv2647. These genes are separated from the remaining inversely oriented PhiRv2 genes by an IS6110 insertion. PhiRv1, a second prophage present in M. tuberculosis H37Rv, has a similar organisation, with a single integrase and two repressor-like proteins; again with inward and outward promoters at the right hand end of the integrated lysogen ( Figure S2). The presence of host as well as phage sequences in the sRNA transcript driven by activation of the ncRv12659 promoter, and the fact that the sequence is highly conserved in the lysogen suggests that it could have a functional impact on the physiology of M. tuberculosis. To explore this possibility we made a construct in which expression of ncRv12659 was driven by its native promoter on a replicating plasmid (pMSC12659). The construct contains 131 basepairs upstream of the ncRv12659 TSS, including the promoter and the first 13 codons of Rv2659c as well as 170 basepairs downstream of the mapped 39 end of ncRv12659 to ensure that any putative intrinsic termination signals were included (basepairs 1-471 in Figure 2).
The plasmid was transformed into PhiRv2-positive M. tuberculosis H37Rv as well as the PhiRv2-negative M. tuberculosis N0072. Analysis by qRT-PCR revealed that expression of ncRv12659 in M. tuberculosis H37Rv increased ,1000-fold reaching approximately 10% of 16S rRNA levels, and furthermore that the level of ncRv12659 expression was similar in both strains. ( Figure S3). The overexpression of ncRv12659 resulted in a small but detectable reduction in the growth rate of both strains ( Figure S4). Microarray analysis of H37Rv 6 pMSC12659 was performed using M. tuberculosis gene expression arrays from Agilent Technol-   ogies. Using a minimum fold-change of 2 and a p-value,0.05 we observed differential expression of more than 100 probes, corresponding to 68 phage as well as host genes (Table S1a). We decided to focus on probes with a minimum fold change of four, which left 27 probes (Table S1b). Six of these probes corresponded to sequences contained in the overexpression construct and were therefore disregarded, leaving 21 probes representing 11 genes ( Table 2). Thirteen of the probes (fold change 6-113) mapped to a set of genomic loci containing partial duplications of a common core sequence ( Figure S5). The repeat loci have been linked to diverse hypothetical protein products in different genome annotations, though the frameshifts present in the nucleotide alignment suggest that coding capacity is not conserved between the different copies. Five probes associated with PhiRv2 indicated a 4 to 5-fold up-regulation of Rv2659c and Rv2658c, and these changes were verified by qRT-PCR to be 3.9 and 3.2, respectively ( Figure 6A). In spite of the reduced growth rate, only one gene, desA3 (Rv3229c) represented by three probes, was down-regulated more than the four-fold cut-off, with a probe average of 7-fold; an 8.2-fold down-regulation was verified by qRT-PCR ( Figure 6A). A parallel analysis of desA3 expression in M. tuberculosis N0072 6 pMSC12659 and M. bovis BCG 6 pMSC12659 revealed only a slight reduction (1.5-fold and 2-fold respectively), suggesting that the more pronounced effect in H37Rv may be influenced by the presence of PhiRv2 genes other than ncRv12659 ( Figure 6B).

Discussion
We have demonstrated by RNAseq and Northern blot that the starvation-induced transcriptional signal ascribed to Rv2660c mRNA is in fact associated with an sRNA encoded on the opposite strand of the DNA rather than the mRNA. A similar profile was recently reported by Sala and co-workers [9]. Examination of a panel of M. tuberculosis clinical isolates under a variety of culture conditions failed to show any evidence of an Rv2660c mRNA, and we were unable to detect Rv2660c peptides by highly sensitive targeted mass spectrometry.
Rosenkrands and co-workers investigated the M. tuberculosis proteome during long-term starvation and failed to identify Rv2660c [21], and neither Kelkar et al. nor Schubert et al. identified Rv2660c in their recent high coverage proteome mapping data [19,22]. Zheng et al. did report Rv2660c identification in M. bovis BCG but with only two out of tens of thousands of mass spectra [23]. Hence, it remains questionable if the Rv2660c protein exists.
The sequence of the proposed hypothetical Rv2660c protein has several unusual features. Most of the sequence is present in all strains of M. tuberculosis and in the precursor-like mycobacterial strains represented by M. canettii [24], but lacks primary sequence or structural homology with any annotated proteins outside of this group. The C-terminal region of Rv2660c corresponds to antisense translation of the 39 portion of a valine tRNA molecule. In PhiRv2-positive strains, part of the Rv2660c nucleotide sequence has an imperfect duplication within the bacteriophage protein Rv2645. In some genome annotations the N-terminal region of Rv2660c is extended to include part of the Rv2661c sequence; other annotations include an alternative open reading frame on the opposite strand.  The starvation-induced sRNA transcript, ncRv12659, originates within the PhiRv2 prophage, and ncRv12659 is found only in strains that have retained PhiRv2. It is likely that PhiRv2 infected an early ancestor of the M. tuberculosis complex, but has undergone progressive erosion with loss of tail genes and deletion from many extant clinical isolates. Our results are consistent with a model in which the ncRv12659 promoter would have had a primary role in gene expression in the free circular virion and is relatively repressed in the stable lysogen. Activation of the ncRv12659 promoter together with the Rv2659c integrase promoter in response to starvation may reflect the release of repression as part of a programme to initiate the lytic cycle of the phage. The decision between lysis and lysogeny has previously been shown to be tightly linked to the nutrient availability of the host in many phages e.g. [25][26][27], and may explain why the ncRv12659 promoter responds so dramatically to starvation.
As described for other sRNAs [7,10], ncRv12659 accumulates to high levels during M. tuberculosis infection and provides a potential biomarker for detection of cells that are starved for nutrients as well as oxygen, and which may represent future persisters [4][5][6]; in this case specifically for PhiRv2-positive strains. The observation that only the 59 portion of ncRv12659 is detected during infection is intriguing and suggests premature termination of transcription or internal processing followed by degradation of a less stable 39 portion of the transcript. It may be that the inclusion of part of the tRNA has a stabilising effect on the RNA.
The presence of host as well as phage sequences raised the possibility that expression of the sRNA transcript could affect the physiology of M. tuberculosis, and overexpression of ncRv12659 did result in impaired growth as well as perturbations in the transcription profile of M. tuberculosis H37Rv. More than fifty genes showed significant changes, the most highly up-regulated genes being PhiRv2 genes and a novel repeat locus. The observed induction of Rv2659c and massive increase in ncRv12659 level after transformation with pMSC12659 could result from sequestration of a repressor protein by provision of multiple copies of its binding site. We did not identify significant sequence similarities in the regions upstream of the TSSs of the up-regulated repeat loci. The mechanism by which over-expression of ncRv12659 leads to enhanced transcription of these sequences, and their potential coding (or non-coding) function remains to be determined. We conclude that the 59 end of ncRv12659 provides a useful marker for phenotypic analysis of M. tuberculosis during infection with PhiRv2-positive strains, but that the distinctive transcriptional upregulation of this locus in the starvation model is unlikely to play a direct role in the enhanced efficacy of the H56 vaccine against reactivation disease.
If Rv2660c represents a misannotation, how can we account for the beneficial effect of its addition to the H56 vaccine? There may be a fortuitous cross-reaction between the T cell response elicited by the recombinant Rv2660c protein and some as yet undefined antigen of M. tuberculosis. This would be consistent with the observed detection of T cell responses to Rv2660c in a panel of infected individuals [28]. Alternatively, but somewhat unlikely,  Table S1), repeat loci see Figure S5. doi:10.1371/journal.pone.0080047.t002 there may be some highly specific environmental but yet unidentified condition that the bacteria encounter in vivo under which expression of Rv2660c is in fact induced. In light of the successful H56 vaccination data, identification of such a condition would provide an important insight into the physiology of latent infection. Finally, it's possible that fusion of the Rv2660c peptide to the remaining vaccine constituents has some beneficial effect on the way in which are processed and presented to T cells. Further efforts to understand the mechanism by which Rv2660c improves efficacy of the H56 vaccine are likely to provide insights into the pathology and immunology of tuberculosis and possibly also provide a further understanding of why a promising vaccine candidate such as MV85A has failed [29,30].

Materials and Methods
Bacterial strains and plasmids E. coli DH5a was used for plasmid constructions and grown in LB broth or on LB agar. Mycobacterial strains included the vaccine strain Mycobacterium bovis BCG, the laboratory strain Mycobacterium tuberculosis H37Rv and the clinical isolates N0031, N0052, N0072, N0145, N0153 [31].

Growth of M. tuberculosis and M. bovis BCG
Cultures were grown in the standard Middlebrook 7H9 medium supplemented with glycerol (0.5%), Middlebrook ADC (10%) and Tween-80 (0.05%) in roller bottles at 2 rpm. Where required, kanamycin was added at 20 mg/ml. Exponential phase cultures were harvested at an OD 600 0.6 to 0.8; stationary phase cultures were harvested one week after OD 600 had reached 1.0.

Plasmid construction
The overexpression plasmid pMSC12659 was made by replacing an XbaI-HindIII promoter fragment of pKA303 [7] with a 483 basepair XbaI-HindIII fragment of the region around ncRv12659 including 131 basepairs upstream of the ncRv12659 TSS ( Figure 2).

RNA isolation
RNA isolation was done as described previously [32]. Briefly, cultures were harvested with rapid cooling by the addition of ice and pelleted at 10,000 rpm for 10 minutes. RNA was then isolated from the pellet using the FastRNA Pro Blue Kit from MP Biomedicals following the manufacturer's instructions. To isolate RNA from bacteria grown in mice, the lung homogenates were spun at 13,000 rpm for 5 minutes to collect the bacteria. These were resuspended in 1 ml Trizol (Invitrogen) with 150 micron glass beads and the samples disrupted in a fast Prep (MPBio) at a setting of 6.0 for 40 secs. The RNA was extracted according to manufacturer's guidelines. RNA concentration was measured by nanodrop and RNA integrity measured by the 2100 Bioanalyzer using a Nano chip.

Preparation of starvation samples for RNAseq and proteomics
M. tuberculosis H37Rv was grown in Middlebrook 7H9 supplemented with 0.4% glycerol, 0.085% NaCl, 0.5% BSA and 0.05% Tyloxapol in roller bottle culture (2 rpm at 37uC). For nutrient-starvation experiments, exponentially growing bacteria were harvested as previously described [33] but using PBS supplemented with 0.025% Tyloxapol. RNA and protein was isolated from triplicate PBS-washed and 24 hours starved cultures.

Northern blotting
Northern blotting was performed as previously described [32] using RNA marker low (Abnova) and 20 mg of RNA for each sample. Membranes were incubated with a probe constructed from the oligo template 2048nrtLONG 59-gacctgcggtttcatcca-caaaatctgggctgcgtgaactaaatgtCCTGTCTC-39 in Ultrahyb (Ambion) at 68uC.

Quantitative RT-PCR
Total RNA was treated with Turbo DNase (Ambion) until DNA free. cDNA was synthesized using Superscript III (Invitrogen) and random hexamers. Primers were designed using the Applied Biosystems software Primer Express, and sequences are listed in Table S2. Each 20 ml qRT-PCR reaction, contained 16 SYBRgreen (Applied Biosystems), 900 nm each primer and 5 ml of template cDNA. Absolute quantitation was perfomed and all genes were normalised to 16S expression.

Whole transcriptome RNA sequencing
Isolation of RNA was performed as described above. All RNA samples were treated with Turbo DNase free (Ambion) until residue DNA contamination removed. Concentration and quality control of RNA samples was measured by Nanodrop (ND-1000, Labtech) and Agilent RNA chip (2100 Bioanalyser). Construction of strand-specific cDNA libraries from 2-3 mg total RNA was generated using the Illumina directional mRNASeq protocol (Part # 15018460 Rev. A); to capture all RNA species polyA-tail and size selection was omitted. Single-end read sequencing was performed on Illumina Genome Analyser and HiSeq platforms, using a single flow cell lane per library.

Transcription Start Site (TSS) RNA sequencing
Strand-specific cDNA libraries for TSS mapping were made by Vertis Technologies AG, Germany (http://www.vertisbiotech. com/). In order to enrich for 59 ends of primary transcripts, RNA was fragmented with ultrasound (4 pulses of 30 s at 4uC), treated with polynucleotide kinase (PNK) and then incubated with Terminator exonuclease (TEX, Epicentre), which specifically degrades RNA species which carry a 59 monophosphate. The exonuclease-resistant RNA species (primary transcripts with 59 PPP) were used for the construction of strand-specific cDNA libraries suitable for Illumina sequencing.

RNAseq mapping
Raw reads were first filtered to discard low quality reads. Poor quality read bases were trimmed using the SolexaQA package [34]; default parameters were used, trimming bases with confidences p.0.05, and removing reads,25 bases. Reference based assembly using the reference genome H37Rv [EM-BL:AL123456] was performed with BWA. Full data sets and accession numbers for transcriptomes are described elsewhere [16,20].

TSS mapping
Custom perl scripts were written for TSS calling. Briefly, for detecting candidate TSS, the increment in reads from one genome position to the next consecutive base was calculated for all genomic positions, selecting all genomic positions with an increment significantly above the average background coverage as candidates. Automated annotation of the putative TSS detected according to genomic distribution was performed as described by [15]. Full data sets and accession numbers for TSS mappings are described elsewhere [16,20].

Microarray Analysis
Whole genome M. tuberculosis microarray slides were purchased from Agilent Technologies through the Bacterial Microarray Group at St. George's (BmG@S), University of London. For cDNA synthesis 2 mg of vector control and over expression RNA, isolated from exponential cultures at OD 600 nm of 0.6 was used. The cDNA was labelled individually with both Cy-3 and Cy-5 dyes (GE Healthcare) using Superscript III reverse transcriptase (Invitrogen). Dye swaps were performed and the cDNA hybridized to an 8 Chamber Agilent slide at 65uC for 16 hours before washing the slide with Oligo aCGH Wash Buffer 1 (Agilent) for 5 minutes at room temperature and Oligo aCGH Wash Buffer 2 (Agilent) for 1 minute at 37uC. Slides were stabilized using Agilent's Stabilisation and Drying Solution according to manufacturer's instructions.
Slides were scanned at 5 microns using an Agilent Technologies Microarray Scanner at BmG@S. Txt files created by the Agilent scanner were analysed using Genespring 12.0 filtering on flags and expression. T-test against zero was performed and p-value selected as p,0.05, correcting for multiple comparisons using Benjamini-Hochberg. The array design is available in BmG@Sbase (Accession No. A-BUGS-41; http://bugs.sgul.ac.uk/A-BUGS-41) and also ArrayExpress (Accession No. A-BUGS-41). Fully annotated microarray data have been deposited in BmG@Sbase (accession number E-BUGS-156; http://bugs.sgul.ac.uk/E-BUGS-156) and also ArrayExpress (accession number E-BUGS-156).

Proteomics
Bacterial cell pellets were dissolved in lysis buffer containing 8 M urea and 0.1% RapiGest (Waters) in 0.1 M ammonium bicarbonate buffer. The cell suspension was thoroughly vortexed and incubated at room temperature for 10 min. Subsequently, cells were disrupted by ribolysing the samples at a setting of 6.5 for 30 secs at 4uC using 150 mm glass beads (SIGMA). Lysates were then centrifuged at 13,000 rpm for 5 minutes. The clarified lysate was filtered using 0.2 mm Millipore tubes (UFC30GV25). Protein concentration was determined using a BCA assay according to manufacturer's protocol (Thermo Fisher Scientific). Protein disulfide bonds were reduced by adding 5 mM tris(2-carboxyethyl)phosphine (TCEP) and incubating for 30 min at 37uC. Next, the free cysteine residues were alkylated by adding 10 mM iodoacetamide and incubating for 30 min in the dark at room temperature. Excessive iodoacteamide was captured by addition of 12.5 M N-acetyl cysteine and incubation for 10 min at room temperature. Extracted protein samples were diluted at a ratio of 1:5 with 0.05 M ammonium bicarbonate buffer to reach a urea concentration of ,2 M. Sequencing-grade modified trypsin (Promega) was added at a ratio of 1:100 enzyme:substrate (weight/weight) and incubated for over night at 37uC with gentle shaking at 300 rpm. To stop the tryptic digest and to precipitate RapiGest the pH was lowered to 2 using 50% trifluoro acetic acid (TFA) followed by an incubation for 30 min at 37uC with shaking at 500 rpm. Water-immiscible degradation products of RapiGest were pelleted by centrifugation at 16,000 g for 10 min. The cleared peptide solution was desalted with C18 reversed-phase columns (Waters). Prior to use, the C18 columns were activated with 100% acetonitrile (ACN), followed by equilibration with 2% ACN/0.1% TFA. After loading the sample, the columns were washed four times with 2% ACN/0.1% TFA. Finally, peptides were eluted with 50% ACN/0.1% TFA, dried under vacuum, and re-solubilised in 2% ACN/0.1% FA to a final concentration of 1.0 mg/ml.
For each of the 3 mass spectrometry-suitable tryptic peptides of Rv2660c a synthetic peptide was purchased in unpurified form (JPT Peptide Technologies), re-solubilised in 180 ml of 20% ACN/ 0.1% FA and spiked into the samples with a dilution of 1:100 (v/v) as positive control. Additionally, 11 retention time peptides (Biognosys) were added to each sample. Peptides were separated by liquid chromatography on a fused silica microcapillary column (15 cm675 mm) packed in-house with C18 resin (Magic C18 AQ 5 mm diameter, 200 Å pore size, Michrom BioResources) with a linear gradient from 98% solvent A (2% ACN/0.1% FA) and 2% solvent B (98% ACN/0.1% FA) to 35% solvent B over 35 min at a flow rate of 300 nl/min. Rv2660c peptides were measured in unscheduled selected reaction monitoring (SRM) acquisition mode on a 5500 QTRAP mass spectrometer (AB Sciex) equipped with a nanoelectrospray ion source. The optimal 5 SRM transitions per peptide precursor, as well as the chromatographic retention time, were obtained from the Mtb Proteome Library [19]. The mass spectrometer was operated in positive mode using electrospray ionisation. The SRM transitions were acquired with a mass window of 0.7 half-maximum peak width (unit resolution) in Q1 and Q3, a cycle time of ,2 s and a dwell time of 20 ms. Collision energies were calculated as follows: CE = 0.044 * (m/z)+5.5 and CE = 0.051 * (m/z)+0.5 for 2+ and 3+ charged precursor ions, respectively. Data were analysed manually using the software Skyline. The SRM data can be viewed in and downloaded from Panorama: https://daily.panoramaweb.org/labkey/project/ Aebersold/schubert/2013_Houghton_Rv2660c/begin.view?

Supporting Information
Figure S1 SRM analysis of tryptic peptides from Rv2660c. SRM traces over 96 hours of a starvation experiment are shown for three tryptic peptides from Rv2660c and a peptide derived from Rv3457c serving as a positive control. The first column shows the SRM signals of the synthetic reference peptides spiked into the first time point. The other three columns show the SRM signals in samples without reference peptides spiked in. No signal for the targeted peptides can be detected, neither by zooming into the expected regions (not shown). The positive control peptide in the last row shows that the sensitivity as well as chromatographic retention times are highly reproducible within the different samples. (TIF) Figure S2 PhiRv1 prophage. The PhiRv1 genome has a structural organisation similar to PhiRv2, with adjacent inward and outward TSSs (shown as an Artemis trace). PhiRv1 encodes a single integrase (Rv1586c) and two predicted proteins with structural homology to transcriptional repressors (Rv1574, Rv1575) as well as a conserved putative repressor binding site, shown in red. (TIF) Figure S3 Over-expression of ncRv12659. The diagram shows the level of (over)expression of ncRv12659 measured by qRT-PCR and normalised to 16S levels in all three backgrounds used. Each bar represents the mean and standard deviation of three biological replicates. (TIF) Figure S4 Growth of M. tuberculosis upon ncRv12659 overexpression. The curves illustrate that both strains of M. tuberculosis had a significant growth defect when expressing high amounts of ncRv12659. (TIF) Figure S5 Sequences of M. tuberculosis repeat loci. Sequence alignments of the repeat loci with homology to Rv0257. The diagram illustrates the sequences of the individual repeats, their location and the genes or gene regions they are associated with. Cyan highlights probes mapping in forward orientation with respect to Rv0257 orientation and magenta highlights probes mapping in the antisense orientation. (TIF) Table S1 Table shows Genespring output of probes with a minimum fold-change in expression of at least two, and their corresponding gene annotations (A). T-test against zero was performed and p-value selected as p,0.05, correcting for multiple comparisons using Benjamini-Hochberg. (B) Table shows same data as Table S1 but with a cut-off value of four. (XLSX)