Figure 1.
Outline of the bioinformatic pipeline.
(a) Construction of the effective target genome (ETG) in terms both of sequences and coordinates. (b) Construction of the two strand specific reads maps. (c) Construction of two strand specific conservation maps. (d) Combination of reads and conservation map to allow for the identification of putative sRNA encoding regions. (e) annotations of putative sRNA to assess their reliability.
Figure 2.
Reads map (blue curve) is obtained by assembling together all reads (sequences in red) mapping uniquely and completely within the same IGR or AS region (sequence in black). The BioPerl procedure implemented merges NGS mappers output and T_IGRAScoord files.
Figure 3.
For each IGR (sequence in black), reads (blue curve) and conservation (green curve) maps are superimposed. First Type A candidates (highlighted in blue) are identified and extracted by testing length constrains (conditions I and II) and reads coverage above ExprT1 (dotted blue line). On the remaining portions of IGRs, Type B candidates (highlight in yellow) are identified and extracted by testing length constrains (conditions I and II) and contemporaneously both reads coverage above ExprT2 and conservation depth above ConsT2 (dot and dashed yellow lines). Finally, Type C candidate (highlighted in green) are identified in the remaining IGRs on the basis of high sequence conservation (above ConsT1 threshold reported as dotted green line).
Table 1.
Complete list of weights wj for conservation map calculation.
Table 2.
Summary of reads coverage and conservation depth empirical distributions.
Table 3.
Candidate classification based on candidate definition provided in 2.4.
Table 4.
Comparison with Arnvig, et al. [10] annotated sRNA.
Table 5.
Comparison with DiChiara, et al. [11] sRNAs annotated in Mycobacterium bovis BCG.