Comparative Analysis of Functional Metagenomic Annotation and the Mappability of Short Reads
(a) Simulated sets of reads of length L are generated from curated and annotated reference genomes using a sliding window approach. The origin of each read is recorded, and reads are labeled to note whether they originated from genes associated with a KO (purple), from genes not associated with a KO (green), or from an intergenic region (gray). Each read is then annotated through a translated BLAST search against the KEGG database and the obtained annotation is compared to the annotation of the genome region from which the read was derived, to evaluate whether the correct gene and/or correct KO were recovered. (b) Evaluating the annotation of the S. pneumoniae genome. The inner ring represents the proportion of the genome annotated with KO genes, non-KO genes, and intergenic regions. The outer ring illustrates how reads originating from each such category were annotated, illustrating the accuracy of the annotations obtained by a BLAST-based search.