INDIGO – INtegrated Data Warehouse of MIcrobial GenOmes with Examples from the Red Sea Extremophiles

doi:10.1371/journal.pone.0082210

Figure 1.

Workflow of annotation process and data warehousing.

Here, the section marked (A) shows steps in the annotation process. Section (B) shows a PERL based conversion of annotations into an XML schema - validated using the class attributes and data types defined in the genomic model, and finally, section (C) shows the process of data warehouse development steps.

More »

Expand

Figure 2.

Annotation comparison for E. coli O104 (TY2482) among AAMG pipeline, BG7 and reference annotation set from Broad Institute.

Regarding the CDS annotation AAMG ranks second (with only 2 CDS region less annotated than BG7), while in annotation of orphan (hypothetical) CDS products (the less the better) and in annotation of functional (non-hypothetical) CDS products (the more the better) AAMG performs the best.

More »

Expand

Figure 3.

A) Keyword and B) Query builder search interface to INDIGO.

The keyword search interface shows an example of the search for “benzoate degradation”. Results are categorized on the left side of the resulting page, showing the number of hits found for genes, domains, pathways, etc. These results are further categorized into hits per genome for different organisms. Clicking on any of these categories shows filtered results. The query builder interface has an option to include or constrains an annotation class attribute, e.g. pathway name is constrained for “benzoate degradation”, while the organism attribute ‘short name’ is constrained to “SSPSH”. The annotation feature class attributes to be included in the result list here are gene db identifier, symbol, organism’s short name and pathway name. User can select any of the available annotation class attributes making it possible to integrate annotation from several different sources. Results of constrained query builder search are shown as a list. There are summary and filter options on the list page that allow a user to further analyze these results.

More »

Expand

Figure 4.

Region search interface.

This figure shows features (genes) for a region using coordinates (Contig3:198625-229704) from organism Haloplasma contractile (HLPCO). This region shows the cell Division and Cell Wall (DCW) biosynthesis gene cluster. An integrated genome browser view available via Region search results page, shows here the arrangement of genes in this region of the contig from HLPCO . The table below this section shows genome region, data export options, basic details of the feature (genes), type of features and their location on the genome. The create list by feature link saves this gene list in the data warehouse for further analysis. This list stays permanently if the user is logged in.

More »

Expand

Figure 5.

A) Gene Ontology, B) Protein Domain and C) Pathway enrichment analysis.

The figure shows a snapshot obtained in case when a term “cell cycle” was searched through the keyword search option and resulting genes were saved in a list that shows enrichment of GO, protein domain and pathways in comparison to the rest of the data in INDIGO. The number of hits shown for reach category can be saved as lists for further analysis.

More »

Expand

Figure 6.

Benzoate degradation in Salinisphaera shabanensis.

The genes from Salinisphaera shabanesis associated with Benzoate degradation pathway by INDIGO are shown in Red. INDIGO developed a functionality, available for all pathways present in INDIGO, that generates a specific URL to automatically display KEGG Orthologs from INDIGO on to pathway diagrams at KEGG webserver.

More »

Expand