Table 1.
Number of classes per branch in the OntoBiotope ontology.
Table 2.
Omnicrobe sources, data volumes, types, and extraction dates in the May 2022 version.
Fig 1.
Text-mining process.
Fig 2.
Information system flowchart.
Fig 3.
Republished from http://omnicrobe.migale.inrae.fr under a CC BY license, with permission from INRAE MaIAGE, original copyright 2022.
Table 3.
Number of distinct relationships extracted per source in the May 2022 version.
Fig 4.
Distribution of taxa ranks in Lives_In relations in Omnicrobe.
“Strain” ranks comprise strains and isolates. “Species and subspecies” ranks include species and ranks below species and above strain (e.g., subspecies, varieties, morph). “Genus and subgenus” ranks include genus and ranks below genus and above species (e.g., subgenus, section, series). “Family and subfamily” ranks consist of family and ranks below family and above genus (e.g., subfamily, tribe). “Higher ranks” include all ranks above the family (e.g. order, class, phylum, kingdom). The height of the bars is proportional to the number of Lives_In relations in Omnicrobe.
Fig 5.
Distribution of microbe taxa in Lives_In relations extracted from PubMed in Omnicrobe.
The taxa represented in this chart are taxon roots selected as microorganisms in Omnicrobe (see section Ontologies and taxonomies). The arc is proportional to the number of Lives_In relations that involve the taxon or any descendant. “Others” include taxa that account for less than 1% of relations: Archae, Chlamydomonadales, Chlorella, Choanoflagellida, Cryptophyta, Desmidiales, Diplomonadida, Glaucocystophyceae, Haptophyta, Ichthyosporea, Oxymonadida, Parabasalia, Prototheca, Retortamonadidae, Rhizaria.
Fig 6.
Distribution of habitats in Lives_In relations extracted from PubMed.
This chart represents the habitats at the four highest levels in the OntoBiotope ontology. The arc is proportional to the number of Lives_In relations extracted from PubMed that involve the habitat or any descendant in OntoBiotope. Only habitats with more than 20,000 occurrences are shown for readibility.
Fig 7.
Proportion of taxa-habitat relations in each source, which are also extracted from PubMed.
The height of the bar represents the proportion of relations per source that were also extracted from PubMed. For instance, only 10% of relations in GenBank were also extracted from PubMed (the same taxon-habitat pair), leaving 90% of relations exclusive to GenBank.
Fig 8.
Frequency of habitats and number of different taxa to which they are linked.
The green line (left scale) represents the number of Lives_In relations extracted from PubMed that involves each of the 100 most frequent habitats. The brown line (right scale) represents the number of distinct taxa to which each habitat is linked with Lives_In relations extracted from PubMed.
Fig 9.
Correlation between temperature tropism phenotypes in Omnicrobe.
Each box represents the intersection between the sets of taxa to which the two phenotypes are linked with Exhibits relations in Omnicrobe. The color intensity indicates the Jaccard index between the sets of taxa.
Fig 10.
An example of complex embedded queries.
These queries are used to retrieve mesophilic or thermophilic bacteria present in soy milk and capable of acidification, and with a qualified presumption of safety.