A k-mer-based method for the identification of phenotype-associated genomic biomarkers and predicting phenotypes of sequenced bacteria

doi:10.1371/journal.pcbi.1006434

Fig 1.

Schematic presentation of PhenotypeSeeker workflow.

Panel A shows the 'PhenotypeSeeker modeling' steps, which generate the phenotype prediction model based on the input genomes and their phenotype values. Panel B shows the 'PhenotypeSeeker prediction' steps, which use the previously generated model to predict the phenotypes for input genomes.

More »

Expand

Fig 2.

The influence of k-mer length on the CPU time and total RAM usage of PhenotypeSeeker (bars, left axis) and on the number of different k-mers present in the genomes (line, right axis).

More »

Expand

Fig 3.

The positions of ciprofloxacin-resistant P. aeruginosa strains on cladogram.

The MIC values (mg/l) are marked to the external nodes with corresponding strain names. Strains with MIC > 0.5 mg/l are highlighted with yellow to denote ciprofloxacin resistance according to EUCAST breakpoints [16]. Strains with detected mutations in QRDR of gyrA and parC are marked with the color code on the perimeter of the cladogram.

More »

Expand

Fig 4.

Virulence genes in corresponding clusters and wzi included in the PhenotypeSeeker prediction model in K. pneumoniae strains (13-mers, weighted, max. 10 000 k-mers for the regression model).

Each row is one strain, and each column represents one protein coding gene. Blue cells represent 13-mers in the model for the corresponding gene and a strain. Genes in colibactin, aerobactin and yersiniabactin clusters show the most differentiating pattern between carrier and invasive/infectious strains.

More »

Expand

Table 1.

Model’s F1-measure and running time.

The results with 13-mers and weighting are shown. The maximum number of 13-mers selected for the regression model was 1000. In cases where sequencing reads were used as the input, a minimum frequency of 5 for a 13-mer was required to reduce the influence of sequencing errors.

More »

Expand

Table 2.

PhenotypeSeeker comparison to Kover and SEER using P. aeruginosa and C. difficile data.

PhenotypeSeeker with the weighting option and maximum 1000 k-mers for the regression model was used.

More »

Expand