Learning a Weighted Sequence Model of the Nucleosome Core and Linker Yields More Accurate Predictions in Saccharomyces cerevisiae and Homo sapiens

doi:10.1371/journal.pcbi.1000834

Learning a Weighted Sequence Model of the Nucleosome Core and Linker Yields More Accurate Predictions in Saccharomyces cerevisiae and Homo sapiens

Figure 8

Classification performance of individual k-mers and subsets of k-mers.

Area under the ROC curve obtained using features associated with individual k-mers as well as certain subsets of k-mers. All represents the set of all k-mers of length 1, 2, 3. Tri represents the set of all trinucleotides, Di the set of all dinucleotides, and Mono the set of mono-nucleotides. The features are ordered in the graph according to the average performance on H. sapiens and S. cerevisiae. All subsets perform better than any individual k-mer, and the most discriminative individual k-mers are the mono-nucleotides A/T and G/C, followed by the dinucleotide AA/TT and the trinucleotide AAA/TTT. This analysis is based on the top-scoring 12,698 S. cerevisiae positions, and the top-scoring 209,101 H. sapiens positions.

doi: https://doi.org/10.1371/journal.pcbi.1000834.g008