Figure 1.
The blue-highlight underneath represents the region that passes the criteria according to the parameters and this region will be considered as a candidate to be a primer.
Figure 2.
Classification results with varied parameters.
A) The KNN classifiers were tested by varying number of neighbors, k from 1 to 7. The plot shows average accuracy for each k. k = 1 and k = 2 resulted in the best performance. B) PCA-LDA classification result with varied number of eigenvectors. Our PCA-LDA classifiers were tested for dimensionality reduction varied from one through seven different eigenvectors. The plot shows the highest accuracy when using six eigenvectors.
Figure 3.
Illustration of the ensemble binary classifiers.
Each classifier would be used to differentiate two classes and the score will be count for each serotype. In a SVM classifier, each class consists of 9 melt curves from 9 different conditions. The result will be based on the serotype that returns the highest score.
Table 1.
List of target DNA sequences.
Table 2.
List of 7 primer pairs used to differentiate 92 serotypes of S. pneumonia.
Figure 4.
Predicted melt curves of serotype 1 with the first primer set across 9 different conditions.
The predicted melt curve were generated using uMelt with 9 different conditions, which are all combinations between [Na+ K+]: 47 mM, 50 mM, and 53 mM and [Mg2+]: 1.4 mM, 1.5 mM, and 1.6 mM.
Figure 5.
Accuracy of different classifiers under different conditions.
Horizontal axis shows the different Na+, K+ and Mg2+ concentrations respectively that were used to generate the predict curves. Vertical axis shows accuracy in %age. Different curves labeled with different legends represent the performance of different classifiers.
Table 3.
Average accuracy of the classifier under different Na+, K+ and Mg2+ concentrations.
Figure 6.
Experimental melt curves from six different number of ‘CG’ sites DNA sequences.
Melt curves of six synthetic DNA sequences from two duplicate experiments from different days. Different colors represent different sequences as legend. The fully methylated sequences represented in dark blue color with 10 ‘CG’ sites and then two ‘CG’ sites were changed to ‘TG’ to be the next target of 8 ‘CG’ sites and so on until all ‘CG’ sites were changed to ‘TG’ as 0 ‘CG’ sites (non-methylated) represented in light blue.