Table 1.
Experimental details for collecting plant electrophysiological response to three different stimuli.
Fig 1.
Schematic diagram of the plant signal data analysis pipeline with three basic steps.
Table 2.
Data blocks (consisting of 1024 samples) belonging to each chemical stimulus.
Fig 2.
Outlier detection and removal effects on the percentage of the data retained.
Fraction of outliers n = 6 has been selected to retain data without significant outliers.
Fig 3.
Heatmap of correlations amongst the 15 features after outlier detection.
This shows the strong positive and negative correlations amongst the statistical features.
Fig 4.
Scree plot showing the cumulative explained variance in PCA to select the number of PCs for the multiclass classification.
Table 3.
Hyperparameters used in Scikit-learn package in Python [56], including both the default and customized values yielding robust classification on both the 15D and 7D feature space.
Fig 5.
Average normalized confusion matrices over 1000 Monte Carlo draws from the majority class using the original feature and various classifiers in the original 15D feature space.
Mean confusion matrices are calculated from the held out 50% test data under each draw of the 1000 Monte Carlo under-sampling with different classifiers: (a) decision tree, (b) k-NN, (c) QDA, (d) Random forest (entropy) (e) Random forest (gini), (f) SVM (linear kernel) (g) SVM (RBF kernel), (h) MLP classifier, (i) Gaussian Naïve Bayes, (j) Adaptive boosting algorithm.
Fig 6.
Average normalized confusion matrices over 1000 Monte Carlo draws from the majority class using the original feature and various classifiers in the reduced 7D feature space using PCA.
Mean confusion matrices are calculated form the held out 50% test data under each draw of the 1000 Monte Carlo under-sampling with different classifiers: (a) decision tree, (b) k-NN, (c) QDA, (d) Random forest (entropy) (e) Random forest (gini), (f) SVM (linear kernel) (g) SVM (RBF kernel), (h) MLP classifier, (i) Gaussian Naïve Bayes, (j) Adaptive boosting algorithm.
Fig 7.
Comparison of the distributions of the classification accuracies derived from the ensemble of confusion matrices over 1000 Monte Carlo under-sampling runs of the majority class in the original 15D and reduced 7D feature space.
Fig 8.
Normalized histograms of the three different chemical stimuli (sulfuric acid, ozone, and salt) for all 15 statistical features used.
Fig 9.
Normalized histograms of the two different plant species (tomato and cabbage) for all 15 statistical features used.
Fig 10.
Reduced dimensional t-SNE plot of the three different chemical stimuli (sulfuric acid, ozone, and salt).
Fig 11.
Reduced dimensional t-SNE plot of the two plant species tomato and cabbage.
Fig 12.
Comparison of the distributions of the balanced accuracies derived from the ensemble of confusion matrices over 1000 Monte Carlo under-sampling runs of the majority class in the original 15D and reduced 7D feature space with 1D KDE fit and normal fit to compare the mean difference.
Fig 13.
Comparison of the distributions of the F1 score derived from the ensemble of confusion matrices over 1000 Monte Carlo under-sampling runs of the majority class in the original 15D and reduced 7D feature space with 1D KDE fit and normal fit to compare the mean difference.
Fig 14.
Comparison of the distributions of the Matthews correlation coefficient derived from the ensemble of confusion matrices over 1000 Monte Carlo under-sampling runs of the majority class in the original 15D and reduced 7D feature space with 1D KDE fit and normal fit to compare the mean difference.
Fig 15.
Scatterplots between the two performance measures–F1 score vs. MCC for the original 15D and 7D feature-based classification with different classifier families.
Fig 16.
Scatterplots between the two performance measures–MCC vs. balanced accuracy for the original 15D and 7D feature-based classification with different classifier families.
Fig 17.
Scatterplots between the two performance measures–F1 score vs. balanced accuracy for the original 15D and 7D feature-based classification with different classifier families.
Table 4.
Repeated measure MANOVA table combining all three performance measures of all 10 classifiers.
Fig 18.
Groupwise scatter diagrams (off diagonals) and the 1D marginal distributions of the classification performance measures (along the principal diagonal), combining 1000 Monte Carlo under-sampling runs for all 10 classifiers.
Blue dots represent samples from the original 15D classification, and the red crosses represent the reduced 7D classification results.
Table 5.
Groupwise repeated measure MANOVA table combining all three performance measures of all 10 classifiers in the original 15D feature space vs. the 7D reduced feature space.
Table 6.
Comparison of contemporary chemical stress classification methods from plant electrophysiological data.