Table 1.
Demographics of the patients from this study.
Fig 1.
The general workflow of classifying FAIMS data into diseased or non-diseased classes.
The steps that were explored are indicated as dark blue boxes. Variations or specification of some steps are displayed at the sides. The order in which the steps and approaches were investigated differs from the order shown in the diagram. Consult the main text for a description of the order. Briefly, the pipeline was compared when using the data of different sample “runs” either individually or in ensembles. Different forms of discrete wavelet transforms (DWT) were considered, as well as a feature exclusion step based on the feature variance. Within the cross–validation cycle, we evaluated three different feature selection methods (filter, wrapper and embedded), as well as a post–filter selection principal component analysis (PCA) step and the inclusion of the demographic data as features. Finally, we also explored ensemble steps at the classifier model probability level. See main text for details and the order in which the pipeline was explored.
Fig 2.
The recommended pipeline for classifying FAIMS data into diseased or non-diseased classes resulting from this study.
We found that “run” 2 data with a 2D wavelet transform were the better performing steps prior to the feature selection. The filter method with an nKeep parameter value of 2 perform best and with minimal algorithm run time. The addition of the demographic data as features to the wavelet transform FAIMS data resulted in a higher AUC score, although it was not found to be a statistically significant finding. However, these data might prove informative in a larger-scale pilot analysis. Overall, no classifier model was found to out–compete the others and we therefore suggest to use all five, until further research determines a “clear winner”. See main text for details and discussion about our findings.
Fig 3.
(a) Heat map of FAIMS data for a diabetic patient. (b) Linearised data without wavelet transform. (c) Data with one–dimensional (1D) discrete wavelet transform (DWT). (d-f) show the equivalent plots for a member of the control group (volunteer).
Table 2.
Model performance comparison with the use of different runs.
Table 3.
Model performance comparison of use of raw FAIMS data and wavelet-transformed FAIMS data.
Table 4.
Model performance comparison using different of 2D wavelet transforms.
Fig 4.
Classification model performance for each model across a range of nKeep values.
Error bars show the 95% confidence intervals. Neural Network cannot be used with more than 11 features.
Table 5.
Model performance comparison of PCA implementation.
Table 6.
Feature selection method comparisons.
Table 7.
Feature selection method comparison.
Table 8.
Model performance comparison run subtraction.
Table 9.
Model performance comparison- noise reduction approaches.
Table 10.
Model performance comparison when using the demographic (demo) variables as features or when using these in addition to the two FAIMS features selected by the filter method.