Table 1.
Patient demographic information.
Figure 1.
Classification modeling process.
A three-layer nested cross-validation approach was applied using both PLS-DA and SVM modeling methods to determine significant features capable of classifying children with ASD from TD children. The 179 features of the training set were analyzed using a leave-one-group-out cross-validation loop as described. The results from this cross-validation process were used to estimate model performance and create a robust feature VIP score index to rank the ASD vs TD classification importance of each of the 179 features. These feature ranks were used to evaluate the performance of the molecular signature using an independent validation set.
Figure 2.
The top 179 features were compared for rank between SVM and PLS modeling methods. The lowest rank scores represent the most important features.
Table 2.
A breakdown of the numbers of features resulting from filtering and annotation processes, based on molecular formula.
Figure 3.
Performance of the SVM and PLS models.
Average AUC and accuracy of the (a) SVM and (b) PLS models containing different numbers of features. The bar graphs show the number of optimal models which were derived from recursive feature elimination process that was included in the resampling process for the indicated number of features.
Figure 4.
ROC curve performance of the classification models from the training and validation sets.
The average of 100 iterations of the classifier for the best performing feature sets following recursive feature elimination comparing ASD vs. TD samples (Black and Grey Lines). The blue (PLS) and red (SVM) lines are ROC curves of the best performing validation feature subsets. Vertical bars represent the standard error of the mean.
Table 3.
Classifier performance metrics based on predictions on the independent 21-sample validation set, showing the feature sets with the highest accuracy.
Table 4.
Confirmed metabolites.