Machine learning algorithm validation with a limited sample size

Fig 8

Illustrative examples of why models overfit.

A: SVM-RBF decision boundary. Red and blue circles/crosses show data points from two classes, red and blue areas show learned decision boundary by SVM-RBF. Left: Classifier trained on both train data points (circles) and validation data points (crosses). Right: Classifier trained only on train data points (circles). B: Two-sample t-test feature selection performed both, on pooled and on independent train and validation data. Y axis shows mean t-statistic for selected 10 features from the pool of features ranging from 20 to 100.

