Hands-on training about overfitting
Fig 14
Exploring class-randomized data.
Scatterplot in Orange can search for feature combinations that best split the classes. For the yeast expression data, diauxic shift (diau f) and sporulation at a five-hour timepoint (spo-mid) provide for the best combination. When the data is class-randomized, the class labels change, the pattern of class-separation is no longer there, but the data points keep their position. The effect of randomization is also visible by comparing the two Data Tables.