Influence of Feature Encoding and Choice of Classifier on Disease Risk Prediction in Genome-Wide Association Studies

doi:10.1371/journal.pone.0135832

Table 1.

Illustration of the three different encoding schemes for SNP data.

More »

Expand

Table 2.

Algorithm implementations.

More »

Expand

Table 3.

Parameter optimization values.

More »

Expand

Table 4.

Average number of SNPs reaching the specified p-value threshold per data set.

More »

Expand

Table 5.

Average ranks and p-values of the Friedman test for the three encoding schemes.

More »

Expand

Table 6.

Rank differences and p-values for pair-wise comparison of encodings.

More »

Expand

Fig 1.

Comparison of encodings per classifier.

The three encodings compared by their rank distance over all data sets and classifiers (a) and grouped by classifier. A connecting line between encodings means that the null hypothesis of them being significantly different could not be rejected. Only algorithms for which the Friedman test rejected the null hypothesis are shown. (α = 0.001.)

More »

Expand

Fig 2.

Comparison of encodings per disease data set.

The three encodings compared by their rank distance over all data sets and classifiers (a) and grouped by disease data set. A connecting line between encodings means that the null hypothesis of them being significantly different could not be rejected. Only data sets for which the Friedman test rejected the null hypothesis are shown. (α = 0.001.)

More »

Expand

Table 7.

Maximum and average AUCs for different encodings grouped by data set.

More »

Expand

Table 8.

Average ranks of the seven classification algorithms.

More »

Expand

Table 9.

Rank differences and p-values for pair-wise comparison of classification algorithms.

More »

Expand

Fig 3.

Comparison of classification algorithms.

The seven classification algorithms compared by their rank distance over all disease data sets using the additive encoding. A connecting line between encodings means that the null hypothesis of them being significantly different could not be rejected (with α = 0.001).

More »

Expand

Table 10.

Average AUC for each data set and algorithm over all p-value thresholds.

More »

Expand