Fig 1.
Network architecture for the two types of AE used in the analysis.
(a) Single hidden layer and (b) Two-hidden layers. In both cases, xi represent elements of the input as well as the output vector. zi represents elements of the learned lower dimensional representation. (a) Shallow autoencoder architecture and (b) Deeper autoencoder architecture.
Fig 2.
Flow diagram of the MCA data analysis pipeline.
MCA on all features after numerical feature categorisation.
Fig 3.
Flow diagram of the MCA/PCA data analysis pipeline.
MCA on categorical and PCA on numerical features.
Fig 4.
Flow diagram of the MCA/PCA/PCA data analysis pipeline.
MCA on categorical and PCA on numerical features followed by PCA on resulting components of both methods.
Fig 5.
Flow diagram of the AE data analysis pipeline.
Feature scaling followed by application multiple layer autoencoder neural network.
Table 1.
Characteristics of the COPD cohort (overall, training and test sets) used for measuring patient similarity.
Mean value and standard deviation are presented for continuous feature.
Table 2.
Features ranked by degree of influence on the resulting similarity according to each pipeline.
Bolded entries indicate numerical features.
Table 3.
Clustering results comparison for COPD cohort—different k values presented above and below diagonal as indicated by backslash “\”.
The values of the diagonals represent averaged results obtained from 10% bootstrapped sampling and re-clustering.
Fig 6.
Patient similarity rankings as assigned by two clinician raters for each pipeline.
Table 4.
Summary of evaluation results, including importance of features, cluster tendency and clinical expert evaluation for all four data processing pipelines.
Table 5.
Raw (%) agreement and kappa coefficient calculated on the basis of 1–2 as well as binary (best/worst) rankings.