Table 1.
NCBI GEO series selected for this study.
Criteria for series selection was getting a relative balancing of the different categories, including all possible samples from the least frequent diseases. Technology and total number of samples/outliers are included.
Table 2.
Taxonomic classifications for the three skin cancer scenarios: 2, 3 & 7 classes.
Fig 1.
Microarray gene expression analysis pipeline.
The process has been developed sequentially in different phases. This pipeline summarizes the decisions made throughout the study.
Table 3.
Bioconductor R AnnotationData packages and available symbols for the selected series integration.
Fig 2.
Expression values of each series after independent normalization.
The aggregation of the high quality samples shows dynamic variability among different datasets.
Fig 3.
Expression values of each series after joint platforms normalization.
The integration tool used on the high quality samples reflects a homogeneous expression range.
Table 4.
Total number of obtained DEGs depending on several restrictions imposed by different evaluated configurations of virtualArray tool.
The batch effect removal and union method factors were considered. The statistical parameters |LFC| ≥ 4 and PV ≤ 0.001 were selected.
Fig 4.
Final common DEGs obtained by considering common genes from QD and MRS results intersection.
17 common DEGs were obtained between QD and MRS effect batch removal in addition to apply union methods intersection.
Table 5.
List of 17 DEGs which are independent to the union method, batch removal method and multiclass problem.
One of the virtualArray configurations (union method by mean, MRS batch effect and 7 classes taxonomy) was selected for showing those DEGs. All of them were listed and ordered by |μLFC|.
Table 6.
The statistical analysis includes the main factors assessed, such as relevant statistics parameters among which highlights associated PV.
Fig 5.
Hierarchical clustering of healthy and skin cancer samples by using the 17 DEGs.
A perfect differentiation among the 7 cancer-related skin states was obtained after applying clustering and dendrogram reorder. Five samples from each skin state were used. Different colors are used for each skin sample type: NSK (light green), NEV (dark green), PRIMEL (dark purple), METMEL (light purple), BCC (chocolate), SCC (orange) and MCC (salmon).
Fig 6.
Classification accuracy achieved for each of the considered taxonomies: (A) 7 classes, (B) 3 classes and (C) 2 classes.
The confusion matrix for taxonomy A was constructed with 10-CV and 17 DEGs. The other confusion matrices were constructed from the previous, by summing the respective sub-matrices associated with each skin super-state.
Fig 7.
Expression level of the selected genes ordered by the ranking returned by mRMR algorithm.
Different colors are used for each cancer-related skin state: NSK (Normal Skin), NEV (Nevus), PRIMEL (Primary Melanoma), METMEL (Metastatic Melanoma), BCC (Basal Cell Carcinoma), SCC (Squamous Cell Carcinoma) and MCC (Merkel Cell Carcinoma).
Fig 8.
Evolution of the classification accuracy for each subset of genes considered, and for each taxonomy.
Similar trends can be observed for both LOO-CV and 10-CV.
Fig 9.
Evolution of the classification accuracy for each cancer-related skin state according to the number of genes from the mRMR ranking considered in the classifier.
Different colors are used for each skin sample type: NSK (Normal Skin), NEV (Nevus), PRIMEL (Primary Melanoma), METMEL (Metastatic Melanoma), BCC (Basal Cell Carcinoma), SCC (Squamous Cell Carcinoma) and MCC (Merkel Cell Carcinoma). SVM with 10-CV was used.
Table 7.
Relation between DEGs in this study and different skin diseases or disorders.
A minimum number of two disease-related citations for each gene was selected, as well as one gene significantly associated at least with a disease. A maximum False Discovery Rate (FDR) equal to 0.05 was imposed.