Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series

doi:10.1371/journal.pone.0196836

Table 1.

NCBI GEO series selected for this study.

Criteria for series selection was getting a relative balancing of the different categories, including all possible samples from the least frequent diseases. Technology and total number of samples/outliers are included.

More »

Expand

Table 2.

Taxonomic classifications for the three skin cancer scenarios: 2, 3 & 7 classes.

More »

Expand

Fig 1.

Microarray gene expression analysis pipeline.

The process has been developed sequentially in different phases. This pipeline summarizes the decisions made throughout the study.

More »

Expand

Table 3.

Bioconductor R AnnotationData packages and available symbols for the selected series integration.

More »

Expand

Fig 2.

Expression values of each series after independent normalization.

The aggregation of the high quality samples shows dynamic variability among different datasets.

More »

Expand

Fig 3.

Expression values of each series after joint platforms normalization.

The integration tool used on the high quality samples reflects a homogeneous expression range.

More »

Expand

Table 4.

Total number of obtained DEGs depending on several restrictions imposed by different evaluated configurations of virtualArray tool.

The batch effect removal and union method factors were considered. The statistical parameters |LFC| ≥ 4 and PV ≤ 0.001 were selected.

More »

Expand

Fig 4.

Final common DEGs obtained by considering common genes from QD and MRS results intersection.

17 common DEGs were obtained between QD and MRS effect batch removal in addition to apply union methods intersection.

More »

Expand

Table 5.

List of 17 DEGs which are independent to the union method, batch removal method and multiclass problem.

One of the virtualArray configurations (union method by mean, MRS batch effect and 7 classes taxonomy) was selected for showing those DEGs. All of them were listed and ordered by |μ_LFC|.

More »

Expand

Table 6.

Results of the ANOVA test.

The statistical analysis includes the main factors assessed, such as relevant statistics parameters among which highlights associated PV.

More »

Expand

Fig 5.

Hierarchical clustering of healthy and skin cancer samples by using the 17 DEGs.

A perfect differentiation among the 7 cancer-related skin states was obtained after applying clustering and dendrogram reorder. Five samples from each skin state were used. Different colors are used for each skin sample type: NSK (light green), NEV (dark green), PRIMEL (dark purple), METMEL (light purple), BCC (chocolate), SCC (orange) and MCC (salmon).

More »

Expand

Fig 6.

Classification accuracy achieved for each of the considered taxonomies: (A) 7 classes, (B) 3 classes and (C) 2 classes.

The confusion matrix for taxonomy A was constructed with 10-CV and 17 DEGs. The other confusion matrices were constructed from the previous, by summing the respective sub-matrices associated with each skin super-state.

More »

Expand

Fig 7.

Expression level of the selected genes ordered by the ranking returned by mRMR algorithm.

Different colors are used for each cancer-related skin state: NSK (Normal Skin), NEV (Nevus), PRIMEL (Primary Melanoma), METMEL (Metastatic Melanoma), BCC (Basal Cell Carcinoma), SCC (Squamous Cell Carcinoma) and MCC (Merkel Cell Carcinoma).

More »

Expand

Fig 8.

Evolution of the classification accuracy for each subset of genes considered, and for each taxonomy.

Similar trends can be observed for both LOO-CV and 10-CV.

More »

Expand

Fig 9.

Evolution of the classification accuracy for each cancer-related skin state according to the number of genes from the mRMR ranking considered in the classifier.

Different colors are used for each skin sample type: NSK (Normal Skin), NEV (Nevus), PRIMEL (Primary Melanoma), METMEL (Metastatic Melanoma), BCC (Basal Cell Carcinoma), SCC (Squamous Cell Carcinoma) and MCC (Merkel Cell Carcinoma). SVM with 10-CV was used.

More »

Expand

Table 7.

Relation between DEGs in this study and different skin diseases or disorders.

A minimum number of two disease-related citations for each gene was selected, as well as one gene significantly associated at least with a disease. A maximum False Discovery Rate (FDR) equal to 0.05 was imposed.

More »

Expand