Fig 1.
Schematic diagram of this study.
Table 1.
Performance evaluation of different machine-learning models by ten-fold cross-validation.
Fig 2.
Performance of different semi-supervised models based on the independent testing gene set.
(A) The area under the receiver operating characteristic curve (AUROC) for four semi-supervised models. (B) The area under the precision-recall curve (AUPRC) for four semi-supervised models. (C) The AUROC for four deep semi-supervised-learning models. (D) The AUPRC for four deep semi-supervised-learning models. The FlexMatch model fails to capture patterns in the omic data, producing predictions no better than random chance, a consequence of fundamental failures in its pseudo-labeling mechanism.
Fig 3.
UpSet plot showing the overlap between the identified cER genes with the cancer genes from different databases.
Upset plot diagram shows the intersection of the six cancer gene sets and the identified cER genes. Vertical bars (indicated by the black dots joined by black lines) represent the number of genes in the different combinations of cancer gene sets. Horizontal bars represent the number of genes in the different cancer gene sets.
Fig 4.
The Circos plot displays the predicted cancer-specific cER genes across 18 cancer types.
The predicted cER genes are highlighted in red, and the outermost ring contains the corresponding gene symbols.
Fig 5.
Evaluation of CASER-predicted cERs by gene set enrichment analysis.
(A) Network visualization of the enriched pathways based on the gene set enrichment analysis results of GOBP pathways. Node represents a specific GOBP pathway, and edges represents numbers of shared genes. The size of nodes corresponds to the number of genes in that pathway, while the color gradient represents the -log10 adjusted p-value, with darker red colors indicating higher pathway enrichment. (B) KEGG pathway and (C) DisGeNET gene set enrichment results for the predicted cERs (including known and predicted novel cERs), unpredicted cERs and NGs. Terms with adjusted P-values < 0.01 are shown.
Fig 6.
Evaluation of the identified cERs by a published ER CRISPR-screen dataset and the DepMap gene essentiality screen dataset.
The robust rank aggregation (RRA) scores of negative selections are shown for different gene sets in (A) LNCaP and (B) MDA-MB-231 cells. The Chronos dependency scores are shown for different gene sets in (C) LNCaP and (D) TNBC cancer cell. The RRA scores of negative selection were calculated using MAGeCK. The higher -log10(RRA) values indicate a greater effect on cell survival after gene knockdown. The lower DEMETER and Chronos score indicate a greater effect on cell survival after gene knockdown. The genes in this analysis are both cancer genes and ERGs. Non-cancer ERs refer to the ER genes that are also neutral genes. P-value is calculated by Wilcoxon rank-sum one-tailed test.
Fig 7.
Investigation of six candidate cER genes in affecting the proliferation of cancer cell lines.
(A) Six predicted cER genes were investigated in SK-mel-2 cell line (n = 4). (B) Six predicted cER genes were investigated in Caki-1 cell line (n = 4). Statistical analysis was performed between si-NC and si-Genes. P-values are calculated by One-way analysis of variance followed by Dunnett’s corrections and indicated by star symbols, *, P < 0.05; **, P < 0.01; ***, P < 0.001; ****, P < 0.0001; ns, P > 0.05.