Analysis of Gene Expression Profiles of Soft Tissue Sarcoma Using a Combination of Knowledge-Based Filtering with Integration of Multiple Statistics

The diagnosis and treatment of soft tissue sarcomas (STS) have been difficult. Of the diverse histological subtypes, undifferentiated pleomorphic sarcoma (UPS) is particularly difficult to diagnose accurately, and its classification per se is still controversial. Recent advances in genomic technologies provide an excellent way to address such problems. However, it is often difficult, if not impossible, to identify definitive disease-associated genes using genome-wide analysis alone, primarily because of multiple testing problems. In the present study, we analyzed microarray data from 88 STS patients using a combination method that used knowledge-based filtering and a simulation based on the integration of multiple statistics to reduce multiple testing problems. We identified 25 genes, including hypoxia-related genes (e.g., MIF, SCD1, P4HA1, ENO1, and STAT1) and cell cycle- and DNA repair-related genes (e.g., TACC3, PRDX1, PRKDC, and H2AFY). These genes showed significant differential expression among histological subtypes, including UPS, and showed associations with overall survival. STAT1 showed a strong association with overall survival in UPS patients (logrank p = 1.84×10−6 and adjusted p value 2.99×10−3 after the permutation test). According to the literature, the 25 genes selected are useful not only as markers of differential diagnosis but also as prognostic/predictive markers and/or therapeutic targets for STS. Our combination method can identify genes that are potential prognostic/predictive factors and/or therapeutic targets in STS and possibly in other cancers. These disease-associated genes deserve further preclinical and clinical validation.


Introduction
Recent advances in genomic technologies offer an excellent opportunity to determine the complete biological characteristics of neoplastic tissues, resulting in improved diagnosis, treatment selection, rational classification based on molecular carcinogenesis, and identification of therapeutic targets. The diagnosis and treatment of soft tissue sarcomas (STS) have been difficult because STSs comprise a group of highly heterogeneous tumors in terms of histopathology, molecular signature, histological grade, and primary site. These tumors have generally been classified into subtypes according to their histological resemblance to normal tissue. The Fédération Francaise des Centres de Lutte Contre le Cancer (FNCLCC) grading system was defined more than 20 years ago and is still the most commonly used grading system for STS [1,2]. Treatment of STS is based on both histological subtype and histological grade. The understanding gained regarding the molecular pathology of cancer in recent decades suggests that some tumor types exhibit stand-alone recurrent genetic aberrations, such as chromosomal translocations, that result in gene fusions, e.g., SYT-SSX in synovial sarcoma (SS) [3], TLS-CHOP in myxoid/round cell liposarcoma (MLS) [4], and KIF5B-RET in lung adenocarcinoma [5], or somatic mutations, e.g., KIT in gastrointestinal stromal tumors (GIST) [6] and 26 mutated genes (TP53, KRAS, EGFR, and 23 other genes) in lung adenocarci-noma [7]. The molecular markers specific to each tumor type are useful for tumor classification [8]. In contrast, several malignant tumors, such as malignant fibrous histiocytoma (MFH), are characterized by numerous nonrecurrent, complex chromosomal aberrations, and they frequently show overlapping histological features and immunophenotypes that are difficult for pathologists to interpret [9]. In particular, the diagnosis of MFH has been a controversial issue [10][11][12][13]. MFH is the most common soft tissue sarcoma in adults. It has a wide range of histological subtypes [13]. For this reason, discrimination between MFH and other STSs is difficult, but this discrimination is necessary because there are significant differences in the 5-year survival rates of the STS subtypes [14]: 100% for well-differentiated liposarcoma (WLS), 71% for synovial sarcoma (SS), 46% for pleomorphic MFH, and 92% for myxofibrosarcoma (MFS). MFH was renamed undifferentiated pleomorphic sarcoma (UPS) in 2002 by the World Health Organization (WHO) [15]. MFS was considered a subtype of MFH before this classification; WHO reclassified MFS as another subtype of STS [15]. Discrimination between UPS and MFS is particularly difficult [14] because of their histological similarities and because of the considerable heterogeneity of UPS [13]. UPS was previously characterized by global gene expression analysis using analysis of variance (ANOVA) and clustering analysis [13]. Although some possible prognostic factors were identified, the list of factors was not complete because the study was conducted without information on patient outcomes. In the present study, we hypothesized that some genes can serve both as diagnostic markers for histological subtyping and as prognostic markers of overall survival in STS. We used a combination of statistical and bioinformatic methods to identify those genes.
Many statistical and bioinformatic methods have been proposed for global biological information analysis in the past 3 decades. For example, basic local alignment search tool (BLAST) [16], ClustalW [17], BLAST-based algorithm for the identification of upstream ORFs with conserved amino acid sequences (BAIUCAS) [18], and G4 DNA motif region finder by R (G4MR-FindeR) [19] have been used for sequence analysis; hierarchical clustering [20], fuzzy k-means [21], and fuzzy adaptive resonance theory (FuzzyART) [22,23] have been used for gene cluster analysis; gene set enrichment analysis (GSEA) [24], modified signal-to-noise (S2N9) [25], and projective adaptive resonance theory (PART) [26,27] have been used for gene selection; fuzzy neural network (FNN) [28,29] and boosted fuzzy classifier with a SWEEP operator (BFCS) [30][31][32] have been used for the construction of prediction models; and IntPath [33] and Stringent DDI-based Prediction [34] were used for analysis of pathways and protein-protein interactions. The use of statistical or bioinformatic analysis is practical and useful for clinical diagnosis [35][36][37] and the identification of marker genes [38][39][40][41][42][43]. In the present study, we focused on microarray data analysis; however, the analysis of data obtained using next-generation sequencing technologies [44] is a subject of an upcoming project.
Global analysis of gene expression is a powerful method for the identification of prognostic/predictive factors and/or therapeutic targets. However, it is often difficult, if not impossible, to identify definitive disease-associated genes using genome-wide analysis alone, primarily because of multiple testing problems. In this situation, knowledge-based approaches, such as knowledge-based fuzzy adaptive resonance theory (KB-FuzzyART) [45] and knowledge-based single nucleotide polymorphism (KB-SNP) [46,47], are effective and interpretable [48][49][50]. Online Mendelian Inheritance in Man (OMIM) is a continuously updated catalog of human genes and genetic disorders and traits. In the present study, we used OMIM as a knowledge source for narrowing the list of candidate genes and applied the OMIM-based method to gene expression data from STS patients. Thus, we identified 25 genes that showed significant differential expression among histological subtypes, including UPS, and showed associations with overall survival. According to the literature, these genes are useful not only as diagnostic markers for the discrimination of molecular pathway-based subtypes but also as prognostic/predictive markers and/or therapeutic targets for STS. Moreover, these genes are useful for understanding the mechanisms underlying tumor progression or metastasis and for the rational design of anticancer therapeutics. Therefore, our combination method of knowledgebased filtering and simulation based on the integration of multiple statistics can identify potential prognostic/predictive factors and/ or therapeutic targets in STS and possibly in other cancers.

Ethics statement
The study was conducted according to the principles expressed in the Declaration of Helsinki. The ethics committee of the National Cancer Center approved the study protocol. All patients provided written informed consent.  [51], as shown in Table S1. Tumor samples were obtained at the time of excision and were cryopreserved in liquid nitrogen.

Microarray analysis
For RNA extraction, trained pathologists carefully excised the tissue samples from the main tumor, leaving a margin free from the surrounding nontumorous tissue. The elimination of nontumorous stromal cells is necessary for gene expression analysis of carcinomas because tumor tissues contain a significant number of nontumorous stromal cells, including fibroblasts, endothelial cells, and inflammation-associated cells. STS contains non-tumorous The list of OMIM numbers related to cancer (e.g., cancer, carcinoma, sarcoma, tumor, and neoplasm) was selected and converted into Affymetrix probe IDs in Ensembl. (B) Prefiltering of probe sets. This procedure was based on the number of absent calls and the range of signals. A signal range (95th percentile to 5th percentile) of .2000 was used as a percentile filter. Furthermore, we excluded probe sets for which the number of absent calls was .50% (44/88). Probe sets related to cancer were selected using the OMIM-based method. (C) Integration of survival analysis and discriminant analysis. (D) Clinical data from all patients were permutated. Permutated data for 72 STS patients (20 UPS, 15 MFS, 20 MLS, and 17 SS patients) were extracted from the permutated data of all patients. For these data, p values (p 1 ) were calculated by applying ANOVA to the log-transformed gene expression data to discriminate among UPS, MFS, MLS, and SS. In addition, permutated data from 88 patients were used for survival analysis. For these data, p values (p 2 ) were calculated by applying the logrank test to the binarized gene expression data to analyze the outcomes in the STS group. The integrated statistic p9 was defined as p 1 6p 2 . The lowest p9 value was selected for each repetition. This procedure was repeated 100,000 times, and an empirical null distribution was constructed. Using the distribution, the actual p9 value obtained from the real data was converted to the adjusted p value (based on the correction for multiple testing problems). doi:10.1371/journal.pone.0106801.g001 stromal cells that are difficult to exclude because STS originates from mesenchymal cells. However, in STS, the tumor tissue contains very few non-tumorous stromal cells and therefore unlikely to confound the analysis. Hence, laser microdissection was not performed in this study. Total RNA samples extracted from the bulk tissue specimens were labeled with biotin and hybridized to high-density oligonucleotide microarrays (Human Genome U133A 2.0 Array; Affymetrix, Santa Clara, CA, USA) comprising 22,283 probe sets representing 18,400 transcripts, according to the manufacturer's instructions. The scanned array data were processed using the Affymetrix Microarray Suite v.5.1 software (MAS5), which scaled the average intensity of all the genes on each array to the target signal of 1000. The microarray data from the present study are available in the Genome Medicine Database of Japan (GeMDBJ) [52] (https://gemdbj.nibio.go.jp/ dgdb/) under the accession number EXPR058P.

Data preprocessing
We excluded 68 control probe sets and 2343 genes that were subject to cross-hybridization according to NetAffx Annotation  (www.affymetrix.com). Furthermore, we excluded those genes for which more than 50% (44/88) of the samples showed an absent call (i.e., the detection call determined by MAS5 based on the p value of the one-sided Wilcoxon signed-rank test; an absent call corresponds to p$0.065, which is the default threshold in MAS5). An absent call indicates that the expression signal was undetectable. Genes showing low variance, i.e., a signal range value (95th percentile to 5th percentile) of less than 2000, were excluded [40]. Furthermore, we conducted an OMIM-based reduction of the number of candidate genes. In total, 1412 genes were selected, to which we applied log-transformation or binarization using the median value as a threshold for each gene, as shown in Fig. 1. The 2 types of datasets, log-transformed and binarized, were used for ANOVA and the logrank test, respectively.

Simulation based on the combination of a permutation test and the integration of multiple statistics
We previously proposed a statistical simulation based on a permutation test and the integration of multiple statistics [51].
This method was used in the present study. We first calculated p values using ANOVA to discriminate among histological subtypes, including UPS, MFS, SS, and MLS. We also calculated p values by means of the logrank test in the survival analysis of all STS patients in relation to the 1412 filtered genes. We defined the integrated statistic p9 as p 1 6p 2 , where p 1 is the p value from ANOVA and p 2 is the p value from the logrank test. The same STS patients (n = 72; 20 UPS, 15 MFS, 17 SS, and 20 MLS patients) were used in both of these tests. The integrated statistic p9 could be underestimated by the use of 72 common samples. Therefore, to cancel this influence, we conducted a simulation based on the permutation test, as shown in Fig. 1, to estimate the adjusted p9 values as well as the multiple testing problems.

Statistical analysis
The median value of the gene expression signals for each gene was calculated, and the patients were distributed into 2 groups using the median value as a threshold for each gene. Logrank tests [53] were performed for overall survival of STS patients for each gene. We also calculated Spearman's rank correlation coefficients to assess the relationships between gene expression signals and histological grades [54] or incidence of tumor metastases. We considered data obtained after 50 months of follow-up as censored data in the analysis of the logrank test, similar to the procedure followed in our previous study [51]. Kaplan-Meier curves [55] based on histological subtype were constructed for all STS patients.

OMIM
OMIM is a continuously updated catalog of human genes and genetic disorders and traits, with a focus on the molecular relationship between genetic variation and phenotypic expression. The list of MIM gene accession numbers associated with keywords related to cancer was obtained from the OMIM website (http:// www.omim.org/). We used several keywords related to cancer, including ''cancer,'' ''carcinoma,'' ''sarcoma,'' ''tumor,'' and ''neoplasm,'' to create the MIM gene accession number list. There were 4394 MIM gene accession numbers, as shown in Table S2. The final MIM gene accession number list was obtained on January 10, 2014.

Ensembl
Ensembl is a joint project between EMBL-EBI and the Sanger Centre to develop software that produces and maintains automatic annotation of eukaryotic genomes [56]. We converted MIM numbers to the Affymetrix probe set IDs of the Human Genome  Table S3.

Principal component analysis (PCA)
We used PCA to reduce the gene expression profile data to a two-dimensional dataset. PCA was first proposed in 1901 by Pearson [57]. This method is a statistical procedure that uses orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components (PCs). The number of PCs is less than or equal to the number of original variables. This transformation is defined in such a way that the first PC has the greatest possible variance.

Multiple testing correction
The Bonferroni correction is a method used to address the problem of multiple comparisons (also known as the multiple testing problem). It is considered the simplest and most conservative method for control of the family-wise error rate (FWER). False discovery rate (FDR) controlling procedures, such as the Benjamini-Hochberg (BH) method [58], are more powerful (i.e., less conservative) than the FWER procedures, but their use increases the likelihood of false positives within the rejected hypothesis. In the present study, the BH method was used to calculate the q value. The q value is defined as an FDR analog of the p value.

Heatmap and hierarchical clustering analyses
A heatmap was created using the R program (function heatmap.2 in Package gplots) for the log-transformed and scaled gene expression data of selected genes. Hierarchical clustering was also conducted using the Euclidean distance and complete linkage (default parameters of function heatmap.2).

Kaplan-Meier curves for 4 histological subtypes
Kaplan-Meier curves based on a histological subtype were constructed for all STS patients, as shown in Fig. 2. This figure shows that MFS had a good prognosis, MLS and SS had intermediate prognoses, and UPS had a poor prognosis. Although the logrank test yielded statistically significant results (p,0.05) in histological types, we conducted gene expression analysis to select molecular markers for more accurate diagnosis in accordance with the analysis.
Extraction of genes that are both diagnostic and prognostic markers, by means of a simulation using the permutation test To extract genes that are both diagnostic markers (for discrimination of histological subtypes) and prognostic markers (of overall survival in STS), we applied a simulation based on the combination of a permutation test and the integration of multiple statistics into 1412 prefiltered probe sets of microarray data obtained from STS patients. As shown in Table 2, 29 probe sets, representing 25 genes, were extracted (adjusted p value ,0.05).

Association analysis of the histological grade (or metastasis status) and gene expression data for the 25 selected genes
We next used Spearman's rank correlation analysis to analyze the association between the gene expression level in STS patients and the histological grade (or metastasis status), as shown in Table 3. Table 3 shows that genes with positive r were upregulated in highly malignant tumors, whereas genes with negative r were downregulated in highly malignant tumors. The expression levels of almost all of the 25 genes were associated with either the histological grade or metastasis. However, stearoyl-CoA desaturase 1 (SCD1) and signal transducer and activator of transcription 1 (STAT1) were not associated with either the histological grade (SCD1: r = 20.0191, p = 0.860; STAT1: r = 20.146, p = 0.173) or metastasis (SCD1: r = 0.0237, p = 0.826; STAT1: r = 20.177, p = 0.0995). This result indicates that SCD1 and STAT1 expression levels can be related to the overall survival of STS patients but not to metastasis. Therefore, these data suggest that SCD1 and STAT1 expression levels can   be used in combination with the histological grade to predict the survival of STS patients.

Hierarchical clustering based on the gene expression pattern of the 25 selected genes
We performed hierarchical clustering for the 29 selected probe sets, representing 25 genes and 4 histological subtypes (UPS, MFS, MLS, and SS), as shown in Fig. 3. The genes were roughly classified into 4 clusters (clusters A, B, C, and D). Almost all genes were upregulated in both UPS and MFS. In addition, genes in cluster A were upregulated in SS, and genes in cluster D were upregulated in MLS.

Analysis of the distribution of histological subtypes based on gene expression levels
We performed PCA to calculate the first and the second PCs using the 29 probe sets. Detailed information on PCA, including eigenvector, standard deviation, proportion of variance, and cumulative proportion, is provided in Tables S4 and S5. The distribution of the 4 histological subtypes of STS on the 2 axes is shown in Fig. 4. The 4 histological subtypes were clearly classified into 3 clusters (SS, MLS, and UPS+MFS). This result indicated that UPS and MFS had histological similarities and similar gene expression patterns. Therefore, to discriminate between UPS and MFS, we applied Welch's t test and the BH method to the gene expression data of the 29 probe sets, as shown in Table 4. We extracted 9 probe sets, representing 8 genes (q value ,0.05): enolase 1 (ENO1)/c-myc-promoter binding protein-1 (MBP1); prolyl 4-hydroxylase subunit alpha-1 (P4HA1); peroxiredoxin 1 (PRDX1); CD34; family with sequence similarity 162, member A (FAM162A)/human growth and transformation-dependent protein (HGTD-P); protein tyrosine kinase 7 (PTK7); and macrophage migration inhibitory factor (MIF). We performed PCA to calculate the first and the second PCs from these 9 probe sets. Detailed information of PCA, including eigenvector, standard deviation, proportion of variance, and cumulative proportion, are shown in Table S5. The distribution of the 2 histological subtypes, UPS and MFS, on the 2 axes is shown in Fig. 5. UPS and MFS were classified into approximately 2 clusters. For the contribution of this classification, MIF, ENO1/MBP1, and CD34 contributed to the top 3 largest coefficients for PC1, PTK7, PRDX1, and ENO1/MBP1 contributed to the top 3 largest coefficients for PC2, and only SCD1 contributed to the largest coefficients for PC3, as shown in Table S5. MIF, ENO1/MBP1, and SCD1 were extracted in our previous study [51]. We also applied Welch's t test and the BH method to the gene expression data from the 29 probe sets to discriminate UPS from SS and UPS from MLS, as shown in Table 4.

Classification of the 25 genes based on pairwise comparison of histological subtypes
We classified the 25 genes into 7 groups on the basis of 3 comparisons (UPS vs. MFS, UPS vs. SS, and UPS vs. MLS), as shown in Fig. 6. Only 3 genes, ENO1/MBP1, P4HA1, and PRDX1, were commonly selected (genes that were selected in the UPS vs. MFS comparison were also selected in the UPS vs. SS or UPS vs. MLS comparison). Furthermore, we compared the 25 genes selected in our study with the genes involved in the complexity index in sarcomas (CINSARC) [59] because the use of CINSARC (composed of 67 genes) instead of the FNCLCC grading system [1,2] was recently proposed for predicting metastasis in STS [59]. In this comparison, only 4 common genes, that is, pituitary tumor-transforming 1 (PTTG1), abnormal spindle-like microcephaly-associated protein (ASPM), cell-division cycle protein 20 (CDC20), and kinesin family member 20A (KIF20A)/mitotic kinesin-like protein 2 (MKlp2), were extracted. The differential expression of these 4 genes was statistically significant (q ,0.05) for UPS vs. SS and for UPS vs. MLS, but not for UPS vs. MFS. These 4 genes belonged to cluster B, as shown in Fig. 3. Consequently, the 25 genes were classified into 7 groups on Genes inside the red circle were statistically significant (q ,0.05 calculated using Welch's t test and the BH method) in the comparison of UPS with SS. Genes inside the green oval were statistically significant (q ,0.05) in the comparison of UPS with MLS. Genes inside the blue oval were statistically significant (q ,0.05) in the comparison of UPS and MFS. Genes inside the pink oval are common to CINSARC and our 25-gene set. For PCA of the 9-probe set, MIF and CD34 highlighted in red were the first and third largest contributing coefficients to PC1, respectively. PTK7 and PRDX1 highlighted in blue were the first and second largest contributing coefficients to PC2, respectively. ENO1/ MBP1 highlighted in purple was the second largest contributing coefficient to PC1 and the third largest contributing coefficient to PC2. SCD1 highlighted in green was the largest contributing coefficient to PC3. doi:10.1371/journal.pone.0106801.g006

Survival analysis in UPS patients
We used the logrank test to analyze the survival of UPS patients. We selected the best p value for various thresholds (30th, 40th, 50th, 60th, 70th, and 80th percentiles) of gene expression signals in UPS patients for each probe set when the gene expression signals were binarized. Adjusted p values were obtained by adjusting the data for the multiple testing problem (6 thresholds629 probe sets) based on the permutation test, as shown in Table S6. Only STAT1 showed a statistically significant association with survival in UPS (logrank p value 1.84610 26 and adjusted p value 2.99610 23 after the permutation test). Fig. 7 shows that STAT1-positive and STAT1-negative groups had clearly different survival curves based on the Kaplan-Meier method.

Discussion
In the present study, we conducted a simulation based on a permutation test to extract genes that are both diagnostic markers (for discrimination of histological subtypes) and prognostic markers (for overall survival in STS). As shown in Table 2, 25 genes were extracted, and their adjusted p values were statistically significant (adjusted p,0.05). We analyzed studies related to these 25 genes and found many reports suggesting that these 25 genes are effective prognostic/predictive factors or therapeutic targets, as shown in Table S7, according to the literature (See Supplementary  Discussion).
Although we did not try to identify the molecular mechanisms behind the 25 selected genes, several published studies have examined pathways related to these 25 genes, as shown in Table  S7 and Fig. 8. These 25 genes are roughly classified into 4 types, namely, hypoxia-related genes (MIF, SCD1, P4HA1, ENO1/ MBP1, FAM162A/HGTD-P, SLC16A1/MCT1, FN1, and STAT1), cell cycle-and DNA repair-related genes (ASPM, CDK1/CDC2, CDC20, KIF20A/MKlp2, PTTG1, TACC3, PRDX1, PRKDC/DNA-PKcs, and H2AFY/H2AX), growth factor signal transduction-related genes, and other genes. Cell cycle-and DNA repair-related genes, hypoxia-induced genes, and growth factor signal transduction-related genes are key players in tumor growth, angiogenesis, metabolism, invasion, and metastasis in various types of cancer. In fact, these processes are attenuated by the inhibition or silencing of many of these 25 genes, as shown in Table S7. These genes are therefore possible prognostic/ predictive markers and/or therapeutic targets.
STAT1 expression was found to be strongly associated with survival in UPS patients. STAT1 interacts directly with p53 and induces cell growth arrest and apoptosis, as shown in Fig. 8. Although STAT1 is repressed by HIF-1, the STAT1-positive group among the UPS patients had a better prognosis, even when hypoxia-related genes were upregulated. Therefore, STAT1 is a possible novel, independent prognostic/predictive factor of STS, particularly UPS.
In the diagnosis of STS, classification of UPS is the most controversial topic. Among the 25 selected genes, hypoxia-related genes (MIF, SCD1, P4HA1, ENO1/MBP1, FAM162A/HGTD-P, SLC16A1/MCT1, FN1, and STAT1) are present in this study. In particular, the genes MIF, SCD1, P4HA1, ENO1/MBP1, and FAM162A/HGTD-P are differentially expressed between UPS and MFS, as shown in Fig. 6 and Table 4. Furthermore, STAT1 is a prognostic marker in UPS patients, as shown in Fig. 7. Therefore, these hypoxia-related genes are promising prognostic and therapeutic targets and, if validated, may improve the treatment/diagnosis of this type of cancer. Further research is needed regarding the hypoxia-related pathways in highly malignant STS.
We manually constructed a hypothetical regulation model (Figure 8) of metabolic and signaling control in highly malignant STS. Nevertheless, according to the literature, a part of these networks could be automatically predicted by pathway and interaction analyses. For example, pathways of the cell cycle and the DNA damage response were identified by IntPath [33,60,61] with statistical significance (q value ,0.05), as shown in Table S8. Interaction networks of the cell cycle (ASPM, CDK1, CDC20, KIF20A, PTTG1, PRKDC, and TACC3) and HIF-1 (MIF, ENO1, and PRDX1) were identified by means of STRING [62], as shown in Fig. S1. Nonetheless, these tools should be used with appropriate parameters [34,60,61]. Such tools are more effective methods when large numbers of candidate genes are extracted.
In summary, we analyzed microarray gene expression data from 88 STS patients using a combination method involving knowledge-based filtering and a simulation based on the integration of multiple statistics to reduce multiple testing problems. Our combination method automatically identified 25 genes in the gene expression data from STS. These genes showed significant differential expression between different histological subtypes, including UPS, and showed associations with survival in STS. Furthermore, we conducted a bibliographic survey in terms of cancer progression for the 25 identified genes, and substantial evidence was uncovered in the literature. These genes were roughly classified into 4 types, namely, hypoxia-related genes, cell cycle-and DNA repair-related genes, growth factor signal transduction-related genes, and other genes. STAT1 showed a statistically significant association with the survival of UPS patients (logrank adjusted p = 0.00299). Although only a few studies have investigated the association of these genes with survival in STS, many recent studies have reported that these genes are prognostic factors and/or therapeutic targets in other types of cancers. Therefore, these results suggest that our combination method is capable of identifying genes that are potential prognostic/ predictive factors and/or therapeutic targets in STS and possibly in other cancers. These disease-associated genes deserve further preclinical and clinical validation. Figure S1 The pathways predicted by STRING from the 25 selected genes.    Table S8 Pathway analysis in IntPath. k: genes from the overlap between genes in the list and genes in the pathway, n: the number of genes in the input gene list, m: the number of genes in the identified pathways, N: the total number of genes. The p values were calculated using the hypergeometric test; the q values were calculated from the p values using the Benjamini-Hochberg (BH) method.