Construction and analysis of mRNA, miRNA, lncRNA, and TF regulatory networks reveal the key genes associated with prostate cancer

Purpose Prostate cancer (PCa) causes a common male urinary system malignant tumour, and the molecular mechanisms of PCa are related to the abnormal regulation of various signalling pathways. An increasing number of studies have suggested that mRNAs, miRNAs, lncRNAs, and TFs could play important roles in various biological processes that are associated with cancer pathogenesis. This study aims to reveal functional genes and investigate the underlying molecular mechanisms of PCa with bioinformatics. Methods Original gene expression profiles were obtained from the GSE64318 and GSE46602 datasets in the Gene Expression Omnibus (GEO). We conducted differential screens of the expression of genes (DEGs) between two groups using the online tool GEO2R based on the R software limma package. Interactions between differentially expressed miRNAs, mRNAs and lncRNAs were predicted and merged with the target genes. Co-expression of miRNAs, lncRNAs and mRNAs was selected to construct mRNA-miRNA-lncRNA interaction networks. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed for the DEGs. Protein-protein interaction (PPI) networks were constructed, and transcription factors were annotated. Expression of hub genes in the TCGA datasets was verified to improve the reliability of our analysis. Results The results demonstrate that 60 miRNAs, 1578 mRNAs and 61 lncRNAs were differentially expressed in PCa. The mRNA-miRNA-lncRNA networks were composed of 5 miRNA nodes, 13 lncRNA nodes, and 45 mRNA nodes. The DEGs were mainly enriched in the nuclei and cytoplasm and were involved in the regulation of transcription, related to sequence-specific DNA binding, and participated in the regulation of the PI3K-Akt signalling pathway. These pathways are related to cancer and focal adhesion signalling pathways. Furthermore, we found that 5 miRNAs, 6 lncRNAs, 6 mRNAs and 2 TFs play important regulatory roles in the interaction network. The expression levels of EGFR, VEGFA, PIK3R1, DLG4, TGFBR1 and KIT were significantly different between PCa and normal prostate tissue. Conclusion Based on the current study, large-scale effects of interrelated mRNAs, miRNAs, lncRNAs, and TFs established a new prostate cancer network. In addition, we conducted functional module analysis within the network. In conclusion, this study provides new insight for exploration of the molecular mechanisms of PCa and valuable clues for further research into the process of tumourigenesis and its development in PCa.


Methods
Original gene expression profiles were obtained from the GSE64318 and GSE46602 datasets in the Gene Expression Omnibus (GEO). We conducted differential screens of the expression of genes (DEGs) between two groups using the online tool GEO2R based on the R software limma package. Interactions between differentially expressed miRNAs, mRNAs and lncRNAs were predicted and merged with the target genes. Co-expression of miRNAs, lncRNAs and mRNAs was selected to construct mRNA-miRNA-lncRNA interaction networks. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed for the DEGs. Protein-protein interaction (PPI) networks were constructed, and transcription factors were annotated. Expression of hub genes in the TCGA datasets was verified to improve the reliability of our analysis.

Results
The results demonstrate that 60 miRNAs, 1578 mRNAs and 61 lncRNAs were differentially expressed in PCa. The mRNA-miRNA-lncRNA networks were composed of 5 miRNA nodes, 13 lncRNA nodes, and 45 mRNA nodes. The DEGs were mainly enriched in the nuclei and cytoplasm and were involved in the regulation of transcription, related to sequence-specific DNA binding, and participated in the regulation of the PI3K-Akt signalling PLOS

Introduction
Prostate cancer (PCa) involves a common male urinary system malignant tumour that has the highest incidence among European and American populations [1,2]. This disease not only seriously affects the quality of life of patients but is also associated with financial burdens for society and the family [3,4]. PCa is closely regulated by various cytokines, related genes and intercellular signalling networks in the course of its development. However, the specific molecular mechanisms remain unclear. Therefore, it is important to study the molecular mechanisms of prostate cancer and to prevent the disease and develop effective treatments. miRNAs are a class of non-coding small RNAs that can be combined with the 3-'UTR of target mRNA to regulate gene expression, which leads to abnormal expression of target genes. A large number of studies have reported abnormal expression of miRNAs in prostate cancer involving multiple miRNAs, such as miR-16, 221, 375, 1290 and 141 [5][6][7]. Abnormal miRNA expression has been confirmed to be closely related with long non-coding RNAs (lncRNAs) and transcription factors (TFs). lncRNAs regulate miRNA expression by binding and sequestering target miRNAs and participating in mRNA expression regulation [8,9]. lncRNAs are emerging as novel diagnostic markers of PCa, and Chang et al. found that HOTAIR may participate in PCa progression [10]. Tian et al. revealed that RNCR3 functions as a tumour-promoting lncRNA in prostate cancer [11]. Li et al. found long noncoding RNA BDNF-AS is associated with clinical outcomes and plays a functional role in human prostate cancer [12]. As important factors in gene transcription and post-transcriptional regulation, TFs are also involved in controlling signaling pathways with miRNAs [13,14]. A previous study found that CREB1 and FoxA1 associated with advanced prostate cancer, and CREB1/FoxA1 target gene panels can predict prostate cancer recurrence [15]. However, further research is needed to determine the regulatory mechanisms of miRNAs, lncRNAs, TFs and mRNAs in PCa.
Microarray analysis can quickly identify all genes that are expressed at the same time-point [16]. Future research could benefit from integration and analysis of these data [17]. In this work, we identified differentially expressed genes (DEGs) in prostate cancer from the GSE64318 [18] and GSE46602 [19] datasets. We performed Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway analysis for significant DEGs. Furthermore, we analysed mRNA-miRNA-lncRNA and protein-protein interaction (PPI) networks to reveal interactions and identified some factors that may be associated with PCa regulatory mechanisms. Finally, we analysed hub genes based on the PPI network and TCGA datasets. This study will contribute to understanding the molecular mechanism of prostate cancer, providing valuable clues for further research.

Raw data
The datasets used in the present study were downloaded from the National Center of Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih. gov/geo/) [20]. The experimental articles that were used to compare the RNAs in prostate cancer tissue and noncancerous prostate tissue from patients were included. The original gene expression profiles were obtained from the GSE64318 [18] and GSE46602 datasets [19] (S1 and S2 Tables). The GSE64318 dataset includes microRNA expression profiles of prostate biopsy samples from 54 prostate biopsy specimens (27 tumours and 27 normal tissues). The platform used for these data is the GPL8227 Agilent-019118 Human miRNA Microarray 2.0 G4470B (miRNA ID version). The GSE46602 dataset includes genome expression profiling of tumour tissue specimens from 36 patients with prostate cancer and normal prostate biopsies from 14 patients. The platform used to analyse these data was the GPL570 Affymetrix Human Genome U133 Plus 2.0 Array.

Identification of differentially expressed genes
GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r/) is an interactive web tool based on the R language limma package [21] that can be used to compare two or more groups of samples to identify differential expression in a GEO series. In the present study, GEO2R was used to filter differentially expressed mRNAs and miRNAs between normal and tumour samples separately in each of the datasets. The false discovery rate (FDR) is a method of conceptualizing the rate of type I errors in null hypothesis testing when conducting multiple comparisons. GEO2R calculates the FDR automatically. The multiple t test was used to detect statistically significant genes at the same time with FDR correction. Fold change (FC) >2 and P-value <0.05 were set as the cut-off criteria. Probes without a corresponding gene symbol were then filtered.

Prediction of mRNA-miRNA-lncRNA interactions
Interactions between differentially expressed miRNAs and differentially expressed mRNAs were predicted using miRWalk 3.0 (http://mirwalk.umm.uni-heidelberg.de/), which integrated the prediction results of both TargetScan [22] and miRDB [23], and a score !0.95 was considered as the cutoff criterion for the prediction analysis in miRWalk. Only the target mRNAs included in all of these databases were selected for the further analysis. The interaction between miRNA and lncRNA was predicted by using DIANA-LncBase v2.0 [24] and the score !0.4 was considered as cutoff criterion for the prediction analysis in the experimental module of LncBase. After the predicted targets were intersected with DEGs, miRNAs, lncRNAs and mRNAs were selected for further analysis. Cytoscape software (version 3.40) was used to visualize the regulatory network.

Gene function analysis
The DAVID database (http://david.abcc.ncifcrf.gov/) was used to conduct gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis on significant DEGs and target genes of miRNAs with differential expression. The species was limited to Homo sapiens, and the "adjusted P-value (from the Benjamini-Hochberg method), 0.05" was considered statistically significant [25]. The GO terms included these three criteria: molecular function (MF), cellular component (CC), and biological process (BP).

Protein-protein interaction (PPI) network construction and analysis
The protein-protein interaction network was constructed using the STRING online database. PPI pairs with a combined score !0.4 were used to construct the PPI network. Transcription factors were annotated using a TF checkpoint [26]. Then, the regulatory relationship between genes was visualized using Cytoscape software (version 3.4.0) and analysed through a topological property of the computing network including the degree distribution of the network using the CentiScaPe app [27]. Furthermore, the gene with a degree >5 was defined as a hub gene in the regulatory network according to a previous study [28].

TCGA dataset analysis
TCGA is a platform for researchers to download and assess free public datasets (https:// cancergenome.nih.gov/) [29]. In the present study, we verified the expression of hub genes in TCGA datasets to improve the reliability of our analysis using an online tool, UALCAN (http://ualcan.path.uab.edu/) [30]. UALCAN used TCGA level 3 RNA-seq and clinical data from 31 cancer types. As described by Li and Dewey [31], the "scaled_estimate" was multiplied by 10 6 to obtain transcripts per million (TPM) expression value using an in-house PERL (Practical Extraction and Report Language) program. As TPM has been suggested to be more comparable across samples than fragments per kilobase of transcript per million mapped reads (FPKM) and reads per kilobase of transcript per million mapped reads (RPKM) [32], TPM was used as the measure of expression here. Fifty-two normal samples and 497 primary tumour samples of prostate adenocarcinoma (PRAD) in TCGA datasets were included in the present study. Box and whisker plots were used to show hub gene expression levels between normal samples and primary tumour samples in PRAD.

Identification of differentially expressed mRNAs, miRNAs and lncRNAs
The results show that 60 miRNAs, 1578 mRNAs and 128 lncRNAs were differentially expressed in prostate cancer (S3, S4 and S5 Tables). Analysis of the GSE64318 dataset led to the identification of 60 miRNAs in PCa tumour tissues compared with normal prostate tissues, including 35 up-regulated miRNAs and 25 down-regulated miRNAs. Analysis of the GSE46602 dataset led to the identification of 1578 mRNAs, including 583 up-regulated and 995 down-regulated mRNAs, and 128 lncRNAs, including 49 up-regulated and 79 down-regulated lncRNAs. We selected all significantly up-regulated and down-regulated mRNAs, miRNAs and lncRNAs to plot their expression on heat-maps and volcano plots (Fig 1). The top 10 significantly up-regulated and down-regulated miRNAs, mRNAs and lncRNAs are shown in Tables 1, 2 and 3, respectively.

Function enrichment analysis of DEGs
GO enrichment analyses for DEGs based on the mRNA-miRNA-lncRNA network were performed. The top 10 most significant GO terms of each group were shown. On the MF level, the DEGs were mainly enriched in sequence-specific DNA binding. On the CC level, the DEGs were mainly enriched in the nucleus, cytoplasm and plasma membrane. On the BP level, the DEGs were mainly enriched in positive regulation of transcription from the RNA polymerase II promoter, cell proliferation and cell migration. The top 20 most significant KEGG pathway terms are shown. The genes were mainly enriched in the PI3K-Akt signalling pathway, pathways in cancer and the focal adhesion signalling pathway (Fig 4, S1 File).

Protein-protein interaction network (PPI) analysis
Using the STRING online database and Cytoscape software, a total of 128 of the 1578 DEGs were mapped into the PPI network complex. In this network, with a degree >5, 32 nodes were chosen as hub nodes, including 2 TFs, 14 miRNAs, and 16 mRNAs; the results are presented in Table 4, Fig 5 and S2 File. Moreover, we found that 6 mRNAs (i.e., EGFR, VEGFA, PIK3R1, DLG4, TGFBR1 and KIT) exhibited higher degrees (>10) and miRNA-mRNA pairs. These findings suggest that these nodes may play important roles in the development of PCa.

TCGA dataset analysis
TCGA dataset analyses were performed to demonstrate that aberrant expression of the hub genes, including EGFR, VEGFA, PIK3R1, DLG4, TGFBR1 and KIT, were significantly different between PCa and normal prostate tissues (Fig 6).

Discussion
Prostate cancer (PCa) has become a public health issue of great concern worldwide [33]. However, the molecular mechanisms involved in PCa progression are still unclear. Therefore, it is crucial to study the mechanism and identify molecular targets for diagnosis and treatment. A number of studies have found that the development of PCa is related to a variety of signalling pathways, and mRNAs, miRNAs, lncRNAs and TFs play important regulatory roles. In this study, we performed a comprehensive bioinformatics analysis and retrieved mRNAs, miRNAs, lncRNAs and TFs in the interaction network, revealing key genes associated with prostate cancer. miRNA expression in tumour tissues often differs from that of normal tissue, and this differential expression affects the occurrence, development and prognosis of tumours [34,35]. Studies have confirmed that miRNA expression profiles can be used as biomarkers for the early detection, classification and prognosis of tumours [36,37]. lncRNAs act as miRNA sponges and can regulate miRNA abundance and compete with mRNA for miRNA binding [38]. By constructing an mRNA-miRNA-lncRNA network, we found that the aberrant expression of lncRNAs led to abnormal expression of 5 miRNAs (i.e., has-miR-20a, has-miR-20b, has-miR-23b, has-let-7a and has-let-7d) in PCa and thus regulated the expression of target mRNAs. A previous study demonstrated that 5 miRNAs regulate the development of prostate cancer by targeting specific mRNAs and play an important role in the regulation of prostate cancer [39][40][41]. The involvement of key lncRNAs including HYMAI, MEG3, IPO5P1, MAG12-AS3, RMST and TRG-AS1 are important. Among them, MEG3 is an important tumour suppressor gene that inhibits cell proliferation and induces apoptosis in PCa [42]. The discovery of these lncRNAs suggests potential diagnostic and therapeutic targets for PCa.
Next, we conducted a functional enrichment analysis of the DEGs based on the mRNA-miRNA-lncRNA network. We found that the DEGs were mainly enriched in the nucleus and  Analysis of interaction networks in PCa cytoplasm, were involved in transcriptional regulation, were related to sequence-specific DNA binding and participated in regulation of the PI3K-Akt signalling pathway, which is related to cancer, and the focal adhesion signalling pathway. To further analyse the key genes related to PCa, we constructed a PPI network. More significantly, we found that the transcription factor NR3C1 was a hub gene that participates in regulation of the expression of multiple miRNAs (has-miR-20a, has-miR-20b, has-miR-23b). Puhr M et al. assessed NR3C1 expression and the functional significance in tissues from PCa and found that it is a key factor for the development of PCa [43]. Moreover, we found that its regulatory function is similar to MEG3 and its down-regulation leads to increased expression of miRNAs (has-miR-20a, has-miR-20b, and has-miR-23b). Based on these results, we speculate that some regulatory relationship may exist between NR3C1 and MEG3. Finally, through PPI and TCGA dataset analyses, we found that 6 mRNAs (i.e., EGFR, VEGFA, PIK3R1, DLG4, TGFBR1 and KIT) had higher degrees and miRNA-mRNA pairs. This expression was lower and significantly different between PCa and normal prostate tissues. EGFR belongs to a family of cell membrane receptor tyrosine kinases and is a key factor in tumour cell growth and invasion [44,45]. Previous studies have demonstrated that abnormal expression of EGFR and its downstream signalling contributes to disease progression in PCa [46]. EGFR downregulation is associated with enhanced signalling [47], which can lead to the development of cancer [48]. VEGFA is a mitogen with high endothelial cell specificity that plays a major regulatory role in the development of PCa [49]. A previous study found that PIK3R1, TGFBR1 and KIT might have clinical utility in distinguishing PCa [50][51][52]. EGFR can regulate VEGF expression by influencing MAPK and PI3K signalling pathways [53]. TGFBR1, DLG4 and KIT are also related to the regulation of MAPK and PI3K signalling pathways [54]. Our results further confirmed the potential biological relevance of 6 mRNAs in PCa. We also found that 6 mRNAs (EGFR, VEGFA, PIK3R1, DLG4, TGFBR1 and KIT) were associated with abnormal expression of 5 miRNAs (has-miR-20a, has-miR-20b, has-miR-23b, haslet-7a and has-let-7d) in PCa. In conclusion, the hub genes that we identified may play crucial roles in PCa.
Supporting information S1