Automated Identification of Core Regulatory Genes in Human Gene Regulatory Networks

doi:10.1371/journal.pcbi.1004504

Fig 1.

Integrated human transcriptional and post-transcriptional GRN.

The network contains a total of 1867 miRNAs and 21,940 genes, including 1374 TFs and 20,556 non-TF genes (or mRNAs for simplicity). The numbers on the edges denote the total number of interactions between different types of nodes.

More »

Expand

Fig 2.

Degree distribution of the general human TF-miRNA-mRNA network over (a) all nodes, (b) mRNAs, (c,d) TFs (in, out degrees) and (e,f) miRNAs (in, out degrees), (g,h,i) regulation of TFs, mRNAs and miRNAs by TFs, (j,k) regulation of TFs and mRNAs by miRNAs.

More »

Expand

Table 1.

Statistics of degree distribution for various interactions in the integrated human GRN.

More »

Expand

Fig 3.

Relationship between a node’s in or out-degree in the integrated human TF-miRNA-mRNA network and its expression characteristics in BioGPS (a) absolute expression level, (b) tissue specific gene expression.

More »

Expand

Fig 4.

Evolution of the integrated human TF-miRNA-mRNA network with increasing number of ChIP’ed transcription factors and cell types in which transcription factors have been ChIP’ed.

As more TFs are ChIP’ed, while the shape of TF out-degree distribution remains the same (a), a proportionate number of edges are added to the network (b). Addition of these new edges leads to a linear increase in the in-degrees of mRNA nodes both for average in-degree and high in-degree mRNAs (c,d). This is similar to percolation dynamics where the frequency of both average and large size clusters increases as an increasing number of lattice spaces are filled up (e). ChIP of the same TFs in more cell types adds fewer new edges to the network (f) and the TF nodes (g) with a plateau reached beyond 10 cell types. Shown in (h) is an extrapolation of mRNA in-degree if all known TFs in the human genome (~1400) were to be ChIP’ed.

More »

Expand

Table 2.

Public MCF-7 gene expression datasets downloaded from NCBI’s Gene Expression Omnibus (GEO) database.

More »

Expand

Fig 5.

MCF-7 estrogen response gene network.

(a) Differentially expressed genes with FDR < 0.001 were selected from two high quality datasets GSE11324 (microarray) and GSE51403 (RNA-seq). A significant number of genes were common between the two datasets, near about 70% if a FDR cutoff of 0.05 were to be used as shown by dotted ellipses. The union list between GSE11324 and GSE51403 was selected for network construction. The union list also had a good overlap with another dataset GSE11352. (b) The final MCF-7 network consisted of 5736 nodes including 462 TFs, 58 miRNAs and 5216 mRNA genes. The numbers on the edges denote the total number of interconnections between various types of nodes.

More »

Expand

Table 3.

Top 20 regulatory molecules identified by various ranking strategies in the MCF-7 estrogen response gene regulatory network.

More »

Expand

Fig 6.

Randomization test to determine whether the core TFs and microRNAs identified in MCF-7 estrogen response GRN were obtained by chance.

The scatter plot compares the core number of a TF in MCF-7 estrogen response GRN (y-axis) with its average core number over 10,000 randomly sampled networks (x-axis). To be comparable the randomly sampled networks contained the same number of TFs and miRNAs as the MCF-7 estrogen response GRN.

More »

Expand

Fig 7.

Randomization test to determine whether the core TFs and microRNAs identified in MCF-7 estrogen response GRN were obtained by chance.

The scatter plot compares the core number of a microRNA in MCF-7 estrogen response GRN (y-axis) with its average core number over 10,000 randomly sampled networks (x-axis). To be comparable the randomly sampled networks contained the same number of TFs and miRNAs as the MCF-7 estrogen response GRN.

More »

Expand

Table 4.

Literature validation scores for various ranking strategies (lower scores are better).

More »

Expand

Fig 8.

Comparison of gene expression measurements in two repeats of the same biological experiment.

The scatter plot shows the measured fold changes of gene expression in E2 vs. Control treated MCF-7 cells in GSE11324 (x-axis) and GSE51304 (y-axis) experiments. Only genes which passed a FDR cutoff of 0.001 (a) or 0.05 (b) are shown. Although both datasets are of high quality, the absolute values of fold change are only moderately correlated. However, the direction of fold change is consistent for most of the genes.

More »

Expand

Table 5.

Performance of explaining gene expression in E2 vs. control treated MCF-7 cells using core regulators identified by various ranking strategies.

Three different mathematical or AI models were used for modeling gene expression: linear regression (LR), support vector machines (classification, SVC, and regression, SVR) and principal component analysis (PCA). Performance was measured as area under the ROC curve (AUROC) for real-valued estimators and using Matthew’s correlation coefficient (MCC) for binary classifiers in 5-fold cross validation.

More »

Expand

Fig 9.

Hierarchical organization of all regulatory molecules, including 106TFs and 58miRNAs, in MCF-7 estrogen response GRN using K-core algorithm.

TF and miRNA nodes are represented by rectangles and diamonds respectively. Nodes are colored red or green depending upon whether the molecule’s expression is up or down-regulated in E2 vs. Control cells. The hierarchy is based on the principle of network centrality where nodes which are more important for the flow of regulatory information are more towards the core. Nodes in core 1 (Myc) are most central, followed by nodes in cores 2, 3, and so on in decreasing order of centrality. Some cores have been clubbed together for ease of visualization.

More »

Expand

Table 6.

Gene expression classification in the MCF-7 estrogen response GRN using various selections of regulatory nodes based on their core numbers, K, in K-core hierarchy.

In the top half of the table the innermost core regulators (K ≤ 2) are always included and the cumulative effect of adding further core regulators is measured. In the bottom half of the table the innermost core regulators (K ≤ 2) are excluded in order to measure the individual contributions of regulators at various core levels. Classification accuracy is reported in terms of area under the ROC curve (AUROC) for real valued classifiers (LR, SVR and PCA) and Matthew’s correlation coefficient (MCC) for binary classifiers (SVC).

More »

Expand

Table 7.

Literature validation in terms of the average rank of genes in various cores of the K-core hierarchical organization of MCF-7 estrogen response GRN (lower scores are better).

More »

Expand

Table 8.

Performance of gene expression classification in the MCF-7 estrogen response GRN with and without the inclusion of miRNAs in the list of regulators.

Each row of the table represents a different selection of regulatory nodes based on their core number, K, in the hierarchy produced by K-core. Classification accuracy is reported in terms of the area under the ROC curve (AUROC) for LR and SVR.

More »

Expand

Fig 10.

Overlap of miRNA-mRNA interactions predicted in silico by various tools.

More »

Expand