CBNA: A control theory based method for identifying coding and non-coding cancer drivers

A key task in cancer genomics research is to identify cancer driver genes. As these genes initialise and progress cancer, understanding them is critical in designing effective cancer interventions. Although there are several methods developed to discover cancer drivers, most of them only identify coding drivers. However, non-coding RNAs can regulate driver mutations to develop cancer. Hence, novel methods are required to reveal both coding and non-coding cancer drivers. In this paper, we develop a novel framework named Controllability based Biological Network Analysis (CBNA) to uncover coding and non-coding cancer drivers (i.e. miRNA cancer drivers). CBNA integrates different genomic data types, including gene expression, gene network, mutation data, and contains a two-stage process: (1) Building a network for a condition (e.g. cancer condition) and (2) Identifying drivers. The application of CBNA to the BRCA dataset demonstrates that it is more effective than the existing methods in detecting coding cancer drivers. In addition, CBNA also predicts 17 miRNA drivers for breast cancer. Some of these predicted miRNA drivers have been validated by literature and the rest can be good candidates for wet-lab validation. We further use CBNA to detect subtype-specific cancer drivers and several predicted drivers have been confirmed to be related to breast cancer subtypes. Another application of CBNA is to discover epithelial-mesenchymal transition (EMT) drivers. Of the predicted EMT drivers, 7 coding and 6 miRNA drivers are in the known EMT gene lists.

The first approach to rank candidate coding cancer drivers is based on mutation 3 count. The second approach is based on (i) mutation frequency and spectrum of 4 patients, and (ii) mutation rates of genes incorporating expression level and replication 5 time, as the MutSigCV method does [1]. Another approach to rank cancer drivers 6 predicted by CBNA is based on the functional impact of mutations as OncodriveFM 7 does [2]. We compare these approaches and the result is illustrated in Fig. A. It can be 8 seen that the performance of CBNA using mutation count is better than the other 9 methods in all the cases. 10 2 Breakdown of known breast cancer driver genes 11 There are 80 known breast cancer genes in the well-curated set which are obtained 12 from [3], [4], [5], and [6]. Of these 80 genes, 18 are filtered out by the PPI and miRNA 13 databases. For the remaining gene set, 19 genes are critical nodes, 8 are redundant 14 nodes, and 35 are ordinary nodes in the network created by CBNA.   The miRNA-TF-mRNA network of the normal state obtained by CBNA consists of 35 7,726 nodes and 31,053 directed edges. We apply the network control [8] to evaluate the 36 controllability of the network by identifying its MDNS. We find that the size of MDNS, 37 denoted as N D , is 2,680, accounting for 45.5% of nodes. We then classify nodes as 38 critical, ordinary, and redundant based on the change of N D upon their removal. In the 39 miRNA-TF-mRNA network of normal state, 11.2% of nodes are critical, 34% are 40 ordinary, and the remaining 54.8% are redundant (Fig. C(A)). We find that critical 41 nodes have higher in-degrees/out-degrees compared with ordinary and redundant nodes, 42 which can be seen in the average in-degree/out-degree and accumulative 43 in-degree/out-degree distribution of nodes in Fig. C(B) and Fig. C(C). 5 Epithelial-mesenchymal transition drivers 45 We apply CBNA to BRCA dataset to identify epithelial-mesenchymal transition (EMT) 46 drivers. These drivers are expected to drive the transition from epithelial state to 47 mesenchymal state in breast cancer patients.

49
To validate the result, we rank predicted EMT drivers based on their node degree 50 (i.e. the number of edges of a node) in the network of the mesenchymal condition, which 51 is built by our method. The higher degree a driver has, the higher it is in the ranked 52 driver list. We then validate the top 100 predicted coding EMT drivers with 53 mesenchymal genes in EMT signatures [9] and 17 predicted miRNA EMT drivers with 54 pro-mesenchymal miRNAs in EMT miRNAs [10]. There are 7 predicted coding EMT 55 drivers and 6 predicted miRNA EMT drivers in these known EMT gene lists.

Fig D. Overlaps between candidate EMT drivers and known EMT genes
The chart shows the significant ovelaps of (A) coding candidate EMT drivers (top 100) and mesenchymal genes, (B) candidate miRNA EMT drivers and pro-mesenchymal miRNAs.
where N denotes the number of all genes of interest, K is the number of confirmed 59 drivers (i.e. genes in known EMT gene lists), M is the number of estimated drivers, and 60 n is the number of drivers validated.

62
The result is illustrated in Fig. D.