m6A-Driver: Identifying Context-Specific mRNA m6A Methylation-Driven Gene Interaction Networks

As the most prevalent mammalian mRNA epigenetic modification, N6-methyladenosine (m6A) has been shown to possess important post-transcriptional regulatory functions. However, the regulatory mechanisms and functional circuits of m6A are still largely elusive. To help unveil the regulatory circuitry mediated by mRNA m6A methylation, we develop here m6A-Driver, an algorithm for predicting m6A-driven genes and associated networks, whose functional interactions are likely to be actively modulated by m6A methylation under a specific condition. Specifically, m6A-Driver integrates the PPI network and the predicted differential m6A methylation sites from methylated RNA immunoprecipitation sequencing (MeRIP-Seq) data using a Random Walk with Restart (RWR) algorithm and then builds a consensus m6A-driven network of m6A-driven genes. To evaluate the performance, we applied m6A-Driver to build the context-specific m6A-driven networks for 4 known m6A (de)methylases, i.e., FTO, METTL3, METTL14 and WTAP. Our results suggest that m6A-Driver can robustly and efficiently identify m6A-driven genes that are functionally more enriched and associated with higher degree of differential expression than differential m6A methylated genes. Pathway analysis of the constructed context-specific m6A-driven gene networks further revealed the regulatory circuitry underlying the dynamic interplays between the methyltransferases and demethylase at the epitranscriptomic layer of gene regulation.

Introduction Methylation, as a significant epigenetic modification of nucleic acids, regulates gene expression, influences grows and development of plants and animals, and is closely related to the occurrence and development of disease. The epigenetic regulatory mechanisms and physiological functions of DNA methylation have been well established through intensive studies in simple model organisms to human in the past decade [1][2][3]. However, RNA methylation, even though prevalent in many organisms, has long been considered to have little functional relevance. The discovery of obesity-associated FTO as a demethylase [4] of mRNA N6-methyladenosine (m 6 A) revealed that mRNA m 6 A methylation can be reversed and is thus a highly dynamic phenomenon. This discovery sparked the surged interests in study the prevalence of m 6 A in different cells and the functions of m 6 A. Subsequently, using methylated RNA immunoprecipitation sequencing (MeRIP-seq) technique [5][6][7], transcriptome-wide distribution of m 6 A in mammalian cells was profiled [6,7], revealing for the first time a widespread occurrence of m 6 A in >25% transcripts. m 6 A was also shown to be enriched around the stop codon of RNA transcripts and conserved between people and mouse [6,7], implicating a potential role played by m 6 A in post-transcriptional regulation [6,8,9]. Since then, m 6 A has been shown to have a number of important biological functions, including promoting RNA degradation [10], regulating RNA stability by modulating binding of RNA binding proteins [6,11,12], and controlling translation efficiency [13][14][15][16][17]. Meanwhile, the identification of m 6 A methyltransferases and demethylases [4,[18][19][20] further revealed the regulators of epitranscriptome. We now know that the m 6 A methyltransferase complex consists of METTL3, METTL14, and WTAP and functions as m 6 A "writers" in eukaryotes [9,18,21]. In contrast, FTO and ALKBH5 are identified to be de-methyltransferase, or m 6 A "erasers" [4,9,22], indicating that mRNA m 6 A methylation is a dynamic process [4] and directly regulated by a number of methylases and demethylases [23]. Knockdown studies of these (de)methylases further revealed their involvement in many significant physiological processes including obesity [24][25][26], synaptic signaling [27], cancer [28,29], sperm development [22], stem cell differentiation [30], circadian periods [31], yeast meiosis [32,33], and stem cell pluripotency [34][35][36]. Although these studies together greatly improve our understanding of the reversible mRNA m 6 A methylation, the regulatory mechanisms and functional circuitry of m 6 A are still largely elusive.
Currently, MeRIP-Seq is the most widely adopted high throughput approach for measuring transcriptome-wide m 6 A methylation [6,7,37]. To obtain a transcriptome-wide m 6 A profile, MeRIP-Seq produces two sets of samples, i.e., IP and input samples. While IP samples include sequencing reads from m 6 A methylated RNA fragments pulled down with anti-m 6 A antibody, input samples measure the basal abundance of all transcripts, which are used as background for assessing the enrichment of methylated fragment. Detecting m 6 A methylation site or "peak detection" from MeRIP-Seq data can be achieved by comparing the enrichment of reads in the IP samples vs. those in the input samples. Several algorithms including exomePeak have been developed for m 6 A peak detection [38][39][40][41]. After the methylation sites are identified, differential m 6 A methylation (DmM) analysis can be also performed in a case-control study to further identify the dynamic m 6 A sites whose methylation status is significantly different under two experimental conditions. Algorithms such as exomePeak [42] and MeTDiff [43] have also been developed for this purpose. While peak detection and DmM analysis are essential steps for m 6 A bioinformatics analysis, they do not yet provide direct information about the functional relevance of m 6 A.
We focus in this paper on predicting m 6 A-driven genes (mDrGenes) and the m 6 A-driven gene interaction network (mDrNet). Specifically, we refer mDrGenes as genes whose mRNAs harbor DmM sites or differential m 6 A methylation genes (DmMGs), and thus may be under dynamic epitranscriptomic regulation and be functionally significant to the biological context of interest. Conceivably, when data is available, mDrGenes can be conveniently identified by first predicting the DmMGs and then assessing their functional significance by using functional networks such as Protein-Protein Interactions (PPI) network or biological pathways. However, identifying functionally significant DmMGs when there are replicates can be nontrivial. The challenge arises as a result of technical and biological bias, where significant DmMGs identified in some replicates might not be significant in other replicates. Existing algorithms for DmM analysis such as exomePeak and MeTDiff all devise different methods ranging from taking consensus DmM sites [42] to statistically modeling of replicate samples [43] to mitigate this bias. While they can help detect robust DmM sites, these DmM sites might not be functionally significant DmMGs. As our goal emphasizes on detecting functional significance, an approach that can address this bias in assessing functional significance is more desirable and likely to better identify the m 6 A-driven genes and network.
To address the aforementioned issue, we propose in this paper m 6 A-Driver, an algorithm that predicts mDrGenes by evaluating the consistency of RNA differential methylation from a functional network perspective. Specifically, rather than predicting DmMGs directly, m 6 A-Driver first performs DmM analysis on every possible replicate set (RS) independently, where each RS includes two IP-input pairs, one from the treated/case condition and the other from the untreated/control condition. Then, a DmM functional network is constructed for each RS by searching the significant interactions with DmMGs in PPI network using a Random Walk with Restart (RWR) algorithm. We adopt PPI network here to model functional interactions of m 6 A mediating genes because m 6 A has been shown to regulate the process of translation [13][14][15][16][17], in addition to its influence on gene expression. Finally, a consensus m 6 A-driven gene network is built by taking all the significant reoccurring interactions. By assessing the consensus among RS networks as opposed to RS DmMGs, m 6 A-Driver effectively addresses the sample bias that impacts functional prediction. m 6 A-Driver was applied to four case-control studies that investigate the functions of the component of methyltransferase complex (METTL3, METTL14, and WTAP) and demethylase (FTO). In the end, m 6 A-driven gene networks were constructed for each (de)methylase together with an integrated network for the complete m 6 A methyltransferase complex. We showed that the predicted m 6 A-driven genes have higher degree of differential expression and more explicit functional relevance than DmMGs identified directly by previous approaches. These results demonstrate the effectiveness of m 6 A-Driver in prioritizing functional significant m 6 A-driven genes from m 6 A sequencing data.

An overview of the m 6 A-Driver algorithm
The algorithm of m 6 A-Driver consists of four steps, depicted in Fig 1, with the first three steps implemented in each RS and step 4 performed to combine the results from all RSs. In step 1, Each DmMG is mapped to the PPI network and then serves as the starting node to initiate the RWR and its top accessible nodes are held as candidate genes. (C 1 ) For a DmMG with specific degree, we retrieve the normalized degree distribution of its m 6 A-Driver: Identifying mDrNet exomePeak [42] is applied to detect DmMGs in each RS. In step 2, for each RS, the Random Walk with Restart (RWR) is performed using every DmMG as the seed node separately to search for their closely interacting genes in the PPI network. In step 3, the topological and biological significance of these DmM interacting genes are assessed and the genes that are determined to be insignificant are filtered out. The topological significance is estimated by their occurrence as top nodes prioritized by the same RWR algorithm in 100 random networks generated with the same topological structures. Meanwhile, the biological significance is evaluated by the length of their shortest path to the initial node (the seed DmMG). RS-specific DmM interacting networks, each consisting of significant interacting genes, are constructed at the end of step 3. Finally, in step 4, an mDrNet is constructed by assessing the interaction recurrence across all RSs. The genes that make up the nodes of the mDrNet are predicted mDrGenes determined by exomePeak. In this way, we extract a set of mDrGenes, or functionally relevant genes driven by m 6 A and a network that depicts the functional relationship of mDrGenes.

An overview of the data
We applied m 6 A-Driver on 4 MeRIP-seq datasets, i.e., FTO knockdown dataset (KD-FTO) [23], METTL3 knockdown dataset (KD-METTL3), METTL14 knockdown dataset (KD-METTL4), and WTAP knockdown dataset (KD-WTAP) [44]. KD-FTO dataset is obtained from [27] that profiles m 6 A in FTO gene knockdown mice and their wild-type littermate. There are 12 samples (3 IP replicates paired with 3 input replicates for FTO knockdown mice and 3 IP replicates paired with 3 input replicates under wild-type littermate). It was divided into 9 sets of biological replicates and each biological replicate set (RS) contains two IP samples respectively from a FTO knockdown mouse and a wild-type (WT) littermate and two corresponding input samples from the two mice.
KD-METTL3, KD-METTL14 and KD-WTAP datasets are from a recent study, which shows that m 6 A regulates mRNA stability [44]. Each dataset contains 8 samples, 2 IP replicates paired with 2 input replicates from the knockdown HeLa cells and 2 IP replicates paired with 2 input replicates from untreated HeLa cells. Similar to KD-FTO, samples in each of the three datasets are then divided into 4 RSs, each of which contains two IP samples from the knockdown HeLa cells and untreated HeLa cells respectively and two corresponding input samples.
We first predicted the DmM sites in each dataset using exomePeak. As the technical limitation of MeRIP-Seq can lead to high sample bias, making the prediction results less reliable, we then set out to check the quality of the prediction results. First of all, the specificity of the predictions by exomePeak and MeTDiff on these datasets has been evaluated in a previously published paper [43], which shows that the false positive rates for all these datasets can be controlled and there are high specific DmM sites predicted in all these datasets. Next, we further examined the predictions of the three m 6 A methylase knockdown datasets, where we created a set of pseudo control and pseudo knockdown sequencing samples by scrambling the samples of a dataset so that the pseudo control and knockdown samples are both made up by a real control replicate and a real knockdown replicate. We then performed exomePeak on both the real dataset and the pseudo dataset and examined the prediction specificity at different top accessible nodes in each of the 100 random networks, which are generated with the same degree distribution with the original network. The normal occurrence frequency of certain degree indicates the probability that a node with this degree to be selected by chance. (C 2 ) The candidate genes are assessed by their topological and biological significance. Candidate genes that are not observed by random chance, i.e., p<0.05, or are biological significant, i.e., L = 1, are retained as significant candidate genes. (D) A consensus m 6 A-driven gene network is constructed by interactions recurring across all RSs.
thresholds by comparing the ratio of predicted DmM sites (or reported true positive rate, RTPR) in the real dataset and those in the pseudo dataset (or false positive rate, FPR) using a ROC-like curve. As is shown in S1 Fig, the percentage of DmM sites in the real datasets is much higher than that in the pseudo dataset at different thresholds in all the 3 datasets. Taken together, these results demonstrate that the false positive rates in these datasets can be controlled and the exomePeak prediction results are of good specificity for subsequent analysis.
The reference network, PPI network, is built from the most recent version of PPI data from BioGRID (release 3.4.128, compiled on August 25th, 2015) [45]. Based on the binary interactions, we removed the isolated proteins and self-interaction proteins to establish a PPI network with a total of 16,062 proteins and 152,676 interactions. m 6 A-Driver filters candidate DmMGs in a more robust and efficient way Jia et al. have proposed the VarWalker algorithm [46] to combine PPI network and mutation data identified by next-generation sequencing (NGS) to build consensus networks for identifying cancer driver genes. While VarWalker was proposed for predicting driver mutations, it provides a general framework for prioritizing target genes from high-throughput sequencing data assisted by PPI network. VarWalker evaluates candidate target genes (i.e., mutation gene in cancer or DmMG in this work) by assessing their topological significance using random networks which hold the same degree distribution with the PPI network. However, utilizing only the topological characteristics may remove functionally significant candidate target genes. Also, the filtering result is not steady because it will remove different candidate genes for the same target gene when using different random networks. That is, VarWalker is not robust enough. We propose in this paper an improved strategy to evaluate both the topological and functional significance of candidate DmMGs in a more robust and efficient way, and the approach is detailed in the Materials and Methods section.
To compare the robustness and efficiency of m 6 A-Driver and VarWalker in filtering candidate genes, we applied the two methods on 100 genes randomly selected from the DmMGs in KD-METTL3 dataset to filter their candidate genes using two different sets of random networks. Each set contains 100 networks which hold the same topological property of PPI network. A more robust algorithm should remove a consistent set of genes in two random network sets. As is shown in Table 1, m 6 A-Driver only removed 1 different candidate gene when using different random network sets, whereas VarWalker removed 40 different candidate genes. This result demonstrates that m 6 A-Driver is more robust in filtering candidate genes. It is not surprising to also notice that some of the removed genes by VarWalker have significant biological functional connections with the seed (the DmMG) in the PPI network. Moreover, m 6 A can filter candidate genes in a more efficient way. VarWalker needs to perform RWR for each DmMG in each of the 100 random networks to compute the reoccurrence frequency of the candidate genes for calculating the p-value. In contract, m 6 A-Driver only needs the degree of a candidate gene and the degree of the seed gene to calculate its empirical p-value. The predicted mDrGenes closely interact with each other in the m 6

Characteristics of the predicted mDrGenes
We next examined the characteristics of the predicted mDrGenes. We first investigated the differential methylation of the mDrGenes. An mDrGenes is defined as a hyper mDrGenes if its most differentially methylated site is hyper-methylated, but otherwise defined as a hypo mDrGenes if its most differentially methylated site is hypo-methylated. We counted the number of hyper and hypo mDrGenes (Fig 2). As expected, mDrGenes in KD-FTO are mostly hyper-methylated, whereas those in three other methylase knockdown datasets are more hypo-methylated. This result is consistent with the fact that FTO is an m 6 A demethylase, but METTL3, METTL14, and WTP are elements of m 6 A methyltransferase complex. We also calculated the average number of DmM sites for per gene and found that on average, an mDrGene harbors more than one DmM sites ( Table 2). It is interesting that mDrGenes in KD-FTO harbor more DmM sites than the other 3 datasets and KD-METTL3 mDrGenes harbor the least number of DmM sites on average. We then investigated the DmM site distribution using the Guitar R/Bioconductor package [47] in an mDrGene transcript (Fig 3). Overall, the distributions for the 4 datasets are very similar, where DmM sites are mostly enriched around the stop codon and are distributed more in 3'UTR and CDS, which is consistent with the reported results in the literature [6,7]. Furthermore, we obtained the sequence motifs of DmM sites in mDrGenes for each of the four datasets using MEME-ChIP webserver [48] (Fig 4). The reported RRACH m 6 A motifs [6,7] was top ranked in KD-FTO and KD-METTL3, whereas the most enriched motifs in KD-METTL14 and KD-WTAP are similar to the binding motifs of SRSF1 and SRSF9. Interestingly, SRSF1 and SRSF9 are components of the SRSF protein that is involved in splice site selection in alternative splicing [49].

mDrGenes are more functionally significant
We asked if mDrGenes are more functional relevant. To test this, we examined the functionally significance between mDrGenes and DmMGs predicted by exomePeak. We performed GO [50] enrichment analysis using DAVID (Database of Annotation, Visualization and Integrated Discovery) [51] and then compared the enrichment degrees of the top enriched biological processes (BP, Fig 5). Since a larger testing gene set tends to lead to a smaller enriched p-value when performing DAVID, to make the comparison fair, we balanced the scale of mDrGenes and DmMGs before enrichment analysis. For KD-FTO dataset, the scale of mDrGenes is larger (Fig 6), so we randomly removed some mDrGenes to make the scales the same and then performed the enrichment analyses for 10 times to calculate an average p Bonferroni for each enriched term. After also performing enrichment analysis on DmMGs, we compared the p Bonferroni of top 20 enriched terms for DmMGs and scaled mDrGenes. Since, DmMGs have a larger scale for KD-METTL3, KD-METTL14 and KD-WTAP datasets, to balance the scale, we selected DmMGs that harbor top differently methylated DmM sites. Then, we performed enrichment analyses and compared the p Bonferroni of top 20 enriched terms for the mDrGenes and scaled DmMGs. The result shows that mDrGenes are more significantly enriched than DmMGs in all the top enriched biological processes, demonstrating that mDrGenes are more functional relevant than DmMGs.
To further investigate the biological significance of mDrGenes, we evaluated the differential expression (DE) of mDrGenes and DmMGs. Conceivably, a gene set is likely to be more  functionally important if it has more differentially expressed genes (DEGs) and/or its DEGs are more differentially expressed. To this end, we applied DESeq2 [52] to the input replicates of treated and untreated samples and determined a gene to be DEG if the adjusted p-value is less than 0.05. We first examined the percentage of DEGs in mDrGenes and DmMGs in the four datasets. We found that there are very few DEGs in mDrGenes and DmMGs for both KD-FTO and KD-METTL14 dataset, and thus the percentages of DEGs in mDrGenes and DmMGs are very low for these two datasets. Not surprisingly, no significant differences between the percentages of DEGs in mDrGenes and DmMGs can be observed (Fisher's test, see Table 3 for details). In contrast, much more mDrGenes and DmMGs are differential expressed in KD-METTL3 and KD-WTAP, and the percentages in mDrGenes are significantly higher than those in DmMGs (Fisher's test, see Table 3 for details). We next compared the degree of DE, which is represented by the negative log10 (FDR) calculated by DESeq2 (Fig 7).
The result shows that the DE degrees of mDrGenes are also higher than those of DmMGs. The only exception is the FTO KD experiment, in which there are nearly no differential expression genes. Taken together, we can conclude that mDrGenes are likely to include more DEGs than DmMGs and their degree of DEs are also likely to be higher.

mDrGenes participate in several important biological processes
Functional enrichment analyses were carried out on the 4 mDrNets to help reveal the biological processes regulated by the 4 enzymes at epitranscriptomic layer of gene regulation. The results obtained using DAVID reveal a significant enrichment of multiple m 6 A-related pathways annotated by either the Kyoto Encyclopedia of Genes and Genomes (KEGG) [53] or Gene Ontology (GO) biological process (BP) domains (Figs 8-11).
For KD-FTO, since FTO is a demethylation enzyme, we expected to observe mainly hypermethylation. However, m 6 A-Driver did report several hypo-methylation mDrGenes, suggesting a potentially direct or indirect mode of FTO regulation to also enhance m 6 A. We next examined the functional relationship between the hyper-and hypo-methylated genes and found that there was little overlapping between their enriched functions (Fig 8 and Fig 9, see  S6 Fig and S7 Fig for detail). To keep consistency with paper [27], in which the KD-FTO data is published, we adopt the whole reference genome as the control data set of enrichment analysis. The hyper-methylated mDrGenes are clearly linked closely to neurological processes and neuro signaling pathways. Several significantly enriched terms annotated by GO BP are synapse and neuron signaling transmission (132 hyper mDrGenes, p Bonferroni = 1.79×10 −18 ), synaptic transmission (43 hyper mDrGenes, p Bonferroni = 1.77×10 −13 ), transmission of nerve impulse (50 genes in hyper mDrGenes, p Bonferroni = 5.58×10 −14 ).They are also likely associated with neuron differentiation (65 hyper mDrGenes, p Bonferroni = 1.81×10 −11 ) and neuron development (53 hyper mDrGenes, p Bonferroni = 3.00×10 −11 ) as well as embryonic development (54 , which may be another proof of RNA methylation involved in steering stem cell pluripotency [34][35][36]. In contrast, the hypo-methylated mDrGenes are more related to metabolic processes (79 hypo mDrGenes in protein catabolic process, p Bonferroni = 2.41×10 −18 and 85 hypo mDrGenes in macromolecule catabolic process, p Bonferroni = 1.88×10 −17 ) and cell cycle (65 hypo mDrGenes, p Bonferroni = 9.29×10 −12 ). In addition, the hypo mDrGenes are enriched in Spliceosome (17 hypo mDrGenes, p Bonferroni = 8.25×10 −4 ), which is also a KEGG term enriched in KD-METTL3, KD-METTL14 and KD-WTAP data (Fig 11), implicating a potential role of m 6 A in mRNA splicing. Note that WTAP itself is also splicing factor. However, this result suggests that WTAP might also regulate splicing in an m 6 A dependent fashion. Taken together, our predicted mDrGenes confirm the demethylation role of FTO but may suggest a direct or indirect role of FTO in promoting m 6 A. Functional enrichment suggests that these two modes of FTO function are involved in distinct biological processes and pathways.
Another interesting finding is that both hyper and hypo mDrGenes are enriched in cancer related pathways including Chronic myeloid leukemia (20 hyper mDrGenes, p Bonferroni = 3.27×10 −6 ; 12 hypo mDrGenes, p Bonferroni = 2. As FTO-KD data is extracted from mouse brain, so we also do enrichment analysis using the brain tissue specific expressed genes as control data to check whether the pathways we find above are really influenced by m 6 A methylation or by tissue specific expression. Brain tissue  We can see that exomePeak predicts more genes in the three methylase knockdown datasets but m 6 A-Driver can find genes missed by exomePeak due to biological variance.  To further study the dynamics of m 6 A methylation, we applied the enrichment analysis to mDrGenes predicted in KD-METTL3, KD-METTL14 and KD-WTAP and compared the similarity and difference of their enriched GO biological processes (Fig 10, and also see S10 Fig for more details) and KEGG pathways (Fig 11, and also see S11 Fig for more information). In this case, we chose to perform enrichment on all mDrGenes in these three datasets instead of analyzing hyper-and hypo-methylated mDrGenes separately, because there are little hyper mDrGenes in these 3 datasets. There are significant overlapping biological processes among these 3 sets of mDrGenes, but also exist enzyme specific functions (Fig 10).    What is interesting is that the overlapping functions between METTL3 associated mDrGenes and WTAP associated mDrGenes are positive regulations of metabolism and gene expression. In contrast, the overlapping functions between METTL3 targeted mDrGenes genes and METTL14 mDrGenes are mainly negative regulation of metabolism and gene  There are significant overlapping pathways between the three enzymes targeted m 6 A-driven genes and also enzyme specific functions. Two important consistent pathways are cancers and splicing which indicates m 6 A may regulate these pathways through m 6 A-driven genes.
Comparison of top KEGG pathways enriched in the three mDrGenes sets also revealed common as well as methylase specific functions (see Fig 11). Particularly, the consensus func- , suggesting that these mDrGenes may be used as biomarkers of glioma. We also notice that METTL3 mDrGenes are specifically enriched in Melanoma (18 mDrGenes, p Bonferroni = 1.32×10 −4 ). A recent study have demonstrated that mutations within intron 8 of FTO leads to increased melanoma risk [29], suggesting a link between m 6 A and melanoma.
To help reveal the pathways potentially relevant to different modes of m 6 A functions, we checked the overlaps between the enriched pathways in hyper and hypo mDrGenes in KD-FTO and mDrGenes in KD-METTL3, KD-METTL14 and KD-WTAP (S12 and S13 Figs). All 5 groups of mDrGenes are enriched in cell cycle and pathways in cancer, including especially Chronic myeloid leukemia. This further suggests that m 6 A is related to cancer. The overlapping pathways between the hyper-mDrGenes in KD-FTO and those in METTL3/ METTL14 are mainly related to transcription including regulation of transcription and regulation of gene expression. Indeed, it has been shown that m 6 A recruits YTHDF2 protein to regulate mRNA stability [54]. In contrast, the overlapping pathways between the hypo-mDrGenes in KD-FTO and those in METTL3/METTL14 are related RNA splicing. Interestingly, nuclear m 6 A-binding protein YTHDC1 is shown to promote exon inclusion of targeted mRNAs through facilitating mRNA binding of splicing factor SRSF3 [55,56].
We further examined the mDrNets and their subnetworks associated with the enriched biological processes. Several sub-mDrNets for KD-FTO including intracellular signaling cascade, synaptic transmission BP category, and transmission of nerve impulse, are shown in Fig 12. They are consistent with our hypothesis that mDrGenes in the same biological process are interacted with each other closely. That is also the reason why m 6 A-Driver can identify mDrGenes that might not be consistently identified in most RSs due to the biological variance, but have significant biological functions. This underscores the advantage of m 6 A-Driver in addressing variance among different replicates for predicting mDrGenes. For KD-METTL3, KD-METTL14 and KD-WTAP, since they form the m 6 A methyltransferase complex, we integrated the 3 mDrNets and examined the subnetworks associated with the enriched pathways (Fig 13). Similar to KD-FTO, the mDrGenes enriched in the same pathway are closely connected and many mDrGenes undetected as differential m 6 A genes in all RSs are also identified. What is also interesting to notice is that the enriched pathways common in three datasets could be resulted from very different mDrGenes for each dataset, suggesting the 3 m 6 A methylases collaboratively regulate the same pathway through different mechanisms.
Because many m 6 A sites are detected in 3'UTRs that also contain microRNA binding sites. It will be helpful to further examine if the mDrGenes are also enriched in certain microRNA with significant functions. To test this, we download the microRNA-target information from miRTarBase, a database curates experimentally validated microRNA-target interactions [57]. Then we performed Fisher's test to test whether the m 6 Table 2). The information of all targeted mDrGenes by micro-RNA is included in supplementary material (S1-S4 Texts).

Discussion
The MeRIP-seq technology significantly advances the study of m 6 A, enabling profiling m 6 A methytranscriptome for specific cell conditions. However, existing algorithms focus mostly on predicting m 6 A sites from MeRIP-seq data. Although they are powerful tools for MeRIP-seq data analysis, they cannot directly assess the functional importance of these sites and associated genes. To address this shortcoming, we proposed in this paper m 6 A-Driver, a novel algorithm for detecting m 6 A-driven genes and their interaction network. m 6 A-Driver utilizes proteinprotein interaction networks to identify functional meaningful differentially m 6 A methylated genes and overcomes the biases in predicting functional enrichment of sites due to different sample replicates. The comparison on the p-values of top enriched biological processes in the prediction results of m 6 A-Driver and exomePeak demonstrates that m 6 A-Driver could identify mDrGenes that are more functional relevant. In terms of the algorithm, comparison with VarWalker, an algorithm for predicting mutation driver genes, shows that m 6 A-Driver is computationally more efficient and can produce topological and biologically more robust predictions. Furthermore, m 6 A-Driver generates a condition-specific m 6 A-driven network that reveals the detailed functional circuitry underlying the biological condition.
The results on the FTO, METTL3, METTL14 and WTAP knockdown data demonstrated that m 6 A-Driver can address the sample bias in MeRIP-Seq data and identify functional relevant mDrGenes in a robust and efficient fashion. m 6 A-Driver predicted several significant biological progresses and pathways associated with each knockdown dataset and constructed four mDrNet separately regulated by the four m 6 A (de)methylases. Functional enrichment analysis across the four networks showed the involvement of m 6 A in a diverse processes and pathways nodes denote hypo mDrGenes. The node size and edge width are corresponding to the frequency that they appear in different replicates sets. The number labeled on the edge means the times that it reappears in all RS sets. mDrGenes in the same biological process interact with each other closely, which is consistent with our hypothesis. For UNC13B, STX1A and GRID2, although they are not predicted as differential m 6 A genes in all RS, m 6 A-Driver still identify them as mDrGenes, because they have significant biological functions.
doi:10.1371/journal.pcbi.1005287.g012 m 6 A-Driver: Identifying mDrNet including regulating cancer, transcription, and slicing, many of which have been reported in the literature. The presented results in this paper demonstrate that m 6 A-Driver is an effective The orange nodes denote mDrGenes reappearing in at last two consensus networks. The yellow nodes denote mDrGenes specifically from KD-METTL3 mDrNet. The cyan nodes denote KD-METTL14 specific mDrGenes and the green ones denote WTAP specific mDrGenes. The node size and the edge width represent the frequency that they reappear in different replicates sets (RSs). Though enriched in the same pathways but the target mDrGenes are not all the same for the 3 methylation enzymes. Mean that, the 3 methylation enzymes may drive different genes to influence the same pathway. and reliable approach to identify functionally relevant m 6 A-driven genes and networks from MeRIP-Seq data.
We want to point out that the inherent technical limitations of MeRIP-Seq can lead to increased false positives in the predicted mDrGenes and their functional networks. As pointed out earlier in the paper, the sample bias and the impact of library "size factor" from different conditions can negatively impact the quality of the data. How to normalize MeRIP-Seq samples from different conditions is still an open topic that requires additional research. Many useful normalization methods such as the "geometric mean" approach proposed in DESeq can provide valuable guidance. Nevertheless, the results from this and other papers showed that with careful processing and modeling of the current MeRIP-Seq data, many important functions of m 6 A can be predicted. With the continuing improvement of the MeRIP-Seq and related technologies, we expect that m 6 A-Driver should produce increasingly more accurate predictions. In the current m 6 A-Driver algorithm, identification of mDrGenes relies on a reference network and a threshold to determine the candidate genes. So far, only PPI network is considered, and no reference networks for noncoding RNAs are included. As m 6 A is also prevalent in long noncoding RNAs (lncRNAs), constructing a noncoding RNA interaction network and extending m 6 A-Driver algorithm to lncRNAs will enable the study of m 6 A associated functions in lncRNAs. As a future work, we can use the score in step 4 to help construct the consensus network. For a DmMGs network of a biological replicate sets built by step 3, we can use peak calling score to transfer the network to a weighted network. The weight of an edge in the network could be calculated as w ¼ À lgðxÞÀ lgðyÞ 2 , where x and y denote the peak calling FDR (adjusted p-value) of the two gene nodes. Then all edges of the DmMGs networks are pooled together. For each edge, we can calculate the consensus score as s ¼

Materials and Methods
Datasets FTO knockdown (KD-FTO) dataset is a MeRIP-Seq data [27] from the wild-type littermate as well as FTO knockdown mice samples. There are 9 sets of biological replicates in this cohort, and each biological replicate set (RS) contains two IP samples from a FTO knockdown mouse and the wild-type littermate, respectively, as well as two corresponding input samples from the two mice samples. In the original work [27], Hess et al. demonstrated that FTO-knockdown mice have impaired dopamine release, reduced dopaminergic receptor responses and an altered locomotor response to cocaine, which are related to specific m 6 A mRNA demethylation regulated by FTO.
METTL3, METTL14 and WTAP knockdown MeRIP-Seq datasets are from a recent study which reveals that m 6 A regulates mRNA stability [44]. Each of the three cohorts consists of 4 RSs, each of which contains two IP samples from gene knockdown (treated) HeLa cells and untreated HeLa cells, respectively, and two corresponding input samples. The study revealed that knockdown of METTL3 led to decrease the binding of YTHDF2 to its targets, and increase the stability of its target RNAs similar to that of YTHDF2 knockdown.
The reference network, PPI network, is built from the most recent version of PPI data from BioGRID (release 3.4.128, compiled on August 25th, 2015) [45]. Based on the binary interactions, we removed the isolated and self-interactions proteins to establish a PPI network with a total of 16,062 proteins and 152,676 interactions.
Prediction of m 6 A-driven genes and the network using the m 6 A-Driver algorithm m 6 A-Driver predicts mDrGenes and mDrNet from MeRIP-Seq data, where mDrGene is a gene whose mRNA harbors at least one DmM site in a biological context of interest and whose function is relevant to the context. m 6 A-Driver first divides the MeRIP-Seq data into several RSs, each containing 2 paired samples, an IP sample paired with an input sample under the treated condition and another pair under the untreated condition. Then, m 6 A-Driver predicts mDrGenes and mDrNet by the following four steps. The workflow is shown in Fig 1. Step 1. RS-specific prediction of DmMG using exomePeak. This step is designed to predict DmMGs for each RS. A gene is identified as a DmMG if its mRNA harbors at least one hyper-or hypo-methylated m 6 A site in treated samples compared with untreated samples. The DmM sites of a RS are detected by the exomePeak R package [38], which predicts the transcriptome-wide m 6 A hyper-or hypo-methylated sites from the treated and untreated MeRIP-Seq datasets. At the end of this step, a set of DmMGs are obtained for each RS.
Step 2. Prediction of RS-specific candidate DmMGs with the Random Walk with Restart algorithm. The goal of this step is to predict potential DmMGs that not detected in a RS due to the biological variance. RWR simulates a random walker from either a seed node or a set of seed nodes, and moves to its immediate neighbors randomly at each step until it reaches a stable status. In the end, all the nodes in the graph are ranked by the probability of the random walker reaching this node, which denotes the degree of closeness to the seed node(s). Given a connected graph G with N nodes, RWR can be formally described as follows: where p t is an N×1 vector whose ith element represents the probability of the walker traversing to node i at step t, p 0 is the N×1 initial probability vector, M is an N×N transition matrix of the graph, which is the column-normalized adjacency matrix for G, and λ is a fixed parameter, which denotes the restarting probability at a given time step (λ = 0.5 in this study). Generally, if assuming that there are k initial genes from which the walker would start with an equal probability, the initial nodes will have a probability of 1 / k and the remaining nodes will have a probability 0 in p 0 , meaning that p 0 ¼ fp 0 m g ¼ & 1=k; start node 0; otherwise , where μ = 1,2,Á Á Á,n. Then, p t is updated according to (1) iteratively until the difference between p t and p t+1 is below a predefined threshold (10 −6 in this work). For each RS, we iteratively take each DmMG as the starting node, performing RWR to prioritize candidate DmMGs. According to the probability that the walker would reach the node after the RWR enters the stable state, nodes ranked top 10 are held as candidate DmMGs. Previous studies suggested various ways to select candidate nodes including using the most accessible node (i.e., top 1) [58,59], top 5 [60], top 10 [61,62], top 20 [62], and top 100 nodes [63], but no consensus rules have been proposed. In this work, we retained the top 10 accessible nodes to keep a balanced tradeoff between choosing too few informative genes (e.g. top 1) and too many irrelevant genes [46]. The threshold, top 10, is only an empirical primary filter of nodes to select only the most highly accessible nodes from the starting node (DmMG); the candidate DmMGs will be further assessed by their topological and biological significance in the next step.
Step 3. Topological and functional significance-based evaluation of the candidate DmMGs. The goal of this step is to extract functionally and topologically significant candidate DmMGs from the PPI network. To evaluate the topological significance of the candidate nodes in the PPI network, Jia et al. proposed a strategy that utilizes randomized PPI networks that maintain the topological characteristics of the original network (e.g., degree of each node) to estimate whether the candidate target genes defined as gene interact with target gene (i.e., DmMG identified in step 1 in this paper) are identified by chance [46]. To this end, 100 random PPI networks, each of which preserves the degree distribution of the original PPI network, are first generated through the switching algorithm proposed by Moli et al. [64]. Then, RWR is performed, using each of DmMG as a starting node, in each of the 100 random networks and the top 10 nodes with the highest probabilities are extracted. Finally, for each DmMG, the 10 candidate nodes in the original network, g 1 ,g 2 ,Á Á Á,g 10 , are assessed by computing an empirical p-value: p ¼ #fpðg n Þg 100 , where π(g ν ) is a random network in which g ν (ν = 1,2,Á Á Á,10), is found as the top 10 candidate genes after the restart random walk starting from the target DmMG. The empirical p-value is calculated as the probability of a candidate DmMG to be randomly selected. According to the study of Jia et al., the candidate DmMGs with p < 0.05 should be retained as the significant candidate DmMGs.
Note that the random PPI network maintains only the degree distribution of the original network, indicating that the degree is the only property of nodes in the random network. Since the rank of a gene in a random network is determined by its degree and the initial node's degree, it is possible that this degree-based filter of candidate DmMGs may remove functional significant genes by mistake. Meanwhile, the randomization-based strategy may remove different candidate genes for the same DmMG when adopting different sets of random PPI networks. In addition, performing RWR for each DmMG of each sample in each of the 100 networks will take a long time.
To address these issues, we combine the degree distributions of top 10 genes prioritized by starting nodes with different degree in random networks and use the shortest path between candidate DmMG and DmMG to assess candidate DmMGs. Although PPI network consists of 16,062 nodes, these nodes only hold 312 kinds of degree d i (d i 2 [1,1975], i = 1,2,Á Á Á,312). For each kind of degree, we randomly selected a node with degree d i as the starting node, denoted as Seed i (i = 1,2,Á Á Á,312), to initiate the random walk in each random network and retained the top 10 nodes with the highest probabilities. After assessing all the 100 random networks, a candidate DmMGs set S j (j = 1,2,Á Á Á,312) can be formed. The normal occurrence of gene with degree d i in S j is calculated as: freq ij ¼ # random occurrence # PPI occurrence where "random occurrence" is the counts of genes with degree d i selected as top 10 across the 100 random networks, that is, the counts of genes with degree d i in S j , and "#PPI occurrence" is the counts of genes with degree d i in PPI network. Finally, we compute an empirical p-value: p ij ¼ freq ij 100 for the gene with degree d i in S j , which indicates the probability of the gene to be randomly selected [46]. Moreover, to make the filter more robust, we formulate a function p ij = f j (d i ) to estimate the empirical p-value of gene with degree d i in S j , and then solve it using the locfit algorithm [65]. Fig 14 shows the empirical p-value as a function of the degree, which is used to select top 10 genes in the random network initiated by the starting nodes with different degree. As is shown that the empirical p-value monotonically increases along with the increase of top 10 genes' degree and the rate of increase is higher if the initial node's degree is larger. This implies that filtering candidate genes based only on the empirical p-values could remove candidate genes with biological significance.
To filter the candidate genes of a DmMG by both their topological and functional significance, we first compute the empirical p-value of candidate DmMGs. For a candidate DmMG with degree d i , the p-values calculated by the regression function, p ij = f j (d i ). Then, we calculate the shortest path L between candidate DmMGs and the corresponding DmMG in PPI network, which assesses the functional significance of the candidate DmMG. Finally, we retain candidate DmMGs with p < 0.05 or L = 1 as significant candidate DmMGs. The procedure is iteratively performed to candidate genes of each DmMG in each RS.
Step 4. Construction of a consensus m 6 A-driven gene network. The goal of this step is to extract mDrGenes from DmMGs identified in each RS. After detecting DmMGs and their candidate genes in each RS, all significant interactions from different RS are pooled together, forming a universal candidate edges pool. Furthermore, we required both proteins involved in an interaction to be encoded by DmMGs, meaning that a pair of DmMGs and its candidate genes in the pool could be either identified in the same RS or in different RS set. We extracted edges that consistently occur in every RS, and the DmMGs involved in the edges were identified as m 6 A-driven genes. Then all the eligible edges constitute the consensus mDrNet. The selected mDrGenes are DmMGs identified in at least one RS, which means they contain the information from original data, but they are not required to reappear in every RS to avoid sample bias. Also, mDrGenes are identified by consensus reappearance across all RS to enhance their reliability. Moreover, mDrGenes are closely interacting in the PPI network, implying significant functions.   [46]. The top 10 genes of a random network denote the gene that rank top 10 in the random network initiated by the starting node with a specific degree. We can see that the p-value monotonically increases along with the increase of top 10 genes' degree and the growth rate is enhanced when initial node's degree is larger. We show here a heat map depicting the GO biological process (BP) categories most enriched in m 6 A-driven genes identified in KD-FTO dataset. The enrichment analysis is conducted for the hyper and hypo m 6 A-driven genes respectively using DAVID. The FTO targeted hyper m 6 A-driven genes closely link to synaptic transmission and cell-cell signaling. And we also find several other significant biological processes and genes regulated by m 6 A such as embryonic development and neuron differentiation. Thus demonstrates m 6 A-Driver could identify biological functionally significant m 6 Adriven genes. (TIFF)

S7 Fig. Pathways regulated by FTO.
We show here a heat map depicting the KEGG categories most enriched in m 6 A-driven genes identified in KD-FTO dataset. The enrichment analysis is done to the hyperand hypo m 6 A-driven genes respectively using DAVID. The m 6 A-driven genes are significantly enriched in cancer related pathway and some specific cancer such as chronic myeloid leukemia and Glioma which suggest RNA methylation may play a role in cancer. (TIFF) S8 Fig. Binary biological processes regulated by FTO using the brain tissue specific expressed genes as control data. We show here a binary map depicting the GO biological process (BP) categories enriched in m 6 A-driven genes identified in KD-FTO experiment. The enrichment analysis is conducted for the hyper and hypo m 6 A-driven genes respectively using DAVID and adopting the brain tissue specific expressed genes as control data.Brain tissue specific expressed genes are genes who have a RPKM value over 1 in at least half of the input samples, including treated and untreated ones. (TIFF) S9 Fig. Binary pathways regulated by FTO using the brain tissue specific expressed genes as control data. We show here a binary map depicting the KEGG categories most enriched in mDrGenes for KD-FTO dataset. The enrichment analysis is conducted for the hyper and hypo m 6 A-driven genes respectively using DAVID and adopting the brain tissue specific expressed genes as control data. Brain tissue specific expressed genes are genes who have a RPKM value over 1 in at least half of the input samples, including treated and untreated ones. (TIFF) S10 Fig. Biological processes regulated by methyltransferase complex. We show here a heat map depicting the GO biological process (BP) categories most enriched in m 6 A-diven genes identified in KD-METTL3, KD-METTL14 and KD-WTAP using DAVID. There are significant overlapping biological processes between the three enzymes targeted m 6 A-driven genes and also enzyme specific functions, suggesting that different methylation enzymes may influence different biological processes via driving different genes. (TIFF) S11 Fig. Pathways regulated by methyltransferase complex. We show here a heat map depicting the KEGG categories most enriched in m 6 A-driven genes identified in KD-METTL3, KD-METTL14 and KD-WTAP using DAVID. There are significant overlapping pathways between the three enzymes targeted m 6 A-driven genes and also enzyme specific functions. Two important consistent pathways are cancers and splicing which indicates m 6 A may regulate these pathways through m 6