A novel biclustering algorithm for mining m6A co-methylation patterns based on beta-binomial distribution and data screening strategy

Zhaoyang Liu; Yuteng Xiao; Dao Xiang; Hao Shi; Kaijian Xia

doi:10.1371/journal.pcbi.1014430

Abstract

Studies have shown that m⁶A plays a key role in different life processes such as RNA metabolism, physiology and pathology. However, due to the complexity of life processes, its specific regulatory details are still not revealed. The computational approach based on co-methylation pattern mining of m⁶A sequencing data can assist in revealing its mechanism and save time and economic cost, however, the current algorithms suffer from the problems of insufficient robustness to low signal-to-noise data and unreliable performance. Based on this, this paper proposes an enhanced beta-binomial distribution biclustering algorithm (EBBM) based on data screening strategy. This algorithm is based on the framework of Bayesian, adopts Gibbs sampling method for parameter inference, and introduces the data screening strategy in the process of parameter inference, which effectively removes the problem that the low signal-to-noise data in the original sequencing data of m⁶A affects the reliability of the clustering results. The simulation experiment results show that this algorithm can effectively deal with the interference of low signal-to-noise data and accurately mine the co-methylation patterns pre-planted in the data, which is significantly better than the current mainstream biclustering algorithm. In real human m⁶A sequencing data with 32 samples, this algorithm mined two effective co-methylation patterns, which were enriched to different biological processes, such as negative regulation of phosphorylation and peptidyl lysine methylation, etc. The scoring results of GEO_Score indicate that the results of this algorithm are more biologically meaningful than the clustering results of current mainstream m⁶A co-methylation pattern mining algorithms.

Author summary

Methylation of RNA molecules—specifically a modification known as m⁶A—plays a crucial role in how our cells function, influencing everything from normal development to diseases like cancer. However, studying these modifications is challenging because the sequencing technology used to detect them produces noisy data, making it difficult to distinguish genuine biological signals from technical errors. We developed a new computational approach called EBBM that tackles this problem head-on. Our method works like a smart filter, simultaneously analyzing two complementary datasets generated by sequencing experiments. By incorporating a statistical model that accounts for the unique characteristics of this data, EBBM can identify patterns of co-methylation—groups of RNA sites that are modified together under specific conditions—while effectively discarding sequencing noise. When we tested EBBM on both simulated and real human data, it significantly outperformed existing methods. It successfully uncovered biologically meaningful co-methylation patterns that were linked to processes like cell differentiation and gene regulation. Our work provides researchers with a more reliable tool for studying RNA modifications, potentially accelerating discoveries about how these modifications contribute to health and disease, and opening new avenues for therapeutic development.

Citation: Liu Z, Xiao Y, Xiang D, Shi H, Xia K (2026) A novel biclustering algorithm for mining m⁶A co-methylation patterns based on beta-binomial distribution and data screening strategy. PLoS Comput Biol 22(6): e1014430. https://doi.org/10.1371/journal.pcbi.1014430

Editor: Arli Aditya Parikesit, Indonesia International Institute for Life Sciences, INDONESIA

Received: January 8, 2026; Accepted: June 10, 2026; Published: June 23, 2026

Copyright: © 2026 Liu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was supported by the Xuzhou Science and Technology Program, Jiangsu Province, China (No. KC23257 to ZYL); the Suzhou Key Medical Disciplines (Health Information Management) (No. SZXK202528 to KJX); the Shandong Provincial Natural Science Foundation (No. ZR2024MF134 to KJX, No. ZR2024QF053 to YTX); and the Changshu Key Laboratory of Medical Artificial Intelligence and Big Data (Nos CYZ202301 and CS202314 to KJX). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

N⁶-methyladenosine (m⁶A) methylation is the most abundant epitranscriptomic modification in eukaryotic RNA, which dynamically regulates gene expression through the “write-erase-recognize” mechanism, and plays a key role in different life processes, including RNA metabolism, physiology and pathology. For instance, m⁶A is involved in RNA splicing, export, stability, translation and localization [1,2]. It also regulates circadian rhythms, adipogenesis, spermatogenesis, embryonic stem cell self-renewal and differentiation, neurodevelopment, and neuronal diseases [3–5]. Aberrant m⁶A regulation has also been associated with various cancers including acute myelogenous leukemia [6,7], breast cancer [8–10], pancreatic cancer [11], gastric cancer [12,13], prostate cancer [14], renal cancer [15], mesothelioma [16], sarcoma glioblastoma [17,18], mesothelioma [19,20], lung cancer [21], and hepatocellular carcinoma [22–31]. Depletion of METTL3 is known to lead to apoptosis and reduce cancer cell invasiveness [32,33], while hypoxia-activated ALKBH5 leads to cancer stem cell enrichment [34]. As FTO is a key regulatory gene for energy metabolism and obesity, studies of single nucleotide polymorphisms in FTO have been shown an association with body mass index as well as with the development of obesity and diabetes mellitus in humans [35–39]. In addition, it has also been suggested that FTO affects preadipocyte differentiation [40–42].

However, due to the complexity of life processes, their specific regulatory details remain unrevealed. Currently, with the development of Methylated RNA Immunoprecipitation with Next Generation Sequencing (MeRIP-Seq) [43] and High-Throughput Sequencing [44], more and more methods of wet experiments are being used to detect relevant m⁶A methylation modification in tissues or tumors, and used to annotate the modification changes of m⁶A under different conditions to gradually reveal its functional mechanism. MeRIP-Seq technology is mainly based on the principle of chromosomal immunoprecipitation (IP), utilising specific antibodies to enrich RNA fragments with m⁶A modification and combining this with high-throughput sequencing to locate the methylation sites. Total RNA is first extracted from excised tissue or tumor cells and divided equally into two parts, one part is then randomly fragmented randomly into small fragments of 100–200 nt using chemically or enzymatically methods. This fragmented RNA is then incubated with the fragmented RNA using an anti-m⁶A antibody that specifically binds to methylation sites, and the methylated RNA fragments are enriched by capturing the antibody-RNA complexes by magnetic beads (e.g., Protein A/G) that elute unbound RNA. Finally, the immunoprecipitated RNA (IP) sample and another portion of unenriched control RNA (input) sample are reverse transcribed, library constructed and sequenced separately. Therefore, MeRIP-Seq sequencing technology ultimately describes the m⁶A modification by IP and input samples, which is significantly different from the traditional RNA sequencing technology. On the basis of quality control and data comparison of the sequencing data, the m⁶A modification peaks have to be identified by comparing the signal difference between the IP and input samples, and then analyzing the distribution of the methylation sites in conjunction with gene annotation. Therefore, traditional wet-lab methods can be effective for methylation profiling, yet these methods tend to have high economic and time costs [45]. In recent years, a large amount of sequencing data has been accumulated from biological wet experiments under different conditions. The methods of computation developed based on these data combined with artificial intelligence technology play an increasingly important role in revealing its regulatory laws, and the application of computational methods not only saves economic costs, but also greatly accelerates the study of the functional mechanism of m⁶A [46]. In the exploration of computational methods for m⁶A regulatory law revealing studies, many important works have been proposed based on different levels. Firstly, a series of databases were constructed based on existing biological wet experiments, which paved the necessary research foundation for the study of computational methods. Such as RMBase [47,48], MeT-DBV2.0 [49], m6Avar [50],m6A2Target [51] and m6A-Driver [52]. The above databases cover multi-dimensional research needs from basic mechanisms to clinical translation, and from common transcripts to non-coding RNAs, which provide powerful data support for precisely analyzing the biological functions of m⁶A. On the basis of the above data, a series of tools and prediction algorithms have been proposed, and the following are some representative tools. sRAMP [53] proposed by Cui et al. based on sequence features (K-mer frequency, RNA structure) can perform accurate prediction of m⁶A sites on human and mouse mRNAs. Meng et al. proposed a Whistle [54] model based on the random forests and integration of MeRIP-seq, RNA-seq and CLIP-seq data predicted tissue-specific m⁶A sites. Cui et al. recently proposed another combined framework deepSRAMP [55] based on the transformer architecture and recurrent neural networks, whose prediction accuracy is greatly superior to other state-of-the-art prediction tools such as WHISTLE.

The series of site prediction algorithms constructed above try to decompose the spectrum of m⁶A modification under various conditions at the computational level, and then analyze the correlation between the genes where the site are located and diseases, so as to construct the interaction network between m⁶A and diseases. However, the process of dynamic modification of m⁶A methylation, as well as the occurrence and development of different life processes such as diseases, is a complex and comprehensively regulated by various factors, and it is necessary to explore its potential laws from multiple dimensions, such as time and space, in order to reveal the mechanism of m⁶A modification that drives the progression of diseases through the spatio-temporal specific regulation of gene expression network, and to provide a theoretical basis for precision intervention.

In recent years, the study of m⁶A co-methylation pattern mining has played an increasingly important role in the study of its temporal and spatial synergistic and specific regulatory functions. Co-methylation pattern refers to the phenomenon that multiple m⁶A sites are densely distributed on mRNA molecules at the same time point, which may synergistically regulate RNA stability and translational efficiency, or some non-coding RNAs (e.g., IncRNA MALAT1, circRNA CDR1as) share methylation regulatory elements with mRNAs, which may affect the activity of methylase-reading proteins through the mechanism of “competitive binding”, forming a trans-RNA regulatory network [56–60]. The occurrence of these co-methylation patterns can be over-expressed or under-expressed at the m⁶A modification level at the same time. Therefore, these co-methylation patterns can be mathematically modeled as clusters, and the sites contained in the clusters show the same or similar methylation modification trends under certain conditions, while the clusters differ from each other greatly.

In the study of mining m⁶A co-methylation patterns, Liu et al. first proposed the concept of co-methylation patterns based on MeRIP-Seq data [56]. Based on the work of Liu et al. Cui et al. proposed a hierarchical graph model for clustering peaks in MeRIP-Seq data based on the distributional characteristics of m⁶A high-pass sequencing data [61]. Subsequently Chen et al. proposed another hierarchical clustering algorithm for m⁶A-Seq co-methylation patterns based on threshold weighting in response to the problem of introducing noise in calculating methylation levels [62]. The algorithm modeling reasonably assumes that the modification intensity of m⁶A sites is expressed by dividing the immunoprecipitated samples (IP_s) characterizing m⁶A modifications by the sum of IP_s and input control samples (input_s), which is also known as the methylation level, i.e., the methylation level of the site s is expressed as IP_s/(IP_s+input_s). To address the problem that when a site has 100 IP reads and 1 input reads under the corresponding conditions, and the methylation level with 1 IP reads and 0 corresponding input reads are both 1, they are all treated as hypermethylated in the clustering process for the calculation, the method of designing the weights is used to integrate the effect of the gene expression into the process of calculation. In this framework, unreliable measurements with a small number of reads counts will be given a smaller weight, while reliable measurements with a larger number of reads counts will be given a larger weight, thus solving the problem of such noise affecting the reliability of the results. Zhang et al. have also recently proposed a nonparametric beta-binomial mixing model, DPBBM, based on the distributional characteristics of the m⁶A data [63]. This model can automatically determine the number of clusters. Liu et al. constructed a single clustering algorithm MBMM based on the beta mixture model using the framework of the EM algorithm and the parameter inference method of moment estimation [64], and mined seven effective co-methylation patterns on human m⁶A modification data. The above algorithms discovered co-methylation patterns hidden in MeRIP-Seq data at a certain level. However, all of the above clustering algorithms are traditional single clustering methods, and they define co-methylation patterns in a relatively simple way, assuming that each co-methylation pattern must contain all sample conditions, and that each site must and only be assigned to one co-methylation pattern. However the above assumptions are biologically unrealistic [65]. Due to enzyme-specific regulation and condition-specificity effect, it is possible that some sites are only locally co-methylated under some conditions. Not all sites have to be assigned to the co-methylation pattern, and it is also possible that there is overlap between co-methylation patterns in terms of sites or conditions. Therefore, traditional single clustering algorithms cannot tap these local co-methylation patterns, and the biclustering algorithms emerged in recent years can solve this problem.

In the study of m⁶A local co-methylation pattern mining by biclustering, based on the m⁶A methylation level data, Zhang et al. proposed an ISA biclustering algorithm based on RNA expression level weighting, REW-ISA [60] and a weighted Plaid biclustering algorithm based on Lagrange multiplier method, FBCwPlaid [66], respectively, on the basis of ISA algorithm and the plaid model, the first to achieve m⁶A local co-methylation pattern mining and discover the potential functional patterns of m⁶A. On the basis of this work, Liu et al. modeled from the dimension of the distributional features of the MeRIP-Seq data, and proposed a beta-mixture distribution-based biclustering algorithm, BDBB [57], which mined two effective local co-methylation patterns on human m⁶A modification data.

The above single clustering and biclustering methods have mined effective co-methylation patterns from different perspectives. However, in terms of data processing, they all transform the original MeRIP-Seq sequencing data, i.e., the matrix of reads in the Ip and input samples is computationally converted into a single methylation level data matrix, that can be handled by algorithm, and such a transformation inevitably introduces noise. Although each algorithm adopts relevant noise reduction methods to a certain extent, it will inevitably affect the accuracy of the results. To address this problem, Liu et al. proposed a biclustering algorithm BBM [58] based on the beta-binomial distribution, which can operate on both matrices of IP and input samples at the same time and mine the co-methylation patterns. BBM defines the biclustering model under the Bayesian framework, adopts Gibbs sampling methods for parameter inference, it realizes that the biclustering operation can be carried out directly on the data of IP samples and input sample reads, by reasonably assuming that the number of IP samples reads follows a beta-binomial distribution, which avoids the problem of noise introduced by biclustering algorithms such as BDBB due to the need of methylation level calculation. BBM overcomes to some extent the problem of noise introduced by traditional methods, which is caused by the computation of methylation levels. However, due to the inherent errors in sequencing technology, measurement data with a low IP/input ratio are often considered noise, the sites may not be methylated in such cases, and it is inappropriate to describe the methylation modification of the sites using this measurement data. BBM does not consider the elimination of such noise effects. To obtain reliable measurement data, the IP/Input ratio is typically used to distinguish methylation signals from background noise, a methodological basis found in the m⁶A MeRIP-seq field. This method assumes that after immunoprecipitation, the RNA fragment containing the true methylation site should be significantly enriched in the IP sample; therefore, the number of reads in the IP sample should be higher than in the Input sample. By calculating the ratio (or derived indicators such as the signal-to-noise (S/N) ratio), the signal can be distinguished from background noise. In the pioneering paper, m⁶A peaks were identified by setting an enrichment threshold of more than fourfold [67]. In [68], while optimizing the refined RIP-seq protocol, Zeng et al. explicitly used the S/N ratio (S/N = (positive region IP ÷ Input) ÷ (negative region IP ÷ Input)) to assess data quality, achieving a signal-to-noise ratio of approximately 100-fold under optimized conditions. The MeTPeak algorithm includes a built-in condition for the IP/Input ratio: ‘mean ratio = IP/(IP+Input) > 0.5’, which is equivalent to IP/Input > 1 [69]. The MeTDiff R package uses ‘FOLD_ENRICHMENT = 1’ by default, requiring that the IP/Input enrichment fold be ≥ 1 [70]. In summary, “IP/Input > 1” is the minimum consensus standard for identifying methylation sites in this field. Typically, different laboratories use different parameters, with 2X and 1.5X enrichment being the most common. However, there is currently no complete standardization. Typically, “1.5X” can be regarded as an empirical conservative value based on this standard. It serves as an industry convention or the default parameter for specific analytical workflows. Furthermore, the sensitivity analysis of threshold data selection in this study also shows that 1.5X is a relatively ideal choice, as it can effectively remove noise while retaining most of the biologically meaningful patterns.

To address above problem, this study selected a threshold of 1.5 times to remove noise and constructs an Enhancing Beta-binomial-distribution biclustering algorithm (EBBM) based on data screening strategy to achieve the mining of local co-methylation patterns of m⁶A. The EBBM can guide the data flow by introducing an algebraic approach into the construction of statistical models, thus achieving the purpose of effective noise removal. Simulation studies show that EBBM can effectively identify the sequencing noise hidden in MeRIP-Seq simulation data. Recovery and relative scoring results show that EBBM is significantly better than the current mainstream co-methylation pattern mining algorithms. On the real IP and input samples reads counts data, EBBM found two effective m⁶A local co-methylation patterns, and the data heatmap shows that their average methylation levels are both greater than 0.6, indicating that EBBM effectively removes the influence of MeRIP-Seq sequencing noise. The GOE_Score scoring results show that EBBM’s scores are significantly higher than those of the current mainstream algorithms, indicating that the patterns found by EBBM contain less noise, and its results are more biologically meaningful than the current mainstream clustering results.

This study makes three significant contributions:

1) Methodological innovation: We develop a novel algebraic-statistical hybrid model under a Bayesian framework, implementing an efficient Gibbs sampling algorithm that establishes a new computational paradigm for MeRIP-Seq data analysis.
2) Technical advancement: Our approach demonstrates superior noise-reduction capabilities, effectively addressing the critical challenge of signal-to-noise ratio improvement in epitranscriptomic profiling.
3) Biological insight: Through rigorous validation, we identify two co-methylation patterns that show higher biological consistency than predictions from existing methods.

2. Results

2.1. Simulation data experimental analysis

In the experiment, we first simulated and generated experimental data for 200 sites under 30 experimental conditions, including IP sample reads count data and input sample reads count data. Three biclusters were embedded in the data. In order to better simulate the real-world scenario of m⁶A methylation modification, the parameters of the binomial distribution followed by the three biclusters, as well as the size of the generated data, were repeatedly adjusted to make the characteristics of the simulated data distribution as similar as possible to those of the actual MeRIP-Seq data. The final binomial distribution parameters for the three biclusters were determined to be 0.65, 0.94 and 0.75 respectively. The binomial distribution parameter followed by the background was 0.5. The three biclusters contained 80, 8, and 80 sites, respectively, and in terms of sample conditions, they contained 13, 8, and 12 sample conditions, respectively. The first and second biclusters had five conditions overlap and the second and third biclusters had three conditions overlap, as shown in Fig 1a. The overall distribution characteristics of the simulated data are shown in Fig 1b, and those of the real data are shown in Fig 1c.

Download:

Fig 1. Comparison of Statistical characteristics between simulated data and real data.

a Heatmap of simulated methylation level data. b Histogram of simulated methylation level data. c Histogram of real methylation level data.

https://doi.org/10.1371/journal.pcbi.1014430.g001

Subsequently, the EBBM model was applied to the aforementioned simulated dataset, with the initial iteration count set to 1000, the burn-in count set to 500, the predefined number of biclusters set to 10, and the intra-chain variance threshold set to 0.1. Finally, the EBBM model output three biclusters. When examining the sites and conditions contained in the three biclusters, it was found that the predefined three biclusters were accurately reproduced, and the overlaps in conditions between them were also accurately reproduced. When checking the values of the binomial distribution parameter output by them, it was found that they also approach the ground-truth, specifically 0.6469045, 0.9419758, and 0.7525028.

To further validate the clustering performance of EBBM on simulated data, a convergence check was performed, as shown in Fig 2. The first row of Fig 2 shows the historical trace plots of the binomial distribution parameters for the three biclusters, while the second row shows the historical trace plots of their likelihood values. As can be seen from the figures, the parameter values and likelihood values of the binomial distribution for the three biclusters identified by EBBM on simulated data fluctuate around their means as the number of iterations increases, without showing any obvious trends or periodicity. Additionally, for each bicluster, its corresponding parameter values and likelihood values converge almost simultaneously, indicating that the algorithm has indeed reached a converged state.

Download:

Fig 2. Convergence check trace-plot of simulation experiment results.

a and d represent the convergence of bicluster1. b and e represent the convergence of bicluster2. c and f represent the convergence of bicluster3.

https://doi.org/10.1371/journal.pcbi.1014430.g002

Additionally, the posterior distribution of parameter of the binomial distribution followed by the bicluster is beta-distributed. According to the data screening strategy, as the number of iterations increases, its shape parameters should gradually converge toward the property , meaning that its two shape parameters and should gradually satisfy the property during the iteration process. The mean of the beta distribution corresponding to each iteration should also gradually move toward a value greater than 0.6. Therefore, the distribution of this parameter was examined in the experiment as the number of iterations increased. Considering the large total number of iterations, the first six iterations were selected for each of the three biclusters discovered by EBBM, and then 10 iterations were randomly selected to observe the distribution of the corresponding parameters. As shown in Fig 3, it can be seen from the figure that the mean values of the parameter distributions corresponding to the three biclusters all move rapidly toward values greater than 0.6 and quickly converge to their target distributions. As the number of iterations increases, their corresponding distributions almost completely overlap with their target distributions. This indicates that the data screening strategy can indeed guide the parameters of the binomial distribution toward the desired attributes, thereby mitigating the impact of sequencing errors introduced by MeRIP-Seq.

Download:

Fig 3. The beta distributions which parameters

follow in simulation data experimental result moves with the increase of the number of iterations. a, b, and c represent the movement of the distribution of the parameter corresponding to bicluster1 as the number of iterations increases. d, e, and f represent the movement of the distribution of the parameter corresponding to bicluster2 as the number of iterations increases. g, h, and i represent the movement of the distribution of the parameter corresponding to bicluster3 as the number of iterations increases.

https://doi.org/10.1371/journal.pcbi.1014430.g003

Subsequently, the experiment selected commonly used mainstream biclustering algorithms and applied them to the simulated data, comparing their clustering results with those of EBBM to evaluate the clustering performance of EBBM on the simulated data.

The scoring criteria in the experiment were selected based on commonly used evaluation indicators for biclustering algorithms, namely recovery and relevance scores, as shown in Eq. (1) and (2).

(1)

(2)

Where, and are the numbers of methylation sites in A and B, respectively. and are the condition numbers in A and B, respectively. and represent the number of elements in A and B, respectively. A represents the true bicluster, while B represents the predicted bicluster. is the set of true biclusters, and is the set of predicted biclusters. Eq. (1) indicates the model’s ability to recover biclusters, i.e., the recovery score. Eq. (2) indicates the correlation between the model’s predicted biclusters and the true biclusters, i.e., the relevance score. The closer the values of the recovery score and relevance score are to 1, the better the clustering effect. Since mainstream biclustering algorithms all require inputs to be single data matrices, therefore, the traditional computational methods was first employed to convert the simulated IP sample reads count matrix and the input reads count matrix into a single data matrix approximating methylation levels. Then, this matrix was fed into various clustering algorithms for cluster analysis. Finally, the experimental results of EBBM and commonly used mainstream biclustering algorithms are shown in Fig 4. As can be seen from Fig 4, without the addition of simulated data with low methylation levels, the recovery scores and relevance scores of EBBM and BBM algorithms are comparable, but significantly better than the other five biclustering algorithms. This is understandable. First, from a theoretical perspective, the process of generating simulated data is consistent with the modeling principles of EBBM and BBM. In the absence of noise, the data screening strategy of EBBM does not work, so the scores of the two are comparable, while other biclustering algorithms use different algorithmic principles. ISA adopts a threshold optimization strategy, so its clustering performance is relatively better than other algorithms. Secondly, the five biclustering algorithms used in the experiment all used methylation level data as input data. According to traditional calculation methods, methylation level data actually contains noise, but the modeling process of these five methods did not consider noise reduction treatment, so their clustering performance was relatively poor on this simulated data. In addition, the experimental results also indicate that EBBM is more suitable than current mainstream biclustering algorithms for mining m⁶A epigenetic transcriptomics data.

Download:

Fig 4. Comparison of clustering results between EBBM and five mainstream biclustering algorithms on simulated data.

https://doi.org/10.1371/journal.pcbi.1014430.g004

To further evaluate the performance of this data screening strategy in eliminating noise introduced by MeRIP-Seq sequencing errors, it is considered adding noise to the simulated data by embedding a low-expression bilcluster while maintaining its overall distribution characteristics similar to that of the true MeRIP-Seq data.

To this end, the experiment simulated the reads count data of IP samples and input samples at 200 sites across 30 experimental conditions. Three biclusters are embedded in the data, with sizes of , , and , respectively, and the remaining part is background. The parameters of their binomial distributions are 0.3, 0.98, 0.66, and 0.5, respectively. Among them, the bicluster with = 0.3 is the embedded noise data. The three biclusters do not overlap in site. In terms of conditions, the first two biclusters have five conditions overlap, while the second and third biclusters have three conditions overlap. The heat map of the converted methylation level data, the simulation data histogram, and the comparison of the real data histogram are shown in Fig 5.

Download:

Fig 5. Comparison of Statistical characteristics between simulated data and real data.

a Heatmap of simulated methylation level data. b Histogram of simulated methylation level data. c Histogram of real methylation level data.

https://doi.org/10.1371/journal.pcbi.1014430.g005

Subsequently, the above simulated data was fed into EBBM, with the initial iteration count set to 1000, the burn-in count set to 500, the predefined number of biclusters set to 10, and the intra-chain variance threshold set to 0.1. Finally, EBBM output two biclusters. The first bicluster contained 10 sites, and the second bicluster contained 80 sites. In terms of conditions, the first bicluster contained 8 conditions, and the second bicluster contained 12 conditions. Both bicluster 1 and bicluster 2 contained sites and conditions that were completely consistent with the actual situation. Additionally, the two overlap conditions in bicluster 1 and bicluster 2 were accurately reproduced.

When examining the relevant parameters of the two biclusters and the elements they contain, it was found that the mean value of parameter for the binomial distribution corresponding to the first bicluster is 0.9749736. The mean value of parameter for the binomial distribution corresponding to the second bicluster is 0.6580733. Both values all approach the ground-truth. Their changes with increasing iteration counts are shown in Fig 6. Fig 6 indicates that as the iteration count increases, the two parameter values fluctuate steadily within a certain range, showing no obvious trend or periodicity, which suggests that the algorithm has converged.

Download:

Fig 6. Trace-plot of parameter

of EBBM clustering result on simulated data.

https://doi.org/10.1371/journal.pcbi.1014430.g006

Additionally, the recovery and relevance scores were examined in the experiment, as shown in Fig 7. Fig 7 shows that when noise is added, the recovery and relevance scores of EBBM are 0.998 and 0.997, respectively, with performance nearly equivalent to that on simulated data without noise. However, BBM’s recovery and relevance scores dropped significantly on noisy data, reaching 0.66 and 0.67, respectively, which are 0.338 and 0.327 lower than EBBM’s scores. This indicates that BBM’s performance in noise removal is far inferior to EBBM’s, and EBBM’s data screening strategy is indeed effective in removing noise on simulated data.

Download:

Fig 7. Comparison of clustering results of EBBM, BBM and other 5 mainstream biclustering algorithms on simulated data with noise.

https://doi.org/10.1371/journal.pcbi.1014430.g007

2.2. Real data experimental analysis

High-precision datasets are a prerequisite for reliable results. Our previous research have construct a data set of 69,446 sites under 32 conditions, based on 6 single-base resolution mi-CLIP and m6A-CLIP experimental and 32 MeRIP-Seq experimental [64], and the majority (about 80.66%) of our sites are supported by at least one of the two high-confidence reference databases, GLORI [71] and m6A‑Atlas v2.0 [72]. This data set is also utilized in this study. The related material details to the data are shown in Table 1.

Download:

Table 1. Data Sets Used In This Study.

https://doi.org/10.1371/journal.pcbi.1014430.t001

All the orginal MeRIP-Seq data are downed form the GEO database. In this study, we followed the method of [80] to quantify the raw data. All the raw data were aligned to the human reference genome of hg19 with Tophat2 [81] to generate the BAM file; then, R script we wrote was run to get the number of reads at each site, normalized to the Fragments Per Kilobase of Transcript Per Million (FPKM) statistics of the site. Since sites with smaller variance under different experimental conditions exhibit more stable methylation modifications under these conditions, meaning that these sites are essentially unaffected by regulatory factors, they cannot be considered as candidate sites for co-methylation patterns. Therefore, variance screening must be performed on m⁶A methylation level data in experimental designs using real data. By visualizing and analyzing the variance distribution of methylation levels at different experimental conditions, it was found that the variance distribution of the sites ranged from 0 to 0.15, with the majority concentrated between 0 and 0.05. To remove low-variance noise sites while preserving most of the sites, different thresholds were tested for variance screening in the experiment. It was ultimately found that when the threshold was set to 0.03, low-variance noise sites were removed while retaining most of the site information. After screening, ultimately, 30,838 index of sites were obtained, and then the IP reads count matrix and input reads count matrix for 30,838 points were obtained. The heatmap of methylation level of 30838 sites across 32 conditional samples as shown in Fig 8.

Download:

Fig 8. Heatmap of the remained sites across the 32 conditions after variance screening.

https://doi.org/10.1371/journal.pcbi.1014430.g008

The above two matrices was then fed into the EBBM model, the number of initial iterations was set to 1,500, the number of burn-in iterations to 500, the intra-chain variance to 1e-06, and the number of initial biclusters to 15. Finally, the model output two patterns containing 22,942 and 2,141 sites, respectively, with condition numbers of 27 and 6. When these two patterns were visualized as methylation levels, the data heatmap is shown in Fig 9.

Download:

Fig 9. Heatmap of methylation level data of EBBM clustering results on real data.

https://doi.org/10.1371/journal.pcbi.1014430.g009

Fig 9 shows that the average methylation levels of the two clusters identified by EBBM in the real data are 0.79 and 0.82, respectively. These values are significantly higher than the methylation levels of the clustering results without data screening strategies. This indicates that this method can effectively remove noise introduced by errors in the MeRIP-Seq sequencing technology. However, Fig 9 also shows that the methylation levels of the sites contained in columns 6, 9, 16, and 27 are significantly low. This is because, during the modeling process, only the sites were screened, while no data screening strategy was applied to the conditions. This approach was chosen primarily because there were relatively few conditions, and secondly, if both sites and conditions were subjected to data screening strategies, the model would easily reach a local optimum, resulting in the loss of most sites. Experimental results showed that when both sites and conditions were subjected to data screening strategies on simulated data, the results were also poor.

1) Pathway specificity analysis

In order to further verify whether the clustering results of the EBBM model on real data are valid local co-methylation patterns, six pathways known to be significantly associated with m⁶A were first selected for pathway correlation analysis. The Fisher’s exact test and multiple hypothesis testing methods were used to analyze whether the sites retained in the clustering results were significantly associated with known pathways. The significance level was set at 0.01. The enrichment results are shown in Table 2.

Download:

Table 2. Pathway-specific analysis of EBBM clustering results on real data.

https://doi.org/10.1371/journal.pcbi.1014430.t002

Table 2 shows that, after multiple hypothesis testing, Pattern 1 is significantly correlated with fatty acid metabolism and UV response down. When the METTL3 gene is silenced, fatty acid metabolism can cause a decrease in m⁶A methylation and total mRNA levels of fatty acid synthesis. At the same time, when METTL3 is silenced and there is no METTL3 catalytic activity, cells show delayed repair of UV-induced cyclobutane pyrimidine adducts and increased sensitivity to UV radiation [82]. Table 2 indicates that pattern1 may be significantly associated with pathways related to reduced m⁶A modification caused by METTL3

gene silencing. Pattern2 was not enriched in the aforementioned known pathways associated with m⁶A. Therefore, we began to search for other biological explanations to explore the biological significance of Pattern2 at other levels. Existing studies have shown that m⁶A methylation is influenced by enzyme regulation, so we then analyzed the substrate specificity of the enzyme.

2) Analysis of enzyme substrate specificity

Firstly, 12,643 METTL3, 7,689 METTL14, 13,124 WTAP, 399 KIAA1429, and 10,030 FTO enzyme target sites were obtained from an independent public study [75]. These target sites were obtained by knocking out the relevant enzyme genes, acquiring raw data, and then mapping them to the human hg19 genome using exomePeak, followed by extracting RNA enzyme target methylation sites with significantly low expression (p-value < 0.05). Secondly, the gene symbols and Entrez Gene IDs corresponding to the sites included in the two patterns discovered by EBBM were annotated. Subsequently, Fisher’s exact test and multiple hypothesis testing methods were used to explore the enrichment relationship between the two biclusters discovered by EBBM and the target sites of enzymes. In the enrichment analysis, the significance level was set to 0.05, and the BH method [83] was used for multiple hypothesis testing. The final enrichment results are shown in Table 3.

Download:

Table 3. Enzyme-specificity analysis of EBBM clustering results on real data.

https://doi.org/10.1371/journal.pcbi.1014430.t003

Table 3 shows that, after multiple hypothesis testing, Pattern1 and Pattern2 were significantly enriched in the specific target sites of four methyltransferase complexes, METTL3, METTL14, WTAP, and KIAA1429, respectively, but the degree of enrichment was significantly different. This indicates that the m⁶A sites retained in Pattern1 and Pattern2 are regulated to varying degrees by methyltransferases. In pathway analysis, Pattern2 was not significantly associated with known pathways significantly regulated by m⁶A, but it was significantly enriched in specific target sites of methyltransferases, indicating that this pattern is also an effective local co-methylation pattern.

The above analysis only demonstrates the significant correlation between the two patterns discovered by EBBM in real data and the known research results related to m⁶A methylation modification. In order to further explore the other biological significance of the two patterns discovered by EBBM, GO enrichment analysis was subsequently performed.

3) GO enrichment analysis

In the GO enrichment analysis, the Gene Symbol and Entrez Gene ID corresponding to the genes retained in pattern1 and pattern2 were first annotated, and then Fisher’s exact test was used to explore the biological terms of the biological processes enriched in the gene ontology of the sites retained in pattern1 and pattern2. When retaining the top 10 most enriched BP terms in pattern1 and pattern2, their enrichment results are shown in Figs 10 and 11. Fig 10 indicates that pattern1 is associated with biological processes such as negative regulation of phosphorylation, differentiation of bone marrow cells, and embryonic stem cell development [84], which is consistent with previous studies. Fig 11 shows that pattern2 is associated with biological processes such as peptide lysine methylation and chromosome separation [85], which is consistent with previous wet lab-related studies.

Download:

Fig 10. Enrichment results of the sites retained in Pattern1 on BP term.

https://doi.org/10.1371/journal.pcbi.1014430.g010

Download:

Fig 11. Enrichment results of the sites retained in Pattern2 on BP term.

https://doi.org/10.1371/journal.pcbi.1014430.g011

In addition, Figs 10 and 11 show that pattern1 and pattern2 are both related to biological processes such as histone modification, covalent chromatin modification, and peptidyl lysine modification. Therefore, the overlap of the top 20 BP terms most enriched in pattern1 and pattern2 was examined, as shown in Fig 12. Fig 12 shows that although there are three overlaps among the top 20 most enriched BP terms retained in the BP GO enrichment results of pattern1 and pattern2, most of the enriched BP terms are still different for each pattern. For example, pattern1 is enriched in intracellular receptor signaling pathways, protein acylation, gland development, kidney development, etc., while pattern2 is enriched in protein methylation, protein alkylation, mitosis and meiosis, RNA splicing, etc. The above analysis shows that the clustering results of EBBM on real data are condition-specific to a certain extent.

Download:

Fig 12. The overlap of the top 20 enrichment BP terms by the sites remained in Pattern1 and Pattern2.

https://doi.org/10.1371/journal.pcbi.1014430.g012

To further illustrate the clustering performance of EBBM on real data, the GOE_Score scoring standard was finally selected, as shown in Eq. 3 [58].

(3)

In Eq. (3), represents the number of genes contained in term t₁ that is significantly enriched in the local co-methylation pattern. T represents the total number of terms enriched in this local co-methylation pattern. represents the total number of genes contained in this local co-methylation pattern. The higher the score for GOE_Score, the more biologically significant the local co-methylation pattern discovered.

In the aforementioned comparative experiments, since the real data used were IP sample reads count data and input sample reads count data, while mainstream biclustering algorithms require the input data to be a single data matrix. Therefore, to compare the performance of EBBM and other mainstream algorithms, the aforementioned lossless data was first normalized into a FPKM value data matrix, and then further converted into a methylation level matrix describing m⁶A methylation modification using the traditional method for calculating methylation levels. Finally, the methylation level matrix data was fed into each mainstream biclustering algorithm, and their GOE_Score scores were calculated based on their clustering results and Eq. (3). For the BBM and EBBM models, the IP sample reads count data and input reads count data were directly input. In addition, the Xmotifs, ISA, Bimax, and Plaid algorithms in the experiment all use the parameters recommended in the relevant literature [86]. The GO enrichment analysis term was set to ALL, which includes CC, BP, and MF. The comparison results are shown in Fig 13. Fig 13 shows that EBBM’s clustering performance on real data is significantly better than other biclustering algorithms, and the average GOE_Score score of the local co-methylation patterns it discovers is significantly higher than that of the BBM model. This suggests that the biclusters discovered by EBBM in real MeRIP-Seq data are more biologically meaningful than the other biclustering algorithms scored lower primarily because the methylation level data was distorted and noisy, and secondly because these biclustering methods did not take measures to penalize noisy data. For BBM, although its model is based on distortion-free IP and input data, it has overcome some of the noise issues introduced during data preprocessing, such as normalizing reads count data to FPKM values. However, due to the sequencing noise inherent in the MeRIP-Seq sequencing technology, this issue has not been addressed in the BBM model. Therefore, in theory, the GOE_Score scores of its clustering results should be lower than those of EBBM. The results in Fig 13 further validate the rationality of the aforementioned analysis.

Download:

Fig 13. Comparison of GOE_Score between EBBM and current mainstream biclustering algorithm on real data.

https://doi.org/10.1371/journal.pcbi.1014430.g013

4) Data sensitivity analysis on threshold selection

To demonstrate the stability of the results with respect to threshold selection, we performed a data sensitivity analysis on threshold selection.

Set the analysis range, re-evaluate key results (such as number of retained sites after screening, average bicluster size, overlap of identified co-methylation patterns (Jaccard similarity), biological consistency of top enriched pathways (GO/KEGG) for biclusters obtained under each threshold.) at different IP/Input ratios, including 1.0, 1.2, 1.4, 1.5, 1.6, 1.8, 2.0 and 2.5. Experimental results (S1 Fig) showed that the number of retained sites decreased as the ratio increased, but the core co-methylation patterns (i.e., biclusters with high stability) remained largely unchanged when the ratio ranged from 1.2 to 2.0. The Jaccard similarity between biclusters obtained at 1.5 and those at 1.8 was > 0.85. Importantly, the top enriched biological processes (e.g., “Histone modification”, “peptidyl-lysine modification”) were consistently reproduced across thresholds from 1.2 to 2.0. When the ratio was set to 1.0 (i.e., no strict filtering) or 2.5 (overly stringent), the biclustering results became unstable or lost meaningful patterns.

The above indicates that the 1.5 threshold is a reasonable default choice, and our algorithm’s performance is robust to moderate variations (±0.3) around this value.

3. Discussion

The exploration of co-methylation patterns based on m⁶A sequencing data can assist biological experiments in studying their functional mechanisms, saving time and economic costs, and thereby exploring the possibility of treating complex diseases such as cancer from the perspective of the epigenetic transcriptome. However, due to the inherent complexity of MeRIP-Seq sequencing data, particularly the inherent sequencing errors associated with this sequencing technology, a certain number of false-positive data points are present in the sequencing data. For example, data where the count of reads for the IP sample at a given site is less than 1.5 times that of the input sample is typically attributed to sequencing errors. Such data has a low signal-to-noise ratio. Currently, mainstream co-methylation detection algorithms in this field generally suffer from issues such as insufficient robustness, low accuracy, and unreliable clustering results.

To address this issue, this paper proposes a beta-binomial distribution biclustering algorithm based on data screening strategies, EBBM. This algorithm improves the robustness of the model to low signal-to-noise ratio data by introducing data screening strategies into the beta-binomial distribution modeling process, thereby improving the reliability of the clustering results. Inspired by current retrieval-augmented generation technique, this algorithm creatively introduces a data screening strategy in the process of constructing models using Bayesian methods, successfully guiding the data flow toward areas with high signal-to-noise ratios, so that the discovered patterns contain sites that are mostly actually methylated. Simulation data experiments show that the EBBM algorithm can effectively mine local co-methylation patterns pre-set in the data. On data without low signal-to-noise ratio (SNR) implantation, the F1 evaluation index shows that EBBM and the current state-of-the-art algorithm BBM are basically equivalent. However, after implanting low SNR data, EBBM scores significantly higher than BBM. In real m⁶A sequencing data, EBBM discovered two effective co-methylation patterns, which were enriched in negative regulation of phosphorylation and peptidyl lysine methylation, etc. different biological processes. Analysis of m⁶A pathway specificity and enzyme substrate specificity indicated that this pattern was an effective co-methylation pattern. In the GO enrichment analysis, the two patterns were enriched in different biological terms, indicating that they have certain condition specificity in their functional patterns. At the same time, there is a small overlap between the two patterns in GO terms. Combined with their pathway analysis, it was found that these two patterns may be regulated by the same regulatory factors to a certain extent, but the regulatory intensity is significantly different. Finally, the GOE_Score scoring results indicate that the co-methylation patterns mined by EBBM are more biologically meaningful than the mining results of current mainstream algorithms.

It is worth noting that more recent probabilistic or deep learning-based biclustering methods were not included as baselines in this study. A review of the literature indicates that such methods—developed primarily for other omics data—are not suitable for direct comparison with our approach. For instance, among probabilistic biclustering algorithms, BGB [87] is designed for a single data matrix in which rows represent independent features. While this makes BGB applicable to conventional omics data, it cannot accommodate the paired-matrix structure of MeRIP-seq data, which consists of two matched matrices (IP and input) measuring the same 65,536 modification sites across 32 conditions. The biological quantity of interest is the enrichment contrast between IP and input, which BGB cannot explicitly model. Any artificial data transformation to force compatibility would yield biclustering results that are not interpretable as differences between IP-enriched and non-enriched sample groups, rendering BGB scientifically untenable as a baseline. Among deep learning-based biclustering algorithms, autoencoder-driven and GNN-based methods have been proposed. A representative autoencoder-driven method, scDBic [88], presents three major issues when applied to MeRIP-seq data. First, it selects “key genes” with low within-cluster expression variance, whereas MeRIP-seq requires sites exhibiting high variance in enrichment contrast. Second, its 128-dimensional bottleneck is invalid for our dataset comprising only 32 samples. Third, it cannot handle two paired measurements (IP and input): using only IP omits essential background correction, while using the IP/input ratio violates its non‑negative integer count assumption. Consequently, scDBic is excluded due to a mismatch in biological targets, scalability constraints, and input format incompatibility. As a representative GNN-based biclustering method, Gaebic [89] requires a precomputed single-matrix feature–sample correlation graph as input. However, our MeRIP-seq data consist of two matched matrices per sample, and no public implementation supports this paired layout. Moreover, Gaebic is specifically designed for binary (0/1) matrices representing miRNA–gene targeting relationships, where only elements equal to 1 can be modeled. Owing to both dual-input incompatibility and binary-data constraints, Gaebic is also unsuitable as a baseline for our study. For similar reasons above, several recently proposed clustering algorithms [90–92] are likewise inappropriate as baselines for comparison.

The algorithm proposed in this paper represents an innovative modeling approach, providing a reference for constructing models for similar data or application scenarios. Additionally, the data screening strategy employed effectively eliminates noise issues caused by MeRIP-Seq sequencing errors, offering a powerful computational tool for identifying m⁶A co-methylation patterns and studying their functional mechanisms. Furthermore, EBBM identified two effective local co-methylation patterns in real m⁶A sequencing data. These patterns exhibit greater biological significance than the clustering results of current mainstream algorithms, and this finding can serve as a reference for biological experiments investigating m⁶A mechanisms. Experimental analysis shows that EBBM can effectively remove the influence of low signal-to-noise ratio in sequencing data, thereby improving the reliability of clustering results. However, this study also has certain limitations. First, the dataset is not large enough. A dataset that accommodates more experimental samples can further improve the reliability of the clustering results. In addition, it is also found in the experiment that due to the limitations of the experimental sample conditions, the data screening strategy was only performed based on the site during the modeling process. Performing the data screening strategy in multiple dimensions may yield more reliable clustering results. Therefore, in future work, we will further expand the m⁶A dataset to include more experimental samples. We will also attempt to implement data screening strategies simultaneously in two dimensions—site and experimental conditions, to further improve the reliability of the clustering results.

4. Methods

4.1. Probabilistic graphical model of EBBM

MeRIP-Seq sequencing uses IP samples and input samples to describe the distribution of m⁶A modifications. Based on the purpose of model construction, in this paper, we use denote the reads count data of IP samples and denote the reads count data of input samples, where n denotes the total number of sites and m denotes the total number of sample conditions. denotes the number of reads count of IP samples for site i under condition j, and denotes the number of reads count of input samples for site i under condition j.

In order to represent the degree of methylation modification of m⁶A under different conditions, the traditional method is described by calculating the methylation level as shown in Eq. (4).

(4)

denotes the methylation level of site i under condition j. Now, the following transformation is made to (4), which therefore leads to (5).

(5)

According to Eq. (4), , and hence, according to the expectation of the binomial distribution, it follows that (5) can be regarded as the expectation of the .

Where denotes the probability of success in each experiment. denotes the number of Bernoulli experiments performed in each binomial distribution experiment, which can be viewed as a random sample from a binomial distribution with parameters .

By the expectation of the binomial distribution, it follows that is sampled with a large probability of converging to its expected value, that is .

Therefore, it is reasonable to assume that the number of IP reads count of site i under condition j follows a binomial distribution with parameter as shown in Eq. (6).

(6)

denotes the number of Bernoulli experiments performed in each binomial distribution experiment, and denotes the probability of a positive case in each binomial distribution experiment. can be viewed as a random sample from the binomial distribution with parameter .

Therefore, the data of the whole IP sample can be viewed as consisting of local co-methylation patterns and 1 background data, each co-methylation pattern respectively follows a relatively sharp binomial distribution with parameter , and the background part follows another relatively flat binomial distribution with parameter . This Beta- binomial mixing (BBM) model can be described by the probability graphical model, which is represented as shown in Fig 14.

Download:

Fig 14. Probabilistic graphical model of BBM.

https://doi.org/10.1371/journal.pcbi.1014430.g014

In Fig 14, and are vectors of indicator variables for sites (rows) and conditions (columns), respectively, and denotes that site i belongs to the bicluster, otherwise it belongs to the background. denotes that condition j belongs to the bicluster, otherwise it belongs to the background. and are samples from Bernoulli distributions with parameters and , respectively, and are parameters of the corresponding binomial distributions. , , are the hyperparameters of the prior distributions of , , , i.e., the shape parameters of the corresponding beta distributions.

Therefore, the data generation process can be viewed as follows: first, and are generated from the beta distributions, respectively, and then the Bernoulli distributions generate the labels and for each site and condition, according to which the data indexed by them are determined to belong to the bicluster or the background, and the corresponding binomial distributions are chosen to generate the corresponding observations.

The above model can mine the co-methylation patterns hidden in the data, however, due to the existence of sequencing errors inherent in MeRIP-Seq technology, usually, the sequencing values with the number of IP reads count at a site less than 1.5 times of the number of input reads count at that site are regarded as sequencing noises, and the BBM could not distinguish such noises efficiently, therefore, the above model needs to be further adapted to improve its robustness to these low signal-to-noise data and enhance the reliability of the clustering results.

The Retrieval-augmented Generation technique improves the reliability of the large model by providing it with information retrieved from specific data sources as a means to correct and supplement the generated answers. Inspired by this, this paper effectively removes the effect of MeRIP-Seq sequencing errors and improves the reliability of clustering results by introducing a data screening strategy based on the BBM model. That is, when the data classification is determined by the initial label, and then the related binomial distribution is determined to generate the corresponding observation data, data interference is carried out, i.e., by further filtering the bicluster elements determined in the previous step retaining the element values with the number of IP reads count of the sites greater than 1.5 times of the number of input reads count, which are used to carry out the estimation of the biclusters in the next step, in order to improve the model’s robustness to the low signal-to-noise ratio data, removing the interference of noise and enhancing the discriminative ability of the model. Through the above operation, elements with an IP reads count number greater than 1.5 times the number of input reads count can eventually be induced to aggregate to the corresponding bicluster, so that most of the elements finally retained in the bicluster have the characteristics of IP reads count greater than 1.5 times the number of input reads count, and all of them have an average methylation level greater than 0.6, i.e., the discovered pattern contains less unmethylation modification occurred site data, achieving the purpose of effectively removing the MeRIP-Seq sequencing noise. Its probability graphical model is shown in Fig 15. As shown in Fig 15, among the sites retained in the bicluster, only the sites that have been methylated are selected by filtering to infer the binomial distribution they follow, to further guide the data flow, thus achieving the purpose of removing sequencing noise.

Download:

Fig 15. Probabilistic graphical model of EBBM.

https://doi.org/10.1371/journal.pcbi.1014430.g015

The generation of data in EBBM can be viewed as first generating and from the prior distributions of the Bernoulli distribution, respectively, and then generating each site label and condition label from the Bernoulli distribution. Based on this label, it is determined whether it belongs to the bicluster or the background, and then the corresponding binomial distribution is selected to generate the corresponding observations, and then the pattern where the observed variables are located is adjusted according to the filtering conditions, i.e., the observed variables that satisfy the condition of are reprogrammed into the bicluster, and the rest are used as the background.

4.2. Parameter inference

As shown in Fig 15, it is difficult to estimate the parameters by maximum likelihood or maximum a posteriori methods because the model contains unobserved hidden variables. Therefore, this model uses the approximate inference Gibbs sampling method for parameter estimation. According to the Gibbs sampling method, the target variables to be sampled in this model are first determined as: 1) the hidden variable vectors of the data, and , and 2) and . Subsequently, the full conditional probability distribution of the sampled target is determined thereby completing the inference of the parameters.

1) full-conditional probabilities of indicator variables and for site i and condition j.

Definition denotes the vector of indicator variables for the site other than site i. Since follows the Bernoulli distribution with a probability , is the probability of

Therefore, the full-conditional probability distribution of belonging to the bicluster for site i is shown in Eq. (7).

(7)

Where, represents the set of parameters of the Bernoulli distribution that the labels of all sites follow. denotes the set of parameters of the Bernoulli distribution that the other sites follow, except for the i-th site. denotes the set of parameters of the Bernoulli distribution that the label vectors of all experimental conditions follow. In Eq. (8),

(8)

represents the number of residual sites where the indicator variable is 1, excluding site i. and are the two shape parameters of the prior distribution.

(9)

Where,

Similarly, the full-conditional distribution of site i belonging to the background is as Eq. (10).

(10)

The two factors in Eq. (10) are calculated as follows.

(11)

(12)

Where,, denotes , where, , denotes , where, .

For convenience, we directly compute Eq. (13).

(13)

Substituting Eq. (7), (8), (9), (10), (11) and (12) into Eq. (13) gives the following result,

(14)

From Eq. (13), we can solve for,

(15)

Similarly, for condition j, the calculation is as follows.

(16)

represents the number of remaining conditions where the indicator variable is 1, except for condition j. and are the two shape parameters of the prior distribution of .

From Equation (16), we can solve for,

(17)

2) Full-conditional probability distribution of and of bicluster and background based on data screening strategy.

First, calculate the full-conditional probability distribution of without data filtering, and then perform data filtering. The full-conditional probability of is derived as follows using Bayes’ formula.

(18)

indicate the set of sites and conditions included in the bicluster.

According to the screening strategy, the next step is to screen out the set of sites and conditions that satisfy the condition that the number of IP sample reads count at the corresponding experimental conditions is greater than 1.5 times the corresponding input reads count, i.e., I’ and J’. I’ and J’ are subsets of I and J, respectively. The elements contained in I’ and J’ all satisfy , that is, their methylation levels are all greater than 0.6. The full-conditions probability distribution of after screening is shown in Eq. (19).

(19)

Similarly, the full-conditional probability of unscreened is derived as follows.

(20)

represents the set of sites and conditions included in the background.

The full-conditions probability distribution of after screening is shown in Eq. (21).

(21)

4.3. Algorithm implementation

1) Implementation of a probabilistic model for a single bicluster

The implementation of the probability model for a single bicluster consists of three parts: the design of the Gibbs sampler, convergence judgment, and final pattern determination. The specific implementation is as follows.

(1) Design of Gibbs sampler

Based on the above, the algorithmic framework for a single bicluster probability model can be designed as Fig 16.

Download:

Fig 16. Algorithmic framework for a single bicluster probability model based on data screening strategies.

https://doi.org/10.1371/journal.pcbi.1014430.g016

Compared with the BBM model, the EBBM model adds a data screening strategy in step 2 for noise reduction. First, based on D_R and D_C, screen out the set of sites and conditions belonging to the bicluster, i.e., I and J. Then, further screen the selected sites and conditions to select the set of sites and conditions that satisfy the condition that the number of IP sample reads count at the corresponding experimental conditions is greater than 1.5 times the corresponding input reads count, i.e., I’ and J’. Finally, substitute into Eq. (19) to calculate the binomial distribution of the bicluster. Comparing Eq. (19) and Eq. (18), we can see that the incremental terms of the two shape parameters of the beta distribution in Eq. (19) are subsets of the two shape parameters of the beta distribution in Eq. (18). The expectation value of the beta distribution represented by Eq. (19) can be expressed as Eq. (22).

(22)

In Eq. (22), and are hyperparameters, and their values can be ignored compared to the values of and . Therefore, the value of Eq. (22) can be approximated by Eq. (23).

(23)

In Eq. (23), since the retained elements are all elements where , the expectation estimated by this formula is at least greater than 0.6, its value is larger than that of formula (18). In other words, the sites retained after screening the temporarily obtained biclusters in the iterative process using Eq. (19) are, according to the MeRIP-Seq sequencing principle, sites that have theoretically undergone methylation.

From the above, in the subsequent calculation steps, Eq. (19) needs to be substituted for Eq. (18) to calculate the binomial distribution parameters of the estimated bicluster.

Next, we estimate the conditional label. Whsen the value of is larger, the probability that condition j belongs to bicluster is greater. According to Eq. (17),

(24)

In Eq. (24), since is a monotonically decreasing function, the larger the value of , the smaller the value of , thereby causing the value of Eq. (24) to increase.

From the above, we can conclude that the larger the value of , the larger the value of will be, and and are directly proportional to each other.

Again, by Eq. (16)

(25)

In step 3 of the Gibbs sampler, for the first experimental condition at the start, since D_R and D_Cj are fixed in advance, the value in Eq. (25) does not change with the value of . Therefore, the value of formula (25) is determined by . After the previous data screening, the value of is relatively large. When the value of is large, the probability density function of the binomial distribution shows that when the value of is large, the value of is large, while the value of is small. In other words, tends to come from the binomial distribution with larger value. Then the value of will be larger. Conversely, the smaller is, the smaller the value of will be. Since the log function is a monotonically increasing function, in this step, when is large, Eq. (25) will estimate the label for the first experimental condition based on the relationship between and in the already determined bicluster. If most of the data satisfy , then Eq. (25) will give a larger value. From the results derived above, is directly proportional to , so the larger the value of , the greater the probability of sampling the corresponding label as 1, meaning that it is more likely to belong to the bicluster.

Then, for the next , will increase, and the value of will be determined according to whether the elements added to the bicluster based on the first experimental condition added still satisfy the condition that in most cases. If most elements still have the property , assign a larger value; otherwise, assign it a smaller value. Therefore, when the first experimental condition has been determined to be a bicluster, Eq. (25) will attempt to increase the number of elements to expand the size of the bicluster, and then determine whether the expanded bicluster still retains the property that most elements satisfy . If so, continue attempting to expand the size of the bicluster; otherwise, reduce the size of the bicluster. The determination of subsequent experimental condition labels follows the same principle.

Conversely, if, when initially evaluating the first experimental condition, most of the elements in the bicluster that has been identified do not exhibit the property , then will be assigned a smaller value. The likelihood of the first experimental condition being assigned to the bicluster is low. In other words, the algorithm determines that the first experimental condition does not belong to the bicluster, thereby removing some elements and reducing the size of the bicluster. This allows the algorithm to statistically analyze whether most elements in the reduced-size bicluster exhibit the property when evaluating the second experimental condition. If the property is present, the second experimental condition is judged to belong to the bicluster, and thus, some elements under the second experimental condition are added to the bicluster, thereby increasing the size of the bicluster when judging the third experimental condition. If the property is not satisfied, the size of the bicluster is further reduced, and the properties of the contained elements are counted to judge the third experimental condition. Subsequent experimental condition judgments follow the same principle.

The same applies to the labeling of sites. Therefore, the data screening strategy will induce elements that satisfy the filtering strategy to cluster together.

In summary, by improving the Gibbs sampler, we can eliminate the noise problem introduced by sequencing errors to a certain extent.

The Gibbs sampler for the single bicluster probability model is designed as Algorithm 1.

Algorithm 1: Gibbs sampler for a single bicluster probability model

Input: randomly initialize the label vectors D_R and D_C for the sites and sample conditions, the preset number of iterations K and the burn-in number L, the IP sample reads count matrix , the input sample reads count matrix , and the hyperparameters (), (), () and ().

Output: determined biclusters and background

1. For each site , fix D_C, , , and the labels D_RT for other sites.

1) Calculate the Bernoulli distribution of the site label according to (14) and (15).

2) Sample the label D_Rj for the site based on the distribution.

2. Fix D_R and D_C

1) Based on D_R and D_C, filter out the sites and conditions belonging to the bicluster, i.e., and .

2) Further screen the selected sites I and conditions J to identify sites I’ and conditions J’ that satisfy .

3) Calculate the binomial distribution of the bicluster and the background according to (19) and (21).

4) Sample and according to their distribution.

3. For each sample condition , fix D_R, , , and the labels D_Cj of other sample conditions.

1) Calculate the Bernoulli distribution of the sample condition label according to (16) and (17).

2) Sample the label for the sample conditions based on this distribution.

4. Fix D_R and D_C

1) Calculate the binomial distribution of the bicluster and the background according to (18) and (20).

2) Sample and according to their distribution.

5.Repeat steps 1, 2, 3, and 4 until the iteration ends.

(2) Convergence diagnosis

There are many methods for diagnosing convergence, and in practice, convergence is usually assessed from multiple perspectives. This paper references the convergence diagnosis method proposed by Liu et al [58], which involves checking the intra-chain variance after Algorithm 1 has completed the predefined number of iterations. If the convergence condition is met, the algorithm stops. Otherwise, the iteration count increases until convergence is achieved. Once convergence has been achieved, the historical trace plots of the parameters and the log-likelihood are examined to confirm the convergence of the algorithm. Eq. (26) show the method for calculating the log-likelihood function in the model.

(26)

(3) Determination of the final pattern

To determine the final pattern, after the algorithm converges, the elements of the bicluster are determined using the Monte Carlo integration method. Specifically, the sites with site label values higher than the three-quarter quartile of all site labels are selected as the sites of the bicluster. Similarly, the conditions with condition label values higher than the three-quarter quartile of all condition labels are selected as the conditions of the bicluster. The remaining sites and conditions are treated as background elements.

2) Probabilistic model of multiple biclusters

This model uses a site non-overlap method to discover multiple biclusters. This involves masking the sites of the discovered biclusters and repeatedly executing the sampling programme on the remaining data until the algorithm terminates and determines the number of biclusters discovered. Liu et al. used this method to determine the number of biclusters. The implementation process is shown in Algorithm 2.

Algorithm 2

Input: Output results of Algorithm 1

Output: Rows belonging to the discovered bicluster and “yes” or “no”.

1. Mask rows that have been assigned to the discovered bicluster.

2. Run Algorithm 1

3. Determine whether there are any remaining rows or conditions to be assigned to the new bicluster.

3) Implementation of the entire EBBM algorithm.

In summary, the entire EBBM algorithm design is shown in algorithm 3.

Algorithm 3

Input: IP sample reads count matrix , input sample reads count matrix , maximum number of biclusters N, and other relevant parameter initial values required by the algorithm.

Output: All biclusters found: Bic₁, Bic₂, … Bic_k.

1. For k in 1: N

2. Feed data matrices and into

Algorithm 1 to obtain .

3. Feed the output results of Algorithm 1 into

Algorithm 2.

4. If: The output result of Algorithm 2 contains “yes”:

5. k = k + 1

6. else: break

Supporting information

S1 Fig. Sensitivity analysis of IP/Input ratio threshold.

The graph evaluates the impact of varying the IP/Input ratio threshold on retained sites and the stability of the results. The “Retained sites” column shows the number of sites retained at each threshold. The “Stable range (1.2-2.0)” highlights the threshold interval where the core co-methylation patterns (i.e., biclusters with high stability) remained largely unchanged. The “Jaccard similarity (vs ratio=1.5)” column quantifies the overlap between the site sets obtained at each threshold and the reference set obtained at ratio = 1.5. A Jaccard similarity value above 0.8 indicates high reproducibility. Importantly, the top enriched biological processes (e.g., “Histone modification”, “peptidyl-lysine modification”) were consistently reproduced across thresholds from 1.2 to 2.0. When the ratio was set to 1.0 (i.e., no strict filtering) or 2.5 (overly stringent), the biclustering results became unstable or lost meaningful patterns. The above indicates that the 1.5 threshold is a reasonable default choice, and our algorithm’s performance is robust to moderate variations (±0.3) around this value. The analysis suggests that thresholds between 1.2 and 2.0 yield stable and consistent results. Abbreviations: IP, immunoprecipitation.

https://doi.org/10.1371/journal.pcbi.1014430.s001

(TIF)

References

1. Yang Y, Hsu PJ, Chen YS, Yang YG. Dynamic transcriptomic m 6 A decoration: Writers, erasers, readers and functions in RNA metabolism. Cell Research. 2018;28(6):616.
- View Article
- Google Scholar
2. Roundtree IA, Evans ME, Pan T, He C. Dynamic RNA modifications in gene expression regulation. Cell. 2017;169(7):1187–200. pmid:28622506
- View Article
- PubMed/NCBI
- Google Scholar
3. Li J, Yang X, Qi Z, Sang Y, Liu Y, Xu B, et al. The role of mRNA m6A methylation in the nervous system. Cell Biosci. 2019;9:66. pmid:31452869
- View Article
- PubMed/NCBI
- Google Scholar
4. Fu Y, Dominissini D, Rechavi G, He C. Gene expression regulation mediated through reversible m(6)A RNA methylation. Nature Reviews Genetics. 2014;15(5):293–306.
- View Article
- Google Scholar
5. Angelova MT, Dimitrova DG, Dinges N, Lence T, Worpenberg L, Carre C, et al. The Emerging field of epitranscriptomics in neurodevelopmental and neuronal disorders. Frontiers in Bioengineering and Biotechnology. 2018;6.
- View Article
- Google Scholar
6. Weisberg E, Boulton C, Kelly LM, Manley P, Fabbro D, Meyer T, et al. Inhibition of mutant FLT3 receptors in leukemia cells by the small molecule tyrosine kinase inhibitor PKC412. Cancer Cell. 2002;1(5):433–43. pmid:12124173
- View Article
- PubMed/NCBI
- Google Scholar
7. Spiekermann K, Bagrintseva K, Schwab R, Schmieja K, Hiddemann W. Overexpression and constitutive activation of FLT3 induces STAT5 activation in primary acute myeloid leukemia blast cells. Clin Cancer Res. 2003;9(6):2140–50. pmid:12796379
- View Article
- PubMed/NCBI
- Google Scholar
8. Koutras AK, Fountzilas G, Kalogeras KT, Starakis I, Iconomou G, Kalofonos HP. The upgraded role of HER3 and HER4 receptors in breast cancer. Critical Reviews in Oncology/Hematology. 2010;74(2):73–8.
- View Article
- Google Scholar
9. Salassidis K, Bruch J, Zitzelsberger H, Lengfelder E, Kellerer AM, Bauchinger M. Translocation t (10; 14)(q11. 2; q22. 1) fusing the kinectin to the RET gene creates a novel rearranged form (PTC8) of the RET proto-oncogene in radiation-induced childhood papillary thyroid carcinoma. Cancer research. 2000;60(11):2786–9.
- View Article
- Google Scholar
10. Khaidakov M, Mitra S, Kang B-Y, Wang X, Kadlubar S, Novelli G, et al. Oxidized LDL receptor 1 (OLR1) as a possible link between obesity, dyslipidemia and cancer. PLoS One. 2011;6(5):e20277. pmid:21637860
- View Article
- PubMed/NCBI
- Google Scholar
11. Shintani S, Funayama T, Yoshihama Y, Alcalde RE, Matsumura T. Prognostic significance of ERBB3 overexpression in oral squamous cell carcinoma. Cancer Lett. 1995;95(1–2):79–83. pmid:7656248
- View Article
- PubMed/NCBI
- Google Scholar
12. Li L, Sawamura T, Renier G. Glucose enhances human macrophage LOX-1 expression: role for LOX-1 in glucose-induced macrophage foam cell formation. Circ Res. 2004;94(7):892–901. pmid:15001526
- View Article
- PubMed/NCBI
- Google Scholar
13. Dubois-Dauphin M, Poitry-Yamate C, de Bilbao F, Julliard AK, Jourdan F, Donati G. Early postnatal Müller cell death leads to retinal but not optic nerve degeneration in NSE-Hu-Bcl-2 transgenic mice. Neuroscience. 2000;95(1):9–21. pmid:10619458
- View Article
- PubMed/NCBI
- Google Scholar
14. Tsuda H, Birrer MJ, Ito YM, Ohashi Y, Lin M, Lee C, et al. Identification of DNA copy number changes in microdissected serous ovarian cancer tissue using a cDNA microarray platform. Cancer genetics and cytogenetics. 2004;155(2):97–107.
- View Article
- Google Scholar
15. López-Terrada D, Gunaratne PH, Adesina AM, Pulliam J, Hoang DM, Nguyen Y, et al. Histologic subtypes of hepatoblastoma are characterized by differential canonical Wnt and Notch pathway activation in DLK+ precursors. Hum Pathol. 2009;40(6):783–94. pmid:19200579
- View Article
- PubMed/NCBI
- Google Scholar
16. Sullivan KE, Piliero LM, Dharia T, Goldman D, Petri MA. 3’ polymorphisms of ETS1 are associated with different clinical phenotypes in SLE. Hum Mutat. 2000;16(1):49–53. pmid:10874305
- View Article
- PubMed/NCBI
- Google Scholar
17. Mustafa DA, Sieuwerts AM, Zheng PP, Kros JM. Overexpression of colligin 2 in glioma vasculature is associated with overexpression of heat shock factor 2. Gene regulation and systems biology. 2010;4:GRSB. S4546.
- View Article
- Google Scholar
18. Frank NY, Schatton T, Kim S, Zhan Q, Wilson BJ, Ma J, et al. VEGFR-1 expressed by malignant melanoma-initiating cells is required for tumor growth. Cancer Res. 2011;71(4):1474–85. pmid:21212411
- View Article
- PubMed/NCBI
- Google Scholar
19. Nordberg J, Mpindi JP, Iljin K, Pulliainen AT, Kallajoki M, Kallioniemi O, et al. Systemic analysis of gene expression profiles identifies ErbB3 as a potential drug target in pediatric alveolar rhabdomyosarcoma. PLoS One. 2012;7(12):e50819. pmid:23227212
- View Article
- PubMed/NCBI
- Google Scholar
20. Beji A, Horst D, Engel J, Kirchner T, Ullrich A. Toward the prognostic significance and therapeutic potential of HER3 receptor tyrosine kinase in human colon cancer. Clinical Cancer Research. 2012;18(4):956–68.
- View Article
- Google Scholar
21. Choi H-R, Kim WK, Kim EY, Han BS, Min J-K, Chi S-W, et al. Dual-specificity phosphatase 10 controls brown adipocyte differentiation by modulating the phosphorylation of p38 mitogen-activated protein kinase. PLoS One. 2013;8(8):e72340. pmid:23977283
- View Article
- PubMed/NCBI
- Google Scholar
22. Luo J, Liu H, Luan S, He C, Li Z. Aberrant Regulation of mRNA m⁶A Modification in Cancer Development. Int J Mol Sci. 2018;19(9):2515. pmid:30149601
- View Article
- PubMed/NCBI
- Google Scholar
23. Hong K. Emerging function of N6-methyladenosine in cancer. Oncol Lett. 2018;16(5):5519–24. pmid:30344705
- View Article
- PubMed/NCBI
- Google Scholar
24. Lin Y, Ueda J, Yagyu K, Ishii H, Ueno M, Egawa N, et al. Association between variations in the fat mass and obesity-associated gene and pancreatic cancer risk: A case–control study in Japan. BMC Cancer. 2013;13(1):1–6.
- View Article
- Google Scholar
25. Machiela MJ, Lindström S, Allen NE, Haiman CA, Albanes D, Barricarte A, et al. Association of type 2 diabetes susceptibility variants with advanced prostate cancer risk in the Breast and Prostate Cancer Cohort Consortium. Am J Epidemiol. 2012;176(12):1121–9. pmid:23193118
- View Article
- PubMed/NCBI
- Google Scholar
26. Ortega A, Niksic M, Bachi A, Wilm M, Sánchez L, Hastie N, et al. Biochemical function of female-lethal (2)D/Wilms’ tumor suppressor-1-associated proteins in alternative pre-mRNA splicing. J Biol Chem. 2003;278(5):3040–7. pmid:12444081
- View Article
- PubMed/NCBI
- Google Scholar
27. Reddy SM, Sadim M, Li J, Yi N, Agarwal S, Mantzoros CS, et al. Clinical and genetic predictors of weight gain in patients diagnosed with breast cancer. Br J Cancer. 2013;109(4):872–81. pmid:23922112
- View Article
- PubMed/NCBI
- Google Scholar
28. Jin DI, Lee SW, Han ME, Kim HJ, Seo SA, Hur GY, et al. Expression and roles of Wilms’ tumor 1‐associating protein in glioblastoma. Cancer science. 2012;103(12):2102–9.
- View Article
- Google Scholar
29. Akilzhanova A, Nurkina Z, Momynaliev K, Ramanculov E, Zhumadilov Z, Rakhypbekov T, et al. Genetic profile and determinants of homocysteine levels in Kazakhstan patients with breast cancer. Anticancer Res. 2013;33(9):4049–59. pmid:24023349
- View Article
- PubMed/NCBI
- Google Scholar
30. Heiliger K-J, Hess J, Vitagliano D, Salerno P, Braselmann H, Salvatore G, et al. Novel candidate genes of thyroid tumourigenesis identified in Trk-T1 transgenic mice. Endocr Relat Cancer. 2012;19(3):409–21. pmid:22454401
- View Article
- PubMed/NCBI
- Google Scholar
31. Casalegno-Garduno R, Schmitt A, Wang X, Xu X, Schmitt M. Wilms’ tumor 1 as a novel target for immunotherapy of leukemia. :3309–11.
32. Bokar JA. The biosynthesis and functional roles of methylated nucleosides in eukaryotic mRNA. Fine-tuning of RNA functions by modification and editing. Springer. 2005. 141–77.
33. Lin S, Choe J, Du P, Triboulet R, Gregory RI. The m(6)A Methyltransferase METTL3 Promotes translation in human cancer cells. Mol Cell. 2016;62(3):335–45. pmid:27117702
- View Article
- PubMed/NCBI
- Google Scholar
34. Zhang C, Samanta D, Lu H, Bullen JW, Zhang H, Chen I, et al. Hypoxia induces the breast cancer stem cell phenotype by HIF-dependent and ALKBH5-mediated m⁶A-demethylation of NANOG mRNA. Proc Natl Acad Sci U S A. 2016;113(14):E2047–56. pmid:27001847
- View Article
- PubMed/NCBI
- Google Scholar
35. Loos RJ, Yeo GD. The bigger picture of FTO—the first GWAS-identified obesity gene. Nature Reviews Endocrinology. 2014;10(1):51–61.
- View Article
- Google Scholar
36. Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, Lindgren CM, et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science. 2007;316(5826):889–94. pmid:17434869
- View Article
- PubMed/NCBI
- Google Scholar
37. Karra E, O’Daly OG, Choudhury AI, Yousseif A, Millership S, Neary MT, et al. A link between FTO, ghrelin, and impaired brain food-cue responsivity. J Clin Invest. 2013;123(8):3539–51. pmid:23867619
- View Article
- PubMed/NCBI
- Google Scholar
38. Kalnina I, Zaharenko L, Vaivade I, Rovite V, Nikitina-Zake L, Peculis R, et al. Polymorphisms in FTO and near TMEM18 associate with type 2 diabetes and predispose to younger age at diagnosis of diabetes. Gene. 2013;527(2):462–8. pmid:23860325
- View Article
- PubMed/NCBI
- Google Scholar
39. Wang L, Yu Q, Xiong Y, Liu L, Zhang X, Zhang Z, et al. Variant rs1421085 in the FTO gene contribute childhood obesity in Chinese children aged 3–6 years. Obesity research & clinical practice. 2013;7(1):e14–22.
- View Article
- Google Scholar
40. Zhang M, Zhang Y, Ma J, Guo F, Cao Q, Zhang Y, et al. The demethylase activity of FTO (fat mass and obesity associated protein) is required for preadipocyte differentiation. PLoS One. 2015;10(7):e0133788. pmid:26218273
- View Article
- PubMed/NCBI
- Google Scholar
41. Merkestein M, Laber S, McMurray F, Andrew D, Sachse G, Sanderson J, et al. FTO influences adipogenesis by regulating mitotic clonal expansion. Nature Communications. 2015;6(1):1–9.
- View Article
- Google Scholar
42. Zhao X, Yang Y, Sun B-F, Shi Y, Yang X, Xiao W, et al. FTO-dependent demethylation of N6-methyladenosine regulates mRNA splicing and is required for adipogenesis. Cell Res. 2014;24(12):1403–19. pmid:25412662
- View Article
- PubMed/NCBI
- Google Scholar
43. Meyer KD, Saletore Y, Zumbo P, Elemento O, Mason CE, Jaffrey SR. Comprehensive analysis of mRNA methylation reveals enrichment in 3’ UTRs and near stop codons. Cell. 2012;149(7):1635–46. pmid:22608085
- View Article
- PubMed/NCBI
- Google Scholar
44. De Simone M, Hoover J, Lau J, Bennett HM, Wu B, Chen C, et al. A comprehensive analysis framework for evaluating commercial single-cell RNA sequencing technologies. Nucleic Acids Res. 2025;53(2):gkae1186. pmid:39675380
- View Article
- PubMed/NCBI
- Google Scholar
45. Liu C, Liang H, Wan AH, Xiao M, Sun L, Yu Y, et al. Decoding the m6A epitranscriptomic landscape for biotechnological applications using a direct RNA sequencing approach. Nat Commun. 2025;16(1):798. pmid:39824841
- View Article
- PubMed/NCBI
- Google Scholar
46. Asim MN, Ibrahim MA, Asif T, Dengel A. RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models. Heliyon. 2025.
- View Article
- Google Scholar
47. Sun W-J, Li J-H, Liu S, Wu J, Zhou H, Qu L-H, et al. RMBase: A resource for decoding the landscape of RNA modifications from high-throughput sequencing data. Nucleic Acids Res. 2016;44(D1):D259-65. pmid:26464443
- View Article
- PubMed/NCBI
- Google Scholar
48. Xuan J, Chen L, Chen Z, Pang J, Huang J, Lin J, et al. RMBase v3.0: decode the landscape, mechanisms and functions of RNA modifications. Nucleic Acids Res. 2024;52(D1):D273–84. pmid:37956310
- View Article
- PubMed/NCBI
- Google Scholar
49. Liu H, Ma J, Meng J, Zhang L. MeT-DB V2.0: Elucidating Context-Specific Functions of N6-Methyl-Adenosine Methyltranscriptome. Methods Mol Biol. 2021;2284:507–18. pmid:33835460
- View Article
- PubMed/NCBI
- Google Scholar
50. Zheng Y, Nie P, Peng D, He Z, Liu M, Xie Y, et al. m6AVar: A database of functional variants involved in m6A modification. Nucleic Acids Res. 2018;46(D1):D139–45. pmid:29036329
- View Article
- PubMed/NCBI
- Google Scholar
51. Deng S, Zhang H, Zhu K, Li X, Ye Y, Li R, et al. M6A2Target: A comprehensive database for targets of m 6 A writers, erasers and readers. Briefings in Bioinformatics. 2021;22(3):bbaa055.
- View Article
- Google Scholar
52. Zhang SY, Zhang SW, Liu L, Meng J, Huang Y. m6A-driver: Identifying context-specific mRNA m6A methylation-driven gene interaction networks. PLoS Computational Biology. 2016;12(12):e1005287.
- View Article
- Google Scholar
53. Zhou Y, Zeng P, Li Y-H, Zhang Z, Cui Q. SRAMP: Prediction of mammalian N6-methyladenosine (m6A) sites based on sequence-derived features. Nucleic Acids Res. 2016;44(10):e91. pmid:26896799
- View Article
- PubMed/NCBI
- Google Scholar
54. Chen K, Wei Z, Zhang Q, Wu X, Rong R, Lu Z, et al. WHISTLE: A high-accuracy map of the human N6-methyladenosine (m6A) epitranscriptome predicted using a machine learning approach. Nucleic Acids Res. 2019;47(7):e41. pmid:30993345
- View Article
- PubMed/NCBI
- Google Scholar
55. Fan R, Cui C, Kang B, Chang Z, Wang G, Cui Q. A combined deep learning framework for mammalian m6A site prediction. Cell Genom. 2024;4(12):100697. pmid:39571573
- View Article
- PubMed/NCBI
- Google Scholar
56. Liu L, Zhang S-W, Zhang Y-C, Liu H, Zhang L, Chen R, et al. Decomposition of RNA methylome reveals co-methylation patterns induced by latent enzymatic regulators of the epitranscriptome. Mol Biosyst. 2015;11(1):262–74. pmid:25370990
- View Article
- PubMed/NCBI
- Google Scholar
57. Liu Z, Xiao Y, Yin H, Li X, Chen S, Xia K, et al. BDBB: A novel beta-distribution-based biclustering algorithm for revealing local co-methylation patterns in epi-transcriptome profiling data. IEEE J Biomed Health Inform. 2022;26(6):2405–16. pmid:33764880
- View Article
- PubMed/NCBI
- Google Scholar
58. Liu Z, Xiao Y, Wang H, Li C, Yin H. BBM: A novel beta-binomial-distribution-based biclustering algorithm for mining m6A co-methylation patterns. Expert Systems with Applications. 2024;258:125121.
- View Article
- Google Scholar
59. Chen S, Zhang L, Lu L, Meng J, Liu H. FBCwPlaid: A Functional Biclustering Analysis of Epi-Transcriptome Profiling Data Via a Weighted Plaid Model. IEEE/ACM Trans Comput Biol Bioinform. 2022;19(3):1640–50. pmid:33400655
- View Article
- PubMed/NCBI
- Google Scholar
60. Zhang L, Chen S, Zhu J, Meng J, Liu H. REW-ISA: Unveiling local functional blocks in epi-transcriptome profiling data via an RNA expression-weighted iterative signature algorithm. BMC Bioinformatics. 2020;21(1):447. pmid:33036550
- View Article
- PubMed/NCBI
- Google Scholar
61. Cui X, Meng J, Zhang S, Rao MK, Chen Y, Huang Y. A hierarchical model for clustering m 6 A methylation peaks in MeRIP-seq data. BMC Genomics. 2016;17(7):317–25.
- View Article
- Google Scholar
62. Chen K, Wei Z, Liu H, d. Magalhães JP, Rong R, Lu Z, et al. Enhancing epitranscriptome module detection from m6A-seq data using threshold-based measurement weighting strategy. BioMed Research International. 2018;2018.
- View Article
- Google Scholar
63. Zhang L, He Y, Wang H, Liu H, Huang Y, Wang X, et al. Clustering count-based RNA methylation data using a nonparametric generative model. Current Bioinformatics. 2019;14(1):11–23.
- View Article
- Google Scholar
64. Liu Z, Yin H, Chen S, Liu H, Meng J, Wang H, et al. MBMM: Moment estimating beta mixture model-based clustering algorithm for m6A Co-methylation module mining. Current Bioinformatics. 2021;16(10):1244–56.
- View Article
- Google Scholar
65. Padilha VA, Campello RJ. A systematic comparative evaluation of biclustering techniques. BMC Bioinformatics. 2017;18(1):55.
- View Article
- Google Scholar
66. Chen S, Zhang L, Lu L, Meng J, Liu H. FBCwPlaid: A functional biclustering analysis of epi-transcriptome profiling data via a weighted plaid model. IEEE/ACM Trans Comput Biol Bioinform. 2022;19(3):1640–50. pmid:33400655
- View Article
- PubMed/NCBI
- Google Scholar
67. Dominissini D, Moshitch-Moshkovitz S, Schwartz S, Salmon-Divon M, Ungar L, Osenberg S, et al. Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq. Nature. 2012;485(7397):201–6. pmid:22575960
- View Article
- PubMed/NCBI
- Google Scholar
68. Zeng Y, Wang S, Gao S, Soares F, Ahmed M, Guo H, et al. Refined RIP-seq protocol for epitranscriptome analysis with low input materials. PLoS Biol. 2018;16(9):e2006092. pmid:30212448
- View Article
- PubMed/NCBI
- Google Scholar
69. Cui X, Meng J, Zhang S, Chen Y, Huang Y. A novel algorithm for calling mRNA m6A peaks by modeling biological variances in MeRIP-seq data. Bioinformatics. 2016;32(12):i378–85. pmid:27307641
- View Article
- PubMed/NCBI
- Google Scholar
70. Cui X, Zhang L, Meng J, Rao MK, Chen Y, Huang Y. MeTDiff: A novel differential RNA methylation analysis for MeRIP-Seq data. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2015;15(2):526–34.
- View Article
- Google Scholar
71. Shen W, Sun H, Liu C, Yi Y, Hou Y, Xiao Y, et al. GLORI for absolute quantification of transcriptome-wide m6A at single-base resolution. Nature Protocols. 2024;19(4):1252–87.
- View Article
- Google Scholar
72. Liang Z, Ye H, Ma J, Wei Z, Wang Y, Zhang Y, et al. m6A-Atlas v2.0: Updated resources for unraveling the N6-methyladenosine (m6A) epitranscriptome among multiple species. Nucleic Acids Res. 2024;52(D1):D194–202. pmid:37587690
- View Article
- PubMed/NCBI
- Google Scholar
73. Dominissini D, Moshitch-Moshkovitz S, Schwartz S, Salmon-Divon M, Ungar L, Osenberg S, et al. Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq. Nature. 2012;485(7397):201–6. pmid:22575960
- View Article
- PubMed/NCBI
- Google Scholar
74. Fustin J-M, Doi M, Yamaguchi Y, Hida H, Nishimura S, Yoshida M, et al. RNA-methylation-dependent RNA processing controls the speed of the circadian clock. Cell. 2013;155(4):793–806. pmid:24209618
- View Article
- PubMed/NCBI
- Google Scholar
75. Liu J, Yue Y, Han D, Wang X, Fu Y, Zhang L, et al. A METTL3-METTL14 complex mediates mammalian nuclear RNA N-6-adenosine methylation. Nature Chemical Biology. 2014;10(2):93–5.
- View Article
- Google Scholar
76. Schwartz S, Mumbach MR, Jovanovic M, Wang T, Maciag K, Bushkin GG, et al. Perturbation of m6A Writers Reveals Two Distinct Classes of mRNA Methylation at Internal and 5’ Sites. Cell Reports. 2014;8(1):284–96.
- View Article
- Google Scholar
77. Li Z, Weng H, Su R, Weng X, Zuo Z, Li C, et al. FTO plays an oncogenic role in acute myeloid leukemia as a N6-methyladenosine RNA demethylase. Cancer Cell. 2017;31(1):127–41. pmid:28017614
- View Article
- PubMed/NCBI
- Google Scholar
78. Barbieri I, Tzelepis K, Pandolfini L, Shi J, Millán-Zambrano G, Robson SC, et al. Promoter-bound METTL3 maintains myeloid leukaemia by m6A-dependent translation control. Nature. 2017;552(7683):126–31. pmid:29186125
- View Article
- PubMed/NCBI
- Google Scholar
79. Batista PJ, Molinie B, Wang J, Qu K, Zhang J, Li L, et al. m(6)A RNA modification controls cell fate transition in mammalian embryonic stem cells. Cell Stem Cell. 2014;15(6):707–19. pmid:25456834
- View Article
- PubMed/NCBI
- Google Scholar
80. Wu X, Wei Z, Chen K, Zhang Q, Su J, Liu H, et al. m6Acomet: Large-scale functional prediction of individual m6A RNA methylation sites from an RNA co-methylation network. BMC Bioinformatics. 2019;20(1):223.
- View Article
- Google Scholar
81. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: Accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14(4):R36. pmid:23618408
- View Article
- PubMed/NCBI
- Google Scholar
82. Xiang Y, Laurent B, Hsu C-H, Nachtergaele S, Lu Z, Sheng W, et al. RNA m6A methylation regulates the ultraviolet-induced DNA damage response. Nature. 2017;543(7646):573–6. pmid:28297716
- View Article
- PubMed/NCBI
- Google Scholar
83. Benjamini Y, Hochberg Y. Controlling the false discovery rate - A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. 1995;57(1):289–300.
- View Article
- Google Scholar
84. Vu LP, Pickering BF, Cheng Y, Zaccara S, Nguyen D, Minuesa G, et al. The N 6-methyladenosine (m 6 A)-forming enzyme METTL3 controls myeloid differentiation of normal hematopoietic and leukemia cells. Nature Medicine. 2017;23(11):1369.
- View Article
- Google Scholar
85. Williams BC, Gatti M, Goldberg ML. Bipolar spindle attachments affect redistributions of ZW10, a Drosophila centromere/kinetochore component required for accurate chromosome segregation. J Cell Biol. 1996;134(5):1127–40. pmid:8794856
- View Article
- PubMed/NCBI
- Google Scholar
86. Yu G, Wang LG, Han Y, He QY. clusterProfiler: An R package for comparing biological themes among gene clusters. Omics: a journal of integrative biology. 2012;16(5):284–7.
- View Article
- Google Scholar
87. Zhang Q, Chang C, Long Q. Robust knowledge-guided biclustering for multi-omics data. Briefings in Bioinformatics. 2024;25(1):bbad446.
- View Article
- Google Scholar
88. Tang X, Liu C, Lan C. scDBic: A novel deep learning-based biclustering algorithm for analyzing scRNA-seq data. Bioinformatics. 2026;42(3):btag095. pmid:41746287
- View Article
- PubMed/NCBI
- Google Scholar
89. Wang L, Zhang H, Chang H-W, Qin Q-M, Zhang B-R, Li X-Q, et al. GAEBic: A novel biclustering analysis method for miRNA-targeted gene data based on graph autoencoder. J Comput Sci Technol. 2021;36(2):299–309.
- View Article
- Google Scholar
90. Marchello G, Corneli M, Bouveyron C. A deep dynamic latent block model for co-clustering of zero-inflated data matrices. Journal of Computational and Graphical Statistics. 2024;33(4):1224–39.
- View Article
- Google Scholar
91. Charfaoui Y, Houari A, Boufera F. AMoDeBic: An adaptive Multi-objective Differential Evolution biclustering algorithm of microarray data using a biclustering binary mutation operator. Expert Systems with Applications. 2024;238:121863.
- View Article
- Google Scholar
92. Xie J, Ma A, Zhang Y, Liu B, Cao S, Wang C, et al. QUBIC2: A novel and robust biclustering algorithm for analyses and interpretation of large-scale RNA-Seq data. Bioinformatics. 2020;36(4):1143–9. pmid:31503285
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Yang Y, Hsu PJ, Chen YS, Yang YG. Dynamic transcriptomic m 6 A decoration: Writers, erasers, readers and functions in RNA metabolism. Cell Research. 2018;28(6):616.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Roundtree IA, Evans ME, Pan T, He C. Dynamic RNA modifications in gene expression regulation. Cell. 2017;169(7):1187–200. pmid:28622506
View Article
PubMed/NCBI
Google Scholar

[5] View Article

[6] PubMed/NCBI

[7] Google Scholar

[ref3] 3. Li J, Yang X, Qi Z, Sang Y, Liu Y, Xu B, et al. The role of mRNA m6A methylation in the nervous system. Cell Biosci. 2019;9:66. pmid:31452869
View Article
PubMed/NCBI
Google Scholar

[9] View Article

[10] PubMed/NCBI

[11] Google Scholar

[ref4] 4. Fu Y, Dominissini D, Rechavi G, He C. Gene expression regulation mediated through reversible m(6)A RNA methylation. Nature Reviews Genetics. 2014;15(5):293–306.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref5] 5. Angelova MT, Dimitrova DG, Dinges N, Lence T, Worpenberg L, Carre C, et al. The Emerging field of epitranscriptomics in neurodevelopmental and neuronal disorders. Frontiers in Bioengineering and Biotechnology. 2018;6.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref6] 6. Weisberg E, Boulton C, Kelly LM, Manley P, Fabbro D, Meyer T, et al. Inhibition of mutant FLT3 receptors in leukemia cells by the small molecule tyrosine kinase inhibitor PKC412. Cancer Cell. 2002;1(5):433–43. pmid:12124173
View Article
PubMed/NCBI
Google Scholar

[19] View Article

[20] PubMed/NCBI

[21] Google Scholar

[ref7] 7. Spiekermann K, Bagrintseva K, Schwab R, Schmieja K, Hiddemann W. Overexpression and constitutive activation of FLT3 induces STAT5 activation in primary acute myeloid leukemia blast cells. Clin Cancer Res. 2003;9(6):2140–50. pmid:12796379
View Article
PubMed/NCBI
Google Scholar

[23] View Article

[24] PubMed/NCBI

[25] Google Scholar

[ref8] 8. Koutras AK, Fountzilas G, Kalogeras KT, Starakis I, Iconomou G, Kalofonos HP. The upgraded role of HER3 and HER4 receptors in breast cancer. Critical Reviews in Oncology/Hematology. 2010;74(2):73–8.
View Article
Google Scholar

[27] View Article

[28] Google Scholar

[ref9] 9. Salassidis K, Bruch J, Zitzelsberger H, Lengfelder E, Kellerer AM, Bauchinger M. Translocation t (10; 14)(q11. 2; q22. 1) fusing the kinectin to the RET gene creates a novel rearranged form (PTC8) of the RET proto-oncogene in radiation-induced childhood papillary thyroid carcinoma. Cancer research. 2000;60(11):2786–9.
View Article
Google Scholar

[30] View Article

[31] Google Scholar

[ref10] 10. Khaidakov M, Mitra S, Kang B-Y, Wang X, Kadlubar S, Novelli G, et al. Oxidized LDL receptor 1 (OLR1) as a possible link between obesity, dyslipidemia and cancer. PLoS One. 2011;6(5):e20277. pmid:21637860
View Article
PubMed/NCBI
Google Scholar

[33] View Article

[34] PubMed/NCBI

[35] Google Scholar

[ref11] 11. Shintani S, Funayama T, Yoshihama Y, Alcalde RE, Matsumura T. Prognostic significance of ERBB3 overexpression in oral squamous cell carcinoma. Cancer Lett. 1995;95(1–2):79–83. pmid:7656248
View Article
PubMed/NCBI
Google Scholar

[37] View Article

[38] PubMed/NCBI

[39] Google Scholar

[ref12] 12. Li L, Sawamura T, Renier G. Glucose enhances human macrophage LOX-1 expression: role for LOX-1 in glucose-induced macrophage foam cell formation. Circ Res. 2004;94(7):892–901. pmid:15001526
View Article
PubMed/NCBI
Google Scholar

[41] View Article

[42] PubMed/NCBI

[43] Google Scholar

[ref13] 13. Dubois-Dauphin M, Poitry-Yamate C, de Bilbao F, Julliard AK, Jourdan F, Donati G. Early postnatal Müller cell death leads to retinal but not optic nerve degeneration in NSE-Hu-Bcl-2 transgenic mice. Neuroscience. 2000;95(1):9–21. pmid:10619458
View Article
PubMed/NCBI
Google Scholar

[45] View Article

[46] PubMed/NCBI

[47] Google Scholar

[ref14] 14. Tsuda H, Birrer MJ, Ito YM, Ohashi Y, Lin M, Lee C, et al. Identification of DNA copy number changes in microdissected serous ovarian cancer tissue using a cDNA microarray platform. Cancer genetics and cytogenetics. 2004;155(2):97–107.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref15] 15. López-Terrada D, Gunaratne PH, Adesina AM, Pulliam J, Hoang DM, Nguyen Y, et al. Histologic subtypes of hepatoblastoma are characterized by differential canonical Wnt and Notch pathway activation in DLK+ precursors. Hum Pathol. 2009;40(6):783–94. pmid:19200579
View Article
PubMed/NCBI
Google Scholar

[52] View Article

[53] PubMed/NCBI

[54] Google Scholar

[ref16] 16. Sullivan KE, Piliero LM, Dharia T, Goldman D, Petri MA. 3’ polymorphisms of ETS1 are associated with different clinical phenotypes in SLE. Hum Mutat. 2000;16(1):49–53. pmid:10874305
View Article
PubMed/NCBI
Google Scholar

[56] View Article

[57] PubMed/NCBI

[58] Google Scholar

[ref17] 17. Mustafa DA, Sieuwerts AM, Zheng PP, Kros JM. Overexpression of colligin 2 in glioma vasculature is associated with overexpression of heat shock factor 2. Gene regulation and systems biology. 2010;4:GRSB. S4546.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref18] 18. Frank NY, Schatton T, Kim S, Zhan Q, Wilson BJ, Ma J, et al. VEGFR-1 expressed by malignant melanoma-initiating cells is required for tumor growth. Cancer Res. 2011;71(4):1474–85. pmid:21212411
View Article
PubMed/NCBI
Google Scholar

[63] View Article

[64] PubMed/NCBI

[65] Google Scholar

[ref19] 19. Nordberg J, Mpindi JP, Iljin K, Pulliainen AT, Kallajoki M, Kallioniemi O, et al. Systemic analysis of gene expression profiles identifies ErbB3 as a potential drug target in pediatric alveolar rhabdomyosarcoma. PLoS One. 2012;7(12):e50819. pmid:23227212
View Article
PubMed/NCBI
Google Scholar

[67] View Article

[68] PubMed/NCBI

[69] Google Scholar

[ref20] 20. Beji A, Horst D, Engel J, Kirchner T, Ullrich A. Toward the prognostic significance and therapeutic potential of HER3 receptor tyrosine kinase in human colon cancer. Clinical Cancer Research. 2012;18(4):956–68.
View Article
Google Scholar

[71] View Article

[72] Google Scholar

[ref21] 21. Choi H-R, Kim WK, Kim EY, Han BS, Min J-K, Chi S-W, et al. Dual-specificity phosphatase 10 controls brown adipocyte differentiation by modulating the phosphorylation of p38 mitogen-activated protein kinase. PLoS One. 2013;8(8):e72340. pmid:23977283
View Article
PubMed/NCBI
Google Scholar

[74] View Article

[75] PubMed/NCBI

[76] Google Scholar

[ref22] 22. Luo J, Liu H, Luan S, He C, Li Z. Aberrant Regulation of mRNA m⁶A Modification in Cancer Development. Int J Mol Sci. 2018;19(9):2515. pmid:30149601
View Article
PubMed/NCBI
Google Scholar

[78] View Article

[79] PubMed/NCBI

[80] Google Scholar

[ref23] 23. Hong K. Emerging function of N6-methyladenosine in cancer. Oncol Lett. 2018;16(5):5519–24. pmid:30344705
View Article
PubMed/NCBI
Google Scholar

[82] View Article

[83] PubMed/NCBI

[84] Google Scholar

[ref24] 24. Lin Y, Ueda J, Yagyu K, Ishii H, Ueno M, Egawa N, et al. Association between variations in the fat mass and obesity-associated gene and pancreatic cancer risk: A case–control study in Japan. BMC Cancer. 2013;13(1):1–6.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref25] 25. Machiela MJ, Lindström S, Allen NE, Haiman CA, Albanes D, Barricarte A, et al. Association of type 2 diabetes susceptibility variants with advanced prostate cancer risk in the Breast and Prostate Cancer Cohort Consortium. Am J Epidemiol. 2012;176(12):1121–9. pmid:23193118
View Article
PubMed/NCBI
Google Scholar

[89] View Article

[90] PubMed/NCBI

[91] Google Scholar

[ref26] 26. Ortega A, Niksic M, Bachi A, Wilm M, Sánchez L, Hastie N, et al. Biochemical function of female-lethal (2)D/Wilms’ tumor suppressor-1-associated proteins in alternative pre-mRNA splicing. J Biol Chem. 2003;278(5):3040–7. pmid:12444081
View Article
PubMed/NCBI
Google Scholar

[93] View Article

[94] PubMed/NCBI

[95] Google Scholar

[ref27] 27. Reddy SM, Sadim M, Li J, Yi N, Agarwal S, Mantzoros CS, et al. Clinical and genetic predictors of weight gain in patients diagnosed with breast cancer. Br J Cancer. 2013;109(4):872–81. pmid:23922112
View Article
PubMed/NCBI
Google Scholar

[97] View Article

[98] PubMed/NCBI

[99] Google Scholar

[ref28] 28. Jin DI, Lee SW, Han ME, Kim HJ, Seo SA, Hur GY, et al. Expression and roles of Wilms’ tumor 1‐associating protein in glioblastoma. Cancer science. 2012;103(12):2102–9.
View Article
Google Scholar

[101] View Article

[102] Google Scholar

[ref29] 29. Akilzhanova A, Nurkina Z, Momynaliev K, Ramanculov E, Zhumadilov Z, Rakhypbekov T, et al. Genetic profile and determinants of homocysteine levels in Kazakhstan patients with breast cancer. Anticancer Res. 2013;33(9):4049–59. pmid:24023349
View Article
PubMed/NCBI
Google Scholar

[104] View Article

[105] PubMed/NCBI

[106] Google Scholar

[ref30] 30. Heiliger K-J, Hess J, Vitagliano D, Salerno P, Braselmann H, Salvatore G, et al. Novel candidate genes of thyroid tumourigenesis identified in Trk-T1 transgenic mice. Endocr Relat Cancer. 2012;19(3):409–21. pmid:22454401
View Article
PubMed/NCBI
Google Scholar

[108] View Article

[109] PubMed/NCBI

[110] Google Scholar

[ref31] 31. Casalegno-Garduno R, Schmitt A, Wang X, Xu X, Schmitt M. Wilms’ tumor 1 as a novel target for immunotherapy of leukemia. :3309–11.

[ref32] 32. Bokar JA. The biosynthesis and functional roles of methylated nucleosides in eukaryotic mRNA. Fine-tuning of RNA functions by modification and editing. Springer. 2005. 141–77.

[ref33] 33. Lin S, Choe J, Du P, Triboulet R, Gregory RI. The m(6)A Methyltransferase METTL3 Promotes translation in human cancer cells. Mol Cell. 2016;62(3):335–45. pmid:27117702
View Article
PubMed/NCBI
Google Scholar

[114] View Article

[115] PubMed/NCBI

[116] Google Scholar

[ref34] 34. Zhang C, Samanta D, Lu H, Bullen JW, Zhang H, Chen I, et al. Hypoxia induces the breast cancer stem cell phenotype by HIF-dependent and ALKBH5-mediated m⁶A-demethylation of NANOG mRNA. Proc Natl Acad Sci U S A. 2016;113(14):E2047–56. pmid:27001847
View Article
PubMed/NCBI
Google Scholar

[118] View Article

[119] PubMed/NCBI

[120] Google Scholar

[ref35] 35. Loos RJ, Yeo GD. The bigger picture of FTO—the first GWAS-identified obesity gene. Nature Reviews Endocrinology. 2014;10(1):51–61.
View Article
Google Scholar

[122] View Article

[123] Google Scholar

[ref36] 36. Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, Lindgren CM, et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science. 2007;316(5826):889–94. pmid:17434869
View Article
PubMed/NCBI
Google Scholar

[125] View Article

[126] PubMed/NCBI

[127] Google Scholar

[ref37] 37. Karra E, O’Daly OG, Choudhury AI, Yousseif A, Millership S, Neary MT, et al. A link between FTO, ghrelin, and impaired brain food-cue responsivity. J Clin Invest. 2013;123(8):3539–51. pmid:23867619
View Article
PubMed/NCBI
Google Scholar

[129] View Article

[130] PubMed/NCBI

[131] Google Scholar

[ref38] 38. Kalnina I, Zaharenko L, Vaivade I, Rovite V, Nikitina-Zake L, Peculis R, et al. Polymorphisms in FTO and near TMEM18 associate with type 2 diabetes and predispose to younger age at diagnosis of diabetes. Gene. 2013;527(2):462–8. pmid:23860325
View Article
PubMed/NCBI
Google Scholar

[133] View Article

[134] PubMed/NCBI

[135] Google Scholar

[ref39] 39. Wang L, Yu Q, Xiong Y, Liu L, Zhang X, Zhang Z, et al. Variant rs1421085 in the FTO gene contribute childhood obesity in Chinese children aged 3–6 years. Obesity research & clinical practice. 2013;7(1):e14–22.
View Article
Google Scholar

[137] View Article

[138] Google Scholar

[ref40] 40. Zhang M, Zhang Y, Ma J, Guo F, Cao Q, Zhang Y, et al. The demethylase activity of FTO (fat mass and obesity associated protein) is required for preadipocyte differentiation. PLoS One. 2015;10(7):e0133788. pmid:26218273
View Article
PubMed/NCBI
Google Scholar

[140] View Article

[141] PubMed/NCBI

[142] Google Scholar

[ref41] 41. Merkestein M, Laber S, McMurray F, Andrew D, Sachse G, Sanderson J, et al. FTO influences adipogenesis by regulating mitotic clonal expansion. Nature Communications. 2015;6(1):1–9.
View Article
Google Scholar

[144] View Article

[145] Google Scholar

[ref42] 42. Zhao X, Yang Y, Sun B-F, Shi Y, Yang X, Xiao W, et al. FTO-dependent demethylation of N6-methyladenosine regulates mRNA splicing and is required for adipogenesis. Cell Res. 2014;24(12):1403–19. pmid:25412662
View Article
PubMed/NCBI
Google Scholar

[147] View Article

[148] PubMed/NCBI

[149] Google Scholar

[ref43] 43. Meyer KD, Saletore Y, Zumbo P, Elemento O, Mason CE, Jaffrey SR. Comprehensive analysis of mRNA methylation reveals enrichment in 3’ UTRs and near stop codons. Cell. 2012;149(7):1635–46. pmid:22608085
View Article
PubMed/NCBI
Google Scholar

[151] View Article

[152] PubMed/NCBI

[153] Google Scholar

[ref44] 44. De Simone M, Hoover J, Lau J, Bennett HM, Wu B, Chen C, et al. A comprehensive analysis framework for evaluating commercial single-cell RNA sequencing technologies. Nucleic Acids Res. 2025;53(2):gkae1186. pmid:39675380
View Article
PubMed/NCBI
Google Scholar

[155] View Article

[156] PubMed/NCBI

[157] Google Scholar

[ref45] 45. Liu C, Liang H, Wan AH, Xiao M, Sun L, Yu Y, et al. Decoding the m6A epitranscriptomic landscape for biotechnological applications using a direct RNA sequencing approach. Nat Commun. 2025;16(1):798. pmid:39824841
View Article
PubMed/NCBI
Google Scholar

[159] View Article

[160] PubMed/NCBI

[161] Google Scholar

[ref46] 46. Asim MN, Ibrahim MA, Asif T, Dengel A. RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models. Heliyon. 2025.
View Article
Google Scholar

[163] View Article

[164] Google Scholar

[ref47] 47. Sun W-J, Li J-H, Liu S, Wu J, Zhou H, Qu L-H, et al. RMBase: A resource for decoding the landscape of RNA modifications from high-throughput sequencing data. Nucleic Acids Res. 2016;44(D1):D259-65. pmid:26464443
View Article
PubMed/NCBI
Google Scholar

[166] View Article

[167] PubMed/NCBI

[168] Google Scholar

[ref48] 48. Xuan J, Chen L, Chen Z, Pang J, Huang J, Lin J, et al. RMBase v3.0: decode the landscape, mechanisms and functions of RNA modifications. Nucleic Acids Res. 2024;52(D1):D273–84. pmid:37956310
View Article
PubMed/NCBI
Google Scholar

[170] View Article

[171] PubMed/NCBI

[172] Google Scholar

[ref49] 49. Liu H, Ma J, Meng J, Zhang L. MeT-DB V2.0: Elucidating Context-Specific Functions of N6-Methyl-Adenosine Methyltranscriptome. Methods Mol Biol. 2021;2284:507–18. pmid:33835460
View Article
PubMed/NCBI
Google Scholar

[174] View Article

[175] PubMed/NCBI

[176] Google Scholar

[ref50] 50. Zheng Y, Nie P, Peng D, He Z, Liu M, Xie Y, et al. m6AVar: A database of functional variants involved in m6A modification. Nucleic Acids Res. 2018;46(D1):D139–45. pmid:29036329
View Article
PubMed/NCBI
Google Scholar

[178] View Article

[179] PubMed/NCBI

[180] Google Scholar

[ref51] 51. Deng S, Zhang H, Zhu K, Li X, Ye Y, Li R, et al. M6A2Target: A comprehensive database for targets of m 6 A writers, erasers and readers. Briefings in Bioinformatics. 2021;22(3):bbaa055.
View Article
Google Scholar

[182] View Article

[183] Google Scholar

[ref52] 52. Zhang SY, Zhang SW, Liu L, Meng J, Huang Y. m6A-driver: Identifying context-specific mRNA m6A methylation-driven gene interaction networks. PLoS Computational Biology. 2016;12(12):e1005287.
View Article
Google Scholar

[185] View Article

[186] Google Scholar

[ref53] 53. Zhou Y, Zeng P, Li Y-H, Zhang Z, Cui Q. SRAMP: Prediction of mammalian N6-methyladenosine (m6A) sites based on sequence-derived features. Nucleic Acids Res. 2016;44(10):e91. pmid:26896799
View Article
PubMed/NCBI
Google Scholar

[188] View Article

[189] PubMed/NCBI

[190] Google Scholar

[ref54] 54. Chen K, Wei Z, Zhang Q, Wu X, Rong R, Lu Z, et al. WHISTLE: A high-accuracy map of the human N6-methyladenosine (m6A) epitranscriptome predicted using a machine learning approach. Nucleic Acids Res. 2019;47(7):e41. pmid:30993345
View Article
PubMed/NCBI
Google Scholar

[192] View Article

[193] PubMed/NCBI

[194] Google Scholar

[ref55] 55. Fan R, Cui C, Kang B, Chang Z, Wang G, Cui Q. A combined deep learning framework for mammalian m6A site prediction. Cell Genom. 2024;4(12):100697. pmid:39571573
View Article
PubMed/NCBI
Google Scholar

[196] View Article

[197] PubMed/NCBI

[198] Google Scholar

[ref56] 56. Liu L, Zhang S-W, Zhang Y-C, Liu H, Zhang L, Chen R, et al. Decomposition of RNA methylome reveals co-methylation patterns induced by latent enzymatic regulators of the epitranscriptome. Mol Biosyst. 2015;11(1):262–74. pmid:25370990
View Article
PubMed/NCBI
Google Scholar

[200] View Article

[201] PubMed/NCBI

[202] Google Scholar

[ref57] 57. Liu Z, Xiao Y, Yin H, Li X, Chen S, Xia K, et al. BDBB: A novel beta-distribution-based biclustering algorithm for revealing local co-methylation patterns in epi-transcriptome profiling data. IEEE J Biomed Health Inform. 2022;26(6):2405–16. pmid:33764880
View Article
PubMed/NCBI
Google Scholar

[204] View Article

[205] PubMed/NCBI

[206] Google Scholar

[ref58] 58. Liu Z, Xiao Y, Wang H, Li C, Yin H. BBM: A novel beta-binomial-distribution-based biclustering algorithm for mining m6A co-methylation patterns. Expert Systems with Applications. 2024;258:125121.
View Article
Google Scholar

[208] View Article

[209] Google Scholar

[ref59] 59. Chen S, Zhang L, Lu L, Meng J, Liu H. FBCwPlaid: A Functional Biclustering Analysis of Epi-Transcriptome Profiling Data Via a Weighted Plaid Model. IEEE/ACM Trans Comput Biol Bioinform. 2022;19(3):1640–50. pmid:33400655
View Article
PubMed/NCBI
Google Scholar

[211] View Article

[212] PubMed/NCBI

[213] Google Scholar

[ref60] 60. Zhang L, Chen S, Zhu J, Meng J, Liu H. REW-ISA: Unveiling local functional blocks in epi-transcriptome profiling data via an RNA expression-weighted iterative signature algorithm. BMC Bioinformatics. 2020;21(1):447. pmid:33036550
View Article
PubMed/NCBI
Google Scholar

[215] View Article

[216] PubMed/NCBI

[217] Google Scholar

[ref61] 61. Cui X, Meng J, Zhang S, Rao MK, Chen Y, Huang Y. A hierarchical model for clustering m 6 A methylation peaks in MeRIP-seq data. BMC Genomics. 2016;17(7):317–25.
View Article
Google Scholar

[219] View Article

[220] Google Scholar

[ref62] 62. Chen K, Wei Z, Liu H, d. Magalhães JP, Rong R, Lu Z, et al. Enhancing epitranscriptome module detection from m6A-seq data using threshold-based measurement weighting strategy. BioMed Research International. 2018;2018.
View Article
Google Scholar

[222] View Article

[223] Google Scholar

[ref63] 63. Zhang L, He Y, Wang H, Liu H, Huang Y, Wang X, et al. Clustering count-based RNA methylation data using a nonparametric generative model. Current Bioinformatics. 2019;14(1):11–23.
View Article
Google Scholar

[225] View Article

[226] Google Scholar

[ref64] 64. Liu Z, Yin H, Chen S, Liu H, Meng J, Wang H, et al. MBMM: Moment estimating beta mixture model-based clustering algorithm for m6A Co-methylation module mining. Current Bioinformatics. 2021;16(10):1244–56.
View Article
Google Scholar

[228] View Article

[229] Google Scholar

[ref65] 65. Padilha VA, Campello RJ. A systematic comparative evaluation of biclustering techniques. BMC Bioinformatics. 2017;18(1):55.
View Article
Google Scholar

[231] View Article

[232] Google Scholar

[ref66] 66. Chen S, Zhang L, Lu L, Meng J, Liu H. FBCwPlaid: A functional biclustering analysis of epi-transcriptome profiling data via a weighted plaid model. IEEE/ACM Trans Comput Biol Bioinform. 2022;19(3):1640–50. pmid:33400655
View Article
PubMed/NCBI
Google Scholar

[234] View Article

[235] PubMed/NCBI

[236] Google Scholar

[ref67] 67. Dominissini D, Moshitch-Moshkovitz S, Schwartz S, Salmon-Divon M, Ungar L, Osenberg S, et al. Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq. Nature. 2012;485(7397):201–6. pmid:22575960
View Article
PubMed/NCBI
Google Scholar

[238] View Article

[239] PubMed/NCBI

[240] Google Scholar

[ref68] 68. Zeng Y, Wang S, Gao S, Soares F, Ahmed M, Guo H, et al. Refined RIP-seq protocol for epitranscriptome analysis with low input materials. PLoS Biol. 2018;16(9):e2006092. pmid:30212448
View Article
PubMed/NCBI
Google Scholar

[242] View Article

[243] PubMed/NCBI

[244] Google Scholar

[ref69] 69. Cui X, Meng J, Zhang S, Chen Y, Huang Y. A novel algorithm for calling mRNA m6A peaks by modeling biological variances in MeRIP-seq data. Bioinformatics. 2016;32(12):i378–85. pmid:27307641
View Article
PubMed/NCBI
Google Scholar

[246] View Article

[247] PubMed/NCBI

[248] Google Scholar

[ref70] 70. Cui X, Zhang L, Meng J, Rao MK, Chen Y, Huang Y. MeTDiff: A novel differential RNA methylation analysis for MeRIP-Seq data. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2015;15(2):526–34.
View Article
Google Scholar

[250] View Article

[251] Google Scholar

[ref71] 71. Shen W, Sun H, Liu C, Yi Y, Hou Y, Xiao Y, et al. GLORI for absolute quantification of transcriptome-wide m6A at single-base resolution. Nature Protocols. 2024;19(4):1252–87.
View Article
Google Scholar

[253] View Article

[254] Google Scholar

[ref72] 72. Liang Z, Ye H, Ma J, Wei Z, Wang Y, Zhang Y, et al. m6A-Atlas v2.0: Updated resources for unraveling the N6-methyladenosine (m6A) epitranscriptome among multiple species. Nucleic Acids Res. 2024;52(D1):D194–202. pmid:37587690
View Article
PubMed/NCBI
Google Scholar

[256] View Article

[257] PubMed/NCBI

[258] Google Scholar

[ref73] 73. Dominissini D, Moshitch-Moshkovitz S, Schwartz S, Salmon-Divon M, Ungar L, Osenberg S, et al. Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq. Nature. 2012;485(7397):201–6. pmid:22575960
View Article
PubMed/NCBI
Google Scholar

[260] View Article

[261] PubMed/NCBI

[262] Google Scholar

[ref74] 74. Fustin J-M, Doi M, Yamaguchi Y, Hida H, Nishimura S, Yoshida M, et al. RNA-methylation-dependent RNA processing controls the speed of the circadian clock. Cell. 2013;155(4):793–806. pmid:24209618
View Article
PubMed/NCBI
Google Scholar

[264] View Article

[265] PubMed/NCBI

[266] Google Scholar

[ref75] 75. Liu J, Yue Y, Han D, Wang X, Fu Y, Zhang L, et al. A METTL3-METTL14 complex mediates mammalian nuclear RNA N-6-adenosine methylation. Nature Chemical Biology. 2014;10(2):93–5.
View Article
Google Scholar

[268] View Article

[269] Google Scholar

[ref76] 76. Schwartz S, Mumbach MR, Jovanovic M, Wang T, Maciag K, Bushkin GG, et al. Perturbation of m6A Writers Reveals Two Distinct Classes of mRNA Methylation at Internal and 5’ Sites. Cell Reports. 2014;8(1):284–96.
View Article
Google Scholar

[271] View Article

[272] Google Scholar

[ref77] 77. Li Z, Weng H, Su R, Weng X, Zuo Z, Li C, et al. FTO plays an oncogenic role in acute myeloid leukemia as a N6-methyladenosine RNA demethylase. Cancer Cell. 2017;31(1):127–41. pmid:28017614
View Article
PubMed/NCBI
Google Scholar

[274] View Article

[275] PubMed/NCBI

[276] Google Scholar

[ref78] 78. Barbieri I, Tzelepis K, Pandolfini L, Shi J, Millán-Zambrano G, Robson SC, et al. Promoter-bound METTL3 maintains myeloid leukaemia by m6A-dependent translation control. Nature. 2017;552(7683):126–31. pmid:29186125
View Article
PubMed/NCBI
Google Scholar

[278] View Article

[279] PubMed/NCBI

[280] Google Scholar

[ref79] 79. Batista PJ, Molinie B, Wang J, Qu K, Zhang J, Li L, et al. m(6)A RNA modification controls cell fate transition in mammalian embryonic stem cells. Cell Stem Cell. 2014;15(6):707–19. pmid:25456834
View Article
PubMed/NCBI
Google Scholar

[282] View Article

[283] PubMed/NCBI

[284] Google Scholar

[ref80] 80. Wu X, Wei Z, Chen K, Zhang Q, Su J, Liu H, et al. m6Acomet: Large-scale functional prediction of individual m6A RNA methylation sites from an RNA co-methylation network. BMC Bioinformatics. 2019;20(1):223.
View Article
Google Scholar

[286] View Article

[287] Google Scholar

[ref81] 81. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: Accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14(4):R36. pmid:23618408
View Article
PubMed/NCBI
Google Scholar

[289] View Article

[290] PubMed/NCBI

[291] Google Scholar

[ref82] 82. Xiang Y, Laurent B, Hsu C-H, Nachtergaele S, Lu Z, Sheng W, et al. RNA m6A methylation regulates the ultraviolet-induced DNA damage response. Nature. 2017;543(7646):573–6. pmid:28297716
View Article
PubMed/NCBI
Google Scholar

[293] View Article

[294] PubMed/NCBI

[295] Google Scholar

[ref83] 83. Benjamini Y, Hochberg Y. Controlling the false discovery rate - A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. 1995;57(1):289–300.
View Article
Google Scholar

[297] View Article

[298] Google Scholar

[ref84] 84. Vu LP, Pickering BF, Cheng Y, Zaccara S, Nguyen D, Minuesa G, et al. The N 6-methyladenosine (m 6 A)-forming enzyme METTL3 controls myeloid differentiation of normal hematopoietic and leukemia cells. Nature Medicine. 2017;23(11):1369.
View Article
Google Scholar

[300] View Article

[301] Google Scholar

[ref85] 85. Williams BC, Gatti M, Goldberg ML. Bipolar spindle attachments affect redistributions of ZW10, a Drosophila centromere/kinetochore component required for accurate chromosome segregation. J Cell Biol. 1996;134(5):1127–40. pmid:8794856
View Article
PubMed/NCBI
Google Scholar

[303] View Article

[304] PubMed/NCBI

[305] Google Scholar

[ref86] 86. Yu G, Wang LG, Han Y, He QY. clusterProfiler: An R package for comparing biological themes among gene clusters. Omics: a journal of integrative biology. 2012;16(5):284–7.
View Article
Google Scholar

[307] View Article

[308] Google Scholar

[ref87] 87. Zhang Q, Chang C, Long Q. Robust knowledge-guided biclustering for multi-omics data. Briefings in Bioinformatics. 2024;25(1):bbad446.
View Article
Google Scholar

[310] View Article

[311] Google Scholar

[ref88] 88. Tang X, Liu C, Lan C. scDBic: A novel deep learning-based biclustering algorithm for analyzing scRNA-seq data. Bioinformatics. 2026;42(3):btag095. pmid:41746287
View Article
PubMed/NCBI
Google Scholar

[313] View Article

[314] PubMed/NCBI

[315] Google Scholar

[ref89] 89. Wang L, Zhang H, Chang H-W, Qin Q-M, Zhang B-R, Li X-Q, et al. GAEBic: A novel biclustering analysis method for miRNA-targeted gene data based on graph autoencoder. J Comput Sci Technol. 2021;36(2):299–309.
View Article
Google Scholar

[317] View Article

[318] Google Scholar

[ref90] 90. Marchello G, Corneli M, Bouveyron C. A deep dynamic latent block model for co-clustering of zero-inflated data matrices. Journal of Computational and Graphical Statistics. 2024;33(4):1224–39.
View Article
Google Scholar

[320] View Article

[321] Google Scholar

[ref91] 91. Charfaoui Y, Houari A, Boufera F. AMoDeBic: An adaptive Multi-objective Differential Evolution biclustering algorithm of microarray data using a biclustering binary mutation operator. Expert Systems with Applications. 2024;238:121863.
View Article
Google Scholar

[323] View Article

[324] Google Scholar

[ref92] 92. Xie J, Ma A, Zhang Y, Liu B, Cao S, Wang C, et al. QUBIC2: A novel and robust biclustering algorithm for analyses and interpretation of large-scale RNA-Seq data. Bioinformatics. 2020;36(4):1143–9. pmid:31503285
View Article
PubMed/NCBI
Google Scholar

[326] View Article

[327] PubMed/NCBI

[328] Google Scholar

A novel biclustering algorithm for mining m⁶A co-methylation patterns based on beta-binomial distribution and data screening strategy

A novel biclustering algorithm for mining m⁶A co-methylation patterns based on beta-binomial distribution and data screening strategy

This is an uncorrected proof.

Figures

Abstract

Author summary

1. Introduction

2. Results

2.1. Simulation data experimental analysis

2.2. Real data experimental analysis

3. Discussion

4. Methods

4.1. Probabilistic graphical model of EBBM

4.2. Parameter inference

4.3. Algorithm implementation

Supporting information

S1 Fig. Sensitivity analysis of IP/Input ratio threshold.

References