iPiDA-LTR: Identifying piwi-interacting RNA-disease associations based on Learning to Rank

Wenxiang Zhang; Jialu Hou; Bin Liu

doi:10.1371/journal.pcbi.1010404

Abstract

Piwi-interacting RNAs (piRNAs) are regarded as drug targets and biomarkers for the diagnosis and therapy of diseases. However, biological experiments cost substantial time and resources, and the existing computational methods only focus on identifying missing associations between known piRNAs and diseases. With the fast development of biological experiments, more and more piRNAs are detected. Therefore, the identification of piRNA-disease associations of newly detected piRNAs has significant theoretical value and practical significance on pathogenesis of diseases. In this study, the iPiDA-LTR predictor is proposed to identify associations between piRNAs and diseases based on Learning to Rank. The iPiDA-LTR predictor not only identifies the missing associations between known piRNAs and diseases, but also detects diseases associated with newly detected piRNAs. Experimental results demonstrate that iPiDA-LTR effectively predicts piRNA-disease associations outperforming the other related methods.

Author summary

Accumulating evidences have indicated that dysfunction and abnormal expression of piRNAs are closely associated with the emergence and development of diseases. Currently, identifying piRNA-disease associations mainly focuses on biological experimental methods and computational methods. However, biological experimental methods take substantial time and resources. Computational methods mainly focused on identifying diseases associated known piRNAs. With the development of biological technology, more and more newly detected piRNAs were detected. Therefore, identifying diseases associated with newly detected piRNAs is more important compared with identifying diseases associated with known piRNAs. Information retrieval (IR)’s goal is to rank documents based on the relevance to certain topics. This task is particularly similar with identification of piRNA-disease associations. Specifically, ranking documents related to previous topics corresponds to identify diseases associated with known piRNAs, and ranking documents related to novel topics is similar to identify diseases associated with newly detected piRNAs. Therefore, we propose a new predictor called iPiDA-LTR to predict associations between piRNAs and diseases based on information retrieval technology. Experimental results indicated that iPiDA-LTR is promising in identifying diseases associated with known piRNAs and newly detected piRNAs.

Citation: Zhang W, Hou J, Liu B (2022) iPiDA-LTR: Identifying piwi-interacting RNA-disease associations based on Learning to Rank. PLoS Comput Biol 18(8): e1010404. https://doi.org/10.1371/journal.pcbi.1010404

Editor: Quan Zou, University of Electronic Science and Technology, CHINA

Received: March 27, 2022; Accepted: July 18, 2022; Published: August 15, 2022

Copyright: © 2022 Zhang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: A user-friendly web server of iPiDA-LTR predictor is freely available at http://bliulab.net/iPiDA-LTR/.

Funding: This work was supported by the National Key R&D Program of China (No. 2018AAA0100100 to BL) and the Beijing Natural Science Foundation (No. JQ19019 to BL). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

This is a PLOS Computational Biology Methods paper.

Introduction

Piwi-interacting RNA (piRNA) with 24–31 nucleotides in length is a class of small RNAs interacting with Piwi-subfamily Argonaute proteins [1–3]. Early studies found that piRNAs mainly located in germ stem cells on drosophila and mouse, and regulated germ stem cell proliferation [4–6]. With the fast development of biotechnology and computing techniques [7, 8], more and more piRNAs were discovered, and the corresponding functions were also detected, including stem cell proliferation, gene expression, and heterochromatin formation, etc [9–12].

As more and more piRNA functions were detected, many evidences indicated that dysfunction and abnormal expression of piRNAs are closely associated with the emergence and development of diseases [13–17]. Therefore, the identification of associations between piRNAs and diseases is important for diagnosis and treatment of diseases [18, 19]. Currently, it mainly focused on biological experimental methods and computational methods. For biological experiments methods, Cabral et al. indicated that piRNAs play a role in the process of translational research of gastric cancer as potential biomarkers [20]. Krishnan et al. identified eight non-redundant piRNAs as breast cancer markers [21]. Roy et al. studied the reciprocal expression between piRNAs and the corresponding targets, and provided a novel insight into the role of piRNAs in Alzheimer’s disease [22]. Although biological experimental methods are highly reliable, it takes substantial time and resources. Some computational methods have been proposed for identifying the associations between non-coding RNAs and diseases, such as miRNA-disease associations [23], circRNA-disease associations [24], etc. In this regard, computational methods are proposed to predict piRNA-disease associations, which can serve as powerful auxiliary tools to save time and cost compared with biological experiments. For example, Wei et al. proposed the first computational predictor for identifying piRNA-disease associations based on the positive unlabelled learning algorithm, and established the first web server [25]. A convolutional neural network was utilized to extract association features between piRNAs and diseases, and then the Support Vector Machine was employed to construct the predictor [26]. Although computational methods have been proposed, they mainly aim at the application scenario of identifying missing associations between known piRNAs and diseases. However, more and more newly detected piRNAs were detected [27–29]. Therefore, the application scenario of identifying piRNA-disease associations of newly detected piRNAs is very important to investigate piRNA functions and disease pathogenesis.

In recent years, information retrieval (IR) becomes a widely used technology, whose ultimate goal is to rank documents based on the relevance to certain topics [30, 31]. As an successful algorithm in information retrieval, Learning to Rank (LTR) [32, 33] has been successfully applied to web page retrieval employed by Google [34], Yahoo [35], Microsoft [36], etc. Compared with traditional IR methods, the advantage of LTR is that it integrates component methods so as to automatically rank documents associated with query from multiple perspectives [30]. LTR has been applied in identifying circRNA-disease associations [24], detecting protein remote homology [37], predicting protein-phenotype associations [38], drug–target binding affinity prediction [39], etc. The core concept of LTR is to calculate the relevance score f(q, d) between query q and document d. Therefore, this task is particularly similar with identification of piRNA-disease associations (see Fig 1). PiRNAs and diseases can be treated as queries and documents, respectively. Learning to Rank not only identifies associations between known piRNAs and diseases, but also ranks diseases associated with newly detected piRNAs.

Download:

Fig 1. The similarities between the prediction of documents related with query topics and the identification of diseases associated with query piRNAs, where piRNA and disease can be treated as query and document, respectively.

https://doi.org/10.1371/journal.pcbi.1010404.g001

In this study, we propose a new predictor, named iPiDA-LTR, to predict associations between piRNAs and diseases, which has the following advantages. iPiDA-LTR predictor combines component methods and Learning to Rank, which cannot only identify missing associations between known piRNAs and diseases, but also can identify diseases associated with newly detected piRNAs. Experimental results indicated that iPiDA-LTR is promising to identify piRNA-disease associations. A web server of iPiDA-LTR is constructed to identify diseases associated with query piRNAs, which can be accessed at http://bliulab.net/iPiDA-LTR.

Materials and methods

Materials

To imitate two application scenarios, we construct two types of datasets based on piRDisease v1.0 database [40] collecting 7939 piRNA-disease associations with 4796 piRNAs and 28 diseases. Firstly, a standard dataset is constructed following [25], which can be represented as: (1) where represents 5002 piRNA-disease associations from [25]. and contain 4350 piRNAs and 21 diseases from , respectively. and contain known piRNA-disease associations and unknown piRNA-disease associations, respectively. Specifically, piRNA-disease associations contained in are labelled as 1, otherwise 0. To avoid overfitting problem, is further divided into a benchmark dataset and an independent dataset. The benchmark dataset is used to adjust parameters and train model via cross-validation, and the independent dataset is employed to evaluate the performance of different methods.

For the first application scenario: predicting associations between known piRNAs and known diseases

Benchmark dataset and independent dataset are constructed as: (2) where we randomly select 20% associations from and to construct and , respectively, and then the remaining associations in and are used to construct and , respectively. Obviously, represents benchmark dataset, which is used to optimize parameters and train models, and then trained models are used to identify unknown associations in .

For the second application scenario: predicting the associations between newly detected piRNAs and known diseases

To imitate the second application scenario, we randomly select 80% and 20% piRNAs from as known piRNA set and newly detected piRNA set , respectively, based on which benchmark dataset and independent dataset are constructed as: (3) where and represent benchmark dataset and independent dataset, respectively. PiRNAs contained in and belong to and , respectively. Detailed information of , , and is shown in Table 1. The datasets can be obtained at http://bliulab.net/iPiDA-LTR/dataset/.

Download:

Table 1. The detailed statistical information of

,

and

.

https://doi.org/10.1371/journal.pcbi.1010404.t001

Method overview

In this study, a novel ranking framework, named iPiDA-LTR, is proposed to solve two application scenarios. The workflow of iPiDA-LTR is shown in Fig 2 with three steps: (a) Association feature extraction; (b) Component methods; (c) Ranking diseases associated with query piRNAs.

Download:

Fig 2. The workflow of iPiDA-LTR predictor.

(i) Association feature extraction: piRNA sequences and disease ontology are used to calculate piRNA sequence similarities and disease semantic similarities by combining them and piRNA-disease associations to construct association features and labels; (ii) Component methods: four methods are used to train models with benchmark dataset, and then trained models are utilized to calculate association scores of query piRNAs; (iii) Ranking diseases associated with query piRNAs: association scores of samples in the benchmark dataset are used to train LambdaMART model, and then trained LambdaMART model is employed to rank diseases associated with query piRNAs.

https://doi.org/10.1371/journal.pcbi.1010404.g002

Association feature extraction

PiRNA sequence similarities.

The piRNA similarities play a vital role in RNA-disease association identification [24–26], and piRNA sequence similarities have been applied to piRNA-disease association identification [25, 26]. Many methods have been proposed to calculate sequence similarities [41–43]. For example, Smith-Waterman algorithm has been successfully applied to multiple sequence analysis tasks, including RNA sequence similarity analysis [25, 26, 44], protein sequence analysis [45, 46], etc. In this study, we employ Smith-Waterman algorithm [41, 44] to calculate piRNA sequence similarities: (4) where S_P(p_i, p_j) is similarity between piRNA p_i and piRNA p_j. SW(p_i, p_j) represents local alignment score between piRNA p_i and piRNA p_j based on Smith-Waterman algorithm.

Disease semantic similarities.

The disease semantic similarity calculation is a key component in RNA-disease association identification. The disease ontology [47] has been applied to RNA-disease association identification so as to calculate disease semantic similarities [48–53]. Disease ontology organized by the directed acyclic graph (DAG) provides a hierarchical structure of the complex disease parent node [47]. Similar diseases share similar hierarchical structure in DAG of disease ontology. Therefore, DAG of disease ontology helps to measure similarity between two diseases. In this study, we use DAG of disease ontology to calculate disease semantic similarities [54, 55]: (5) (6) where S_D(m, n) is similarity between disease m and disease n. T_k represents the node set containing the ancestor nodes of k and itself. S_n(i) is the semantic value of node i to node n.

Association features and labels.

The association feature between disease d and the query piRNA p is: (7) where is the association features of piRNA p and disease d. S_P(p,:) and S_D(d,:) represent pth row and dth row in the S_P and S_D, respectively. If piRNA p is associated with disease d, the label of is equal to 1, otherwise 0.

Component methods

In this study, we select two types of component methods to calculate association scores, including machine learning methods and collaborative filtering (CF). For machine learning methods, Random Forest (RF) [56–60], Logistic Regression method (LR) [61], and Support Vector Machine (SVM) [62–64] are employed, treating the identification of piRNA-disease association as a classification problem. CF is a recommendation algorithm [65, 66], which utilizes guilt-by-association assumption to identify piRNA-disease association focusing on local information. In this study, association features of benchmark dataset (see Eq 7) are used to train machine learning models, and then used to calculate association scores for dataset. Finally, association features between piRNA p and disease d can be represented as: (8) where represents association features of piRNA p and disease d. , , and are association scores between piRNA p and disease d calculated by CF, LR, RF and SVM, respectively.

Ranking diseases associated with query piRNAs

In this study, we employ Learning to Rank (LTR) to solve the problem of identifying potential piRNA-disease associations motivated by information retrieval [24, 37, 38, 67]. LTR is generally classified into three categories, including ListWise, PairWise and PointWise [68]. In this study, a ListWise method LambdaMART [32] is selected to obtain high quality of top-ranked diseases, which has been applied in identifying circRNA-disease associations [24], detecting protein remote homology [37], predicting protein-phenotype associations [38] and drug–target binding affinity prediction [39]. The number of trees, the truncation level k, shrinkage and the number of leaves are the four main parameters. The truncation level of k influences the quality of top-ranked results by Normalized Discounted Cumulative Gain (NDCG), which can be formulated as [32]: (9) where k represents the truncation level. IDCG@k is the value of DCG@k in the best optimal ranking results. If a query piRNA is associated with disease located in position i, rel_i is equal to 1, otherwise 0. To obtain the final ranking results, association scores calculated by Eq 8 for training set are used to train LambdaMART model, and the trained LambdaMART model is employed to rank diseases associated with query piRNAs based on association scores of query piRNAs.

Results and discussion

Evaluation criteria

In this study, the benchmark dataset is employed to optimize the parameters of the models, and the independent dataset is used to evaluate the performance of predictors. How to evaluate the ranking quality and prediction performance is crucial for identifying piRNA-disease associations. Because iPiDA-LTR predictor treats the identification of piRNA-disease associations as an information retrieval ranking task, we employ three important ranking criteria to evaluate the rank quality of different predictors: Normalized Discounted Cumulative Gain (NDCG), Mean Average Precision (MAP) and ROCk. Besides, Area Under the ROC Curve (AUC) and Area Under the Precision-Recall Curve (AUPR) are also used to measure comprehensive performance [69–72]. The average values of these criteria for all query piRNAs are calculated to evaluate performance of predictors.

The effect of parameters for identifying piRNA-disease associations

iPiDA-LTR predictor mainly contains the following four parameters: the number of trees, the truncation level k, shrinkage and the number of leaves. Due to the large number of combinations of the four parameters, we fix three parameters in turns, and then find the local optimal values of the remaining parameters according to AUPR. The influences of different combinations of parameters for iPiDA-LTR on dataset and dataset are shown in Figs 3 and 4, respectively, from which we can see that the final optimized combinations of four parameters on iPiDA-LTR predictor on dataset and dataset are (120, 14, 0.22, 3) and (30, 15, 0.10, 29), respectively.

Download:

Fig 3. The predictive results of iPiDA-LTR predictor on

dataset via five-fold cross-validation.

(a) The truncation level k, shrinkage and the number of leaves are assigned as 10, 0.10, and 10 respectively, which are RankLib’s default values (https://sourceforge.net/p/lemur/wiki/RankLib/), and the optimal value of the number of trees is 120; (b) The number of trees, shrinkage and the number of leaves are fixed as 120, 0.10 and 10 respectively, and the truncation level k is optimized as 14; (c) The number of trees, the truncation level k and the number of leaves are fixed as 120, 14 and 10 respectively, and the shrinkage is optimized as 0.22; (d) The number of trees, the truncation level k and shrinkage are 120, 14 and 0.22 respectively, and the number of leaves is set as 3.

https://doi.org/10.1371/journal.pcbi.1010404.g003

Download:

Fig 4. The predictive results of iPiDA-LTR predictor on

dataset via five-fold cross-validation.

(a) The truncation level k, shrinkage and the number of leaves are assigned as 10, 0.10, and 10 respectively, which are the RankLib’s default values (https://sourceforge.net/p/lemur/wiki/RankLib/), and the optimal value of the number of trees is 30; (b) The number of trees, shrinkage and the number of leaves are fixed as 30, 0.10 and 10 respectively, and the truncation level k is optimized as 15; (c) The number of trees, the truncation level k and the number of leaves are fixed as 30, 15 and 10 respectively, and the shrinkage is optimized as 0.10; (d) The number of trees, the truncation level k and shrinkage are 30, 15 and 0.1 respectively, and the number of leaves is optimized as 29.

https://doi.org/10.1371/journal.pcbi.1010404.g004

Complementary analysis for component methods

In this study, iPiDA-LTR incorporates two types of component methods, including machine learning methods (LR, RF and SVM) and collaborative filtering (CF). LR, RF and SVM are obtained by python package Scikit-learn [73]. For LR’s parameters, max_iter and solver are assigned as 300 and liblinear, respectively. For RF’s parameters, n_estimators, max_leaf_nodes, n_jobs and max_features are assigned as 80, 10, -1 and 0.2, respectively. For SVM’s parameters, kernel and probability are assigned as linear and True, respectively. We analyze the impact of different types of component methods to identify associations between piRNAs and diseases, and the results are shown in Tables 2 and 3, from which we can see the followings: (i) iPiDA-LTR predictor outperforms iPiDA-LTR-ML predictor on dataset and dataset; (ii) The iPiDA-LTR obviously outperforms iPiDA-LTR-ML in terms of ranking criteria (NDCG@5 and ROC1), especially for the second application scenario (see Table 3). Machine learning methods based on classification algorithms focus on global predictive performance, and collaborative filtering can identify special piRNA-related diseases focusing on local predictive performance. Therefore, machine learning methods and collaborative filtering are complementary. It is not surprising that iPiDA-LTR predictor obtains the best performance compared with iPiDA-LTR-ML, because iPiDA-LTR shares the advantages of these two types of methods.

Download:

Table 2. The comparison results of predictors based on Learning to Rank integrating different component methods via five-fold cross-validation on

dataset.

https://doi.org/10.1371/journal.pcbi.1010404.t002

Download:

Table 3. The comparison results of predictors based on Learning to Rank integrating different component methods via five-fold cross-validation on and

dataset.

https://doi.org/10.1371/journal.pcbi.1010404.t003

The usage frequencies of component methods measure the contribution of component methods for iPiDA-LTR. Fig 5 shows the usage frequencies of component methods on iPiDA-LTR, from which we can see that each component method is frequently used, indicating that they are important for iPiDA-LTR. Tables 2 and 3 and Fig 5 show that component methods are complementary, and iPiDA-LTR combines them leading to better performance for identifying piRNA-disease associations.

Download:

Fig 5. The usage frequencies of component methods, calculated by Apache commons-math3 library and RankLib library.

(a) shows the usage frequencies of component methods on dataset via five-fold cross-validation; (b) shows the usage frequencies of component methods on dataset via five cross-validation.

https://doi.org/10.1371/journal.pcbi.1010404.g005

Comparison with related methods

In this section, the two state-of-the-art predictors including iPiDi-PUL predictor [25] and iPiDA-sHN predictor [26] are compared with iPiDA-LTR predictor, and the results are shown in Tables 4 and 5, from which we can see that iPiDA-LTR is better than the other methods, indicating that iPiDA-LTR is more suitable for identifying piRNA-disease associations. Researchers tend to focus on the top ranked predicted associations in practical application scenarios. Therefore, we analyze the quality of the predicted results (see Fig 6), from which we can see that iPiDA-LTR outperforms the other predictors in terms of ROC1-ROC10. It is not surprising because the loss function of LambdaMART NDCG mainly focuses on the top-ranked predictive known associations (see Eq 9).

Download:

Fig 6. The comparison results of different methods.

(a) and (b) are based on dataset and dataset, respectively.

https://doi.org/10.1371/journal.pcbi.1010404.g006

Download:

Table 4. The comparison results between iPiDA-LTR and two state-of-the-art predictors on

dataset.

https://doi.org/10.1371/journal.pcbi.1010404.t004

Download:

Table 5. The comparison results between iPiDA-LTR and two state-of-the-art

dataset.

https://doi.org/10.1371/journal.pcbi.1010404.t005

Case study

To illustrate the predictive performance of iPiDA-LTR predictor for the identification of associations between new piRNAs and diseases, two query piRNAs, including piR-hsa-23210 and piR-hsa-15023, are selected as query piRNAs from dataset, respectively. The remaining piRNAs in are used to train iPiDA-LTR model, and then the trained iPiDA-LTR model is employed to predict diseases associated with piR-hsa-15023 and piR-hsa-23210.

The predicted results of piR-hsa-23210 and piR-hsa-15023 are shown in Tables 6 and 7, respectively, from which we can see the followings: (i) The evidences for the top five predicted piR-hsa-23210-associated diseases are supported by PubMed (https://pubmed.ncbi.nlm.nih.gov/). For example, the target gene of piR-hsa-23210 is SMC5, which plays crucial roles in the process of human spermatogenesis, such as on the synaptonemal complex between synapsed chromosomes, and in the development of spermatogonial cells [74]. Roy et al. found that piR-33044 (piR-hsa-23210) is significantly abnormal expression in Alzheimer Disease [22]. (ii) Four diseases in Table 7 have been proved to be associated with piR-hsa-15023. For example, Busch et al. found that piR-hsa-15023 is down-regulated in renal cell carcinoma [75]. piR-hsa-15023 showed a significantly differentially expression in gastric adenocarcinoma and non-malignant stomach tissue [76]. Therefore, these results demonstrated that iPiDA-LTR predictor is an effective approach to identify associated diseases for newly detected query piRNAs.

Download:

Table 6. The top five piR-hsa-23210 associated diseases and relevant evidences.

https://doi.org/10.1371/journal.pcbi.1010404.t006

Download:

Table 7. The top five piR-hsa-15023 associated diseases and relevant evidences.

https://doi.org/10.1371/journal.pcbi.1010404.t007

Conclusion

In this study, we treat the task of piRNA-disease associations as a search task based on Learning to Rank [32, 68], where piRNA and disease are regarded as query and document, respectively. The following conclusions can be drawn: (i) iPiDA-LTR can effectively handle with two types of application scenarios compared with the other state-of-the-art methods, especially for the identification of diseases associated with newly detected piRNAs, which is important for studying the pathogenesis of disease and the function of piRNAs; (ii) iPiDA-LTR incorporates component methods into Learning to Rank so as to improve the predictive performance; (iii) The corresponding web server of iPiDA-LTR is freely accessed at http://bliulab.net/iPiDA-LTR/. Although iPiDA-LTR effectively predicts piRNA-disease associations, it only integrates basic machine learning methods and collaborative filtering. In future studies, we will integrate the other state-of-the-art methods and features to improve piRNA-disease associations. The LTR-based framework discussed in this study is a general framework, which would have many other applications in bioinformatics, such as protein function prediction, remote homology detection, etc.

References

1. Seto AG, Kingston RE, Lau NC. The coming of age for Piwi proteins. Molecular cell. 2007;26(5):603–9. Epub 2007/06/15. pmid:17560367.
- View Article
- PubMed/NCBI
- Google Scholar
2. Kirino Y, Mourelatos Z. Mouse Piwi-interacting RNAs are 2’-O-methylated at their 3’ termini. Nature structural & molecular biology. 2007;14(4):347–8. Epub 2007/03/27. pmid:17384647.
- View Article
- PubMed/NCBI
- Google Scholar
3. Ohara T, Sakaguchi Y, Suzuki T, Ueda H, Miyauchi K, Suzuki T. The 3’ termini of mouse Piwi-interacting RNAs are 2’-O-methylated. Nature structural & molecular biology. 2007;14(4):349–50. Epub 2007/03/27. pmid:17384646.
- View Article
- PubMed/NCBI
- Google Scholar
4. Aravin A, Gaidatzis D, Pfeffer S, Lagos-Quintana M, Landgraf P, Iovino N, et al. A novel class of small RNAs bind to MILI protein in mouse testes. Nature. 2006;442(7099):203–7. Epub 2006/06/06. pmid:16751777.
- View Article
- PubMed/NCBI
- Google Scholar
5. Lau NC, Seto AG, Kim J, Kuramochi-Miyagawa S, Nakano T, Bartel DP, et al. Characterization of the piRNA complex from rat testes. Science. 2006;313(5785):363–7. Epub 2006/06/17. pmid:16778019.
- View Article
- PubMed/NCBI
- Google Scholar
6. Brennecke J, Aravin AA, Stark A, Dus M, Kellis M, Sachidanandam R, et al. Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila. Cell. 2007;128(6):1089–103. Epub 2007/03/10. pmid:17346786.
- View Article
- PubMed/NCBI
- Google Scholar
7. Yu L, Su Y, Liu Y, Zeng X. Review of unsupervised pretraining strategies for molecules representation. Briefings in Functional Genomics. 2021;20(5):323–32. pmid:34342611
- View Article
- PubMed/NCBI
- Google Scholar
8. Zeng X, Tu X, Liu Y, Fu X, Su Y. Toward better drug discovery with knowledge graph. Current Opinion in Structural Biology. 2022;72:114–26. pmid:34649044
- View Article
- PubMed/NCBI
- Google Scholar
9. Vagin VV, Sigova A, Li C, Seitz H, Gvozdev V, Zamore PD. A distinct small RNA pathway silences selfish genetic elements in the germline. Science. 2006;313(5785):320–4. Epub 2006/07/01. pmid:16809489.
- View Article
- PubMed/NCBI
- Google Scholar
10. Teixeira FK, Okuniewska M, Malone CD, Coux R-X, Rio DC, Lehmann R. piRNA-mediated regulation of transposon alternative splicing in the soma and germ line. Nature. 2017;552(7684):268–72. Epub 2017/12/07. pmid:29211718; PubMed Central PMCID: PMC5933846.
- View Article
- PubMed/NCBI
- Google Scholar
11. Lim AK, Tao L, Kai T. piRNAs mediate posttranscriptional retroelement silencing and localization to pi-bodies in the Drosophila germline. Journal of cell biology. 2009;186(3):333–42. Epub 2009/08/05. pmid:19651888; PubMed Central PMCID: PMC2728408.
- View Article
- PubMed/NCBI
- Google Scholar
12. Singh G, Swain AC, Mallick B. Delineating Characteristic Sequence and Structural Features of Precursor and Mature Piwi-interacting RNAs of Epithelial Ovarian Cancer. Current Bioinformatics. 2021;16(4):541–52. WOS:000669437400006.
- View Article
- Google Scholar
13. Qiu W, Guo X, Lin X, Yang Q, Zhang W, Zhang Y, et al. Transcriptome-wide piRNA profiling in human brains of Alzheimer’s disease. Neurobiology of aging. 2017;57:170–7. Epub 2017/06/28. pmid:28654860; PubMed Central PMCID: PMC5542056.
- View Article
- PubMed/NCBI
- Google Scholar
14. Cheng J, Guo J-M, Xiao B-X, Miao Y, Jiang Z, Zhou H, et al. piRNA, the new non-coding RNA, is aberrantly expressed in human cancer cells. Clinica Chimica Acta. 2011;412(17–18):1621–5. Epub 2011/05/28. pmid:21616063.
- View Article
- PubMed/NCBI
- Google Scholar
15. Liu Y, Dou M, Song X, Dong Y, Liu S, Liu H, et al. The emerging role of the piRNA/piwi complex in cancer. Molecular cancer. 2019;18(1):123. Epub 2019/08/11. pmid:31399034; PubMed Central PMCID: PMC6688334.
- View Article
- PubMed/NCBI
- Google Scholar
16. Liu Y, Li A, Xie G, Liu G, Hei X. Computational Methods and Online Resources for Identification of piRNA-Related Molecules. Interdisciplinary Sciences-Computational Life Sciences. 2021;13(2):176–91. Epub 2021/04/23. pmid:33886096.
- View Article
- PubMed/NCBI
- Google Scholar
17. Ding X, Li Y, Lü J, Zhao Q, Guo Y, Lu Z, et al. piRNA-823 Is Involved in Cancer Stem Cell Regulation Through Altering DNA Methylation in Association With Luminal Breast Cancer. Frontiers in cell and developmental biology. 2021;9:641052. Epub 2021/04/02. pmid:33791297; PubMed Central PMCID: PMC8005588.
- View Article
- PubMed/NCBI
- Google Scholar
18. Cheng Y, Gong Y, Liu Y, Song B, Zou Q. Molecular design in drug discovery: a comprehensive review of deep generative models. Briefings in Bioinformatics. 2021;22(6):bbab344. pmid:34415297
- View Article
- PubMed/NCBI
- Google Scholar
19. Zeng X, Song X, Ma T, Pan X, Zhou Y, Hou Y, et al. Repurpose open data to discover therapeutics for COVID-19 using deep learning. Journal of proteome research. 2020;19(11):4624–36. pmid:32654489
- View Article
- PubMed/NCBI
- Google Scholar
20. Cabral GF, Pinheiro JADS, Vidal AF, Santos S, Ribeiro-Dos-Santos Â. piRNAs in Gastric Cancer: A New Approach Towards Translational Research. International journal of molecular sciences. 2020;21(6):2126. Epub 2020/03/25. pmid:32204558; PubMed Central PMCID: PMC7139476.
- View Article
- PubMed/NCBI
- Google Scholar
21. Krishnan P, Ghosh S, Graham K, Mackey JR, Kovalchuk O, Damaraju S. Piwi-interacting RNAs and PIWI genes as novel prognostic markers for breast cancer. Oncotarget. 2016;7(25):37944–56. Epub 2016/10/23. pmid:27177224; PubMed Central PMCID: PMC5122362.
- View Article
- PubMed/NCBI
- Google Scholar
22. Roy J, Sarkar A, Parida S, Ghosh Z, Mallick B. Small RNA sequencing revealed dysregulated piRNAs in Alzheimer’s disease and their probable role in pathogenesis. Molecular bioSystems. 2017;13(3):565–76. Epub 2017/01/28. pmid:28127595.
- View Article
- PubMed/NCBI
- Google Scholar
23. Zhang W, Wei H, Liu B. idenMD-NRF: a ranking framework for miRNA-disease association identification. Briefings in Bioinformatics. pmid:35679537
- View Article
- PubMed/NCBI
- Google Scholar
24. Wei H, Xu Y, Liu B. iCircDA-LTR: identification of circRNA-disease associations based on Learning to Rank. Bioinformatics. 2021;37(19):3302–10. Epub 2021/05/09. pmid:33963827.
- View Article
- PubMed/NCBI
- Google Scholar
25. Wei H, Xu Y, Liu B. iPiDi-PUL: identifying Piwi-interacting RNA-disease associations based on Positive Unlabeled Learning. Briefings in Bioinformatics. 2021;22(3):bbaa058. pmid:32393982
- View Article
- PubMed/NCBI
- Google Scholar
26. Wei H, Ding Y, Liu B. iPiDA-sHN: Identification of Piwi-interacting RNA-disease associations by selecting high quality negative samples. Computational Biology and Chemistry. 2020;88:107361. Epub 2020/09/12. pmid:32916452.
- View Article
- PubMed/NCBI
- Google Scholar
27. Zhang P, Si X, Skogerbø G, Wang J, Cui D, Li Y, et al. piRBase: a web resource assisting piRNA functional study. Database. 2014;2014:bau110. Epub 2014/11/27. pmid:25425034; PubMed Central PMCID: PMC4243270.
- View Article
- PubMed/NCBI
- Google Scholar
28. Rosenkranz D. piRNA cluster database: a web resource for piRNA producing loci. Nucleic acids research. 2016;44(D1):D223–30. Epub 2015/11/20. pmid:26582915; PubMed Central PMCID: PMC4702893.
- View Article
- PubMed/NCBI
- Google Scholar
29. Wang J, Zhang P, Lu Y, Li Y, Zheng Y, Kan Y, et al. piRBase: a comprehensive database of piRNA sequences. Nucleic acids research. 2019;47(D1):D175–D80. Epub 2018/10/30. pmid:30371818; PubMed Central PMCID: PMC6323959.
- View Article
- PubMed/NCBI
- Google Scholar
30. Hang LJIToI Systems. A Short Introduction to Learning to Rank. 2011;94-D(10):1854–62.
- View Article
- Google Scholar
31. Song B, Li F, Liu Y, Zeng X. Deep learning methods for biomedical named entity recognition: a survey and qualitative comparison. Briefings in Bioinformatics. 2021;22(6):bbab282. pmid:34308472
- View Article
- PubMed/NCBI
- Google Scholar
32. Burges CJC. From ranknet to lambdarank to lambdamart: An overview. Learning. 2010;11(23–581):81.
- View Article
- Google Scholar
33. He S, Guo F, Zou Q, Ding H. MRMD2.0: A Python Tool for Machine Learning with Feature Ranking and Reduction. Current Bioinformatics. 2020;15(10):1213–21. WOS:000617680800012.
- View Article
- Google Scholar
34. Wang X, Li C, Golbandi N, Bendersky M, Najork M, editors. The lambdaloss framework for ranking metric optimization. Proceedings of the 27th ACM International Conference on Information and Knowledge Management; 2018.
- View Article
- Google Scholar
35. Figueroa A, Neumann G, editors. Learning to Rank Effective Paraphrases from Query Logs for Community Question Answering. Twenty-seventh Aaai Conference on Artificial Intelligence; 2013.
- View Article
- Google Scholar
36. Liu TY, Xu J, Qin T, Xiong W, Li H. LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval. Proceedings of the Workshop on Learning to Rank for Information Retrieval2007. p. 137–45.
- View Article
- Google Scholar
37. Jin X, Liao Q, Wei H, Zhang J, Liu B. SMI-BLAST: a novel supervised search framework based on PSI-BLAST for protein remote homology detection. Bioinformatics. 2021;37(7):913–20. MEDLINE: pmid:32898222.
- View Article
- PubMed/NCBI
- Google Scholar
38. Liu L, Huang X, Mamitsuka H, Zhu S. HPOLabeler: improving prediction of human protein-phenotype associations by learning to rank. Bioinformatics. 2020;36(14):4180–8. Epub 2020/05/08. pmid:32379868.
- View Article
- PubMed/NCBI
- Google Scholar
39. Liu B, Chen J, Wang X. Application of learning to rank to protein remote homology detection. Bioinformatics. 2015;31(21):3492–8. WOS:000365134400013. pmid:26163693
- View Article
- PubMed/NCBI
- Google Scholar
40. Muhammad A, Waheed R, Khan NA, Jiang H, Song X. piRDisease v1.0: a manually curated database for piRNA associated diseases. Database. 2019;2019:baz052. Epub 2019/07/04. pmid:31267133; PubMed Central PMCID: PMC6606758.
- View Article
- PubMed/NCBI
- Google Scholar
41. Smith TF, Waterman MS. Identification of common molecular subsequences. Journal of molecular biology. 1981;147(1):195–7. Epub 1981/03/25. pmid:7265238.
- View Article
- PubMed/NCBI
- Google Scholar
42. Li H-L, Pang Y-H, Liu B. BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models. Nucleic Acids Res. 2021;49(22):e129. Epub 2021/09/29. pmid:34581805; PubMed Central PMCID: PMC8682797.
- View Article
- PubMed/NCBI
- Google Scholar
43. Liu B, Gao X, Zhang H. BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Res. 2019;47(20):e127. Epub 2019/09/11. pmid:31504851; PubMed Central PMCID: PMC6847461.
- View Article
- PubMed/NCBI
- Google Scholar
44. Wei H, Liao Q, Liu B. iLncRNAdis-FB: identify lncRNA-disease associations by fusing biological feature blocks through deep neural network. IEEE/ACM transactions on computational biology and bioinformatics. 2021;18(5):1946–57. MEDLINE: pmid:31905146.
- View Article
- PubMed/NCBI
- Google Scholar
45. Steinegger M, Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol. 2017;35(11):1026–8. Epub 2017/10/17. pmid:29035372.
- View Article
- PubMed/NCBI
- Google Scholar
46. Buchfink B, Reuter K, Drost H-G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods. 2021;18(4):366–8. Epub 2021/04/09. pmid:33828273; PubMed Central PMCID: PMC8026399.
- View Article
- PubMed/NCBI
- Google Scholar
47. Schriml LM, Mitraka E, Munro J, Tauber B, Schor M, Nickle L, et al. Human Disease Ontology 2018 update: classification, content and workflow expansion. Nucleic Acids Res. 2019;47(D1):D955–D62. Epub 2018/11/09. pmid:30407550; PubMed Central PMCID: PMC6323977.
- View Article
- PubMed/NCBI
- Google Scholar
48. Chen X, Yan G-Y. Semi-supervised learning for potential human microRNA-disease associations inference. Scientific reports. 2014;4:5501. Epub 2014/07/01. pmid:24975600; PubMed Central PMCID: PMC4074792.
- View Article
- PubMed/NCBI
- Google Scholar
49. Chen X, Wang L, Qu J, Guan N-N, Li J-Q. Predicting miRNA-disease association based on inductive matrix completion. Bioinformatics. 2018;34(24):4256–65. Epub 2018/06/26. pmid:29939227.
- View Article
- PubMed/NCBI
- Google Scholar
50. Wang L, Xuan Z, Zhou S, Kuang L, Pei T. A Novel Model for Predicting LncRNA-disease Associations Based on the LncRNA-MiRNA-disease Interactive Network. Current Bioinformatics. 2019;14(3):269–78. WOS:000460522300011.
- View Article
- Google Scholar
51. Luo J, Xiao Q. A novel approach for predicting microRNA-disease associations by unbalanced bi-random walk on heterogeneous network. Journal of Biomedical Informatics. 2017;66:194–203. WOS:000409293100018. pmid:28104458
- View Article
- PubMed/NCBI
- Google Scholar
52. Yan C, Wang J, Ni P, Lan W, Wu F-X, Pan Y. DNRLMF-MDA:Predicting microRNA-Disease Associations Based on Similarities of microRNAs and Diseases. IEEE/ACM transactions on computational biology and bioinformatics. 2019;16(1):233–43. pmid:29990253
- View Article
- PubMed/NCBI
- Google Scholar
53. Zhu Q, Fan Y, Pan X. Fusing Multiple Biological Networks to Effectively Predict miRNA-disease Associations. Current Bioinformatics. 2021;16(3):371–84. WOS:000636235900003.
- View Article
- Google Scholar
54. Wang JZ, Du Z, Payattakool R, Yu PS, Chen C-F. A new method to measure the semantic similarity of GO terms. Bioinformatics. 2007;23(10):1274–81. WOS:000247348300013. pmid:17344234
- View Article
- PubMed/NCBI
- Google Scholar
55. Liu Y, Zeng X, He Z, Zou Q. Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources. IEEE/ACM transactions on computational biology and bioinformatics. 2016;14(4):905–15. pmid:27076459
- View Article
- PubMed/NCBI
- Google Scholar
56. Breiman L. Random forests. Machine Learning. 2001;45(1):5–32.
- View Article
- Google Scholar
57. Wu C, Lin B, Shi K, Zhang Q, Gao R, Yu Z, et al. PEPRF: Identification of Essential Proteins by Integrating Topological Features of PPI Network and Sequence-Based Features via Random Forest. Current Bioinformatics. 2021;16(9):1161–8. WOS:000711656200006.
- View Article
- Google Scholar
58. Ao C, Zou Q, Yu L. NmRF: identification of multispecies RNA 2’-O-methylation modification sites from RNA sequences. Briefings in bioinformatics. 2021;23(1):bbab480. MEDLINE: pmid:34850821.
- View Article
- PubMed/NCBI
- Google Scholar
59. Zeng X, Zhong Y, Lin W, Zou Q. Predicting disease-associated circular RNAs using deep forests combined with positive-unlabeled learning methods. Briefings in bioinformatics. 2020;21(4):1425–36. pmid:31612203
- View Article
- PubMed/NCBI
- Google Scholar
60. Zeng X, Zhu S, Hou Y, Zhang P, Li L, Li J, et al. Network-based prediction of drug–target interactions using an arbitrary-order proximity embedded deep forest. Bioinformatics. 2020;36(9):2805–12. pmid:31971579
- View Article
- PubMed/NCBI
- Google Scholar
61. Landwehr N, Hall M, Frank E. Logistic Model Trees. Machine Learning. 2005;59(1):161–205.
- View Article
- Google Scholar
62. Hearst MA, Dumais ST, Osman E, Platt J, Scholkopf B. Support vector machines. IEEE Intelligent Systems and their applications. 1998;13(4):18–28.
- View Article
- Google Scholar
63. Muflikhah L, Widodo N, Mahmudy WF, Solimun. Detection of Hepatitis B Virus-associated Hepatocellular Carcinoma Disease Using Hybrid Hierarchical k-Means Clustering and SVM Algorithm. Current Bioinformatics. 2021;16(7):1004–12. WOS:000726375800013.
- View Article
- Google Scholar
64. Basith S, Hasan MM, Lee G, Wei L, Manavalan B. Integrative machine learning framework for the identification of cell-specific enhancers from the human genome. Brief Bioinform. 2021;22(6). Epub 2021/07/07. pmid:34226917.
- View Article
- PubMed/NCBI
- Google Scholar
65. Yue W, Wang Z, Zhang J, Liu X. An Overview of Recommendation Techniques and Their Applications in Healthcare. IEEE/CAA Journal of Automatica Sinica. 2021;8(4):701–17. WOS:000628913100001.
- View Article
- Google Scholar
66. Bayrak T, Ogul H. A New Approach for Predicting the Value of Gene Expression: Two-way Collaborative Filtering. Current Bioinformatics. 2019;14(6):480–90. WOS:000475702400002.
- View Article
- Google Scholar
67. Yuan Q, Gao J, Wu D, Zhang S, Mamitsuka H, Zhu S. DrugE-Rank: improving drug-target interaction prediction of new candidate drugs or targets by ensemble learning to rank. Bioinformatics. 2016;32(12):i18–i27. Epub 2016/06/17. pmid:27307615; PubMed Central PMCID: PMC4908328.
- View Article
- PubMed/NCBI
- Google Scholar
68. Li H, Systems. A Short Introduction to Learning to Rank. IEICE TRANSACTIONS on Information and Systems. 2011;94(10):1854–62.
- View Article
- Google Scholar
69. Zhu L, Duan G, Yan C, Wang J. Prediction of Microbe-drug Associations Based on Chemical Structures and the KATZ Measure. Current Bioinformatics. 2021;16(6):807–19. WOS:000684207300006.
- View Article
- Google Scholar
70. Dao F-Y, Lv H, Zhang D, Zhang Z-M, Liu L, Lin H. DeepYY1: a deep learning approach to identify YY1-mediated chromatin loops. Briefings in bioinformatics. 2021;22(4):bbaa356. pmid:33279983.
- View Article
- PubMed/NCBI
- Google Scholar
71. Zhang D, Chen H-D, Zulfiqar H, Yuan S-S, Huang Q-L, Zhang Z-Y, et al. iBLP: An XGBoost-Based Predictor for Identifying Bioluminescent Proteins. Computational and Mathematical Methods in Medicine. 2021;2021:6664362. pmid:33505515
- View Article
- PubMed/NCBI
- Google Scholar
72. Li J, Liu L, Cui Q, Zhou Y. Comparisons of MicroRNA Set Enrichment Analysis Tools on Cancer De-regulated miRNAs from TCGA Expression Datasets. Current Bioinformatics. 2020;15(10):1104–12. WOS:000617680800002.
- View Article
- Google Scholar
73. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. the Journal of machine Learning research. 2011;12:2825–30.
- View Article
- Google Scholar
74. Verver DE, Langedijk NSM, Jordan PW, Repping S, Hamer G. The SMC5/6 complex is involved in crucial processes during human spermatogenesis. Biology of reproduction. 2014;91(1):22. Epub 2014/05/24. pmid:24855106; PubMed Central PMCID: PMC6058740.
- View Article
- PubMed/NCBI
- Google Scholar
75. Busch J, Ralla B, Jung M, Wotschofsky Z, Trujillo-Arribas E, Schwabe P, et al. Piwi-interacting RNAs as novel prognostic markers in clear cell renal cell carcinomas. Journal of Experimental & Clinical Cancer Research. 2015;34(1):61. Epub 2015/06/14. pmid:26071182; PubMed Central PMCID: PMC4467205.
- View Article
- PubMed/NCBI
- Google Scholar
76. Martinez VD, Enfield KSS, Rowbotham DA, Lam WL. An atlas of gastric PIWI-interacting RNA transcriptomes and their utility for identifying signatures of gastric cancer recurrence. Gastric Cancer. 2016;19(2):660–5. Epub 2015/03/18. pmid:25779424; PubMed Central PMCID: PMC4573768.
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Seto AG, Kingston RE, Lau NC. The coming of age for Piwi proteins. Molecular cell. 2007;26(5):603–9. Epub 2007/06/15. pmid:17560367.
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Kirino Y, Mourelatos Z. Mouse Piwi-interacting RNAs are 2’-O-methylated at their 3’ termini. Nature structural & molecular biology. 2007;14(4):347–8. Epub 2007/03/27. pmid:17384647.
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Ohara T, Sakaguchi Y, Suzuki T, Ueda H, Miyauchi K, Suzuki T. The 3’ termini of mouse Piwi-interacting RNAs are 2’-O-methylated. Nature structural & molecular biology. 2007;14(4):349–50. Epub 2007/03/27. pmid:17384646.
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Aravin A, Gaidatzis D, Pfeffer S, Lagos-Quintana M, Landgraf P, Iovino N, et al. A novel class of small RNAs bind to MILI protein in mouse testes. Nature. 2006;442(7099):203–7. Epub 2006/06/06. pmid:16751777.
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Lau NC, Seto AG, Kim J, Kuramochi-Miyagawa S, Nakano T, Bartel DP, et al. Characterization of the piRNA complex from rat testes. Science. 2006;313(5785):363–7. Epub 2006/06/17. pmid:16778019.
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Brennecke J, Aravin AA, Stark A, Dus M, Kellis M, Sachidanandam R, et al. Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila. Cell. 2007;128(6):1089–103. Epub 2007/03/10. pmid:17346786.
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. Yu L, Su Y, Liu Y, Zeng X. Review of unsupervised pretraining strategies for molecules representation. Briefings in Functional Genomics. 2021;20(5):323–32. pmid:34342611
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref8] 8. Zeng X, Tu X, Liu Y, Fu X, Su Y. Toward better drug discovery with knowledge graph. Current Opinion in Structural Biology. 2022;72:114–26. pmid:34649044
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref9] 9. Vagin VV, Sigova A, Li C, Seitz H, Gvozdev V, Zamore PD. A distinct small RNA pathway silences selfish genetic elements in the germline. Science. 2006;313(5785):320–4. Epub 2006/07/01. pmid:16809489.
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref10] 10. Teixeira FK, Okuniewska M, Malone CD, Coux R-X, Rio DC, Lehmann R. piRNA-mediated regulation of transposon alternative splicing in the soma and germ line. Nature. 2017;552(7684):268–72. Epub 2017/12/07. pmid:29211718; PubMed Central PMCID: PMC5933846.
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref11] 11. Lim AK, Tao L, Kai T. piRNAs mediate posttranscriptional retroelement silencing and localization to pi-bodies in the Drosophila germline. Journal of cell biology. 2009;186(3):333–42. Epub 2009/08/05. pmid:19651888; PubMed Central PMCID: PMC2728408.
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref12] 12. Singh G, Swain AC, Mallick B. Delineating Characteristic Sequence and Structural Features of Precursor and Mature Piwi-interacting RNAs of Epithelial Ovarian Cancer. Current Bioinformatics. 2021;16(4):541–52. WOS:000669437400006.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref13] 13. Qiu W, Guo X, Lin X, Yang Q, Zhang W, Zhang Y, et al. Transcriptome-wide piRNA profiling in human brains of Alzheimer’s disease. Neurobiology of aging. 2017;57:170–7. Epub 2017/06/28. pmid:28654860; PubMed Central PMCID: PMC5542056.
View Article
PubMed/NCBI
Google Scholar

[49] View Article

[50] PubMed/NCBI

[51] Google Scholar

[ref14] 14. Cheng J, Guo J-M, Xiao B-X, Miao Y, Jiang Z, Zhou H, et al. piRNA, the new non-coding RNA, is aberrantly expressed in human cancer cells. Clinica Chimica Acta. 2011;412(17–18):1621–5. Epub 2011/05/28. pmid:21616063.
View Article
PubMed/NCBI
Google Scholar

[53] View Article

[54] PubMed/NCBI

[55] Google Scholar

[ref15] 15. Liu Y, Dou M, Song X, Dong Y, Liu S, Liu H, et al. The emerging role of the piRNA/piwi complex in cancer. Molecular cancer. 2019;18(1):123. Epub 2019/08/11. pmid:31399034; PubMed Central PMCID: PMC6688334.
View Article
PubMed/NCBI
Google Scholar

[57] View Article

[58] PubMed/NCBI

[59] Google Scholar

[ref16] 16. Liu Y, Li A, Xie G, Liu G, Hei X. Computational Methods and Online Resources for Identification of piRNA-Related Molecules. Interdisciplinary Sciences-Computational Life Sciences. 2021;13(2):176–91. Epub 2021/04/23. pmid:33886096.
View Article
PubMed/NCBI
Google Scholar

[61] View Article

[62] PubMed/NCBI

[63] Google Scholar

[ref17] 17. Ding X, Li Y, Lü J, Zhao Q, Guo Y, Lu Z, et al. piRNA-823 Is Involved in Cancer Stem Cell Regulation Through Altering DNA Methylation in Association With Luminal Breast Cancer. Frontiers in cell and developmental biology. 2021;9:641052. Epub 2021/04/02. pmid:33791297; PubMed Central PMCID: PMC8005588.
View Article
PubMed/NCBI
Google Scholar

[65] View Article

[66] PubMed/NCBI

[67] Google Scholar

[ref18] 18. Cheng Y, Gong Y, Liu Y, Song B, Zou Q. Molecular design in drug discovery: a comprehensive review of deep generative models. Briefings in Bioinformatics. 2021;22(6):bbab344. pmid:34415297
View Article
PubMed/NCBI
Google Scholar

[69] View Article

[70] PubMed/NCBI

[71] Google Scholar

[ref19] 19. Zeng X, Song X, Ma T, Pan X, Zhou Y, Hou Y, et al. Repurpose open data to discover therapeutics for COVID-19 using deep learning. Journal of proteome research. 2020;19(11):4624–36. pmid:32654489
View Article
PubMed/NCBI
Google Scholar

[73] View Article

[74] PubMed/NCBI

[75] Google Scholar

[ref20] 20. Cabral GF, Pinheiro JADS, Vidal AF, Santos S, Ribeiro-Dos-Santos Â. piRNAs in Gastric Cancer: A New Approach Towards Translational Research. International journal of molecular sciences. 2020;21(6):2126. Epub 2020/03/25. pmid:32204558; PubMed Central PMCID: PMC7139476.
View Article
PubMed/NCBI
Google Scholar

[77] View Article

[78] PubMed/NCBI

[79] Google Scholar

[ref21] 21. Krishnan P, Ghosh S, Graham K, Mackey JR, Kovalchuk O, Damaraju S. Piwi-interacting RNAs and PIWI genes as novel prognostic markers for breast cancer. Oncotarget. 2016;7(25):37944–56. Epub 2016/10/23. pmid:27177224; PubMed Central PMCID: PMC5122362.
View Article
PubMed/NCBI
Google Scholar

[81] View Article

[82] PubMed/NCBI

[83] Google Scholar

[ref22] 22. Roy J, Sarkar A, Parida S, Ghosh Z, Mallick B. Small RNA sequencing revealed dysregulated piRNAs in Alzheimer’s disease and their probable role in pathogenesis. Molecular bioSystems. 2017;13(3):565–76. Epub 2017/01/28. pmid:28127595.
View Article
PubMed/NCBI
Google Scholar

[85] View Article

[86] PubMed/NCBI

[87] Google Scholar

[ref23] 23. Zhang W, Wei H, Liu B. idenMD-NRF: a ranking framework for miRNA-disease association identification. Briefings in Bioinformatics. pmid:35679537
View Article
PubMed/NCBI
Google Scholar

[89] View Article

[90] PubMed/NCBI

[91] Google Scholar

[ref24] 24. Wei H, Xu Y, Liu B. iCircDA-LTR: identification of circRNA-disease associations based on Learning to Rank. Bioinformatics. 2021;37(19):3302–10. Epub 2021/05/09. pmid:33963827.
View Article
PubMed/NCBI
Google Scholar

[93] View Article

[94] PubMed/NCBI

[95] Google Scholar

[ref25] 25. Wei H, Xu Y, Liu B. iPiDi-PUL: identifying Piwi-interacting RNA-disease associations based on Positive Unlabeled Learning. Briefings in Bioinformatics. 2021;22(3):bbaa058. pmid:32393982
View Article
PubMed/NCBI
Google Scholar

[97] View Article

[98] PubMed/NCBI

[99] Google Scholar

[ref26] 26. Wei H, Ding Y, Liu B. iPiDA-sHN: Identification of Piwi-interacting RNA-disease associations by selecting high quality negative samples. Computational Biology and Chemistry. 2020;88:107361. Epub 2020/09/12. pmid:32916452.
View Article
PubMed/NCBI
Google Scholar

[101] View Article

[102] PubMed/NCBI

[103] Google Scholar

[ref27] 27. Zhang P, Si X, Skogerbø G, Wang J, Cui D, Li Y, et al. piRBase: a web resource assisting piRNA functional study. Database. 2014;2014:bau110. Epub 2014/11/27. pmid:25425034; PubMed Central PMCID: PMC4243270.
View Article
PubMed/NCBI
Google Scholar

[105] View Article

[106] PubMed/NCBI

[107] Google Scholar

[ref28] 28. Rosenkranz D. piRNA cluster database: a web resource for piRNA producing loci. Nucleic acids research. 2016;44(D1):D223–30. Epub 2015/11/20. pmid:26582915; PubMed Central PMCID: PMC4702893.
View Article
PubMed/NCBI
Google Scholar

[109] View Article

[110] PubMed/NCBI

[111] Google Scholar

[ref29] 29. Wang J, Zhang P, Lu Y, Li Y, Zheng Y, Kan Y, et al. piRBase: a comprehensive database of piRNA sequences. Nucleic acids research. 2019;47(D1):D175–D80. Epub 2018/10/30. pmid:30371818; PubMed Central PMCID: PMC6323959.
View Article
PubMed/NCBI
Google Scholar

[113] View Article

[114] PubMed/NCBI

[115] Google Scholar

[ref30] 30. Hang LJIToI Systems. A Short Introduction to Learning to Rank. 2011;94-D(10):1854–62.
View Article
Google Scholar

[117] View Article

[118] Google Scholar

[ref31] 31. Song B, Li F, Liu Y, Zeng X. Deep learning methods for biomedical named entity recognition: a survey and qualitative comparison. Briefings in Bioinformatics. 2021;22(6):bbab282. pmid:34308472
View Article
PubMed/NCBI
Google Scholar

[120] View Article

[121] PubMed/NCBI

[122] Google Scholar

[ref32] 32. Burges CJC. From ranknet to lambdarank to lambdamart: An overview. Learning. 2010;11(23–581):81.
View Article
Google Scholar

[124] View Article

[125] Google Scholar

[ref33] 33. He S, Guo F, Zou Q, Ding H. MRMD2.0: A Python Tool for Machine Learning with Feature Ranking and Reduction. Current Bioinformatics. 2020;15(10):1213–21. WOS:000617680800012.
View Article
Google Scholar

[127] View Article

[128] Google Scholar

[ref34] 34. Wang X, Li C, Golbandi N, Bendersky M, Najork M, editors. The lambdaloss framework for ranking metric optimization. Proceedings of the 27th ACM International Conference on Information and Knowledge Management; 2018.
View Article
Google Scholar

[130] View Article

[131] Google Scholar

[ref35] 35. Figueroa A, Neumann G, editors. Learning to Rank Effective Paraphrases from Query Logs for Community Question Answering. Twenty-seventh Aaai Conference on Artificial Intelligence; 2013.
View Article
Google Scholar

[133] View Article

[134] Google Scholar

[ref36] 36. Liu TY, Xu J, Qin T, Xiong W, Li H. LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval. Proceedings of the Workshop on Learning to Rank for Information Retrieval2007. p. 137–45.
View Article
Google Scholar

[136] View Article

[137] Google Scholar

[ref37] 37. Jin X, Liao Q, Wei H, Zhang J, Liu B. SMI-BLAST: a novel supervised search framework based on PSI-BLAST for protein remote homology detection. Bioinformatics. 2021;37(7):913–20. MEDLINE: pmid:32898222.
View Article
PubMed/NCBI
Google Scholar

[139] View Article

[140] PubMed/NCBI

[141] Google Scholar

[ref38] 38. Liu L, Huang X, Mamitsuka H, Zhu S. HPOLabeler: improving prediction of human protein-phenotype associations by learning to rank. Bioinformatics. 2020;36(14):4180–8. Epub 2020/05/08. pmid:32379868.
View Article
PubMed/NCBI
Google Scholar

[143] View Article

[144] PubMed/NCBI

[145] Google Scholar

[ref39] 39. Liu B, Chen J, Wang X. Application of learning to rank to protein remote homology detection. Bioinformatics. 2015;31(21):3492–8. WOS:000365134400013. pmid:26163693
View Article
PubMed/NCBI
Google Scholar

[147] View Article

[148] PubMed/NCBI

[149] Google Scholar

[ref40] 40. Muhammad A, Waheed R, Khan NA, Jiang H, Song X. piRDisease v1.0: a manually curated database for piRNA associated diseases. Database. 2019;2019:baz052. Epub 2019/07/04. pmid:31267133; PubMed Central PMCID: PMC6606758.
View Article
PubMed/NCBI
Google Scholar

[151] View Article

[152] PubMed/NCBI

[153] Google Scholar

[ref41] 41. Smith TF, Waterman MS. Identification of common molecular subsequences. Journal of molecular biology. 1981;147(1):195–7. Epub 1981/03/25. pmid:7265238.
View Article
PubMed/NCBI
Google Scholar

[155] View Article

[156] PubMed/NCBI

[157] Google Scholar

[ref42] 42. Li H-L, Pang Y-H, Liu B. BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models. Nucleic Acids Res. 2021;49(22):e129. Epub 2021/09/29. pmid:34581805; PubMed Central PMCID: PMC8682797.
View Article
PubMed/NCBI
Google Scholar

[159] View Article

[160] PubMed/NCBI

[161] Google Scholar

[ref43] 43. Liu B, Gao X, Zhang H. BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Res. 2019;47(20):e127. Epub 2019/09/11. pmid:31504851; PubMed Central PMCID: PMC6847461.
View Article
PubMed/NCBI
Google Scholar

[163] View Article

[164] PubMed/NCBI

[165] Google Scholar

[ref44] 44. Wei H, Liao Q, Liu B. iLncRNAdis-FB: identify lncRNA-disease associations by fusing biological feature blocks through deep neural network. IEEE/ACM transactions on computational biology and bioinformatics. 2021;18(5):1946–57. MEDLINE: pmid:31905146.
View Article
PubMed/NCBI
Google Scholar

[167] View Article

[168] PubMed/NCBI

[169] Google Scholar

[ref45] 45. Steinegger M, Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol. 2017;35(11):1026–8. Epub 2017/10/17. pmid:29035372.
View Article
PubMed/NCBI
Google Scholar

[171] View Article

[172] PubMed/NCBI

[173] Google Scholar

[ref46] 46. Buchfink B, Reuter K, Drost H-G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods. 2021;18(4):366–8. Epub 2021/04/09. pmid:33828273; PubMed Central PMCID: PMC8026399.
View Article
PubMed/NCBI
Google Scholar

[175] View Article

[176] PubMed/NCBI

[177] Google Scholar

[ref47] 47. Schriml LM, Mitraka E, Munro J, Tauber B, Schor M, Nickle L, et al. Human Disease Ontology 2018 update: classification, content and workflow expansion. Nucleic Acids Res. 2019;47(D1):D955–D62. Epub 2018/11/09. pmid:30407550; PubMed Central PMCID: PMC6323977.
View Article
PubMed/NCBI
Google Scholar

[179] View Article

[180] PubMed/NCBI

[181] Google Scholar

[ref48] 48. Chen X, Yan G-Y. Semi-supervised learning for potential human microRNA-disease associations inference. Scientific reports. 2014;4:5501. Epub 2014/07/01. pmid:24975600; PubMed Central PMCID: PMC4074792.
View Article
PubMed/NCBI
Google Scholar

[183] View Article

[184] PubMed/NCBI

[185] Google Scholar

[ref49] 49. Chen X, Wang L, Qu J, Guan N-N, Li J-Q. Predicting miRNA-disease association based on inductive matrix completion. Bioinformatics. 2018;34(24):4256–65. Epub 2018/06/26. pmid:29939227.
View Article
PubMed/NCBI
Google Scholar

[187] View Article

[188] PubMed/NCBI

[189] Google Scholar

[ref50] 50. Wang L, Xuan Z, Zhou S, Kuang L, Pei T. A Novel Model for Predicting LncRNA-disease Associations Based on the LncRNA-MiRNA-disease Interactive Network. Current Bioinformatics. 2019;14(3):269–78. WOS:000460522300011.
View Article
Google Scholar

[191] View Article

[192] Google Scholar

[ref51] 51. Luo J, Xiao Q. A novel approach for predicting microRNA-disease associations by unbalanced bi-random walk on heterogeneous network. Journal of Biomedical Informatics. 2017;66:194–203. WOS:000409293100018. pmid:28104458
View Article
PubMed/NCBI
Google Scholar

[194] View Article

[195] PubMed/NCBI

[196] Google Scholar

[ref52] 52. Yan C, Wang J, Ni P, Lan W, Wu F-X, Pan Y. DNRLMF-MDA:Predicting microRNA-Disease Associations Based on Similarities of microRNAs and Diseases. IEEE/ACM transactions on computational biology and bioinformatics. 2019;16(1):233–43. pmid:29990253
View Article
PubMed/NCBI
Google Scholar

[198] View Article

[199] PubMed/NCBI

[200] Google Scholar

[ref53] 53. Zhu Q, Fan Y, Pan X. Fusing Multiple Biological Networks to Effectively Predict miRNA-disease Associations. Current Bioinformatics. 2021;16(3):371–84. WOS:000636235900003.
View Article
Google Scholar

[202] View Article

[203] Google Scholar

[ref54] 54. Wang JZ, Du Z, Payattakool R, Yu PS, Chen C-F. A new method to measure the semantic similarity of GO terms. Bioinformatics. 2007;23(10):1274–81. WOS:000247348300013. pmid:17344234
View Article
PubMed/NCBI
Google Scholar

[205] View Article

[206] PubMed/NCBI

[207] Google Scholar

[ref55] 55. Liu Y, Zeng X, He Z, Zou Q. Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources. IEEE/ACM transactions on computational biology and bioinformatics. 2016;14(4):905–15. pmid:27076459
View Article
PubMed/NCBI
Google Scholar

[209] View Article

[210] PubMed/NCBI

[211] Google Scholar

[ref56] 56. Breiman L. Random forests. Machine Learning. 2001;45(1):5–32.
View Article
Google Scholar

[213] View Article

[214] Google Scholar

[ref57] 57. Wu C, Lin B, Shi K, Zhang Q, Gao R, Yu Z, et al. PEPRF: Identification of Essential Proteins by Integrating Topological Features of PPI Network and Sequence-Based Features via Random Forest. Current Bioinformatics. 2021;16(9):1161–8. WOS:000711656200006.
View Article
Google Scholar

[216] View Article

[217] Google Scholar

[ref58] 58. Ao C, Zou Q, Yu L. NmRF: identification of multispecies RNA 2’-O-methylation modification sites from RNA sequences. Briefings in bioinformatics. 2021;23(1):bbab480. MEDLINE: pmid:34850821.
View Article
PubMed/NCBI
Google Scholar

[219] View Article

[220] PubMed/NCBI

[221] Google Scholar

[ref59] 59. Zeng X, Zhong Y, Lin W, Zou Q. Predicting disease-associated circular RNAs using deep forests combined with positive-unlabeled learning methods. Briefings in bioinformatics. 2020;21(4):1425–36. pmid:31612203
View Article
PubMed/NCBI
Google Scholar

[223] View Article

[224] PubMed/NCBI

[225] Google Scholar

[ref60] 60. Zeng X, Zhu S, Hou Y, Zhang P, Li L, Li J, et al. Network-based prediction of drug–target interactions using an arbitrary-order proximity embedded deep forest. Bioinformatics. 2020;36(9):2805–12. pmid:31971579
View Article
PubMed/NCBI
Google Scholar

[227] View Article

[228] PubMed/NCBI

[229] Google Scholar

[ref61] 61. Landwehr N, Hall M, Frank E. Logistic Model Trees. Machine Learning. 2005;59(1):161–205.
View Article
Google Scholar

[231] View Article

[232] Google Scholar

[ref62] 62. Hearst MA, Dumais ST, Osman E, Platt J, Scholkopf B. Support vector machines. IEEE Intelligent Systems and their applications. 1998;13(4):18–28.
View Article
Google Scholar

[234] View Article

[235] Google Scholar

[ref63] 63. Muflikhah L, Widodo N, Mahmudy WF, Solimun. Detection of Hepatitis B Virus-associated Hepatocellular Carcinoma Disease Using Hybrid Hierarchical k-Means Clustering and SVM Algorithm. Current Bioinformatics. 2021;16(7):1004–12. WOS:000726375800013.
View Article
Google Scholar

[237] View Article

[238] Google Scholar

[ref64] 64. Basith S, Hasan MM, Lee G, Wei L, Manavalan B. Integrative machine learning framework for the identification of cell-specific enhancers from the human genome. Brief Bioinform. 2021;22(6). Epub 2021/07/07. pmid:34226917.
View Article
PubMed/NCBI
Google Scholar

[240] View Article

[241] PubMed/NCBI

[242] Google Scholar

[ref65] 65. Yue W, Wang Z, Zhang J, Liu X. An Overview of Recommendation Techniques and Their Applications in Healthcare. IEEE/CAA Journal of Automatica Sinica. 2021;8(4):701–17. WOS:000628913100001.
View Article
Google Scholar

[244] View Article

[245] Google Scholar

[ref66] 66. Bayrak T, Ogul H. A New Approach for Predicting the Value of Gene Expression: Two-way Collaborative Filtering. Current Bioinformatics. 2019;14(6):480–90. WOS:000475702400002.
View Article
Google Scholar

[247] View Article

[248] Google Scholar

[ref67] 67. Yuan Q, Gao J, Wu D, Zhang S, Mamitsuka H, Zhu S. DrugE-Rank: improving drug-target interaction prediction of new candidate drugs or targets by ensemble learning to rank. Bioinformatics. 2016;32(12):i18–i27. Epub 2016/06/17. pmid:27307615; PubMed Central PMCID: PMC4908328.
View Article
PubMed/NCBI
Google Scholar

[250] View Article

[251] PubMed/NCBI

[252] Google Scholar

[ref68] 68. Li H, Systems. A Short Introduction to Learning to Rank. IEICE TRANSACTIONS on Information and Systems. 2011;94(10):1854–62.
View Article
Google Scholar

[254] View Article

[255] Google Scholar

[ref69] 69. Zhu L, Duan G, Yan C, Wang J. Prediction of Microbe-drug Associations Based on Chemical Structures and the KATZ Measure. Current Bioinformatics. 2021;16(6):807–19. WOS:000684207300006.
View Article
Google Scholar

[257] View Article

[258] Google Scholar

[ref70] 70. Dao F-Y, Lv H, Zhang D, Zhang Z-M, Liu L, Lin H. DeepYY1: a deep learning approach to identify YY1-mediated chromatin loops. Briefings in bioinformatics. 2021;22(4):bbaa356. pmid:33279983.
View Article
PubMed/NCBI
Google Scholar

[260] View Article

[261] PubMed/NCBI

[262] Google Scholar

[ref71] 71. Zhang D, Chen H-D, Zulfiqar H, Yuan S-S, Huang Q-L, Zhang Z-Y, et al. iBLP: An XGBoost-Based Predictor for Identifying Bioluminescent Proteins. Computational and Mathematical Methods in Medicine. 2021;2021:6664362. pmid:33505515
View Article
PubMed/NCBI
Google Scholar

[264] View Article

[265] PubMed/NCBI

[266] Google Scholar

[ref72] 72. Li J, Liu L, Cui Q, Zhou Y. Comparisons of MicroRNA Set Enrichment Analysis Tools on Cancer De-regulated miRNAs from TCGA Expression Datasets. Current Bioinformatics. 2020;15(10):1104–12. WOS:000617680800002.
View Article
Google Scholar

[268] View Article

[269] Google Scholar

[ref73] 73. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. the Journal of machine Learning research. 2011;12:2825–30.
View Article
Google Scholar

[271] View Article

[272] Google Scholar

[ref74] 74. Verver DE, Langedijk NSM, Jordan PW, Repping S, Hamer G. The SMC5/6 complex is involved in crucial processes during human spermatogenesis. Biology of reproduction. 2014;91(1):22. Epub 2014/05/24. pmid:24855106; PubMed Central PMCID: PMC6058740.
View Article
PubMed/NCBI
Google Scholar

[274] View Article

[275] PubMed/NCBI

[276] Google Scholar

[ref75] 75. Busch J, Ralla B, Jung M, Wotschofsky Z, Trujillo-Arribas E, Schwabe P, et al. Piwi-interacting RNAs as novel prognostic markers in clear cell renal cell carcinomas. Journal of Experimental & Clinical Cancer Research. 2015;34(1):61. Epub 2015/06/14. pmid:26071182; PubMed Central PMCID: PMC4467205.
View Article
PubMed/NCBI
Google Scholar

[278] View Article

[279] PubMed/NCBI

[280] Google Scholar

[ref76] 76. Martinez VD, Enfield KSS, Rowbotham DA, Lam WL. An atlas of gastric PIWI-interacting RNA transcriptomes and their utility for identifying signatures of gastric cancer recurrence. Gastric Cancer. 2016;19(2):660–5. Epub 2015/03/18. pmid:25779424; PubMed Central PMCID: PMC4573768.
View Article
PubMed/NCBI
Google Scholar

[282] View Article

[283] PubMed/NCBI

[284] Google Scholar

Figures

Abstract

Author summary

Introduction

Materials and methods

Materials

For the first application scenario: predicting associations between known piRNAs and known diseases

For the second application scenario: predicting the associations between newly detected piRNAs and known diseases

Method overview

Association feature extraction

PiRNA sequence similarities.

Disease semantic similarities.

Association features and labels.

Component methods

Ranking diseases associated with query piRNAs

Results and discussion

Evaluation criteria

The effect of parameters for identifying piRNA-disease associations

Complementary analysis for component methods

Comparison with related methods

Case study

Conclusion

References