Identification of Multi-Functional Enzyme with Multi-Label Classifier

Yuxin Che; Ying Ju; Ping Xuan; Ren Long; Fei Xing

doi:10.1371/journal.pone.0153503

Abstract

Enzymes are important and effective biological catalyst proteins participating in almost all active cell processes. Identification of multi-functional enzymes is essential in understanding the function of enzymes. Machine learning methods perform better in protein structure and function prediction than traditional biological wet experiments. Thus, in this study, we explore an efficient and effective machine learning method to categorize enzymes according to their function. Multi-functional enzymes are predicted with a special machine learning strategy, namely, multi-label classifier. Sequence features are extracted from a position-specific scoring matrix with autocross-covariance transformation. Experiment results show that the proposed method obtains an accuracy rate of 94.1% in classifying six main functional classes through five cross-validation tests and outperforms state-of-the-art methods. In addition, 91.25% accuracy is achieved in multi-functional enzyme prediction, which is often ignored in other enzyme function prediction studies. The online prediction server and datasets can be accessed from the link http://server.malab.cn/MEC/.

Citation: Che Y, Ju Y, Xuan P, Long R, Xing F (2016) Identification of Multi-Functional Enzyme with Multi-Label Classifier. PLoS ONE 11(4): e0153503. https://doi.org/10.1371/journal.pone.0153503

Editor: Bin Liu, Harbin Institute of Technology Shenzhen Graduate School, CHINA

Received: February 22, 2016; Accepted: March 30, 2016; Published: April 14, 2016

Copyright: © 2016 Che et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Data are available for download from Figshare (https://figshare.com/s/7c8fe7f9bb4176015bfa).

Funding: Natural Science Foundation of China (No. 61302139), http://www.nsfc.gov.cn/.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Enzymes play a crucial role in the catalysis of biological and chemical reactions. As effective catalyzers, they are not consumed and do not participate in the reactions. After they are catalyzed, more than 400 types of reactions can be accelerated. The enzyme commission (EC) number, which is based on the chemical reactions catalyzed by enzymes, is utilized to characterize different enzymes as a numerical classification scheme[1]. Enzymes are divided into six main classes, namely, oxidoreductases, transferases, hydrolases, lyases, isomerases, and ligases, and then subdivided into three hierarchical levels. Most studies on enzyme classification focused on monofunctional enzyme prediction. However, identification of the multifunctional enzyme, which is a specific type of enzyme that can catalyze two or more chemical reactions, has not been provided much attention.

Various approaches have been utilized to achieve high accuracy in monofunctional enzyme prediction. Bioinformatics approach has attained considerable achievements by using information on the protein sequence and structure[2]. Huang et al.[3] proposed an adaptive fuzzy k-nearest neighbor method with Am-Pse-AAC feature extraction method, which was first developed by Kou-Chen Chou for enzyme subfamily class prediction, and attained an excellent accuracy of 92.1% for the six main families. EzyPred[4] is a three-layer predictor that is based on PSSM; it considers protein evolutionary information abundant in the profiles. The second layer responsible for predicting the main function class achieves 93.7% accuracy. EFICAz[5] has a high accuracy of 92% in predicting four EC digit levels in a jackknife test on test sequences that are <40% identical to any sequences in the training dataset.

With regard to multifunctional enzyme prediction, Luna De Ferrari et al.[6] and Zou[7] achieved good results. Luna De Ferrari presented EnzyML, a multi-label classification method that employs InterPro signatures. This method can efficiently provide an explanation for proteins with multiple enzymatic functions and achieves over 98% subset accuracy without utilizing any feature extraction algorithms. Zou proposed two feature algorithms to make predictions and obtained 99.54% and 98.73% accuracy by using 20-D and 188-D features, respectively; however, dataset redundancy was not mentioned in the paper.

The enzyme sequence in the present study was obtained from the Swiss-Prot Database (release 2014.9), an authoritative organization that provides high-quality annotated protein sequences. After redundancy removal with cluster database—high identity with tolerance (CD—HIT)[8], the similarity of the sequence is established below 65% to ensure the effectiveness of the experiments. ACC is then applied[9, 10] for feature extraction. This method was first proposed by Dong as a taxonomy-based protein fold recognition approach and has not been utilized in enzyme classification yet. Accuracy of 94.1% in monofunctional enzyme classification is obtained by using the K-nearest neighbor classifier. With regard to multifunctional enzymes, an average precision of 95.54% and 91.25% is obtained after five cross-validation tests on all enzymes and multifunctional enzymes, respectively.

Method

Data preprocessing

The original downloaded dataset consists of 214,375 sequences. However, each enzyme class has duplicate sequences. 207,430 sequences remained after duplicate elimination. To eliminate the negative effect of sequence similarity, CD-HIT, a widely utilized procedure to reduce sequence redundancy and improve the performance of other sequence analyses using clustering (known as high computing speed) was applied to perform redundancy removal in the experiments. A total of 59,763 sequences with similarity below 65% were obtained. The CD-HIT algorithm progresses as follows. First, the http://cn.bing.com/dict/clientsearch?mkt=zh-CN&setLang=zh&form=BDVEHC&ClientVer=BDDTV3.5.0.4311&q=%E9%80%92%E5%87%8F%E6%8E%92%E5%BA%8F sequences are sorted in length-descending order. Second, the first series class is formed from the longest sequence, and subsequent sequences are compared with the representative sequence of the known series class. If the similarity is above the threshold set beforehand, the sequence is added in this class; otherwise, a new series class is formed. Third, the longest sequence is extracted from each class to form the final dataset. In the experiments, the threshold is set to 0.65, and the word length to compare is 5. Table 1 shows the situation before and after redundancy removal.

Download:

Table 1. Distribution of six enzyme classes before and after CD-HIT(0.65).

https://doi.org/10.1371/journal.pone.0153503.t001

Notably, the multifunctional enzymes in the six classes have not been removed yet. Table 2 shows the distribution of multifunctional enzymes in the six classes.

Download:

Table 2. Distribution of multifunctional enzymes before and after CD-HIT(0.65).

https://doi.org/10.1371/journal.pone.0153503.t002

Feature extraction algorithm

Position-specific scoring matrix.

For convenience of discussion, we denote a protein sequence as S, which is expressed as (1) where L represents the length of S and s_i(1 ≤ i ≤ L) represents one item of the amino acid alphabet, which is expressed as {A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y}[11]. For sequence S, the position-specific scoring matrix (PSSM) was generated by implementing the PSI-BLAST program[12]. PSSM is a L*20 matrix[13] and can be expressed as follows: (2) where each row represents the corresponding position of S (e.g., the 1st row refers to s₁, the 2nd row refers to s₂, and so forth). Each column represents the corresponding residue type of the amino acid alphabet (e.g., the 1st column refers to “A,” the 2nd row refers to “C,” and so forth). p_i,j(1 ≤ i ≤ L, j = 1,2,…, 20) is a score that represents the odds of s_i being mutated to residue type j during evolutionary processes; for example, p_1,1 represents the odds of s₁ being mutated to residue type “A”. A high score for p_i,j usually indicates that the mutation occurs frequently and that the corresponding residue in that position may be functional.

ACC feature representation algorithm.

The framework consists of two feature models denoted as AC and CC. By using the PSSM of Eq (2), the enzyme sequence is formulated into a 20-D feature vector. The 20-D feature vector is calculated as (3) where represents the average score of the amino acids in the enzyme sequence, which indicates the general odds of the sequence being muted to residue j during the evolutionary process.

In the model of AC, the enzyme sequence is computed as (4)

As shown in Eq (4), F_AC measures the average correlation between two amino acids separated by a distance of λ in the enzyme sequence. The dimension of the feature vector F_AC is λ * 20.

In the model of CC, the enzyme sequence is computed as (5)

As shown in Eq (5), F_CC measures the average correlation between two amino acids separated by a distance of λ in the enzyme sequence among 20 types of standard amino acids. The dimension of the feature vector F_CC is λ * 380.

Combining F_AC and F_CC generates a (400 * λ)−D feature vector to represent the enzyme sequence, as represented by (6)

The ACC feature representation algorithm fully employs the influence of the position correlation among sequence amino acids on protein homology detection. Secondary structure features[14, 15] were considered in other protein classification works. However, it is too time consuming for constructing web server.

Classifier selection and tools

KNN algorithm.

The K-nearest neighbors (KNN) algorithm is a mature method and is one of the simplest machine learning algorithms in theory. It is widely used for classification and regression. The key idea in this algorithm is that an object can be assigned to a class if the majority of its k nearest neighbors belong to this class. If k equals 1, then the object is simply assigned to the class of that single nearest neighbor.

For instance, in Fig 1, the objective is to classify the test sample (star) either to the first class of triangles or to the second class of squares. If k equals three, we assign it to the second class according to dashed line circle because two squares and only one triangle exist inside the circle. If k equals five, we assign it to the first class according to the solid line circle because three triangles and only two squares exist inside the circle.

Download:

Fig 1. KNN algorithm diagram.

https://doi.org/10.1371/journal.pone.0153503.g001

The choice of parameter k in this algorithm is important and depends on the data mostly. Generally, a large value of k dilutes the effect of noise in the classification but renders the boundaries between the categories less distinct. In our experiments, a large k value does not perform well.

KNN has been extensively utilized for the classification task in bioinformatics. Many recent studies have proven its high efficiency. In our experiments, we implemented a host of underlying classification algorithms and found that KNN is 20% more accurate than others.

WEKA and MULAN.

Two of the main tools we utilized are Waikato environment for knowledge analysis (WEKA) and multi-label learning (MULAN). WEKA is an ensemble Java package with numerous machine learning algorithms and a graphical user interface. Several standard data mining tasks, including data preprocessing, feature selection, clustering, classification, regression, and visualization, are supported. MULAN is a Java library for learning from multi-label data. WEKA and MULAN contain an evaluation framework that calculates a rich variety of performance measures. They provide a convenient means to compare performance on different data using different classifiers.

Measurement

Single-label measurement.

Given multi-label test datasets S = {(x_i,y_i)|1≤ i ≤ n}, for class y_i where 1≤ j ≤ m, the binary classification performance of a predictor is presented by the four variables below.

TP_j indicates the number of true positive instances, FP_j indicates the number of false positive instances, TN_j indicates the number of true negative instances, and FN_j indicates the number of false negative instances. h(x_i) indicates the classification results of sample x_i predicted by classifier h.

We obtained four evaluation performance indicators according to these four variables as shown below[1, 16–22].

(7)

(8)

(9)

(10)

Multi-label measurement.

We employed two evaluation indicators[23], namely, example-based and label-based metrics. For example-based metrics, we calculated the classification results for each sample first and then obtained the average value for the entire dataset.

We considered multi-label classifier h and multi-label dataset S = {(x_i,Y_i)|1≤ i ≤ n}, where Y_i is the label collection of sample x_i. Y_i = {0,1,1,0,1,0} denotes that sample x_i belongs to classes 1, 2, and 4 simultaneously.

(11)

This index indicates the performance of the relevance tag emerging before a certain tag in the sorted class label sequences. The higher average precision is, the better the performance is; the best value is 1.

For label-based metrics, we calculated the binary classification results for each class first and then obtained the average value for all classes.

Based on single-label measurement, we supposed that B(TP_i, FP_i, TN_i, FN_i) represents the binary classification indicator. The following are defined.

(12)

(13)

B_macro measures the classification capability in each class and obtains the average of all classes as the final result. Its main idea is that each class shares the same weight. However, B_micro endows each sample the same weight. It calculates the sum of values in all classes and then utilizes the value to obtain classification capability as the final result. Such is the difference between these two indicators.

Multi-label classification ensemble algorithm.

Suppose that m classifiers solve an n-class classification problem. We define score matrix scoreVectors, and scoreVectors(i,j) indicates the possibility of the sample being classified into class j by classifier i, where 0≤scoreVectors(i,j)≤1, 1≤i≤n, 1≤j≤m.

Similarly, we define binary matrix bipartitionVectors, and bipartitionVectors(i,j) represents whether the sample is classified into class j by classifier i, where bipartitionVectors(i,j)∈{0,1}, 1≤i≤n, 1≤j≤m.

Below are three ensemble methods. (14) (15) (16) where Sort(scoreVectors(i,j)) represents the scores being sorting in descending order.

Result and Discussion

Monofunctional enzyme classification

First, we evaluated the importance of distance parameter λ in the ACC feature representation algorithm; 94.1% accuracy is attained for the dataset with similarity below 65% when λ is set to 1. With the increase in parameter λ, the improvement is not evident (only 0.1% increase), but time consumption is multiplied. This condition implies that the homology among adjacent amino acids is high. Second, we compared the performance of ACC method in different classifiers. IB1, which was built by KNN where neighbor k was set to 1, yielded the best results. The comparison results are shown in Fig 2.

Download:

Fig 2. Results of ACC method on different classifiers.

https://doi.org/10.1371/journal.pone.0153503.g002

We also compared ACC with other popular protein prediction methods, such as 188D[24] (which considers the constitution, physicochemical properties[25], and distribution of amino acids), liu_feature (820D)[26] (which combines evolution information extracted from frequency profiles with sequence-based kernels for protein remote homology detection), n-gram (20D)[27] proposed by Browm et al. (which denotes the feature vectors by probability calculation), Pse-AAC (420D) originally proposed by Chou[28, 29] (which has been comprehensively applied for diverse biological sequence analyses as an effective protein descriptor[30–38], and DNA descriptor[39–42]. As shown in Fig 3, the advantage of the ACC algorithm is obvious.

Download:

Fig 3. Results of fivefeaturerepresentationmethods on IB1 classifier.

https://doi.org/10.1371/journal.pone.0153503.g003

Aside from these five feature representation methods, we also tested two other enzyme-oriented online platforms. The first one is EzyPred. We randomly extracted 10 enzyme sequences from each class within one multifunctional enzyme as the test dataset and obtained 80% accuracy, which is lower than the 93.7% accuracy mentioned in the paper. The public test website http://www.csbio.sjtu.edu.cn/bioinf/EzyPred/EzyPred is free to the public. The second platform is EFICAz2.5[11, 43]. We obtained 86.4% accuracy with the code obtained from the link http://cssb.biology.gatech.edu/skolnick/webservice/EFICAz2/index.html. This accuracy value is lower than the 92% accuracy mentioned in the paper.

Multifunctional enzyme classification

We applied the ACC method to multifunctional enzyme classification according to the results of monofunctional enzyme prediction. Given that KNN works well in monofunctional enzyme classification, we focused on classifiers (IBLR_ML[44]/MLkNN[45]/BRkNN[46]) whose kernel is the KNN algorithm with the aid of MULAN. Two other classifiers (RakEL[47]/HOMER) were also tested. From Table 3, we can see that the classifier IBLR_ML obtained the best average precision of 95.54%. Classifiers MLkNN and BRkNN also produced good results.

Download:

Table 3. Cross-validation results of Multi-Label classifiers.

https://doi.org/10.1371/journal.pone.0153503.t003

To test the classification performance of the multifunctional enzyme further, we performed cross validation on the multifunctional enzyme only. To ensure data reliability and experimental accuracy, the threshold of data redundancy was set to 0.9. Then, we obtained the dataset in Table 4. Table 5 shows that 89.4% average precision was obtained.

Download:

Table 4. Distribution of multifunctional enzyme after de-redundance (0.9).

https://doi.org/10.1371/journal.pone.0153503.t004

Download:

Table 5. Cross-validation results of Multi-Label classification on multifunctional enzymes only.

https://doi.org/10.1371/journal.pone.0153503.t005

To obtain good results, the five classifiers shown in Table 5 are combined into one. Precision increased to 91.25% with the TOP3 combination rule.

In statistical prediction, the independent dataset test, subsampling or K-fold crossover test and jackknife test are the three cross-validation methods often used to check a predictor for its accuracy[48]. However, among the three test methods, the jackknife test is deemed the least arbitrary that can always yield a unique result for a given benchmark dataset[49]. Accordingly, the jackknife test has been increasingly used and widely recognized by investigators to examine the quality of various predictors[31, 32, 34, 39, 40, 50–54]. However, for saving computational time, the 5-fold cross-validation was used in this study.

Conclusion

We have explored a new method of multifunctional enzyme prediction. Considering the position relation and homology among amino acids[55], we extracted sequence features by using ACC method and performed prediction by using the KNN algorithm. The cross-validation test results indicate that our method outperforms other existing algorithms in datasets with similarity below 65%. Accuracy values of 94.1% in monofunctional enzyme classification and 95.54% in multifunctional enzyme classification were achieved. Compared with other existing prediction methods in the field of multifunctional enzyme class prediction, our method demonstrates better versatility and effectiveness. A public prediction—recognition platform is provided at http://server.malab.cn/MEC/. Our work is expected to be helpful for enzyme prediction in the future.

Our work just focused on the features and multi-label classifier. Some other machine learning techniques, such as feature selection[56], training sample selection[57, 58], ensemble learning[59–61], network features[62–64], imbalance classification[65, 66], ought to be considered in the next step. It is worth noting that there are many other potential tools for enzyme prediction, such as, evolutionary computation[67, 68] and spiking neural models[69–76]. Furthermore, parallel techniques, such as Map Reduce[77, 78], should also be considered for big testing data in the future.

Author Contributions

Conceived and designed the experiments: YXC FX. Performed the experiments: YXC PX. Analyzed the data: YXC YJ. Contributed reagents/materials/analysis tools: YJ PX RL. Wrote the paper: YXC YJ PX FX RL.

References

1. Cheng X-Y, Huang W-J, Hu S-C, Zhang H-L, Wang H, Zhang J-X, et al., A global characterization and identification of multifunctional enzymes. PLoS One, 2012. 7(6): p. e38979. pmid:22723914
- View Article
- PubMed/NCBI
- Google Scholar
2. Liu B, Liu F, Wang X, Chen J, Fang L and Chou K-C, Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Research, 2015. 43(W1): p. W65–W71. pmid:25958395
- View Article
- PubMed/NCBI
- Google Scholar
3. Huang WL, Chen HM, Hwang SF and Ho SY, Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method. Biosystems, 2007. 90(2): p. 405–13. pmid:17140725
- View Article
- PubMed/NCBI
- Google Scholar
4. Shen HB and Chou KC, EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. Biochem Biophys Res Commun, 2007. 364(1): p. 53–9. pmid:17931599
- View Article
- PubMed/NCBI
- Google Scholar
5. Tian W, Arakaki AK and Skolnick J, EFICAz: a comprehensive approach for accurate genome-scale enzyme function inference. Nucleic Acids Res, 2004. 32(21): p. 6226–39. pmid:15576349
- View Article
- PubMed/NCBI
- Google Scholar
6. Goryanin I, DF L, A S, vH J, EnzML: multi-label prediction of enzyme classes using InterPro signatures. 2012. 13(1).
- View Article
- Google Scholar
7. Zou Q, Chen W, Huang Y, Liu X and Jiang Y, Identifying Multi-Functional Enzyme by Hierarchical Multi-Label Classifier. Journal of Computational and Theoretical Nanoscience, 2013. 10(4): p. 1038–1043.
- View Article
- Google Scholar
8. Fu L, Niu B, Zhu Z, Wu S and Li W, CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics, 2012. 28(23): p. 3150–2. pmid:23060610
- View Article
- PubMed/NCBI
- Google Scholar
9. Dong Q, Zhou S and Guan J, A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics, 2009. 25(20): p. 2655–62. pmid:19706744
- View Article
- PubMed/NCBI
- Google Scholar
10. Liu B, Liu F, Fang L, Wang X and Chou K-C, repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects. Bioinformatics, 2015. 31(8): p. 1307–1309. pmid:25504848
- View Article
- PubMed/NCBI
- Google Scholar
11. Arakaki AK, Huang Y and Skolnick J, EFICAz2: enzyme function inference by a combined approach enhanced by machine learning. BMC Bioinformatics, 2009. 10: p. 107. pmid:19361344
- View Article
- PubMed/NCBI
- Google Scholar
12. Altschul Stephen F., Madden Thomas L., Schäffer Alejandro A., Zhang Jinghui, Zhang Zheng, Miller Webb, et al., Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 1997. 25(17): p. 3389–3402. pmid:9254694
- View Article
- PubMed/NCBI
- Google Scholar
13. Liu B, Wang S and Wang X, DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation. Scientific Reports, 2015. 5: p. 15479. pmid:26482832
- View Article
- PubMed/NCBI
- Google Scholar
14. Wei L, Liao M, Gao X and Zou Q, Enhanced Protein Fold Prediction Method through a Novel Feature Extraction Technique. IEEE Transactions on Nanobioscience, 2015. 14(6): p. 649–659. pmid:26335556
- View Article
- PubMed/NCBI
- Google Scholar
15. Wei L, Liao M, Gao X and Zou Q, An Improved Protein Structural Prediction Method by Incorporating Both Sequence and Structure Information. IEEE Transactions on Nanobioscience, 2015. 14(4): p. 339–349.
- View Article
- Google Scholar
16. Liu B, Fang L, Long R, Lan X and Chou K-C, iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition. Bioinformaitcs, 2016. 32(3): p. 362–369.
- View Article
- Google Scholar
17. Chen J, Wang X and Liu B, iMiRNA-SSF: Improving the Identification of MicroRNA Precursors by Combining Negative Sets with Different Distributions. Scientific Reports, 2016. 6: p. 19062. pmid:26753561
- View Article
- PubMed/NCBI
- Google Scholar
18. Liu B, Fang L, Liu F, Wang X and Chou K-C, iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach. Journal of Biomolecular Structure and Dynamics, 2016. 34(1): p. 220–232.
- View Article
- Google Scholar
19. Cai S, Yang S, Zheng F, Lu M, Wu Y and Krishnan S, Knee joint vibration signal analysis with matching pursuit decomposition and dynamic weighted classifier fusion. Computational and Mathematical Methods in Medicine, 2013. 2013: p. 904267. pmid:23573175
- View Article
- PubMed/NCBI
- Google Scholar
20. Chen W, Feng PM, Deng EZ, Lin H and Chou KC, iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. Anal Biochem, 2014. 462: p. 76–83. pmid:25016190
- View Article
- PubMed/NCBI
- Google Scholar
21. Chen W, Feng PM, Lin H and Chou KC, iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res, 2013. 41(6): p. e68. pmid:23303794
- View Article
- PubMed/NCBI
- Google Scholar
22. Chen W, Feng PM, Lin H and Chou KC, iSS-PseDNC: identifying splicing sites using pseudo dinucleotide composition. Biomed Res Int, 2014. 2014: p. 623149. pmid:24967386
- View Article
- PubMed/NCBI
- Google Scholar
23. Zhang Min-Ling and Zhou Z-H, A Review on Multi-Label Learning Algorithms. IEEE Transactions on Knowledge & Data Engineering, 2014. 26(8): p. 1.
- View Article
- Google Scholar
24. Lin C, Zou Y, Qin J, Liu X, Jiang Y, Ke C, et al., Hierarchical classification of protein folds using a novel ensemble classifier. PLoS One, 2013. 8(2): p. e56499. pmid:23437146
- View Article
- PubMed/NCBI
- Google Scholar
25. Liu B, Wang X, Chen Q, Dong Q and Lan X, Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection. PLoS ONE, 2012. 7(9): p. e46633. pmid:23029559
- View Article
- PubMed/NCBI
- Google Scholar
26. Liu B, Zhang D, Xu R, Xu J, Wang X, Chen Q, et al., Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics, 2013. 30(4): p. 472–479. pmid:24318998
- View Article
- PubMed/NCBI
- Google Scholar
27. Brown Peter F., Della Pietra Vincen t J., deSouza Peter V., Lai Jenifer C. and Mercer ReL, Class-based n-gram models of natural language. Computational linguistics, 1992. 18(4): p. 467–479.
- View Article
- Google Scholar
28. Chou KC, Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics, 2005. 21(1): p. 10–9. pmid:15308540
- View Article
- PubMed/NCBI
- Google Scholar
29. Liu B, Xu J, Fan S, Xu R, Zhou J and Wang X, PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou's PseAAC and Physicochemical Distance Transformation. Molecular Informatics, 2015. 34(1): p. 8–17.
- View Article
- Google Scholar
30. Liu B, Xu J, Lan X, Xu R, Zhou J, Wang X, et al., iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance-Pairs and Reduced Alphabet Profile into the General Pseudo Amino Acid Composition. PLoS ONE, 2014. 9(9): p. e106691. pmid:25184541
- View Article
- PubMed/NCBI
- Google Scholar
31. Ding H, Liu L, Guo FB, Huang JA and Lin H, Identify Golgi Protein Types with Modified Mahalanobis Discriminant Algorithm and Pseudo Amino Acid Composition. Protein And Peptide Letters, 2011. 18(1): p. 58–63. pmid:20955168
- View Article
- PubMed/NCBI
- Google Scholar
32. Ding H, Luo LF and Lin H, Prediction of Cell Wall Lytic Enzymes Using Chou's Amphiphilic Pseudo Amino Acid Composition. Protein And Peptide Letters, 2009. 16(4): p. 351–355. pmid:19356130
- View Article
- PubMed/NCBI
- Google Scholar
33. Lin H, Ding H, Guo FB, Zhang AY and Huang J, Predicting subcellular localization of mycobacterial proteins by using Chou's pseudo amino acid composition. Protein And Peptide Letters, 2008. 15(7): p. 739–744. pmid:18782071
- View Article
- PubMed/NCBI
- Google Scholar
34. Zhu PP, Li WC, Zhong ZJ, Deng EZ, Ding H, Chen W, et al., Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition. Molecular Biosystems, 2015. 11(2): p. 558–563. pmid:25437899
- View Article
- PubMed/NCBI
- Google Scholar
35. Ding H, Deng EZ, Yuan LF, Liu L, Lin H, Chen W, et al., iCTX-Type: A Sequence-Based Predictor for Identifying the Types of Conotoxins in Targeting Ion Channels. Biomed Research International, 2014.
- View Article
- Google Scholar
36. Feng P, Jiang N and Liu N, Prediction of DNase I hypersensitive sites by using pseudo nucleotide compositions. ScientificWorldJournal, 2014. 2014: p. 740506. pmid:25215331
- View Article
- PubMed/NCBI
- Google Scholar
37. Feng P, Lin H, Chen W and Zuo Y, Predicting the types of J-proteins using clustered amino acids. Biomed Res Int, 2014. 2014: p. 935719. pmid:24804260
- View Article
- PubMed/NCBI
- Google Scholar
38. Feng PM, Chen W, Lin H and Chou KC, iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal Biochem, 2013. 442(1): p. 118–25. pmid:23756733
- View Article
- PubMed/NCBI
- Google Scholar
39. Guo SH, Deng EZ, Xu LQ, Ding H, Lin H, Chen W, et al., iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics, 2014. 30(11): p. 1522–1529. pmid:24504871
- View Article
- PubMed/NCBI
- Google Scholar
40. Lin H, Deng EZ, Ding H, Chen W and Chou KC, iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Research, 2014. 42(21): p. 12961–12972. pmid:25361964
- View Article
- PubMed/NCBI
- Google Scholar
41. Chen W, Feng P, Ding H, Lin H and Chou KC, iRNA-Methyl: Identifying N(6)-methyladenosine sites using pseudo nucleotide composition. Anal Biochem, 2015. 490: p. 26–33. pmid:26314792
- View Article
- PubMed/NCBI
- Google Scholar
42. Feng P, Chen W and Lin H, Prediction of CpG island methylation status by integrating DNA physicochemical properties. Genomics, 2014. 104(4): p. 229–33. pmid:25172426
- View Article
- PubMed/NCBI
- Google Scholar
43. Kumar N and Skolnick J, EFICAz2.5: application of a high-precision enzyme function predictor to 396 proteomes. Bioinformatics, 2012. 28(20): p. 2687–8. pmid:22923291
- View Article
- PubMed/NCBI
- Google Scholar
44. Cheng Wei-Wei and Hullermeier E, Combining instance-based learning and logistic regression for multilabel classification. Machine Learning, 2009. 76(2–3): p. 211–225.
- View Article
- Google Scholar
45. Zhang Min-ling and Zhou Z, ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 2007. 40: p. 2038–2048.
- View Article
- Google Scholar
46. E. Spyromitros, G. Tsoumakas and Vlahavas. I, An empirical study of lazy multilabel classification algorithms. Proc. 5th Hellenic Conference on Artificial Intelligence, 2008.
47. Tsoumakas Grigorios and Vlahavas I, Random k-Labelsets An Ensemble Method for Multilabel Classification. Pattern Recognition, 2007. 4701: p. 406–417.
- View Article
- Google Scholar
48. Chou KC and Zhang CT, Prediction of Protein Structural Classes. Critical Reviews in Biochemistry and Molecular Biology, 1995. 30(4): p. 275–349. pmid:7587280
- View Article
- PubMed/NCBI
- Google Scholar
49. Chou KC, Some remarks on protein attribute prediction and pseudo amino acid composition. Journal of Theoretical Biology, 2011. 273(1): p. 236–247. pmid:21168420
- View Article
- PubMed/NCBI
- Google Scholar
50. Yuan LF, Ding C, Guo SH, Ding H, Chen W and Lin H, Prediction of the types of ion channel-targeted conotoxins based on radial basis function network. Toxicology in Vitro, 2013. 27(2): p. 852–856. pmid:23280100
- View Article
- PubMed/NCBI
- Google Scholar
51. Lin H, The modified Mahalanobis Discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition. Journal of Theoretical Biology, 2008. 252(2): p. 350–356. pmid:18355838
- View Article
- PubMed/NCBI
- Google Scholar
52. Lin H, Ding C, Song Q, Yang P, Ding H, Deng KJ, et al., The prediction of protein structural class using averaged chemical shifts. Journal of Biomolecular Structure & Dynamics, 2012. 29(6): p. 643–649.
- View Article
- Google Scholar
53. Feng P, Chen W and Lin H, Identifying Antioxidant Proteins by Using Optimal Dipeptide Compositions. Interdiscip Sci, 2015.
- View Article
- Google Scholar
54. Tang H, Chen W and Lin H, Identification of immunoglobulins using Chou's pseudo amino acid composition with feature selection technique. Molecular BioSystems, 2016.
- View Article
- Google Scholar
55. Liu B, Chen J and Wang X, Application of Learning to Rank to protein remote homology detection. Bioinformatics, 2015. 31(21): p. 3492–3498. pmid:26163693
- View Article
- PubMed/NCBI
- Google Scholar
56. Zou Q, Zeng J, Cao L and Ji R, A Novel Features Ranking Metric with Application to Scalable Visual and Bioinformatics Data Classification. Neurocomputing, 2016. 173: p. 346–354.
- View Article
- Google Scholar
57. Wei L, Liao M, Gao Y, Ji R, He Z and Zou Q, Improved and Promising Identification of Human MicroRNAs by Incorporating a High-quality Negative Set. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2014. 11(1): p. 192–201 pmid:26355518
- View Article
- PubMed/NCBI
- Google Scholar
58. Zeng X, Yuan S, Huang X and Zou Q, Identification of cytokine via an improved genetic algorithm. Frontiers of Computer Science, 2015. 9(4): p. 643–651.
- View Article
- Google Scholar
59. Wang C, Hu L, Guo M, Liu X and Zou Q, imDC: an ensemble learning method for imbalanced classification with miRNA data. Genetics and Molecular Research, 2015. 14(1): p. 123–133. pmid:25729943
- View Article
- PubMed/NCBI
- Google Scholar
60. Zou Q, Wang Z, Guan X, Liu B, Wu Y and Lin Z, An approach for identifying cytokines based on a novel ensemble classifier. BioMed research international, 2013. 2013(2013): p. 686090.
- View Article
- Google Scholar
61. Lin C, Chen W, Qiu C, Wu Y, Krishnan S and Zou Q, LibD3C: Ensemble Classifiers with a Clustering and Dynamic Selection Strategy. Neurocomputing, 2014. 123: p. 424–435.
- View Article
- Google Scholar
62. Zou Q, Li J, Song L, Zeng X and Wang G, Similarity computation strategies in the microRNA-disease network: A Survey. Briefings in Functional Genomics, 2016. 15(1): p. 55–64. pmid:26134276
- View Article
- PubMed/NCBI
- Google Scholar
63. Zeng X, Zhang X and Zou Q, Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks. Briefings in Bioinformatics, 2016. 17(2): p. 193–203. pmid:26059461
- View Article
- PubMed/NCBI
- Google Scholar
64. Zeng X, Liao Y, Liu Y and Zou Q, Prediction and validation of disease genes using HeteSim Scores. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2016.
- View Article
- Google Scholar
65. Zou Q, Xie S, Lin Z, Wu M and Ju Y, Finding the best classification threshold in imbalanced classification. Big Data Research, 2016.
- View Article
- Google Scholar
66. Song L, Li D, Zeng X, Wu Y, Guo L and Zou Q, nDNA-prot: Identification of DNA-binding Proteins Based on Unbalanced Classification. BMC Bioinformatics, 2014. 15: p. 298. pmid:25196432
- View Article
- PubMed/NCBI
- Google Scholar
67. Zhang Xingyi, T Y, Cheng Ran, Jin Yaochu, An efficient approach to non-dominated sorting for evolutionary multi-objective optimization. IEEE Transactions on Evolutionary Computation, 2015. 19(2): p. 201–213.
- View Article
- Google Scholar
68. Zhang Xingyi, T Y, Jin Yaochu, A knee point driven evolutionary algorithm for many-objective optimization. IEEE Transactions on Evolutionary Computation, 2015. 19(6): p. 761–776.
- View Article
- Google Scholar
69. Zeng Xiangxiang, P L, Pérez-Jiménez Mario J., Small Universal Simple Spiking Neural P System with Weights. SCIENCE CHINA: Information Science, 2014. 57(9): p. 92–102.
- View Article
- Google Scholar
70. Zeng Xiangxiang, Z X, Song Tao, Pan Linqiang, Spiking Neural P Systems with Thresholds. Neural Computation, 2014. 26(7): p. 1340–1361. pmid:24708366
- View Article
- PubMed/NCBI
- Google Scholar
71. Zhang Xingyi, P L, Pǎun Andrei, On universality of axon P systems. IEEE Transactions on Neural Networks and Learning Systems, 2015. 26(11): p. 2816–2829.
- View Article
- Google Scholar
72. Zhang Xingyi, Z X, Luo Bin, Pan Linqiang, On some classes of sequential spiking neural P systems. Neural Computation, 2014. 26(5): p. 974–997. pmid:24555456
- View Article
- PubMed/NCBI
- Google Scholar
73. Zhang Xingyi, L Y, Luo Bin, Pan Linqiang, Computational power of tissue P systems for generating control languages. Information Sciences, 2014. 278(10): p. 285–297.
- View Article
- Google Scholar
74. Chen Xu, P-J MJ, Valencia-Cabrera Luis, Wang Beizhan, Zeng Xiangxiang, Computing with viruses. Theoretical Computer Science, 2015.
- View Article
- Google Scholar
75. Song Tao, P L, On the Universality and Non-universality of Spiking Neural P Systems with Rules on Synapses. IEEE Trans on Nanobioscience, 2015.
- View Article
- Google Scholar
76. Song Tao, X J, Pan Linaqing, Spiking Neural P Systems with Request Rules. Neurocomputing, 2016.
- View Article
- Google Scholar
77. Zou Q, Li X, Jiang W, Lin Z, Li G and Chen K, Survey of MapReduce Frame Operation in Bioinformatics. Briefings in Bioinformatics, 2014. 15(4): p. 637–647. pmid:23396756
- View Article
- PubMed/NCBI
- Google Scholar
78. Zou Q, Hu Q, Guo M and Wang G, HAlign: Fast Multiple Similar DNA/RNA Sequence Alignment Based on the Centre Star Strategy. Bioinformatics, 2015. 31(15): p. 2475–2481. pmid:25812743
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Cheng X-Y, Huang W-J, Hu S-C, Zhang H-L, Wang H, Zhang J-X, et al., A global characterization and identification of multifunctional enzymes. PLoS One, 2012. 7(6): p. e38979. pmid:22723914
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Liu B, Liu F, Wang X, Chen J, Fang L and Chou K-C, Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Research, 2015. 43(W1): p. W65–W71. pmid:25958395
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Huang WL, Chen HM, Hwang SF and Ho SY, Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method. Biosystems, 2007. 90(2): p. 405–13. pmid:17140725
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Shen HB and Chou KC, EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. Biochem Biophys Res Commun, 2007. 364(1): p. 53–9. pmid:17931599
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Tian W, Arakaki AK and Skolnick J, EFICAz: a comprehensive approach for accurate genome-scale enzyme function inference. Nucleic Acids Res, 2004. 32(21): p. 6226–39. pmid:15576349
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Goryanin I, DF L, A S, vH J, EnzML: multi-label prediction of enzyme classes using InterPro signatures. 2012. 13(1).
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref7] 7. Zou Q, Chen W, Huang Y, Liu X and Jiang Y, Identifying Multi-Functional Enzyme by Hierarchical Multi-Label Classifier. Journal of Computational and Theoretical Nanoscience, 2013. 10(4): p. 1038–1043.
View Article
Google Scholar

[25] View Article

[26] Google Scholar

[ref8] 8. Fu L, Niu B, Zhu Z, Wu S and Li W, CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics, 2012. 28(23): p. 3150–2. pmid:23060610
View Article
PubMed/NCBI
Google Scholar

[28] View Article

[29] PubMed/NCBI

[30] Google Scholar

[ref9] 9. Dong Q, Zhou S and Guan J, A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics, 2009. 25(20): p. 2655–62. pmid:19706744
View Article
PubMed/NCBI
Google Scholar

[32] View Article

[33] PubMed/NCBI

[34] Google Scholar

[ref10] 10. Liu B, Liu F, Fang L, Wang X and Chou K-C, repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects. Bioinformatics, 2015. 31(8): p. 1307–1309. pmid:25504848
View Article
PubMed/NCBI
Google Scholar

[36] View Article

[37] PubMed/NCBI

[38] Google Scholar

[ref11] 11. Arakaki AK, Huang Y and Skolnick J, EFICAz2: enzyme function inference by a combined approach enhanced by machine learning. BMC Bioinformatics, 2009. 10: p. 107. pmid:19361344
View Article
PubMed/NCBI
Google Scholar

[40] View Article

[41] PubMed/NCBI

[42] Google Scholar

[ref12] 12. Altschul Stephen F., Madden Thomas L., Schäffer Alejandro A., Zhang Jinghui, Zhang Zheng, Miller Webb, et al., Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 1997. 25(17): p. 3389–3402. pmid:9254694
View Article
PubMed/NCBI
Google Scholar

[44] View Article

[45] PubMed/NCBI

[46] Google Scholar

[ref13] 13. Liu B, Wang S and Wang X, DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation. Scientific Reports, 2015. 5: p. 15479. pmid:26482832
View Article
PubMed/NCBI
Google Scholar

[48] View Article

[49] PubMed/NCBI

[50] Google Scholar

[ref14] 14. Wei L, Liao M, Gao X and Zou Q, Enhanced Protein Fold Prediction Method through a Novel Feature Extraction Technique. IEEE Transactions on Nanobioscience, 2015. 14(6): p. 649–659. pmid:26335556
View Article
PubMed/NCBI
Google Scholar

[52] View Article

[53] PubMed/NCBI

[54] Google Scholar

[ref15] 15. Wei L, Liao M, Gao X and Zou Q, An Improved Protein Structural Prediction Method by Incorporating Both Sequence and Structure Information. IEEE Transactions on Nanobioscience, 2015. 14(4): p. 339–349.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref16] 16. Liu B, Fang L, Long R, Lan X and Chou K-C, iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition. Bioinformaitcs, 2016. 32(3): p. 362–369.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref17] 17. Chen J, Wang X and Liu B, iMiRNA-SSF: Improving the Identification of MicroRNA Precursors by Combining Negative Sets with Different Distributions. Scientific Reports, 2016. 6: p. 19062. pmid:26753561
View Article
PubMed/NCBI
Google Scholar

[62] View Article

[63] PubMed/NCBI

[64] Google Scholar

[ref18] 18. Liu B, Fang L, Liu F, Wang X and Chou K-C, iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach. Journal of Biomolecular Structure and Dynamics, 2016. 34(1): p. 220–232.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref19] 19. Cai S, Yang S, Zheng F, Lu M, Wu Y and Krishnan S, Knee joint vibration signal analysis with matching pursuit decomposition and dynamic weighted classifier fusion. Computational and Mathematical Methods in Medicine, 2013. 2013: p. 904267. pmid:23573175
View Article
PubMed/NCBI
Google Scholar

[69] View Article

[70] PubMed/NCBI

[71] Google Scholar

[ref20] 20. Chen W, Feng PM, Deng EZ, Lin H and Chou KC, iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. Anal Biochem, 2014. 462: p. 76–83. pmid:25016190
View Article
PubMed/NCBI
Google Scholar

[73] View Article

[74] PubMed/NCBI

[75] Google Scholar

[ref21] 21. Chen W, Feng PM, Lin H and Chou KC, iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res, 2013. 41(6): p. e68. pmid:23303794
View Article
PubMed/NCBI
Google Scholar

[77] View Article

[78] PubMed/NCBI

[79] Google Scholar

[ref22] 22. Chen W, Feng PM, Lin H and Chou KC, iSS-PseDNC: identifying splicing sites using pseudo dinucleotide composition. Biomed Res Int, 2014. 2014: p. 623149. pmid:24967386
View Article
PubMed/NCBI
Google Scholar

[81] View Article

[82] PubMed/NCBI

[83] Google Scholar

[ref23] 23. Zhang Min-Ling and Zhou Z-H, A Review on Multi-Label Learning Algorithms. IEEE Transactions on Knowledge & Data Engineering, 2014. 26(8): p. 1.
View Article
Google Scholar

[85] View Article

[86] Google Scholar

[ref24] 24. Lin C, Zou Y, Qin J, Liu X, Jiang Y, Ke C, et al., Hierarchical classification of protein folds using a novel ensemble classifier. PLoS One, 2013. 8(2): p. e56499. pmid:23437146
View Article
PubMed/NCBI
Google Scholar

[88] View Article

[89] PubMed/NCBI

[90] Google Scholar

[ref25] 25. Liu B, Wang X, Chen Q, Dong Q and Lan X, Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection. PLoS ONE, 2012. 7(9): p. e46633. pmid:23029559
View Article
PubMed/NCBI
Google Scholar

[92] View Article

[93] PubMed/NCBI

[94] Google Scholar

[ref26] 26. Liu B, Zhang D, Xu R, Xu J, Wang X, Chen Q, et al., Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics, 2013. 30(4): p. 472–479. pmid:24318998
View Article
PubMed/NCBI
Google Scholar

[96] View Article

[97] PubMed/NCBI

[98] Google Scholar

[ref27] 27. Brown Peter F., Della Pietra Vincen t J., deSouza Peter V., Lai Jenifer C. and Mercer ReL, Class-based n-gram models of natural language. Computational linguistics, 1992. 18(4): p. 467–479.
View Article
Google Scholar

[100] View Article

[101] Google Scholar

[ref28] 28. Chou KC, Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics, 2005. 21(1): p. 10–9. pmid:15308540
View Article
PubMed/NCBI
Google Scholar

[103] View Article

[104] PubMed/NCBI

[105] Google Scholar

[ref29] 29. Liu B, Xu J, Fan S, Xu R, Zhou J and Wang X, PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou's PseAAC and Physicochemical Distance Transformation. Molecular Informatics, 2015. 34(1): p. 8–17.
View Article
Google Scholar

[107] View Article

[108] Google Scholar

[ref30] 30. Liu B, Xu J, Lan X, Xu R, Zhou J, Wang X, et al., iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance-Pairs and Reduced Alphabet Profile into the General Pseudo Amino Acid Composition. PLoS ONE, 2014. 9(9): p. e106691. pmid:25184541
View Article
PubMed/NCBI
Google Scholar

[110] View Article

[111] PubMed/NCBI

[112] Google Scholar

[ref31] 31. Ding H, Liu L, Guo FB, Huang JA and Lin H, Identify Golgi Protein Types with Modified Mahalanobis Discriminant Algorithm and Pseudo Amino Acid Composition. Protein And Peptide Letters, 2011. 18(1): p. 58–63. pmid:20955168
View Article
PubMed/NCBI
Google Scholar

[114] View Article

[115] PubMed/NCBI

[116] Google Scholar

[ref32] 32. Ding H, Luo LF and Lin H, Prediction of Cell Wall Lytic Enzymes Using Chou's Amphiphilic Pseudo Amino Acid Composition. Protein And Peptide Letters, 2009. 16(4): p. 351–355. pmid:19356130
View Article
PubMed/NCBI
Google Scholar

[118] View Article

[119] PubMed/NCBI

[120] Google Scholar

[ref33] 33. Lin H, Ding H, Guo FB, Zhang AY and Huang J, Predicting subcellular localization of mycobacterial proteins by using Chou's pseudo amino acid composition. Protein And Peptide Letters, 2008. 15(7): p. 739–744. pmid:18782071
View Article
PubMed/NCBI
Google Scholar

[122] View Article

[123] PubMed/NCBI

[124] Google Scholar

[ref34] 34. Zhu PP, Li WC, Zhong ZJ, Deng EZ, Ding H, Chen W, et al., Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition. Molecular Biosystems, 2015. 11(2): p. 558–563. pmid:25437899
View Article
PubMed/NCBI
Google Scholar

[126] View Article

[127] PubMed/NCBI

[128] Google Scholar

[ref35] 35. Ding H, Deng EZ, Yuan LF, Liu L, Lin H, Chen W, et al., iCTX-Type: A Sequence-Based Predictor for Identifying the Types of Conotoxins in Targeting Ion Channels. Biomed Research International, 2014.
View Article
Google Scholar

[130] View Article

[131] Google Scholar

[ref36] 36. Feng P, Jiang N and Liu N, Prediction of DNase I hypersensitive sites by using pseudo nucleotide compositions. ScientificWorldJournal, 2014. 2014: p. 740506. pmid:25215331
View Article
PubMed/NCBI
Google Scholar

[133] View Article

[134] PubMed/NCBI

[135] Google Scholar

[ref37] 37. Feng P, Lin H, Chen W and Zuo Y, Predicting the types of J-proteins using clustered amino acids. Biomed Res Int, 2014. 2014: p. 935719. pmid:24804260
View Article
PubMed/NCBI
Google Scholar

[137] View Article

[138] PubMed/NCBI

[139] Google Scholar

[ref38] 38. Feng PM, Chen W, Lin H and Chou KC, iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal Biochem, 2013. 442(1): p. 118–25. pmid:23756733
View Article
PubMed/NCBI
Google Scholar

[141] View Article

[142] PubMed/NCBI

[143] Google Scholar

[ref39] 39. Guo SH, Deng EZ, Xu LQ, Ding H, Lin H, Chen W, et al., iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics, 2014. 30(11): p. 1522–1529. pmid:24504871
View Article
PubMed/NCBI
Google Scholar

[145] View Article

[146] PubMed/NCBI

[147] Google Scholar

[ref40] 40. Lin H, Deng EZ, Ding H, Chen W and Chou KC, iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Research, 2014. 42(21): p. 12961–12972. pmid:25361964
View Article
PubMed/NCBI
Google Scholar

[149] View Article

[150] PubMed/NCBI

[151] Google Scholar

[ref41] 41. Chen W, Feng P, Ding H, Lin H and Chou KC, iRNA-Methyl: Identifying N(6)-methyladenosine sites using pseudo nucleotide composition. Anal Biochem, 2015. 490: p. 26–33. pmid:26314792
View Article
PubMed/NCBI
Google Scholar

[153] View Article

[154] PubMed/NCBI

[155] Google Scholar

[ref42] 42. Feng P, Chen W and Lin H, Prediction of CpG island methylation status by integrating DNA physicochemical properties. Genomics, 2014. 104(4): p. 229–33. pmid:25172426
View Article
PubMed/NCBI
Google Scholar

[157] View Article

[158] PubMed/NCBI

[159] Google Scholar

[ref43] 43. Kumar N and Skolnick J, EFICAz2.5: application of a high-precision enzyme function predictor to 396 proteomes. Bioinformatics, 2012. 28(20): p. 2687–8. pmid:22923291
View Article
PubMed/NCBI
Google Scholar

[161] View Article

[162] PubMed/NCBI

[163] Google Scholar

[ref44] 44. Cheng Wei-Wei and Hullermeier E, Combining instance-based learning and logistic regression for multilabel classification. Machine Learning, 2009. 76(2–3): p. 211–225.
View Article
Google Scholar

[165] View Article

[166] Google Scholar

[ref45] 45. Zhang Min-ling and Zhou Z, ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 2007. 40: p. 2038–2048.
View Article
Google Scholar

[168] View Article

[169] Google Scholar

[ref46] 46. E. Spyromitros, G. Tsoumakas and Vlahavas. I, An empirical study of lazy multilabel classification algorithms. Proc. 5th Hellenic Conference on Artificial Intelligence, 2008.

[ref47] 47. Tsoumakas Grigorios and Vlahavas I, Random k-Labelsets An Ensemble Method for Multilabel Classification. Pattern Recognition, 2007. 4701: p. 406–417.
View Article
Google Scholar

[172] View Article

[173] Google Scholar

[ref48] 48. Chou KC and Zhang CT, Prediction of Protein Structural Classes. Critical Reviews in Biochemistry and Molecular Biology, 1995. 30(4): p. 275–349. pmid:7587280
View Article
PubMed/NCBI
Google Scholar

[175] View Article

[176] PubMed/NCBI

[177] Google Scholar

[ref49] 49. Chou KC, Some remarks on protein attribute prediction and pseudo amino acid composition. Journal of Theoretical Biology, 2011. 273(1): p. 236–247. pmid:21168420
View Article
PubMed/NCBI
Google Scholar

[179] View Article

[180] PubMed/NCBI

[181] Google Scholar

[ref50] 50. Yuan LF, Ding C, Guo SH, Ding H, Chen W and Lin H, Prediction of the types of ion channel-targeted conotoxins based on radial basis function network. Toxicology in Vitro, 2013. 27(2): p. 852–856. pmid:23280100
View Article
PubMed/NCBI
Google Scholar

[183] View Article

[184] PubMed/NCBI

[185] Google Scholar

[ref51] 51. Lin H, The modified Mahalanobis Discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition. Journal of Theoretical Biology, 2008. 252(2): p. 350–356. pmid:18355838
View Article
PubMed/NCBI
Google Scholar

[187] View Article

[188] PubMed/NCBI

[189] Google Scholar

[ref52] 52. Lin H, Ding C, Song Q, Yang P, Ding H, Deng KJ, et al., The prediction of protein structural class using averaged chemical shifts. Journal of Biomolecular Structure & Dynamics, 2012. 29(6): p. 643–649.
View Article
Google Scholar

[191] View Article

[192] Google Scholar

[ref53] 53. Feng P, Chen W and Lin H, Identifying Antioxidant Proteins by Using Optimal Dipeptide Compositions. Interdiscip Sci, 2015.
View Article
Google Scholar

[194] View Article

[195] Google Scholar

[ref54] 54. Tang H, Chen W and Lin H, Identification of immunoglobulins using Chou's pseudo amino acid composition with feature selection technique. Molecular BioSystems, 2016.
View Article
Google Scholar

[197] View Article

[198] Google Scholar

[ref55] 55. Liu B, Chen J and Wang X, Application of Learning to Rank to protein remote homology detection. Bioinformatics, 2015. 31(21): p. 3492–3498. pmid:26163693
View Article
PubMed/NCBI
Google Scholar

[200] View Article

[201] PubMed/NCBI

[202] Google Scholar

[ref56] 56. Zou Q, Zeng J, Cao L and Ji R, A Novel Features Ranking Metric with Application to Scalable Visual and Bioinformatics Data Classification. Neurocomputing, 2016. 173: p. 346–354.
View Article
Google Scholar

[204] View Article

[205] Google Scholar

[ref57] 57. Wei L, Liao M, Gao Y, Ji R, He Z and Zou Q, Improved and Promising Identification of Human MicroRNAs by Incorporating a High-quality Negative Set. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2014. 11(1): p. 192–201 pmid:26355518
View Article
PubMed/NCBI
Google Scholar

[207] View Article

[208] PubMed/NCBI

[209] Google Scholar

[ref58] 58. Zeng X, Yuan S, Huang X and Zou Q, Identification of cytokine via an improved genetic algorithm. Frontiers of Computer Science, 2015. 9(4): p. 643–651.
View Article
Google Scholar

[211] View Article

[212] Google Scholar

[ref59] 59. Wang C, Hu L, Guo M, Liu X and Zou Q, imDC: an ensemble learning method for imbalanced classification with miRNA data. Genetics and Molecular Research, 2015. 14(1): p. 123–133. pmid:25729943
View Article
PubMed/NCBI
Google Scholar

[214] View Article

[215] PubMed/NCBI

[216] Google Scholar

[ref60] 60. Zou Q, Wang Z, Guan X, Liu B, Wu Y and Lin Z, An approach for identifying cytokines based on a novel ensemble classifier. BioMed research international, 2013. 2013(2013): p. 686090.
View Article
Google Scholar

[218] View Article

[219] Google Scholar

[ref61] 61. Lin C, Chen W, Qiu C, Wu Y, Krishnan S and Zou Q, LibD3C: Ensemble Classifiers with a Clustering and Dynamic Selection Strategy. Neurocomputing, 2014. 123: p. 424–435.
View Article
Google Scholar

[221] View Article

[222] Google Scholar

[ref62] 62. Zou Q, Li J, Song L, Zeng X and Wang G, Similarity computation strategies in the microRNA-disease network: A Survey. Briefings in Functional Genomics, 2016. 15(1): p. 55–64. pmid:26134276
View Article
PubMed/NCBI
Google Scholar

[224] View Article

[225] PubMed/NCBI

[226] Google Scholar

[ref63] 63. Zeng X, Zhang X and Zou Q, Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks. Briefings in Bioinformatics, 2016. 17(2): p. 193–203. pmid:26059461
View Article
PubMed/NCBI
Google Scholar

[228] View Article

[229] PubMed/NCBI

[230] Google Scholar

[ref64] 64. Zeng X, Liao Y, Liu Y and Zou Q, Prediction and validation of disease genes using HeteSim Scores. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2016.
View Article
Google Scholar

[232] View Article

[233] Google Scholar

[ref65] 65. Zou Q, Xie S, Lin Z, Wu M and Ju Y, Finding the best classification threshold in imbalanced classification. Big Data Research, 2016.
View Article
Google Scholar

[235] View Article

[236] Google Scholar

[ref66] 66. Song L, Li D, Zeng X, Wu Y, Guo L and Zou Q, nDNA-prot: Identification of DNA-binding Proteins Based on Unbalanced Classification. BMC Bioinformatics, 2014. 15: p. 298. pmid:25196432
View Article
PubMed/NCBI
Google Scholar

[238] View Article

[239] PubMed/NCBI

[240] Google Scholar

[ref67] 67. Zhang Xingyi, T Y, Cheng Ran, Jin Yaochu, An efficient approach to non-dominated sorting for evolutionary multi-objective optimization. IEEE Transactions on Evolutionary Computation, 2015. 19(2): p. 201–213.
View Article
Google Scholar

[242] View Article

[243] Google Scholar

[ref68] 68. Zhang Xingyi, T Y, Jin Yaochu, A knee point driven evolutionary algorithm for many-objective optimization. IEEE Transactions on Evolutionary Computation, 2015. 19(6): p. 761–776.
View Article
Google Scholar

[245] View Article

[246] Google Scholar

[ref69] 69. Zeng Xiangxiang, P L, Pérez-Jiménez Mario J., Small Universal Simple Spiking Neural P System with Weights. SCIENCE CHINA: Information Science, 2014. 57(9): p. 92–102.
View Article
Google Scholar

[248] View Article

[249] Google Scholar

[ref70] 70. Zeng Xiangxiang, Z X, Song Tao, Pan Linqiang, Spiking Neural P Systems with Thresholds. Neural Computation, 2014. 26(7): p. 1340–1361. pmid:24708366
View Article
PubMed/NCBI
Google Scholar

[251] View Article

[252] PubMed/NCBI

[253] Google Scholar

[ref71] 71. Zhang Xingyi, P L, Pǎun Andrei, On universality of axon P systems. IEEE Transactions on Neural Networks and Learning Systems, 2015. 26(11): p. 2816–2829.
View Article
Google Scholar

[255] View Article

[256] Google Scholar

[ref72] 72. Zhang Xingyi, Z X, Luo Bin, Pan Linqiang, On some classes of sequential spiking neural P systems. Neural Computation, 2014. 26(5): p. 974–997. pmid:24555456
View Article
PubMed/NCBI
Google Scholar

[258] View Article

[259] PubMed/NCBI

[260] Google Scholar

[ref73] 73. Zhang Xingyi, L Y, Luo Bin, Pan Linqiang, Computational power of tissue P systems for generating control languages. Information Sciences, 2014. 278(10): p. 285–297.
View Article
Google Scholar

[262] View Article

[263] Google Scholar

[ref74] 74. Chen Xu, P-J MJ, Valencia-Cabrera Luis, Wang Beizhan, Zeng Xiangxiang, Computing with viruses. Theoretical Computer Science, 2015.
View Article
Google Scholar

[265] View Article

[266] Google Scholar

[ref75] 75. Song Tao, P L, On the Universality and Non-universality of Spiking Neural P Systems with Rules on Synapses. IEEE Trans on Nanobioscience, 2015.
View Article
Google Scholar

[268] View Article

[269] Google Scholar

[ref76] 76. Song Tao, X J, Pan Linaqing, Spiking Neural P Systems with Request Rules. Neurocomputing, 2016.
View Article
Google Scholar

[271] View Article

[272] Google Scholar

[ref77] 77. Zou Q, Li X, Jiang W, Lin Z, Li G and Chen K, Survey of MapReduce Frame Operation in Bioinformatics. Briefings in Bioinformatics, 2014. 15(4): p. 637–647. pmid:23396756
View Article
PubMed/NCBI
Google Scholar

[274] View Article

[275] PubMed/NCBI

[276] Google Scholar

[ref78] 78. Zou Q, Hu Q, Guo M and Wang G, HAlign: Fast Multiple Similar DNA/RNA Sequence Alignment Based on the Centre Star Strategy. Bioinformatics, 2015. 31(15): p. 2475–2481. pmid:25812743
View Article
PubMed/NCBI
Google Scholar

[278] View Article

[279] PubMed/NCBI

[280] Google Scholar

Figures

Abstract

Introduction

Method

Data preprocessing

Feature extraction algorithm

Position-specific scoring matrix.

ACC feature representation algorithm.

Classifier selection and tools

KNN algorithm.

WEKA and MULAN.

Measurement

Single-label measurement.

Multi-label measurement.

Multi-label classification ensemble algorithm.

Result and Discussion

Monofunctional enzyme classification

Multifunctional enzyme classification

Conclusion

Author Contributions

References