Conceived and designed the experiments: YDC LLH TH. Performed the experiments: LLH CC. Analyzed the data: CC TH YDC KC. Contributed reagents/materials/analysis tools: LLH YDC. Wrote the paper: LLH YDC TH CC KC.
The authors have declared that no competing interests exist.
Given a compound, how can we effectively predict its biological function? It is a fundamentally important problem because the information thus obtained may benefit the understanding of many basic biological processes and provide useful clues for drug design. In this study, based on the information of chemical-chemical interactions, a novel method was developed that can be used to identify which of the following eleven metabolic pathway classes a query compound may be involved with: (1) Carbohydrate Metabolism, (2) Energy Metabolism, (3) Lipid Metabolism, (4) Nucleotide Metabolism, (5) Amino Acid Metabolism, (6) Metabolism of Other Amino Acids, (7) Glycan Biosynthesis and Metabolism, (8) Metabolism of Cofactors and Vitamins, (9) Metabolism of Terpenoids and Polyketides, (10) Biosynthesis of Other Secondary Metabolites, (11) Xenobiotics Biodegradation and Metabolism. It was observed that the overall success rate obtained by the method via the 5-fold cross-validation test on a benchmark dataset consisting of 3,137 compounds was 77.97%, which is much higher than 10.45%, the corresponding success rate obtained by the random guesses. Besides, to deal with the situation that some compounds may be involved with more than one metabolic pathway class, the method presented here is featured by the capacity able to provide a series of potential metabolic pathway classes ranked according to the descending order of their likelihood for each of the query compounds concerned. Furthermore, our method was also applied to predict 5,549 compounds whose metabolic pathway classes are unknown. Interestingly, the results thus obtained are quite consistent with the deductions from the reports by other investigators. It is anticipated that, with the continuous increase of the chemical-chemical interaction data, the current method will be further enhanced in its power and accuracy, so as to become a useful complementary vehicle in annotating uncharacterized compounds for their biological functions.
Metabolism refers to a collection of chemical reactions in vivo, which keep an unceasing supply of matter and energy for living organisms to maintain life (e.g., growth and reproduction)
Besides the conventional biochemical experiments, computational methods are alternative ways to annotate the biological functions of compounds. In recent years, various bioinformatics and structural bioinformatics
Recently, the systems biology methods based on protein-protein interactions have been widely applied for predicting protein attributes
In this study, we proposed a multi-target model based on chemical-chemical interactions for predicting the metabolic pathways where compounds participate in. Our method sorts the possible metabolic pathways that are associated with the query chemical, providing a more comprehensive view of the biological effects of the compound.
According to a recent comprehensive review
The compounds were retrieved from public available database KEGG
Class code | Metabolic Pathway | Number of different compounds | |
Group-I | Group-II | ||
4,366 | 3,137 | ||
1 | Carbohydrate Metabolism | 444 | 394 |
2 | Energy Metabolism | 129 | 120 |
3 | Lipid Metabolism | 610 | 383 |
4 | Nucleotide Metabolism | 145 | 132 |
5 | Amino Acid Metabolism | 563 | 483 |
6 | Metabolism of Other Amino Acids | 212 | 154 |
7 | Glycan Biosynthesis and Metabolism | 68 | 43 |
8 | Metabolism of Cofactors and Vitamins | 396 | 309 |
9 | Metabolism of Terpenoids and Polyketides | 713 | 499 |
10 | Biosynthesis of Other Secondary Metabolites | 722 | 519 |
11 | Xenobiotics Biodegradation and Metabolism | 858 | 570 |
Overall | 4,860 | 3,606 |
The 4,366 compounds in Group-I were screened from KEGG by selecting the compounds with the metabolic pathway information. The 3,137 compounds in Group-II were those retrieved from the 4,366 compounds that can interact with any other as annotated by STITCH database. Note that since a compound may occur in more than one pathway class, the sum of the compounds over the 11 pathway classes is greater than the number of different compounds for the cases of both Group-I and Group-II.
Of the 4,366 compounds of Group-I, 4,027 are involved in only one metabolic pathway class, 246 in two metabolic pathway classes, 54 in three metabolic pathway classes, 24 in four metabolic pathway classes, 9 in five metabolic pathway classes, 4 in six metabolic pathway classes, 2 in seven metabolic pathway classes, and none in eight or more metabolic pathway classes. Of the 3,137 compounds of Group-II, 2,820 are involved in only one metabolic pathway class, 226 in two metabolic pathway classes, 53 in three metabolic pathway classes, 23 in four metabolic pathway classes, 9 in five metabolic pathway classes, 4 in six metabolic pathway classes, 2 in seven metabolic pathway classes, and none in eight or more metabolic pathway classes.
Note that since one compound may occur in more than one pathway class, the sum of the compounds over the 11 pathway classes in Group-I turns out to be 4,860, which is greater than 4,366. Likewise, the sum of the compounds over the 11 pathway classes in Group-II is 3,606, which is greater than 3,137. This is quite similar to the case of proteins with multiple location sites, as elaborated in
The chemicals interactions were retrieved from STITCH
Besides the 4,366 compounds (cf.
As mentioned in
Suppose the training dataset contains
In statistical prediction, the following three cross-validation methods are often used to examine a predictor for its effectiveness in practical application: independent dataset test, subsampling (such as 5-fold, 7-fold, or 10-fold cross-validation) test, and jackknife test
For the
In the dataset, the average number of metabolic pathway class that each compound is involved in is calculated as
Given a query compound, according to the information of its interactions with the 4,366 compounds in Group-I (
In this study, our method was evaluated by the 5-fold cross-validation on the benchmark dataset that contains 3,137 compounds in Group-II of
It can be seen from the figure that from the first order to the last one, the 11 accuracies form a download-slope curve.
The average number of metabolic pathway classes with which each compound is involved is 1.15 (cf. Eq. 8), meaning that the average success rate by a random guess would be 1.15/11 = 10.45%, which is much lower than that by our method.
Accordingly, the parameter
Listed in
Class code | Accuracy (%) predicted by each order | ||||||||||
1st | 2nd | 3rd | 4th | 5th | 6th | 7th | 8th | 9th | 10th | 11th | |
1 | 80.96 | 8.38 | 5.08 | 1.78 | 1.02 | 1.27 | 0.25 | 1.02 | 0.00 | 0.25 | 0.00 |
2 | 31.67 | 30.00 | 18.33 | 7.50 | 4.17 | 3.33 | 0.00 | 3.33 | 0.00 | 0.83 | 0.83 |
3 | 73.89 | 6.27 | 6.27 | 4.44 | 1.57 | 2.61 | 2.35 | 0.52 | 0.52 | 1.31 | 0.26 |
4 | 65.15 | 11.36 | 6.82 | 5.30 | 3.03 | 1.52 | 2.27 | 0.00 | 4.55 | 0.00 | 0.00 |
5 | 61.70 | 19.88 | 10.97 | 5.38 | 1.04 | 0.83 | 0.21 | 0.00 | 0.00 | 0.00 | 0.00 |
6 | 29.87 | 27.27 | 11.69 | 11.69 | 5.19 | 5.19 | 3.90 | 3.25 | 1.95 | 0.00 | 0.00 |
7 | 20.93 | 25.58 | 11.63 | 9.30 | 6.98 | 4.65 | 2.33 | 4.65 | 0.00 | 2.33 | 11.63 |
8 | 61.17 | 17.15 | 9.39 | 4.21 | 3.24 | 2.91 | 0.97 | 0.32 | 0.32 | 0.32 | 0.00 |
9 | 74.35 | 8.42 | 4.01 | 3.41 | 1.20 | 1.20 | 2.20 | 2.40 | 1.00 | 1.20 | 0.60 |
10 | 68.98 | 8.67 | 4.62 | 3.66 | 5.01 | 4.24 | 1.54 | 1.54 | 0.77 | 0.58 | 0.39 |
11 | 78.77 | 8.42 | 5.09 | 2.81 | 2.63 | 1.40 | 0.35 | 0.35 | 0.18 | 0.00 | 0.00 |
Overall | 77.97 | 14.19 | 8.07 | 4.88 | 2.93 | 2.55 | 1.43 | 1.28 | 0.70 | 0.57 | 0.38 |
See
As stated in the Method section, the interactive compounds derived from STITCH tend to participate in the same metabolic pathways. For example,
KEGG ligand | Name | Code of Metabolic pathway class | KEGG ligand | Name | Code of Metabolic pathway class | Interaction confidence |
C00429 | Dihydrouracil | 4, 6, 8 | C00106 | Uracil | 4, 6, 8 | 0.981 |
C00429 | Dihydrouracil | 4, 6, 8 | C02642 | N-carbamoyl-be. | 4, 6, 8 | 0.945 |
C00429 | Dihydrouracil | 4, 6, 8 | C00006 | NADP | 2, 6, 8 | 0.921 |
C00429 | Dihydrouracil | 4, 6, 8 | C00005 | NADP(H) | 2, 6 | 0.902 |
C00429 | Dihydrouracil | 4, 6, 8 | C00001 | Hydroxyl radic. | 2, 8 | 0.899 |
C00429 | Dihydrouracil | 4, 6, 8 | C00013 | Pyrophosphate | 2 | 0.899 |
C00429 | Dihydrouracil | 4, 6, 8 | C00119 | Phosphoribosyl. | 1, 4, 5 | 0.899 |
C00429 | Dihydrouracil | 4, 6, 8 | C00906 | Dihydrothymine | 4 | 0.855 |
C00429 | Dihydrouracil | 4, 6, 8 | C00178 | Thymine | 4 | 0.814 |
C00429 | Dihydrouracil | 4, 6, 8 | C07649 | 5-fluorouracil | 11 | 0.744 |
C00429 | Dihydrouracil | 4, 6, 8 | C00099 | Beta-alanine | 1, 4, 6, 8 | 0.650 |
C00429 | Dihydrouracil | 4, 6, 8 | C00380 | Cytosine | 4 | 0.551 |
C00429 | Dihydrouracil | 4, 6, 8 | C00262 | Hypoxanthine | 4 | 0.436 |
C00429 | Dihydrouracil | 4, 6, 8 | C00299 | Uridine | 4 | 0.433 |
C00429 | Dihydrouracil | 4, 6, 8 | C00295 | Orotic acid | 4 | 0.386 |
C00429 | Dihydrouracil | 4, 6, 8 | C05145 | Beta-aminoisob. | 4 | 0.362 |
C00429 | Dihydrouracil | 4, 6, 8 | C02067 | Pseudouridine | 4 | 0.353 |
C00429 | Dihydrouracil | 4, 6, 8 | C00881 | Deoxycytidine | 4 | 0.350 |
C00429 | Dihydrouracil | 4, 6, 8 | C00147 | Adenine | 4, 9 | 0.308 |
C00429 | Dihydrouracil | 4, 6, 8 | C05100 | Beta-ureidoiso. | 4 | 0.286 |
C00429 | Dihydrouracil | 4, 6, 8 | C03056 | 2,6-dihydroxyp. | 8 | 0.274 |
C00429 | Dihydrouracil | 4, 6, 8 | C02565 | N-methylhydant. | 5 | 0.272 |
C00429 | Dihydrouracil | 4, 6, 8 | C00337 | Dihydroorotate | 4 | 0.262 |
C00429 | Dihydrouracil | 4, 6, 8 | C00757 | Berberine | 10 | 0.252 |
C00429 | Dihydrouracil | 4, 6, 8 | C00222 | Malonate semia. | 1, 11, 6 | 0.218 |
C00429 | Dihydrouracil | 4, 6, 8 | C12650 | Capecitabine | 11 | 0.214 |
C00429 | Dihydrouracil | 4, 6, 8 | C12673 | Tegafur | 11 | 0.210 |
C00429 | Dihydrouracil | 4, 6, 8 | C00522 | Pantoate | 8 | 0.207 |
C00429 | Dihydrouracil | 4, 6, 8 | C00864 | Pantothenic ac. | 6, 8 | 0.205 |
C00429 | Dihydrouracil | 4, 6, 8 | C11736 | FUdR | 11 | 0.199 |
C00429 | Dihydrouracil | 4, 6, 8 | C00366 | Uric acid | 4 | 0.167 |
C00429 | Dihydrouracil | 4, 6, 8 | C00219 | Arachidonic ac. | 3 | 0.154 |
See
Encouraged by the quite promising results obtained by the 5-fold cross-validation test on the benchmark dataset of the 3,137 compounds, we applied the method to the 5,549 compounds whose metabolic pathways are unknown as mentioned in the
KEGG Ligand | Name | KEGG Ligand | Name | Code of Metabolic pathway class | Interaction Confidence |
C16265 | N-acetylgalactosamine 4-sulfate | C00059 | sulfate | 2, 4, 5 | 0.956 |
C16265 | N-acetylgalactosamine 4-sulfate | C00333 | glucuronic acid | 1 | 0.931 |
C16265 | N-acetylgalactosamine 4-sulfate | C15923 | galactose | 1 | 0.904 |
C16265 | N-acetylgalactosamine 4-sulfate | C01508 | xylose | 1 | 0.9 |
C16265 | N-acetylgalactosamine 4-sulfate | C01721 | fucose | 1 | 0.899 |
C16265 | N-acetylgalactosamine 4-sulfate | C01330 | Na(+) | 2 | 0.899 |
C16265 | N-acetylgalactosamine 4-sulfate | C00116 | glycerol | 1, 3 | 0.899 |
C16265 | N-acetylgalactosamine 4-sulfate | C00009 | phosphate | 2, 7 | 0.899 |
C16265 | N-acetylgalactosamine 4-sulfate | C00053 | 3′-phospho.pho. | 2, 4, 7 | 0.47 |
C16265 | N-acetylgalactosamine 4-sulfate | C02591 | sugar-1-phosph. | 1 | 0.312 |
C16265 | N-acetylgalactosamine 4-sulfate | C01170 | UDP-GlcNAc | 1 | 0.27 |
C16265 | N-acetylgalactosamine 4-sulfate | C03506 | indole-3-glyce. | 10, 5 | 0.256 |
C16265 | N-acetylgalactosamine 4-sulfate | C01132 | N-acetyl-D-glucosamine | 1 | 0.235 |
C16265 | N-acetylgalactosamine 4-sulfate | C00096 | GDP-mannose | 1, 7 | 0.183 |
C14150 | cyclopropylamine | C06554 | cyanuric acid | 11 | 0.918 |
C14150 | cyclopropylamine | C00014 | ammonia | 2, 4, 5, 6 | 0.907 |
C14150 | cyclopropylamine | C14149 | N-cyclopropylammelide | 11 | 0.899 |
C14150 | cyclopropylamine | C14148 | c0761 | 11 | 0.899 |
C14150 | cyclopropylamine | C00001 | hydroxyl radicals | 2, 8 | 0.899 |
C14150 | cyclopropylamine | C00969 | reuterin | 3 | 0.378 |
C14150 | cyclopropylamine | C06547 | polyethylene | 11, 5 | 0.347 |
C14150 | cyclopropylamine | C01234 | 1-aminocyclopropane-1-carboxylic acid | 1, 5 | 0.29 |
C14150 | cyclopropylamine | C00218 | methylamine | 2 | 0.29 |
C14150 | cyclopropylamine | C16267 | cyclopropanecarboxylic acid | 11 | 0.286 |
C14150 | cyclopropylamine | C16318 | methyl jasmonate | 3 | 0.273 |
C14150 | cyclopropylamine | C11512 | methyl jasmonate | 3 | 0.273 |
C14150 | cyclopropylamine | C05593 | 3-hydroxyphenylacetic acid | 11, 5 | 0.267 |
C14150 | cyclopropylamine | C00261 | benzaldehyde | 11 | 0.238 |
C14150 | cyclopropylamine | C01054 | 2,3-oxidosqualene | 3 | 0.236 |
C14150 | cyclopropylamine | C01013 | 3-hydroxypropionate | 1, 6 | 0.224 |
C14150 | cyclopropylamine | C01471 | acrolein | 11 | 0.208 |
C14150 | cyclopropylamine | C01746 | calcium channel blocker | 10 | 0.205 |
C14150 | cyclopropylamine | C00571 | cyclohexylamine | 11 | 0.205 |
C14150 | cyclopropylamine | C00144 | guanosine monophosphate | 4 | 0.205 |
C14150 | cyclopropylamine | C00903 | cinnamaldehyde | 10 | 0.171 |
C14150 | cyclopropylamine | C07113 | acetophenone | 11 | 0.168 |
C14150 | cyclopropylamine | C01724 | lanosterol | 3 | 0.152 |
See
As indicated by the above discussion and analysis, the results derived from the 1st and 2nd order predictions should be considered as the candidates for the metabolic pathway classes with which the query compound may be involved. In view of this, biochemical experiments should be conducted by mainly focusing on the targets predicted by the 1st and 2nd order predictions. The results obtained by the last five order predictions can be ignored due to their very low likelihood (<2%). Consequently, the current prediction method can provide useful clues for further validation by experiments and expedite the research progress by prioritizing the targets concerned.
It is instructive to note that for the 4,366 compounds in Group-I of
Based on the chemical-chemical interactions information, a multi-target model was proposed for identifying the metabolic pathway classes with which a query compound is involved. Since some compounds may be involved with more than one metabolic pathway class, our method is featured by the capacity able to provide a series of potential metabolic pathway classes for each of the query compounds investigated, instead of only one metabolic pathway class. It is anticipated that our method may become a useful tool in helping annotate the compound for their biological functions.
Each order predicted metabolic pathway class for the collected 5,549 compounds without known metabolic pathway classes. The predicted metabolic pathway class code corresponds to the code in
(PDF)
The authors are very much indebted to the two anonymous reviewers for their constructive comments, which were very helpful for strengthening the presentation of this paper. Many thanks are also to KEGG and STITCH for providing data to support the current study.