Natural formulas and the nature of formulas: Exploring potential therapeutic targets based on traditional Chinese herbal formulas

By comparing the target proteins (TPs) of classic traditional Chinese medicine (TCM) herbal formulas and modern drugs used for treating coronary artery disease (CAD), this study aimed to identify potential therapeutic TPs for treating CAD. Based on the theory of TCM, the Xuefu-Zhuyu decoction (XZD) and Gualou-Xiebai-Banxia decoction (GXBD), both of which are classic herbal formulas, were selected for treating CAD. Data on the chemical ingredients and corresponding TPs of the herbs in these two formulas and data on modern drugs approved for treating CAD and related TPs were retrieved from professional TCM and bioinformatics databases. Based on the associations between the drugs or ingredients and their TPs, the TP networks of XZD, GXBD, and modern drugs approved for treating CAD were constructed separately and then integrated to create a complex master network in which the vertices represent the TPs and the edges, the ingredients or drugs that are linked to the TPs. The reliability of this master network was validated through statistical tests. The common TPs of the two herbal formulas have a higher possibility of being targeted by modern drugs in comparison with the formula-specific TPs. A total of 114 common XZD and GXBD TPs that are not yet the target of modern drugs used for treating CAD should be experimentally investigated as potential therapeutic targets for treating CAD. Among these TPs, the top 10 are NOS3, PTPN1, GABRA1, PRKACA, CDK2, MAOB, ESR1, ADH1C, ADH1B, and AKR1B1. The results of this study provide a valuable reference for further experimental investigations of therapeutic targets for CAD. The established method shows promise for searching for potential therapeutic TPs based on herbal formulas. It is crucial for this work to select beneficial therapeutic targets of TCM, typical TCM syndromes, and corresponding classic formulas.


Introduction
Traditional Chinese medicine (TCM) is rooted in thousands of years of history and is one of the forms of alternative medicine endorsed by the World Health Organization [1]. In the last two decades, an increasing number of people worldwide have used TCM, especially for managing chronic diseases [2][3][4]. Unlike Western medicine, which tends to see disease in terms of the body part that presents symptoms [5][6][7], TCM emphasizes a holistic and systemic approach that treats the organism and views disease not as an independently pathological progress but as an imbalance of the body induced by multiple external or internal factors between opposite physiological functions symbolically described as ying/yang, deficiency/ excess, cold/heat, etc. [8,9].
Chinese herbal medicine (CM) is another important modality of TCM for restoring the body's balance and preventing or treating illness. In practice, multiple-herb formulas (TCM formulas), instead of a single herb, are more commonly used to achieve optimal therapeutic efficacy [10]. The interactions between the components of different CMs are thought to produce certain competitive and/or synergistic effects on multiple target proteins (TPs), thus improving the pharmacological activities and/or reducing the adverse clinical reactions caused by some individual herbs [11,12]. As the therapeutic TPs of TCM formulas are abundant but have been investigated on a limited basis, identifying potential therapeutic TPs based on CM formulas and further elucidating their beneficial effect on the treatment of diseases is important [13].
However, studies on TCM are always controversial in terms of their abstract theory, unclear basis, complex interactions between various ingredients and complex interactive biological systems, and inadequate quality control. With the limited rigorous scientific evidence of its effectiveness, TCM can be difficult for researchers to study because its treatments are often complex and are based on ideas very different from those of modern Western medicine.
Network pharmacology integrates information from bioinformatics, systems biology, and polypharmacology and provides a platform for integrating multiple components and interactions underlying cell, organ, and organism processes in health and disease [14]. This novel method challenges the traditional paradigm of single target drug discovery and explores multiple interactions among "genes, drugs, [and] diseases" from a global perspective [6]. A holistic system mapped by network pharmacology highlights the synthetic effects of multiple drugs on biological networks and relevant diseases [15,16]. Moreover, network pharmacology offers a new strategy for promoting a significant transformation in strategies for therapies and novel drug discovery [13,17].
Coinciding with the holistic and systemic characteristics of TCM, network pharmacology is expected to bridge the gap between TCM and modern medicine [18,19]. The established network will not only help improve our understanding of the chemical-activity relationship of CMs but also facilitate the identification of their main active ingredients and therapeutic TPs [20]. A network-based approach has been widely used to discover drug candidates from herbal products [21], predict potential therapeutic TPs of CMs or herbal prescriptions [22,23], understand the biological significance of TCM syndromes in the differentiation of disease [24], and illuminate the mechanisms of the global functional regulation of CMs [25,26].
To explore the use of the network-based approach to identify potential therapeutic TPs from TCM formulas, this study focuses on coronary artery disease (CAD), as CAD is now the leading cause of mortality worldwide [27], and the biological pathways and therapeutic TPs for treating CAD have been widely investigated and partially identified [28]. Based on TCM diagnostic methods, CAD can be clinically differentiated as various syndromes, such as blood stasis, phlegm turbidity, qi stagnation, cold coagulation, yin deficiency, yang deficiency, and qi deficiency [29]. Among these syndromes, blood stasis and phlegm turbidity are the two major CAD syndromes [30,31]. From a practical standpoint, numerous TCM formulas with different compositions have been demonstrated as being safe and effective for treating CAD [24]. Therefore, we hypothesized that the chemicals in these formulas might focus on similar and crucial TPs that play important roles at the molecular level.
In TCM clinical practice, physicians usually follow the "one classic formula for one typical syndrome" principle. In this case, the Xuefu-Zhuyu decoction (XZD) and Gualou-Xiebai-Banxia decoction (GXBD) are two classic formulas used for treating blood stasis and phlegm turbidity, respectively [21,32]. Based on a nationwide expert survey on the application of TCM in different clinical classifications of CAD, XZD and GXBD were listed as the two most widely used formulas in TCM clinical therapy for treating angina pectoris and acute myocardial infarction [33]. Meanwhile, experimental evidence of their safety and efficacy has been widely reported in existing literature [34][35][36][37][38][39][40][41]. For instance, a nuclear magnetic resonance (NMR)based metabolomics study demonstrated that XZD could effectively ameliorate the symptoms of hyperlipidemia on a global scale and regulate the metabolic state to a near-normal level in a time-dependent pattern [36]. A pharmacological research showed that GXBD could effectively prevent the elevation of segment ST and myocardial damage in rats with acute myocardial ischemia with confidence level of 95% [40]. A meta-analysis further validated the clinical efficacy of GXBD for treating unstable angina [41]. Thus, by using the two herbal formulas as an example, the study constructs biologically meaningful networks to elucidate the target associations based on the components and further identifies potential TPs for treating CAD.

Material and methods Data
XZD and GXBD are two typical herbal formulas used in the TCM treatment of CAD. XZD comprises 11 CMs: Radix Bupleuri, Radix Angelicae Sinensis, Radix Rehmanniae, Radix Paeoniae Rubra, Flos Carthami, Semen Persicae, Fructus Aurantii, Radix et Rhizoma Glycyrrhizae, Rhizoma Chuanxiong, Radix Achyranthis Bidentatae, and Radix Platycodonis [32,35]. GXBD comprises three CMs: Fructus Trichosanthis, Bulbus Allii Macrostemonis, and Rhizome Pinelliae [32]. The chemical constituents of these 14 CMs were retrieved from a traditional Chinese medicine system pharmacology database (TCMSP) and a traditional Chinese medicine integrated database (TCMID) [42,43]; the CMs' corresponding TPs were retrieved from TCMSP. In addition, 240 TPs of drugs approved for use for treating CAD by the US Food and Drug Administration (FDA) [44,45] were collected from Drugbank and the Kyoto Encyclopedia of Genes and Genomes (KEGG), two professional bioinformatics databases [46,47]. The similarity of XZD and GXBD was assessed at the molecular level using the Jaccard index (JI), which measures the similarity of finite sample sets and is calculated as the size of the intersection divided by the size of the union of the sample sets. The value of JI is between 0 and 1, and a higher value of JI implies greater similarity [48].

Target network construction
A TCM formula comprises different CMs and contains various chemical ingredients. Some of these ingredients can act with different functional proteins in vivo, thus triggering a synergistic response [49,50]. With the information obtained from the databases, a bipartite network (drug-target [DT] network) can be constructed based on the interactions between the ingredients and the relevant protein targets. Subsequently, the DT network can be transformed into two biologically relevant network projections (drug network and TP network) through matrix algebra [51]. The nodes in the drug network represent drugs or ingredients, and two nodes are connected to each other if they share at least one TP; the nodes in the TP network are proteins, and two proteins are connected if they are both targeted by at least one common drug or ingredient [15].
The TP network graph G = (V, E) is a combination of V and E, where V is a set of vertices that represent the TPs and E is a set of edges, which means at least one chemical ingredient links v i to v j [15]. The TP networks of XZD and GXBD were visualized and analyzed using the software package Gephi [52].

Centrality analysis
According to the principle of graph theory, the significance of vertices can be measured and expressed using centrality. Centrality indicators identify the most important vertices within the graph [53][54][55]. In this study, three centrality measurements (degree, betweenness, and closeness) were adopted to assess different aspects of the positions of the TPs in the TP network, where vertices represent proteins and edges, drugs/ingredients. Degree centrality shows the number of drugs/ingredients associated with a TP. Betweenness centrality measures how often a protein as an intermediary appears on the shortest path between two proteins. An intermediary with high betweenness functions as a "gatekeeper" to control the flow of interactions in the network [56]; in other words, this protein plays a critical role in intermediating other proteins in terms of the investigated drugs/ingredients in this study. However, a highbetweenness protein need not necessarily be one with a high degree centrality. The closeness centrality of a protein is the total geodesic distance between a protein and all other proteins; it can be defined as how close a protein is to all others. A lower closeness value indicates that it is a more central protein [55,57]. Three centrality indicators were calculated using the software package Gephi [52,58].

Statistical analysis
On the basis of the above data, target network, and centrality analysis, a network-based approach can be employed to elucidate complex associations between targets and to estimate potential targets after passing statistical validation. For quality control, a random simulation should be performed to see whether the results are significant. In the constructed network of targets, various statistical tests will be conducted to examine the associations between variables, especially variables generated from independent data sources, for instance, target modules from different formulae, centrality indicators, and the novelty relative to existing drug targets. According to the types of variables, this study will adopt various appropriate statistical testing approaches, for example, chi-squared test, t-test, one-way analysis of variance (ANOVA), and Pearson correlation test [59,60].
As summarized in Table 1, 787 ingredients in XZD and 179 ingredients in GXBD were found to be associated with 214 and 178 therapeutic TPs, respectively (S4 and S5 Tables). About the similarity reflected by the JI, only a few ingredients between XZD and GXBD were mutual with a low value of 6.62%, whereas the therapeutic TPs for the two formulas highly overlapped with a much higher value of 71.93% in total and 80.65% in the drugs' TPs for treating CAD. The results suggested that XZD and GXBD may act on similar protein targets at the biomolecular level to generate common therapeutic effects, including treating CAD, whereas the formulas' chemical compositions were substantially different.

Target protein network
The XZD and GXBD TP networks were constructed separately and then integrated. As illustrated in Fig 1A, the TP networks of XZD, GXBD, and drugs used for treating CAD (S6 Table) were constructed separately with 7664 edges with 214 vertices, 6374 edges with 178 vertices, and 1418 edges with 240 vertices, respectively, and then integrated to create a complex master network. The blue vertices indicate formula-based TPs that have not been targeted by drugs used for treating CAD, whereas the red vertices are the drug targets and are labeled when they overlap with herbal formula TPs. The blue, orange, yellow, and purple edges denote XZD-specific ingredients, GXBD-specific ingredients, ingredients with common XZD and GXBD TPs, and drugs used for treating CAD, respectively. The distribution of the TPs in the network is summarized and illustrated in Fig 1B. The colored areas in Fig 1B represent the 114 common XZD and GXBD TPs that have not been targeted by modern drugs used for treating CAD, 50 common XZD and GXBD TPs targeted by modern drugs used for treating CAD, 14 GXBDspecific TPs, 38 XZD-specific TPs that have not been targeted by drugs used for treating CAD, 12 XZD-specific TPs targeted by drugs used for treating CAD, and 178 TPs specific to drugs used for treating CAD. To uncover the potential CAD-related therapeutic TPs of XZD and GXBD, 228 TPs of the herbal formulas were analyzed. Fig 1 shows the ingredients (edges) and corresponding TPs (vertices) of the two herbal formulas and the modern drugs approved for treating CAD. By comparing the similarities and differences of the TPs in this master network between the two representative herbal formulas and modern drugs used for treating CAD, this study expects to find potential TPs for treating CAD. To ensure the reliability of the screening in silico of the therapeutic TPs, the master network must first be validated statistically.

Network analysis
As shown in Fig 1B, a total of 228 therapeutic TPs of XZD and GXBD are divided into five modules (yellow, white, orange, blue, and dark purple parts) according to their interactions with the TCM formulas and drugs. The centrality indicators identify the important vertices within the network. A dummy variable, the TP of a drug used for treating CAD (0: no; 1: yes), is used to reflect whether a protein is targeted by modern drugs approved for treating CAD.
According to the types of variables, various statistical tests were conducted to examine the associations between the variables (i.e., network modules, three centralities, and drug target for treating CAD). The chi-square test analyzed the association between the dummy variable "drug target proteins" and the categorical variable "modules"; the independent samples t-test, that between "drug target proteins" and "centralities"; one-way ANOVA, that between "modules" and "centralities"; and Pearson correlation tests, those between the three "centralities" indicators. As a result, all of the above variables show significant associations at 95% confidence level. These statistical results further imply that the master network is not random but reflects, to some extent, chemical or biological significance. The details are discussed below.
First, in terms of the aim of this study, whether the GXBD-specific target proteins, XZDspecific target proteins, and common XZD and GXBD target proteins are significantly different from the TPs of modern drugs used for treating CAD should be validated. In other words, the percentages of the TPs of modern drugs used for treating CAD should be tested among the three samples generated from the combinations of the five modules. The differences in the percentages of the TPs of modern drugs used for treating CAD among specific and common parts The blue vertices indicate formula-based target proteins that have not been targeted by drugs used for treating CAD, and the red vertices indicate the drug targets and are labeled when they overlap with herbal formula target proteins. A colored edge indicates a drug or compound linked to two target proteins (blue: XZD-specific edges; orange: GXBD-specific edges; yellow: overlapped edges between XZD and GXBD; and purple: edges associated with drugs approved for use for treating CAD). (B) The distribution of different target proteins in the network (yellow: common XZD and GXBD target proteins that have not been targeted by modern drugs used for treating CAD; white: common XZD and GXBD target proteins targeted by modern drugs used for treating CAD; orange: GXBD-specific target proteins; blue: XZD-specific target proteins that have not been targeted by drugs used for treating CAD; dark purple: XZD-specific proteins targeted by drugs used for treating CAD; and light purple: target proteins specific to drugs used for treating CAD). The numbers in parentheses represent the number of target proteins in each specific set. of two formulas were statistically significant at the 0.05 level with a chi-square value of 6.386 and p value of 0.041. A contingency table for the chi-squared test is shown in Table 2.
Furthermore, the TPs of the FDA-approved drugs for treating CAD in the master network played more crucial roles than the others with t values (p values) of independent samples tests of degree, closeness, and betweeness between drug targets and non-drug targets being -2.783 (0.007), 3.501 (0.001), and -5.514 (<0.001), respectively. A one-way ANOVA also indicated that the degree was significantly different among the five target modules with F value (p value) of 24.889 (<0.001), and another two one-way ANOVA tests for the closeness and betweeness among modules had values of 29.679 (<0.001) and 23.742 (<0.001), respectively. Finally, the associations between the three "centralities," that is, degree-closeness, degree-betweenness, and closeness-betweenness, are statistically significant with Pearson correlation coefficients (p value) of -0.937 (<0.001), 0.533 (<0.001), and -0.486 (<0.001), respectively.
After the above quality control of the master network, the network analysis results are discussed as below. In the complex master network shown in Fig 1B, the common XZD and GXBD TPs (S7 Table) have a higher possibility (43.8%) of being targeted by modern drugs in comparison with XZD-specific TPs (24%) and GXBD-specific TPs (0%). Details of the observed and expected counts are shown in Table 2. The 114 common XZD and GXBD TPs that have not been targeted by modern drugs used for treating CAD should be further investigated as potential therapeutic targets for CAD. Furthermore, centrality measures can identify the importance of specific nodes in the whole network. Based on the definitions of three centrality indicators (degree, betweenness, and closeness) on the part of material and methods, different centralities reflect different importance of nodes in a network from different angles. Especially, in terms of the correlation relationships with different strengths among the three centralities in this study, it is necessary to combine multiple centrality indicators to identify important target proteins.
With this approach, the top 10 mutual targets worthy of further investigation in the context of new drug discovery are generated and summarized in Table 3. Although different types of centrality indicators are adopted, interestingly, the top 10 targets are basically consistent (i.e., NOS3, PTPN1, GABRA1, PRKACA, CDK2, MAOB, ESR1, ADH1C, ADH1B and AKR1B1). They are discussed below.
Meanwhile, the common proteins which have been validated as targets of approved anti-CAD drugs, i.e., the 50 mutual targets covered by the two formulas and anti-CAD drugs in Fig  1B, are also useful to add to the current understanding of the disease mechanism and to develop new therapeutic agents based on existing targets. They include many well-known targets of CAD, for example, PTGS2, NOS2, and F2. All common targets in Fig 1B are

Discussion
The drug network can be seen as a measure to explore the synergy between drugs because drugs targeting the same target are connected in the network. However, the interpretation of the TP network seems difficult. Targets intervened by the same compound (not targets of intrinsic connections in biological functions) are linked together. On the one hand, it may be implied that herbal formulas generate a multicomponent and multitarget therapeutic mechanism under the precondition of the safety and efficacy of formulas. Thus, the TP network can be applied to explore potential "new" therapeutic targets, as discussed in results. On the other hand, there exists another possibility if the precondition about the TCM formulas is insufficient, namely, the TP network may imply side effects in that targets functioning in different biological functions are always interfered together by the formulas. Thus, it is still necessary to examine the biological meaning of the targets found in the network analysis sufficiently, although we comprehensively reviewed XZD and GXBD in terms of clinical utilization in TCM therapy for treating CAD and established an experimental basis before conducting this study. Generally, the association of top mutual TPs with CAD and relevant biological significance have been widely discussed in existing literature. For instance, NOS3 (endothelial nitric oxide synthase) is the nitric oxide synthase isoform responsible for maintaining systemic blood pressure, vascular remodeling and angiogenesis, and vascular smooth muscle relaxation through directly regulating NO production [69][70][71]. PTPN1 (tyrosine-protein phosphatase non-receptor type 1) is implicated as contributing to the negative regulation of insulin signaling and a key regulator of cardiovascular effects by reducing vascular adrenergic reactivity [72,73]. PRKACA (cAMP dependent kinase) can maintain circulating platelets in a resting state by phosphorylating proteins in numerous platelet-inhibitory pathways [74]. CDK2 (Cyclindependent kinase 2) plays an important role in altering the phosphorylation profile of retinoblastoma tumor suppressor protein (Rb) in coronary artery smooth muscle cells (SMCs) as well as the proliferative response of these cells to mitogenic stimulation [75]. Moreover, MAO-B (amine oxidase B) inhibitors have been shown to be potentially beneficial for treating cardiovascular pathologies [76]. ESR1 (estrogen receptor) could directly affect cardiovascular tissues via regulating the expression of inducible nitric oxide synthase (NOS2A) in vascular smooth muscle cells (VSMC) [77]. ADH1C and ADH1B (alcohol dehydrogenase 1C and 1B) play important roles in modulating fibrinogen and increasing insulin sensitivity to alter the risk of CAD in persons with a history of long-term alcohol consumption [78]. AKR1B1 (aldose reductase) protects against heart ischemic injury by preventing ER stress induced by excessive accumulation of aldehyde-modified proteins [79]. All of these indicate that the top mutual targets perform various beneficial functions for treating cardiovascular diseases at the molecular level. The associations and biological significance of these targets with CAD provide potential opportunities for the further discovery of new drugs for treating CAD.
As stated in the introduction, TCM follows the principle of "one classic formula for one typical syndrome," and XZD and GXBD are intended for two different syndromes in the TCM domain. TCM is well known and considered attractive for its synergistic effects that are observable at the physiological level. On the other hand, modern drugs follow the reductionism approach, which identifies the single most potent compound for one objective. In TCM and modern medicine theory, the definition of syndromes is quite different. The modern CAD concept covers the TCM blood stasis and phlegm turbidity concepts. Such differences might explain the high frequency of modern CAD drugs targeting overlapping targets.
However, there is no reason to consider the formulae-specific targets unimportant. Actually, this is an area not yet well explored by the modern reductionism drug discovery research. Although XZD and GXBD are two classic formulas for treating CAD, they are used differently in TCM clinical practice for two different syndromes, i.e., blood stasis and phlegm turbidity, respectively. The scientific mechanism of TCM syndromes is not yet clear; however, blood stasis and phlegm turbidity provide a valuable basis for studying CAD subtypes, especially under the background of the emerging medical model of precision medicine for customizing healthcare. Thus, formula-specific targets are still worthy of further experimental investigation based on formula-specific clinical applications of TCM and precision medicine, with the aim of exploring therapeutic targets for CAD.
As illustrated in Fig 1B, 50 overlapping TPs between two formulas and 12 XZD-specific ones have been regarded as TPs of modern drugs used for treating CAD. The 38 XZD-specific TPs that are not targeted by drugs used for treating CAD might be used for identifying potential therapeutic targets for CAD in the context of the specific clinical applications of XZD. Meanwhile, 14 GXBD-specific TPs were identified as being irrelevant to the current TPs of drugs used for treating CAD and could potentially be used for treating promising therapeutic targets for CAD based on GXBD-specific clinical applications. With these studies in mind, the different bioactivities for each formula could be enumerated, and the unique therapeutic targets for CAD in each formula could be identified.
Several limitations of this study should be noted. First, as a silico study, this work is still weak owing to the lack of experimental validation. In particular, the top 10 mutual targets proteins are only "potential" therapeutic targets for CAD and need to be validated by specific pharmacological investigations. The results of this study only provide a reference for further experimental investigations of therapeutic targets for CAD. Second, more powerful herbal and bioinformatics databases, for example, the Herbal Ingredients Targets Database (HIT) and the Therapeutic Target Database (TTD), should be included in future studies for better-quality data. In addition, the TCM ingredients/targets/functions provided in currently accessible repositories may not be comprehensive. Potential bias may be caused by the type of limited source data. Sensitivity analysis and negative control should be performed in future studies to assess the robustness of research results and conclusions against source data quality.

Conclusions
The therapeutic effects of herbal formulas in disease management have been demonstrated by clinical practice over thousands of years. Numerous TPs of chemical ingredients combined in herbal formulas have been identified by modern pharmacological studies on TCM, although the overall mechanism of TCM has not been elucidated. By comparing the similarities and differences in TPs between herbal formulas and modern pharmaceutical agents, potential TPs for further experimental investigation can be identified. This study examined two herbal formulas used for treating CAD as an example for exploring a new methodology based on finding therapeutic TPs.
The beneficial therapeutic areas of TCM should be clarified in the context of modern medicine. Classic formulas corresponding to typical syndromes in these therapeutic areas should be selected. Based on the approaches used in this study, a series of potential TPs of the herbal formulas are promising for future experimental study.
Supporting information S1