Netcombin: An algorithm for constructing optimal phylogenetic network from rooted triplets

Phylogenetic networks construction is one the most important challenge in phylogenetics. These networks can present complex non-treelike events such as gene flow, horizontal gene transfers, recombination or hybridizations. Among phylogenetic networks, rooted structures are commonly used to represent the evolutionary history of a species set, explicitly. Triplets are well known input for constructing the rooted networks. Obtaining an optimal rooted network that contains all given triplets is main problem in network construction. The optimality criteria include minimizing the level or the number of reticulation nodes. The complexity of this problem is known to be NP-hard. In this research, a new algorithm called Netcombin is introduced to construct approximately an optimal network which is consistent with input triplets. The innovation of this algorithm is based on binarization and expanding processes. The binarization process innovatively uses a measure to construct a binary rooted tree T consistent with the approximately maximum number of input triplets. Then T is expanded using a heuristic function by adding minimum number of edges to obtain final network with the approximately minimum number of reticulation nodes. In order to evaluate the proposed algorithm, Netcombin is compared with four state of the art algorithms, RPNCH, NCHB, TripNet, and SIMPLISTIC. The experimental results on simulated data obtained from biologically generated sequences data indicate that by considering the trade-off between speed and precision, the Netcombin outperforms the others.


Introduction
Phylogenetics is a branch of bioinformatics that studies and models the evolutionary relations between a set of species or organisms (formally called taxa) [1,2]. The tree structure is the basic model which can show the history of tree-like events such as mutation, insertion and deletion appropriately [1][2][3][4][5]. The main disadvantage of the tree model is its disability to show non-treelike events (more abstractly, reticulate events) like recombination, hybridization and horizontal gene transfer [2,6]. To overcome this weakness, phylogenetic networks are introduced to generalize phylogenetic trees and represent reticulate events [1,2,[7][8][9][10][11][12][13]. In order to build a rooted network from a set of triplets, several algorithms were introduced recently [2, 6-8, 12, 13, 16]. The well-known algorithms are TripNet [7], SIMPLISTIC [6], NCHB [16] and RPNCH [8]. These algorithms find a semi-optimal rooted phylogenetic network that is consistent with a given set of triplets. Because of using heuristic algorithms the result is not necessarily exact optimal. It means that the resulting network is near to optimal which is called semi-optimal. Formally, a rooted phylogenetic network N (network for short) is a directed acyclic graph (DAG) (Fig 3) that is connected and each vertex satisfies one of the following four categories: (i) A unique root node with in-degree 0 and out-degree 2. (ii) Tree nodes with in-degree 1 and out-degree 2. (iii) Reticulation nodes with in-degree 2 and outdegree 1. (iv) Leaves with in-degree 1 and out-degree 0 (Fig 4a). A network is called a network on X if the set of its leaves is X. For example the network of Fig 4a is a network on X = {l 1 , l 2 , . . ., l 7 }.
Generally the problem of constructing an optimal rooted phylogenetic network consistent with a given set of triplets is known to be NP-hard [17,18]. When the structure of the input triplets is dense, this problem can be solved in polynomial time order [18]. A set of triplets τ is called dense if for each subset of three taxa there is at least one information in the set of input triplets [7,18]. More precisely, a set of triplets τ is called dense if for a given set of taxa X and each subset of three taxa {i, j, k} one of the triplets ij|k or ik|j or jk|i belongs to τ [7,18]. For example for a given set of taxa X = {a, b, c, d, e}, the set of triplets τ = {ab|c, ad|b, be|a, ac|d, ae| c, de|a, bd|c, bc|e, be|d, de|c} is dense.
As mentioned above, density is a critical constraint concerning with constructing a rooted phylogenetic network that contains all given triplets. However, usually there is no constraint on the input triplets and in most cases the input triplets might not be dense. So, introducing efficient heuristic methods to solve this problem is necessary. The desirable goal is to construct a rooted network with no reticulate events i.e. a rooted tree structure. BUILD is the algorithm that was introduced for obtaining a tree structure from a given set of triplets if such a tree exists [19]. In fact, BUILD algorithm decides in polynomial time order if there is a rooted phylogenetic tree that contains all given triplets and then produces an output if such a tree exists. Fig 5, indicates an example of BULID algorithm steps for the given τ = {cd|b, cd|a, cd|e, cd|f, ef|a, ef|b, ef|c, ef|d, db|a, db|e, db|f, da|e, da|f, cb|a, cb|e, cb|f, ab|e, ab|f, ac|e, ac|f}.
In tree construction process for a given τ if BUILD stops, it means that there is no tree structure for the given set of triplets. Fig 6 shows an example for the set of triplets τ = {bc|a, bd| a, cd|a, bc|d, cd|b} in which BUILD algorithm stops. In this case, the main goal is to construct a network structure similar to a tree as much as possible. In other words, constructing a rooted phylogenetic network with the minimum reticulate events is the main challenge.
The simplest possible non-treelike structure (network structure) is level-1 rooted phylogenetic network which also known as galled tree [20]. Fig 7 shows an example of a galled tree. If level-1 networks can not represent all input triplets, more complex (higher level) networks are considered to achieve consistency. LEV1ATHAN is a well-known algorithm to construct level-1 networks [21]. In [6] an algorithm is introduced that produces at most a level-2 network (Fig 4).
When more complex networks are needed (e.g. Fig 8), not restricted algorithms such as NCHB, TripNet, RPNCH and SIMPLISTIC are applicable which try to construct a consistent network with the optimality criterions (the level or the number of reticulation nodes) [6][7][8]16]. Among the above four mentioned algorithms, SIMPLISTIC is not exact and just works for dense sets of triplets while for the other three methods there is no constraint on the input triplets. This is one of the SIMPLISTIC disadvantages. Moreover for complex networks SIMPLIS-TIC is very time consuming and has not the ability to return an output in an appropriate time [7].
TripNet has three speed options: slow, normal, and fast. The slow option returns a network near to an optimal network. Normal option works faster compared to slow option, but its network is more complex compared to slow option. Note that slow and normal options return an output in an appropriate time for input triplets consistent with simple and low level networks. However these two options are not appropriate for large data, because by increasing the number of taxa, the set of triplets corresponds to them are consistent with high level networks. Fast option usually output a network in an appropriate time but its network is more complex compared to the two other options. This option is used when the slow and normal options have not the ability to return a network in an appropriate time. It means that fast option just try to output a network and does not consider the optimality criterions. In summary, TripNet has not the ability to return an optimal network in an appropriate time, when input data is large [7]. NCHB is an improvement of TripNet which tries to improve the complexity of the Trip-Net networks but like TripNet it has not the ability to return an optimal network in an appropriate time for large data [16].  c, d, e, f} and two nodes i, j 2 X are adjacent iff there is a node x 2 X such that ij|x 2 τ. Also AG(τ|A) in which A � X is defined in a similar way. Note that in this way the induced graph of AG(τ) on the set of nodes A � X is considered i.e. the set of nodes are A and i, j 2 A are adjacent iff there is a node x 2 A such that ij|x 2 τ. (a) AG(τ RPNCH is a fast method for constructing a network consistent with a given set of triplets, but its output is usually more complex considering the two optimality criterions compared to SIMPLISTIC and TripNet networks. It means that although RPNCH is fast but on average, the RPNCH networks are far away from the optimality criterions [8]. Generally none of the above four methods have the ability to return a network near to an optimal network consistent with a given set of input triplets in an appropriate time. So the focus of this paper is to introduce a new method called Netcombin (Network construction method based on binarization) for constructing a semi-optimal (near to optimal) network in an appropriate time without any constraint on the input triplets. In this research our innovation is based on the binarization and expanding processes. In the binarization process nine measures are used innovatively to construct binary rooted trees consistent with the maximum number of input triplets. These measures are computed based on the structure of the tree and the relation between input triplets. In the expansion process which converts obtained binary tree into a consistent network an intellectual process is used. In this process minimum number of edges are added heuristically to obtain the final network with the minimum number of reticulation nodes.
The structure of this paper is as follow. Section 2, presents the basic notations and definitions. In section 3, our proposed algorithm (Netcombin) is introduced and Netcombin time complexity is investigated. In section 4, the new introduced algorithm is compared with

Definitions and notations
In this section the basic definitions that are used in the proposed algorithm, are presented formally. From here, a set of triplets and a network are indicated by τ and N, respectively.
A rooted phylogenetic tree (tree for short) on a given set of taxa X is a rooted directed tree that contains a unique node r (root) with in-degree zero and out-degree at least 2. In a tree, leaves are with in-degree 1 and out-degree 0 and are distinctly labeled by X. Also inner nodes i.e. nodes except root and leaves, has in-degree 1 and out-degree at least 2 [2,7]. Fig 1, indicates an example of a tree on X = {a, b, c, d, e, f}.
The symbol L N denotes the set of all leaf labels of N. N is a network on X if L N = X. A triplet ij|k is consistent with N or equivalently N is consistent with ij|k if {i, j, k} � L N and N contains two distinct nodes u and v and pairwise internally node-disjoint paths u ! i, u ! j, v ! u, and v ! k. For example, Fig 9 shows that triplets ij|k and jk|i are consistent with the given network, but ik|j is not consistent. A set of triplets τ is consistent with a network N (or equivalently N is consistent with τ) if all the triplets in τ are consistent with N. τ(N) denotes the set of all triplets that are consistent with N.
Binarization is a basic concept, defined as follows. Let T be a rooted tree and x be a node with x 1 , x 2 , . . ., x k , k � 3 childeren. These k children are partitioned into two disjoint subsets X l and X r . Let X l ¼ fx 0 1 ; x 0 2 ; . . . ; x 0 i g and X r ¼ fx 0 iþ1 ; . . . ; x 0 k g in which x 0 1 ; x 0 2 ; . . . ; x 0 k is an arbitrary relabeling of x 1 , x 2 , . . ., x k . If |X l |>1 then create a new node x l , remove the edges ðx;

PLOS ONE
Netcombin: An algorithm for constructing optimal phylogenetic network from rooted triplets same process if |X r | > 1. Continue the process until the out-degree of all nodes except the leaves, be 2. The new tree is called a binarization of T. Fig 10 shows an example of a non-binary tree and two samples of its binarizations. Note that there are also more binarizations for the tree which two of them are illustrated. If T 2 is a binarization of T 1 then τ(T 1 ) � τ(T 2 ) [7]. ! N is called a height function on X [7]. let T be a tree on X, with the root r, c ij be the lowest common ancestor of i, j 2 X, and l T denotes the length of the longest directed path in T. Let x, y be two arbitrary nodes of T. d T (x, y) is the edge path length between x and y. For any two i, Let G τ be a DAG and l G t be the length of the longest directed path in G τ . Assign l G t þ 1 to the nodes with out-degree = 0 and remove them. Assign l G t to the nodes with out-degree = 0 in the resulting graph and continue this procedure until all nodes are removed. Define h G t ða; bÞ, a, b 2 L(τ) and a 6 ¼ b as the value that is assigned to the node ab 2 V(G τ ) and call it the height function related to G τ [7]. For example Fig 13a to 13d. If τ is consistent with a tree then G τ is a DAG and h G t is well defined [7].
Let r be the root of a given network N and l N be the length of the longest directed path in N. For each node a let d(r, a) be the length of the longest directed path from r to a. For any two A quartet is an un-rooted binary tree with four leaves. The symbol ij|kl is used to show a quartet in which i, j and k, l are its two pairs. Each quartet contains a unique edge for which two its endpoints are not leaves. This edge is called the inner edge of the quartet (See Fig 12) [7].

Method
In order to build a network N consistent with a given set of triplets τ, the height function h N related to τ is defined [7]. The height function is a measure that is used to obtain a basic structure of the final network (N) [7]. This basic structure is in the form of a rooted tree. The height function enforce that the obtained rooted tree be consistent with approximately maximum number of triplets of τ. In this research firstly for a given τ, Netcombin assigns a height function h on L(τ). Then 3 not necessarily binary trees are constructed based on h. In the following 9 binarizations of each constructed tree are obtained (i.e. totally 27 binary trees are obtained). https://doi.org/10.1371/journal.pone.0227842.g011

PLOS ONE
Finally 27 networks consistent with given τ are obtained by adding some edges to each 27 binary trees and the optimal network is reported as output as follow: [7]

Assigning height Function
Let T be a tree with its unique height function h T and i, j 2 L T . The triplet ij|k is consistent with [7]. Moreover for a given network N and i, j, . The above two items imply that the following Integer Programming (IP) IP(τ, s) is established for a given triplets τ with |L(τ)| = n [7].
Maximize S 1�i;j�n hði; jÞ Subject to hði; kÞ À hði; jÞ > 0 ijjk 2 t hði; kÞ À hði; jÞ > 0 ijjk 2 t The solution of the above IP provides a criterion to obtain the basic tree structure. Ideally it is expected that the above IP has a feasible solution i.e. a solution that satisfies all its constraints. If there is a tree consistent with a given τ then the above IP has a feasible solution and the solution that maximizes the above IP is the height function of a tree that is consistent with τ. More precisely in this case h T t is the unique optimal solution to the IP ðt; l G t þ 1Þ in which T τ is the unique tree that is constructed by BUILD [7]. If the set of triplets τ be consistent with a tree, HBUILD can also give the same tree. So in this case by using HBUILD the desired tree consistent with τ can be constructed in polynomial time based on the optimal solution [7]. Fig  13 indicates an example of the HBUILD process for the given τ = {cd|b, cd|a, bd|a, bc|a}.
Generally the above IP has a feasible solution iff the graph G τ is a DAG and in this case the minimum s that gives a feasible solution for IP(τ, s) is l G t þ 1 [7]. So for a given τ the IP might have a feasible solution although there is no tree consistent with τ. In the worst case, there is no tree consistent with a given τ and no feasible solution for the above IP i.e. equivalently the graph G τ is not a DAG. To overcome this flaw, the goal is to remove minimum number of edges from G τ (minimum number of criterions from the IP) to lose minimum information. The problem of removing minimum number of edges from a directed graph to obtain a DAG is known as the Minimum Feedback Arc Set problem, MFAS problem for short. MFAS is NPhard [22]. The heuristic method that is introduced in [16] is used to obtain a DAG from G τ as follow: The nodes with in-degree zero cannot participate in any directed cycle. So these nodes are removed and this process is continued in the remaining graph until there is no node with indegree zero. Similarly, this process is performed for the nodes with out-degree zero in the remaining graph [16].
In the resulting graph that contains no node with in-degree zero or out-degree zero, first color white is assigned to each node. Then for each node v 2 V(G) the following is done. Suppose that the out-degree of v is m and vv 1 , vv 2 , . . ., vv m be m such directed edges with v as their tail. These m edges are removed from the resulting graph. The color of v and v 1 , v 2 , . . ., v m are converted to black. For each v i , 1 � i � m the v i u edge that the color of u is white is removed. Then the color of each node u that is the head of some v i , 1 � i � m is converted to black. The process of removing the edges is continued in a way that the color of all nodes becomes black. The Value of v is defined as the number of remaining edges in the resulting graph. The remaining edges related to the node with the minimum value is removed from G and the resulting graph is a DAG. Fig 14 shows an example of this process [16].

PLOS ONE
Netcombin: An algorithm for constructing optimal phylogenetic network from rooted triplets For simplicity the new graph that is a DAG is called G τ again. Now the height function h G t related to G τ is the desired solution.

Obtaining tree
In the following the goal is to obtain a tree structure from the obtained h G t . In the initial step HBUILD is applied on h G t . The ideal situation is when HBUILD continues until a tree structure is obtained. However, HBUILD may stop in one of its subsequence steps. More precisely, Let (G, h) be the weighted complete graph related to h G t . HBUILD algorithm removes the edges with maximum weight from (G, h). If by removing the edges with maximum weight from each connected component, the resulting graph becomes disconnected then this process continues iteratively until each connected component contains only one node. The basic tree structure is obtained by reversing the above disconnecting process in HBUILD (See Fig 13).
If by removing the edges with maximum weight from a connected component C, the resulting graph C 0 remains connected, then HBUILD halts. Hence, the goal is to disconnect the

PLOS ONE
Netcombin: An algorithm for constructing optimal phylogenetic network from rooted triplets I. The process of removing the edges with maximum weights from C 0 is continued until C 0 becomes disconnected.
II. The Min-Cut method is applied on C 0 . Min-Cut is a method that removes minimum weights sum of removed edges in a way that the resulting graph is converted into two connected components [23].
III. Let w be maximum weight of all edges in C 0 . The new weights are computed based on the current weights and w. For each edge with weight m, its new weight is assigned as: Then Min-Cut method is applied on the updated graph. In this research, for each connected component, the above three processes is applied and then by using HBUILD, three possible tree structures are obtained. From here, without loss of generality, the symbol T int is used to show the tree structure obtained from HBUILD or the three tree structures gained from the above processes.

Binarization
Let T be a rooted tree and τ(T) be the set of triplets consistent with T. Also let T binary be a binarization of T and τ(T binary ) be the set of triplets consistent with T binary . Then τ(T) � τ(T binary ). It means that binarization is an effective tool to make the tree structure more consistent with the given triplets. To perform binarization on T int , the following heuristic algorithm is proposed.
For a given set of triplets τ and T int a binary tree structure T intBin is demanded. Binarization can be performed simply with a random approach [7,8]. In order to make binarization more efficient, a new heuristic algorithm is introduced innovatively in this research. This algorithm is originally based on the three parameters, w, t, and p [16,24].
Let τ be a set of triplets and V i , [16,24]. Based on the three parameters (w, t, and p), nine different measure are defined [5]. The measures M = {m 1 , m 2 , . . ., m 9 } are defined as: By using these measures, nine binary tree structures (T intBin ) are built from T int . The binarization process is performed as follow: Binarization Pseudocode 1: Input: T int 2: If T intBin is binary 4: Do nothing 5: else 6: for each vertex v from T intBin with c 1 , c 2 , . . ., c n children, n > 2. 7: Initialize a set C with {c 1 , c 2 , . . ., c n }. 8: while |C| > 1 do 9: Find and remove two vertices c i , c j 2 C with maximum measure values.

12:
Among 6+1 structures, select the more consistent structure and add its root to C.

Output: T intBin
The binarization process is performed based on using nine different defined measures and Subtree-Pruning-Regrafting (SPR) [25,26]. SPR is a method in tree topology search [25]. In the binarization process SPR helps to obtain a tree from T int more consistent with input triplets. If T int is binary there is nothing to do; Else there is at least a vertex v in T int with c 1 , c 2 , . . ., c n children, n > 2. In this case the goal is replacing this part of the tree with a binary structure (a binary subtree). For this purpose in the first step there are n sets each contains one c i , 1 � i � n. Then iteratively in each step, two sets with the maximum measure values (according to the one of the nine defined measures) are selected. Let c i and c j , 1 � i, j � n be two nodes with maximum measure value. By merging c i and c j , a new vertex c new is created (See Fig 16). Here, SPR is used innovatively to improve the merging consistency.
Suppose that c lk and c rk are the roots of the two left and right subtrees of c k , k 2 {i, j}. The idea behind SPR is replacing subtrees to achieve a new binary tree structure with higher consistency. In this work the potential replacement are introduced in six different ways as follow (See Fig 17).
By using these SPRs, six new structures are obtained. Among these tree structures and the structure without replacement, the best tree structure consistent with more input triplets is selected.

Network construction
Let τ 0 � τ be the set of triplets that are not consistent with T intBin . Here, the goal is to add some edges to T intBin in order to construct a network consistent with input τ 0 . In the network construction process, edges are added incrementally to obtain the final network consistent with τ. In order to add edges, we use innovatively a heuristic criterion to select edges rather than random selection. The heuristic criterion is depended on the current non-consistent triplets 2τ 0 and the current network structure. To this purpose, for each pair of edges of the current network structure, a value is assigned. To compute the value of each pair {e, f} of edges, a new edge is added by connecting e and f via two new nodes n e , n f (See Fig 18). The value is the number of triplets in τ 0 that are consistent with the new network structure. In each step of adding an edge, the set of triples τ 0 , are updated by removing consistent triplets.

Time complexity
In this section, we investigate the time complexity of Netcombin. For the input triplets τ let |L    For the method II, in each step, it takes O(mn) to remove the edges with maximum weight. Then Min-Cut is performed in O(mn + n 2 logn). The overall run time is O(mn + n 2 logn). There are n nodes and the total runtime is O(mn 2 + n 3 logn).
The runtime of the method III is the same as the method II. So obtaining tree T int is performed in O(mn 2

Experiments
The RPNCH, NCHB, SIMPLISTIC and TripNet are famous algorithms in constructing phylogenetic networks from given triplets. The SIMPLISTIC algorithm just works for dense triplets [6], while there is no constraints on the NCHB, TripNet, and RPNCH inputs [7,8,16]. In order to evaluate the performance of Netcombin, the following scenario is designed.

Data generation
There is two standard approaches to generate triplets data. Firstly, triplets can be generated randomly which is the simplest way. Secondly, triplets can be obtained from sequences data. Sequences data usually are in the form of biological sequences. Biological sequences can be obtained from species or from simulation software that can generate these kinds of sequences under biological assumptions. In this research we used the second approach using a simulation software. There are standard methods for converting sequences into triplets. Maximum Likelihood (ML) is the well-known method which constructs tree from sequence data [5,6]. For this reason, TREEVOLVE is used which is a software for generating biological sequences [27]. TREEVOLVE has different parameters that can be adjusted manually. In this research we set the parameters, the number of samples, the number of sequences, and the length of sequences. For the other parameters, the default values are used. The number of sequences (number of leaf labels) is set to 10, 20 30, and 40 and the length of sequences is set to 100, 200, 300, and 400. For each case, the number of samples is 10. So totally 160 different sets of sequences are generated. Then PhyML software is used which works based on Maximum Likelihood (ML) criterion. For each set of sequences, all subsets of three sequences are considered and for each of them, an outgroup is assigned. Each subset of three sequences plus the assigned outgroup, are considered as input for PhyML and for these data the output of PhyML is a quartet. Finally by removing the outgroup from each quartet, the set of triplets is obtained. In this research, each triplet information related to a quartet in which the weight of its unique inner edges is zero, is removed. This is because of these types of triplets contains no information and are stars. The way of generating triplets may give non-dense sets of triplets. SIMPLISTIC is used as a method for comparison and its output should be dense. So by adding a random triplet correspond to each star, each non-dense set is converted to a dense set and is used as the input.

Experimental results
In order to show the performance of Netcombin we compare it with TripNet, SIMPLISTIC, NCHB, and RPNCH on the data that are generated in the previous subsection. Since for large size data, SIMPLISTIC has not the ability to return a network in an appropriate time, the time restriction 6 hours is considered. Let N finite be the set of networks for which the running time of the method is at most 6 hours. Let S sequence shows the number of sequences where S sequence 2 {10, 20, 30, 40}. The output of TripNet, SIMPLISTIC, NCHB, and RPNCH is a unique network, but Netcombin outputs 27 networks and the best network is reported. Since the process of constructing these 27 Netcombin networks is independent, we apply Netcombin in a parallel way to obtain 27 networks simultaneously. In implementation we used a PC with Corei7 CPU and run our algorithm on its cores in parallel.
The results of comparing these methods on the two optimality criterions and running time are available in Tables 1 to 4. Table 1 and 2 show the results of the number of networks that belong to N finite , and the average of running time of the networks that belong to N finite . These results show that when the number of taxa is 10, in all cases, all methods on average give an output in at most 2 seconds. When the number of taxa is 20, in 5% of the cases, SIMPLISTIC has not the ability to return a network in less than 6 hours. For the remaining 95% of the cases SIMPLISTIC on average gives an output in 310 seconds. For these data the other four methods on average construct a network in less than 4 seconds. When the number of taxa is 30, in 32.5% of the cases, on average SIMPLISTIC outputs a network in 2600 seconds. For the remaining 77.5% of the cases, SIMPLISTIC has not the ability to return a network in less than 6 hours. For these data, on average Netcombin and RPNCH output a network in at most 15 seconds, while NCHB and TripNet on average output a network in 203 and 210 seconds, respectively. When the number of input taxa is 40, in all cases SIMPLISTIC does not return an output in time restriction 6 hours. In this case Netcombin and RPNCH on average output a network in at most 44 seconds while NCHB and TripNet return a network in time at least 740 seconds. Tables 3 and 4 indicate the results for the two optimality criterions i.e. the number of reticulation nodes and level for the networks that belong to N finite . The results show that when the number of taxa is 10, on average the number of reticulation nodes for the TripNet and NCHB networks is at most 0.9, while for these data on average the Netcombin, RPNCH, and SIMPLIS-TIC number of reticulation nodes is at least 2 and at most 3. Also for these data, on average the level of the NCHB and TripNet networks, is not more than 0.9, while the level of Netcombin, SIMPLISTIC, and RPNCH networks, on average is at least 2 and at most 2.8. When the number of input taxa is 20 on average the TripNet and NCHB number of reticulation nodes is 2.6 and 1.8, respectively. For these data the Netcombin number of reticulation nodes on average is 4,

Discussion
In this paper we investigated the problem of constructing an optimal network consistent with a given set of triplets. Minimizing the level or minimizing the number of reticulation nodes are the two optimality criterion. This problem is known to be NP-hard [17,18]. By analyzing existing research we can divide the solution of constructing networks based on triplets, into two approaches. In the first approach, the reticulation nodes are recognized and then are removed from the set of taxa and a tree structure is obtained for the remaining taxa. Finally the network consistent with all given triplets is obtained by adding reticulation nodes to the tree structure.
In the second approach, a tree structure is obtained and then by adding new edges to the tree structure, the final network consistent with all triplets is obtained. SIMPLISTIC [6], TripNet [7] and NCHB [16] belong to the first approach and RPNCH [8] belongs to the second approach. According to our best of knowledge, all the researches on this problem fall into one of these approaches. Therefore, in recent papers researchers try to improve these approaches gradually. It means that each improvement is valuable because it can reduce the time and costs, effectively. In this paper we introduced Netcombin which is a method for producing an optimal network consistent with a given set of triplets. In order to show the performance of Netcombin we compared it with NCHB, TripNet, SIMPLISTIC, and RPNCH on the 160 different sets of triplets that are generated in the process that is introduced in subsection 4-1.
The results show that although, on average RPNCH is the fastest method, but the level and the number of reticulation nodes of its results are highest. More over on average the differences between Netcombin, NCHB, and TripNet results for the two optimality criterions with RPNCH results are significant.
The results show that on average for small size data SIMPLISTIC is appropriate. But by increasing the number of taxa and for large size data it has not the ability to return a network in an appropriate time and its running time is highest. Also in all cases on average the SIM-PLISTIC number of reticulation nodes and levels are just better than RPNCH. Note that SIM-PLISTIC just works for dense sets of input triplets. The results show that by increasing the number of taxa, the running time of SIMPLISTIC increases exponentially. In more details when the number of taxa is 40, in time less than 6 hours it does not return any network, while the other 4 methods in at most 745 seconds output a network.
Also the results show that on average NCHB and TripNet running time results are nearly the same, but on average the two optimality criterions for NCHB results are better compared to TripNet. Note that the differences between TripNet and NCHB results for the optimality criterions are not significant.
The results show that for small size data TripNet and NCHB are appropriate and their results for the optimality criterions and running time are on average the best. But by increasing the number of taxa, the running time of these methods exceeds significantly compared to Netcombin, while the two optimality criterions for their networks are nearly the same with Netcombin networks results.
The results show that generally and by considering the running time, the level, and the number of reticulation nodes of the final networks, on average Netcombin is a valuable method that returns reasonable network in an appropriate time.