Exact and heuristic algorithms for Space Information Flow

Space Information Flow (SIF) is a new promising research area that studies network coding in geometric space, such as Euclidean space. The design of algorithms that compute the optimal SIF solutions remains one of the key open problems in SIF. This work proposes the first exact SIF algorithm and a heuristic SIF algorithm that compute min-cost multicast network coding for N (N ≥ 3) given terminal nodes in 2-D Euclidean space. Furthermore, we find that the Butterfly network in Euclidean space is the second example besides the Pentagram network where SIF is strictly better than Euclidean Steiner minimal tree. The exact algorithm design is based on two key techniques: Delaunay triangulation and linear programming. Delaunay triangulation technique helps to find practically good candidate relay nodes, after which a min-cost multicast linear programming model is solved over the terminal nodes and the candidate relay nodes, to compute the optimal multicast network topology, including the optimal relay nodes selected by linear programming from all the candidate relay nodes and the flow rates on the connection links. The heuristic algorithm design is also based on Delaunay triangulation and linear programming techniques. The exact algorithm can achieve the optimal SIF solution with an exponential computational complexity, while the heuristic algorithm can achieve the sub-optimal SIF solution with a polynomial computational complexity. We prove the correctness of the exact SIF algorithm. The simulation results show the effectiveness of the heuristic SIF algorithm.


Introduction
Network Information Flow (NIF) [1], proposed in 2000, studies network coding in graphs. Comparatively to NIF, Space Information Flow (SIF) [2,3] studies network coding in geometric space (See Fig 1) and it was proposed in 2011. Network coding has been proved to be an effective technology to solve both NIF and SIF problems, and makes it possible to achieve the maximum throughput of a network and to reduce the complexity of computing the optimal transmission scheme [4]. Although both SIF and NIF use network coding as their fundamental on Delaunay triangulation that leads to the sub-optimal solution for SIF, with significantly lower computational complexity and faster convergence.
The rest of this paper is organized as follows: Section 2 reviews the related work. The problem formulation and definitions are described in Section 3. An exact and a heuristic algorithms for SIF are presented in Section 4 and Section 5, respectively. The simulation results are provided in Section 6. Finally, Section 7 concludes the paper.

Related work
In a Euclidean space, network coding and routing can each construct its own network and then select its own network orientation. It is known that both cost advantage and coding advantage of the network satisfy a primal-dual relation, and their maximum values are equal in undirected networks [9,17]. The coding advantage here refers to the ratio of the maximum achievable throughput with network coding over that without network coding [2].
With regard to routing in space, the properties of optimal ESMT were studied by Gilbert et al. [6]. Arora [18] presented a Polynomial Time Approximation Schemes for Euclidean Traveling Salesman Problem in fixed dimensions, as well as for some other NP-hard Euclidean problems, such as Minimum Steiner Tree. Arora's breakthrough method could also be used to approximate solutions to SIF problems. Winter et al. [11] proposed an improved exact algorithm to improve the generation and the concatenation of full Steiner trees. In a subsequent work, Van [19] proposed a heuristic algorithm and enhancements to an exact algorithm for solving the ESMT problem. The proposed algorithm utilizes the Delaunay triangulation to generate the candidate Steiner points for insertion, the minimum spanning tree to create a network on the inserted points, and second order cone programming to optimize the locations of Steiner points. Unlike other ESMT heuristics relying on the Delaunay triangulation, the proposed algorithm inserts the Steiner points probabilistically into the Delaunay triangles to achieve different subtrees on subsets of the terminal points. Moreover, the computational complexity of ESMT is known to be NP-Hard [19].
For NIF, Barekatain et al. [20] proposed a random network coding based framework for efficient peer-to-peer video streaming. Using the proposed framework, each peer encapsulates one instead of n coefficients entries into the generated encoded packet which results in very low transmission overhead. Zhang et al. [21] proposed a minimal increase network coding algorithm for providing reduced computational complexity of encoding and an increased probability of delivery in a dynamic network. Unlike these works which are based on network coding in graphs, our work is based on network coding in space and aims at minimizing the cost of constructing the network.
If a live traffic such as live video streaming is considered, random network coding could be used to complement SIF because random network coding is a suitable technique in multicasting video streams, as denoted by Barekatain et al. [20]. After our SIF algorithms finish determining the positions of the required relay nodes, a network is set up. Then random network coding can be used for transmitting live video streams.
In line with SIF, Li and Wu [3] studied the problem of space information flow, with a focus on the case of multiple unicast sessions. They proved that for multiple unicast in a Euclidean space, network coding is equivalent to routing. Xiahou et al. [5] proposed a geometric framework to analyze the multiple-unicast conjecture. The proposed framework consists of four major steps. In Step 1, linear programming duality is applied for translating the conjecture from its throughput version to an equivalent cost version. In Step 2, graph embedding is performed, for translating the cost version from the network domain to the space domain.
Step 3 aims at dimension reduction that brings the problem from a high dimension space to 1-D.
Step 4 contains a direct proof in 1-D, where the cut condition on information flow transmission is readily applicable. As opposed to these works, we study a multicast version of SIF and propose two algorithms to compute the optimal SIF solutions. Yin et al. [22] discussed the properties of optimal multicast network embedding and proved that network coding does not make a difference in the basic case of 1-to-2 multicast, and further proved upper-bounds on the number of relay nodes required in an optimal acyclic multicast network. Unlike this work, besides proving some new properties of SIF, we also simulated the proposed heuristic SIF algorithm. Huang et al. [8] proved a number of properties of the optimal SIF solutions, and proposed the first heuristic algorithm to compute the optimal SIF solution, for the case of a single multicast. The proposed heuristic algorithm adopts the uniform partitioning and the equilibrium methods as its two sequential phases. The first phase is to compute an optimal topology, and the second phase is to find the optimal positions of the relay nodes in the topology. In another subsequent work, Huang and Li [15] proposed a heuristic SIF algorithm based on non-uniform partitioning that can solve the non-uniform distribution of the terminal nodes in 2-D Euclidean space. Their simulation results showed that the proposed algorithm had a low computational complexity and promptly converged to the optimal solution. In a recent subsequent work, Huang and Li [16] proposed a polynomial-time heuristic algorithm based on non-uniform partitioning and Delaunay triangulation techniques for solving the problem of min-cost multicast network coding in 2-D Euclidean space. Unlike [8,15,16] which proposed heuristic algorithms based on partitioning technique, we propose the first exact SIF algorithm based on Delaunay triangulation technique. Furthermore, we find that the Butterfly network in space is the second example besides the Pentagram network where SIF is strictly better than ESMT, i.e., the cost advantage of the Butterfly network in space is greater than 1. In contrast to Butterfly network in graph [1] where additional relay nodes are not allowed, Butterfly network in space allows to introduce additional relay nodes. Uwitonze et al. [23] studied the problem of number constrained SIF that takes into account the constraint on the number of relay nodes and proposed a heuristic algorithm with a polynomial-time complexity to solve it. In a subsequent work, Uwitonze et al. [24] studied the problem of area constrained SIF which considers the constraint on the area where the candidate relay nodes should be placed, and proposed a heuristic algorithm with a polynomial-time complexity to solve it. As opposed to these works, minimizing the cost is the main objective of this work.
However, despite what have already been achieved in previous SIF research works, the design of optimal multicast SIF algorithms currently remains an open problem of prime importance in SIF, with a particular challenge in a way that it requires to compute not only the number of relay nodes and the exact geometric location of each relay node, but also the best way to interconnect them, as well as the best multicast flow over such topology.

Formulation and definitions
This paper focuses on the single multicast SIF problems, which studies min-cost multicast network coding in 2-D Euclidean space. The network cost is generally determined by two types of variables, one being the positions of the relay nodes, and the other being the flow rate assignments on the links. We call these two factors positions and flow rate assignments. The latter also determines the connection topology of all nodes, since a link with a zero rate indicates that the link does not exist. The problem can be formulated as follows: Given N (N ! 3) terminal nodes T 1 , T 2 , . . .T N with coordinates in 2-D Euclidean space and a multicast session from one source node S to the N terminals as sinks. The goal is to find the optimal min-cost transmission scheme using SIF, which allows the introduction of a set of relay nodes. The network cost is defined as cost = ∑ uv w(uv)f(uv), where f(uv) is the information flow rate of a link uv in a Euclidean space, and w(uv) is the weight of a link uv, which equals to the Euclidean distance kuvk of uv [2,3].
This problem is difficult in a way that it requires to compute not only the number of relay nodes, but also their exact positions in 2-D Euclidean space to achieve the optimal solutions. Therefore, an optimal solution to an SIF problem should contain two aspects: topology and positions. The former further includes a flow routing scheme that specifies the flow rate assignment on each link. An optimal SIF topology specifies which pairs of nodes (including terminal nodes T 1 , T 2 , . . .T N and relay nodes R 1 , R 2 , . . .) are directly connected through a link, and further specifies the flow rate along each link. The positions of the terminal nodes are fixed and the positions of the relay nodes are not.
In this paper, we adopt the discrete model proposed by Yin et al. [22] to briefly explain SIF. Given a source node that multicasts h messages to a number of sink nodes, where h is the minimum among the maximum number of link disjoint paths from the source node to each sink node. When h = 1, f(uv) is always 1 and SIF problem degenerates into the ESMT problem [22]. The Pentagram example of SIF corresponds to h = 2 [8]. The Butterfly network corresponds to h = 2 as well.
The network volume models the total cost of the network to be constructed, under the natural assumption that the cost of a link is proportional to its length as well as its information flow rate. Suppose that a node u has coordinate (x 1 , y 1 ) and a node v has coordinate (x 2 , y 2 ). The Euclidean distance between u and v is Given an information flow vector f embedded in a Euclidean space, a network G can be induced, over the same nodes and links as in f, by viewing f(uv) as the information flow rate of a link uv in 2-D Euclidean space. The cost of f is then ∑ uv w(uv)f(uv) = ∑ uv kuvkf(uv). This reflects the natural rule that the longer and the wider a communication link is, the more expensive it is. An SIF solution is an embedding of a multicast network G into 2-D Euclidean space [15]. Theorem 1 Each pair of links connected to a relay node in an optimal SIF topology meet at angle of 120˚and each relay node of SIF has exactly 3 links.
Proof. In an optimal SIF topology, each pair of links incident to a relay node meet at 120˚. Suppose, on the contrary, that the links AR and RB meet with ∠ARB = θ < 120˚, the links AR and RC meet with ∠ARC = α > 120˚, while the links BR and RC meet with ∠BRC = β > 120˚, as shown in Fig 4(A). From the mechanical interpretation [6], the three pairs of links (AR, BR), (AR, RC) and (BR, RC) pull on relay node R with three forces of magnitudes: The three pairs of links exert at R three forces of different magnitudes. Therefore, the resultant force exerted by the three pairs of links at R can not be zero and R is at unbalanced state. Now, we add a relay node R 0 and replace the links AR, RB and RC by AR 0 , R 0 B and R 0 C, respectively, without modifying the network throughput, such that ∠AR 0 B = ∠AR 0 C = ∠BR 0 C = 120˚. We obtain a balanced configuration shown in Fig 4(B), with shortest length, a contradiction. From the mechanical interpretation, the three unit force vectors acting at R can be zero only if they are 120˚apart. Therefore, every relay node of an optimal SIF topology has exactly three links that meet at 120˚. It follows that no relay node of an optimal SIF topology can have less than or more than three links. Consequently, from now on, we forbid relay nodes with one or two links, as well as relay nodes with four or more links. Thus, we assume that three links go to a relay node, as it is clear that no reduction in topology length is possible otherwise. This completes the proof.

Theorem 2 SIF consists of minimum superposition of ESMTs.
Proof. We prove the theorem by way of contradiction. In light of [25], a network graph can be partitioned into sub-trees through which the same information flows. Hence, the cost of SIF is the sum of the cost of these sub-trees. Assume that the number of the sub-trees is N s , and that SIF ¼ where T(N s ) is not the ESMT for that certain sub-tree. However, given a certain tree, its corresponding ESMT exists, namely, ESMT(N s ), and which means that ESMTðiÞ þ TðN s Þ: This result contradicts the fact that SIF is the optimal solution. The ESMTs that make SIF are selected from all possible ESMTs of the given terminal nodes by linear programming (LP). Therefore, SIF should consist of minimum superposition of ESMTs. This completes the proof.

The main idea of the proposed exact algorithm
We design an algorithm to solve the problem of min-cost multicast SIF. The algorithm is exact in that its output always returns the optimal SIF solution, i.e., the optimal positions of the relay nodes (optimal topology of the network), and the flow rate assignments on the connection links for N (N ! 3) given terminal nodes in 2-D Euclidean space, but it terminates in exponential time. The algorithm selects i (i 2 [3, N]) nodes from N and then uses a Delaunay triangulation technique to generate at most (2i − 5) Delaunay triangles [14] from the selected nodes, whereafter the polygons whose edges vary from 3 to N are constructed by concatenating 1 to (i − 2) adjacent Delaunay triangles. Delaunay triangulation then helps to enumerate all the concatenations of one adjacent Delaunay triangles, all the concatenations of two adjacent Delaunay triangles, etc. All the possible candidate Steiner nodes are then computed inside each polygon of each Delaunay triangles concatenation by ESMT exact algorithm, after which they are considered by our exact algorithm as possible candidate relay nodes. Since the candidate relay nodes are computed inside each polygon, then enough candidate relay nodes, including optimal relay nodes can be obtained. Thus, a Delaunay triangulation technique helps to find practically good candidate relay nodes. Furthermore, as denoted by Smith et al. [14], a Delaunay triangulation technique has two properties that are useful to reduce the overall length of the tree. Firstly, since MST of N is contained in the DT(N), the number of edges in ESMT is the same as the number of edges in MST. Secondly, since each Delaunay triangle tends to be equilateral, we achieve the maximum possible reduction of length in using the ESMT, as compared with using the MST. Once all the possible candidate relay nodes are obtained, a min-cost multicast LP model is solved over the terminal nodes and all the candidate relay nodes, for computing the optimal min-cost multicast network topology, including the optimal relay nodes selected by LP from all candidate relay nodes and the flow rate assignments on the connection links.

Detailed description of the proposed exact algorithm
Our exact algorithm design is based on two key techniques: Delaunay triangulation and LP. Given N (N ! 3), Delaunay triangulation helps to obtain all the possible candidate relay nodes. LP is used to compute the minimum cost as well as the flow rates on the connection links. Our exact algorithm adopts the following LP model: The LP model (Eq (1)) is based on an undirected network G = (V, E), where V is the set of the given terminal nodes (N) and additional relay nodes (R), while E is the set of undirected links. Due to the bi-directed possibilities of transmission in space, we make links bi-directed and denote a set of directed links as A ¼ fũv;ṽujuv 2 Eg. In the LP objective function, the decision variable f ð ! uvÞ represents the combined effective flow rate on a link ! uv. The coefficient (i.e. weight) wð ! uvÞ represents the Euclidean distance j ! uvjð¼ j ! vuj ¼ juvjÞ. In the LP constraints, uvÞ represents the rate of information flow S ! T i on a link ! uv. For every network information flow S ! T i , there is a conceptual flow f i (uv) [26]. We call it conceptual because different conceptual flows share instead of competing for available bandwidth on the same link [26].
The final flow rate f ð ! uvÞ of a link uv equals to the maximum among all f i ð ! uvÞ and should be not less than the maximum conceptual rate, which will directly affect the total cost. V " (u) and V # (u) denote upstream and downstream adjacent set of u in V, respectively. r is a multicast rate from the source S to each sink T i . We assume that there is a conceptual link from each sink T i back to the source S with the rate r, for concise representation of flow conservation constraints [26]. For every pair of nodes, we have both f i ð ! uvÞ and f i ð ! vuÞ to indicate the flows in two directions.
While solving this problem of SIF, we need to consider both variables (positions of the relay nodes and flow rate assignment on all links). Any feasible flow rate assignment satisfying all the constraints in Eq (1) can be achieved with linear network coding in a single multicast session [1].
Our exact algorithm is shown in Algorithm 1, adopting Delaunay triangulation and LP (Eq (1)) techniques for computing the optimal min-cost multicast SIF. First, it initializes an empty set of the total candidate relay nodes R total and set the value of MINCOST to infinity at line 1. Then a for loop in lines 2-16 relies on two nested for loops to compute the subset of the candidate relay nodes R(i). The loop first initializes an empty subset of the candidate relay nodes R(i) at line 3 and then generates all the subsets of the given N multicast terminal nodes by choosing i (i 2 [3, N]) nodes from them at line 4. The first nested for loop in lines 5-14 computes the subset of the candidate relay nodes R(j). Each subset obtained from line 4 includes ð N i Þ cases in total. Then for each case j, with j 2 ½1; ð N i Þ, the loop first initializes an empty subset of the candidate relay nodes R(j) at line 6. Next, it constructs all the Delaunay triangles for each case j by using Delaunay triangulation at line 7. The second nested for loop in lines 8-12 computes the subset of the candidate relay nodes R(x). This loop first initializes an empty subset of the candidate relay nodes R(x) at line 9. Then the loop constructs the polygons whose edges vary from 3 to i by concatenating x (x 2 [1, (i − 2)]) adjacent Delaunay triangles at line 10. Next, the loop computes the candidate relay nodes R(x) inside each polygon by using ESMT exact algorithm at line 11. R(j) is then computed at line 13 and it includes all the candidate relay nodes obtained from all the polygons of each case j. R(i) is computed at line 15 and it includes all the candidate relay nodes obtained from all ð N i Þ cases of each subset. After running all the loops, line 17 computes R total that includes all the candidate relay nodes obtained from all subsets computed at line 4. Then the algorithm constructs a complete graph that has (N + |R total |) nodes in total at line 18. At this step, the problem of network coding in a space is transformed into network coding in a graph. Hence, it can be solved by LP. Line 19 applies LP to solve the problem of min-cost multicast network coding over the complete graph and outputs the optimal SIF topology and MINCOST.
Algorithm 1 An Exact SIF Algorithm RðiÞ; 18: Construct a complete graph with (N + |R total |) nodes; 19: Solve the LP model to obtain the optimal solution for SIF (i.e., optimal topology of the network and the corresponding MINCOST) based on the complete graph.

Correctness of the proposed exact algorithm
According to corollary 1, it is clear that the optimal SIF topology is composed of a number of ESMTs of shortest possible lengths. Theorem 3 The optimal relay nodes in optimal multicast SIF network topology can be obtained by enumerating all ESMTs and apply LP on all ESMTs to get their minimum superposition.
Proof. Consider an optimal multicast SIF network topology that is composed of N terminal nodes and the optimal relay nodes R opt . The optimal relay nodes include forwarding and encoding relay nodes. Let h be the multicast network throughput. According to Theorem 2, SIF can be composed of ESMTs. Hence, we assume that the relay nodes are located at two possible positions in the optimal multicast SIF topology: (1) the relay nodes are located inside ESMT and (2) the relay nodes are located at the boundary of ESMT (for instance in case a relay node is the root of ESMT). When a relay node is located inside ESMT, its position is determined by the positions of its adjacent terminal nodes that make the ESMT. When a relay node is located at the boundary of ESMT, we can prove by contradiction that its position is also determined by the positions of its adjacent terminal nodes. Suppose that the position of a relay node R is not determined by the positions of its adjacent terminal nodes in a certain ESMT. We can replace R with a new relay node R 0 that is determined by its adjacent terminal nodes, assuming that the addition of R 0 will decrease the network cost, without modifying the network throughput. This contradicts the fact that a relay node R is not determined by its adjacent terminal nodes. Thus, we can enumerate all ESMTs of N to obtain all the possible candidate relay nodes. Then, a multicast LP model (Eq (1) for instance) can be solved over the terminal nodes and all the posible candidate relay nodes in order to obtain the optimal relay nodes. The number of relay nodes required for an optimal multicast SIF network topology is upper-bounded by (N − 2) when h = 1 and (2N − 3) (2N − 2) when h = 2 [22]. This completes the proof.  [6]. Therefore, at most (i − 2), i 2 [3, N] candidate Steiner nodes can be computed inside each polygon of each Delaunay triangle concatenation by ESMT exact algorithm, where i designates the number of the terminal nodes in each polygon of each Delaunay triangle concatenation. For example, at most (3 − 2 = 1) candidate Steiner node can be computed inside each polygon of 1 adjacent Delaunay triangle concatenation, at most (4 − 2 = 2) candidate Steiner nodes can be computed inside each polygon of 2 adjacent Delaunay triangles concatenations, . . .(N − 2) candidate Steiner nodes can be computed inside a polygon of (N − 2) adjacent Delaunay triangles concatenations. Now, an ESMT can be constructed in each polygon of each Delaunay triangles concatenation. Thus, we can enumerate all ESMTs by using a Delaunay triangulation technique. According to Theorem 2, SIF can be composed of ESMTs. Therefore, all the Steiner nodes from all ESMTs enumeration are considered as possible candidate relay nodes. This completes the proof.
According to Theorem 1, each pair of links connected to a relay node in an optimal SIF topology meet at angle of 120˚. Our exact algorithm can obtain all the possible candidate relay nodes, including optimal relay nodes by a Delaunay triangulation technique as follows:  i Þ cases in total. Then, at most (2i − 5) Delaunay triangles are generated by Delaunay triangulation for each case of each subset. Next, the algorithm concatenates at most (i − 2) adjacent Delaunay triangles and then uses ESMT exact algorithm to obtain all possible candidate Steiner nodes, denoted as DT(Steiner) set, inside all Delaunay triangles concatenations and considers them as possible candidate relay nodes, i.e., the Steiner nodes correspond to the candidate relay nodes. Once all the possible candidate relay nodes are obtained, the algorithm solves a min-cost multicast LP model (Eq (1)) over the terminal nodes and all possible candidate relay nodes to choose the optimal relay nodes and to construct the optimal multicast SIF topology. It then computes the optimal minimal cost of the optimal multicast topology. Thus, the output of our exact algorithm is a min-cost multicast SIF topology of shortest length. An optimal min-cost multicast SIF topology obtained consists of ESMTs. Then the algorithm can enumerate all ESMTs to get all the optimal Steiner nodes, denoted as ESMT(Steiner). Hence, ESMT(Steiner) set DT(Steiner) set.

Complexity analysis of the proposed exact algorithm
The complexity of steps 2 and 4 is Oð P N i¼3 ð N i ÞÞ ¼ Oð2 N Þ and it is exponential. The complexity of step 5 is Oð N 3 Þ ¼ OðN 3 Þ and it is polynomial. The complexity of Delaunay triangulation in step 7 is O(N log N) [14,27] and it is polynomial. Our exact algorithm considers the Steiner nodes computed inside each polygon of each Delaunay triangle concatenation by ESMT exact algorithm as possible candidate relay nodes. Given a set of N (N ! 3) terminal nodes, the algorithm always chooses i (i 2 [3, N]) nodes from them. Therefore, we can obtain ð N 3 Þ Delaunay triangles in total. Then we can concatenate at most (i − 2) adjacent Delaunay triangles by Delaunay triangulation. Each Delaunay triangle has at most 1 candidate relay node. Thus, the number of candidate relay nodes in all Delaunay triangles is ð N 3 Þ. Each quadrilateral obtained by concatenating 2 adjacent Delaunay triangles has at most 2 candidate relay nodes and we can obtain ð N 4 Þ quadrilaterals in total. Hence, the number of candidate relay nodes in all quadrilaterals is 2 Á ð N 4 Þ. We can use the same method to compute the number of candidate relay nodes for all the remaining Delaunay triangles concatenations. The last Delaunay triangles concatenation consists of concatenating N − 2 adjacent Delaunay triangles to get ð N N Þ ¼ 1 polygon of N edges that has N − 2 candidate relay nodes. Thus, the number of candidate relay nodes in N − 2 adjacent Delaunay triangles concatenations is N − 2. Therefore, the total number of candidate relay nodes |R| in all Delaunay triangles concatenations is Although each term of |R| is polynomial, the number of terms of |R| altogether increases exponentially. Thus, |R| is exponential.
The number of decision variables in the objective function of our LP model is ð jVj 2 Þ, where |V| is the set of given terminal nodes N and additional candidate relay nodes |R|, i.e., whose order is exponential because |R| is exponential. Then, the number of decision variables in the objective function of our LP model is computed and the result is þ . . . þ ðN À 2Þ À Á , whose order is exponential, since |V| is exponential. The number of constraints in and it is polynomial. The number of constraints in , whose order is O (N) and it is polynomial. The number of constraints in f i ð ! uvÞ f ð ! uvÞ, 8i; 8 ! uv is also (N − 1), whose order is O(N) and it is polynomial. Thus, the total number of constraints in our LP model is N 2 + N − 2, whose order is O(N 2 ) and it is polynomial. Therefore, the computational complexity of our LP model is exponential, because |V| = N + |R| is exponential. Notice that the complexity of our LP model could have been polynomial if |R| was polynomial. The complexity of our LP model depends on whether the complexity of its objective function and the complexity of all the constraints is polynomial or not. If the complexity of the objective function and the complexity of all the constraints of our LP model is polynomial, then the complexity of our LP model will be polynomial, else, the complexity of our LP model will be exponential. Since the complexity of steps 2 and 4 is exponential and the complexity of our LP model is also exponential, then we conclude that the total computational complexity of Algorithm 1 is exponential. Although the computational complexity of our exact algorithm is exponential, our exact algorithm achieves the optimal SIF solution.
Theorem 5 Pentagram is the optimal SIF when the number of terminal nodes is 6. Proof. We prove the theorem by using our exact algorithm. For Pentagram, i = 3, N = 6 and there are ð 6 3 Þ ¼ 20 subsets in total. Next, the algorithm generates 5 Delaunay triangles from all subsets, as shown in Fig 5 and concatenates (N − 2) = 4 adjacent Delaunay triangles. Next, the algorithm uses ESMT exact algorithm to compute all possible candidate Steiner nodes inside all Delaunay triangles concatenations, as shown in Fig 6, where 25 candidate Steiner nodes are obtained in total. The algorithm considers them as possible candidate relay nodes and then uses LP to choose 5 optimal Steiner nodes among 25 candidate Steiner nodes. The algorithm then constructs 5 ESMTs made by 6 terminal nodes and 5 optimal Steiner nodes. The optimal Steiner nodes are then considered as optimal relay nodes and the algorithm uses LP to construct the optimal SIF topology shown in Fig 7, with a min-cost of 4.56/bit. The obtained min-cost is smaller than that obtained by using ESMT exact algorithm (min-cost = 4.64/bit), i.e., 4.56/bit < 4.64/bit and the cost advantage = 1.01. Therefore, we can conclude that Pentagram is the optimal SIF when the number of terminal nodes is 6. This completes the proof.

The main idea of the proposed heuristic algorithm
The computational complexity of the exact algorithm is exponential, which makes it difficult to get the optimal SIF solutions when the number of given terminal nodes is very large. Therefore, we present a heuristic algorithm with a polynomial computational complexity that leads to a faster sub-optimal SIF solution.

Detailed description of the proposed heuristic algorithm
The aim of our heuristic algorithm is to compute the network topology, including the number of relay nodes and the way they are connected with the terminal nodes, as well as the flow rates on the connection links. The proposed heuristic algorithm uses two key techniques: Delaunay triangulation and LP. The former helps to generate at most (2N − 5) Delaunay triangles from N (N ! 3) given terminal nodes [14]. It then helps to obtain all the possible candidate relay nodes from all generated Delaunay triangles and quadrilaterals that are obtained by concatenating 2 adjacent Delaunay triangles. The latter computes the minimum cost as well as the flow rates. Our heuristic algorithm is shown in Algorithm 2, adopting Delaunay triangulation and LP (Eq (1)) techniques for computing min-cost multicast SIF. First, it initializes an empty set of the total candidate relay nodes R total at line 1. Then it constructs all the possible Delaunay triangles for the given N multicast terminal nodes by using Delaunay triangulation at line 2. Next, it initializes an empty subset of the candidate relay nodes R(m), R 0 (m) and set the value of MINCOST to infinity at line 3. Then a for loop in lines 4-9 computes R(m), R 0 (m) and MINCOST. The loop first constructs the polygons whose edges vary from 3 to 4 by concatenating m (m 2 [1, 2]) adjacent Delaunay triangles at line 5. Then the loop computes the candidate relay nodes R(m) inside each polygon by using ESMT exact algorithm at line 6. Next, the loop constructs a complete graph for each value of m at line 7. The complete graph has ðN þ P m m¼1 jRðmÞjÞ nodes in total. At this step, the problem of network coding in a space is transformed into network coding in a graph. Thus, it can be solved by LP. Line 8 applies LP to solve the problem of min-cost multicast network coding over the complete graph to get MINCOST and the corresponding relay nodes R 0 (m). After running the loop, the algorithm computes R total at line 10 and it includes all the relay nodes R 0 (m) obtained for all the values of m. Then the algorithm constructs the second complete graph that has (N + R total ) nodes at line 11. At line 12, the algorithm applies LP to solve the problem of min-cost multicast network coding over the second complete graph and outputs the cost, as well as the SIF topology.

Algorithm 2 A Heuristic SIF Algorithm
Require: Input: N (N ! 3) terminal nodes, a multicast session Ensure: Output: a SIF solution 1: Initialize the total set of candidate relay nodes R total = ;; 2: Obtain all the Delaunay triangles by Delaunay triangulation; 3: Initialize the subset of candidate relay nodes R(m) = ;, R 0 (m) = ;, MINCOST = +1; 4: for m = 1 to 2, do 5: Concatenate m adjacent Delaunay triangles; 6: Obtain the candidate relay nodes R(m); 7: Construct a complete graph with ðN þ P m m¼1 jRðmÞjÞ nodes; 8: Solve the LP model to obtain R 0 (m) and MINCOST based on the complete graph; 9: end for R 0 ðmÞ; 11: Construct the second complete graph with N terminal nodes and |R total | candidate relay nodes; 12: Solve the LP model based on the second complete graph and output cost and the corresponding network topology, including the resulting relay nodes.

Complexity analysis of the proposed heuristic algorithm
The method [28] to get the Steiner nodes from a triangle or a quadrilateral is illustrated in Figs 8 and 9, respectively. Triangle: As shown in Fig 8, assume ∠BAC is the biggest angle in 4ABC. If ∠BAC ! 120˚, then the Steiner node S is the vertex A. If ∠BAC < 120˚, construct two equilateral triangles based on two edges of 4ABC, e.g., 4ABD, 4ACE, then the Steiner node S is the intersection of the lines CD and BE. Therefore, the computational complexity is polynomial.
Quadrilateral: According to [28], the method to get the Steiner nodes in a convex quadrilateral ABCD includes three steps, as shown in Fig 9. First construct two equilateral triangles 4ABE and 4CDF. Next, construct two circles that pass at the vertices of the two equilateral triangles 4ABE and 4CDF. Last, draw the line EF and the two steiner nodes S 1 and S 2 are the intersection of the line EF with two circles, as shown in Fig 9. Thus, the computational complexity is also polynomial.
The computational complexity of Delaunay triangulation is O (N log N) [14,27] and it is polynomial. The computational complexity of generating candidate relay nodes from Delaunay triangles and quadrilaterals constructed by concatenating 2 adjacent Delaunay triangles is polynomial. Our heuristic algorithm considers the Steiner nodes computed inside each Delaunay triangle and quadrilateral by ESMT exact algorithm as possible candidate relay nodes. Given a set N (N ! 3) of terminal nodes, we can get at most (2N − 5) Delaunay triangles by Delaunay triangulation [14]. Each Delaunay triangle has at most 1 candidate relay node. Thus, the total number of candidate relay nodes in all Delaunay triangles is (2N − 5). Our heuristic algorithm constructs all the possible quadrilaterals by concatenating all 2 adjacent Delaunay triangles. It follows that we can get ð N 4 Þ quadrilaterals in total. Each quadrilateral has at most 2 candidate relay nodes. Therefore, the total number of candidate relay nodes in all quadrilaterals is 2 Á ð N 4 Þ. Therefore, the total number of candidate relay nodes |R| in all Delaunay triangles and quadrilaterals is whose order is O (N 4 ). Thus, |R| is polynomial.
The number of decision variables in the objective function of our LP model is ð jVj 2 Þ, where |V| is the set of given terminal nodes N and additional candidate relay nodes |R|, i.e., , whose order is O (N) and it is polynomial. The number of constraints in f i ð ! uvÞ f ð ! uvÞ, 8i; 8 ! uv is also (N − 1), whose order is O(N) and it is polynomial. Thus, the total number of constraints in our LP model is N 2 + N − 2, whose order is O(N 2 ) and it is polynomial. Therefore, the computational complexity of our LP model is polynomial. Then, we conclude that the computational complexity of Algorithm 2 is polynomial.

Simulation results
We have simulated our heuristic SIF algorithm in a 2-D Euclidean space. Our simulations used MATLAB to solve LPs. The positions of the terminal nodes are fixed while the positions of the relay nodes are not fixed. The optimal ESMT is computed by GeoSteiner 3.1 that implements an ESMT exact algorithm [11]. The MST is computed by implementing Prim's algorithm [7] in MATLAB. Given N (N ! 3) terminal nodes, we can concatenate at most (N − 2) adjacent Delaunay triangles by Delaunay triangulation. However, our heuristic algorithm considers only the concatenations of one and two adjacent Delaunay triangles for the sake of polynomial computational complexity. The following parameters are used for evaluating the performance: The cost: This parameter reflects the total length of information transmission required for achieving a 1 bit end-to-end multicast throughput. Minimizing the cost is the primary objective of the proposed heuristic SIF algorithm. to ESMT for these six Cases. This shows that our new heuristic SIF algorithm can achieve the optimal ESMT faster. After concatenating 2 adjacent Delaunay triangles, the optimal ESMT is achieved and SIF degenerates to ESMT for all the 15 Cases, except for the following three special Cases: Case 9, Case 14 and Case 15, where the difference between SIF cost for 2 DT and ESMT cost is very small (0.008 for Case 9, 0.001 for Case 14 and 0.004 for Case 15). Fig 11 shows the relative error percentage of the cost of concatenation of m adjacent Delaunay triangles over that of SIF, i.e. costðmDTÞÀ SIF SIF Á 100%. As shown in Fig 11, when m increases, the relative error percentage becomes negligibly small, such that after concatenating 2 adjacent Delaunay triangles (m = 2), the relative errors of all the 15 cases become zeros, except for the three special cases that we mentioned above: Case 9 (0.43%), Case 14 (0.05%) and Case 15 (0.26%). After accomplishing the concatenation of 3 adjacent Delaunay triangles, the relative errors of all the 15 cases become zeros, except for Case 14. The relative error of Case 14 becomes zero after concatenating 4 adjacent Delaunay triangles. Table 1 shows the cost advantage of all the 15 cases. After concatenating 2 adjacent Delaunay triangles, the cost advantage = 1 for almost all the cases.
The results of Case 9 are not optimal after concatenating 1 and 2 adjacent Delaunay triangles. The optimal results for this Case are obtained after concatenating 3 adjacent Delaunay triangles and after that, the min-cost does not change (See Fig 12) and the cost advantage = 1.   From Fig 13(C), which shows the optimal SIF network topology of Case 9, there are four relay nodes that are directly connected to each other, among which one relay node is close to the terminal node. Figs 14 and 15 show the ESMT and MST topologies for Case 9, respectively.
In addition, the results of Case 14 and Case 15 are also not optimal after concatenating 2 adjacent Delaunay triangles. Their optimal results are obtained after concatenating 4 and 3 adjacent Delaunay triangles, respectively and the cost advantage = 1. After that, the min-cost for Case 14 and Case 15 do not change (See Fig 12). Fig 16 shows Fig 16(D)), while Case 15 doesn't have this kind of relay node (See Fig 19(C)). The gap between the cost of concatenation of 2 adjacent Delaunay triangles and that of SIF will be bigger if there are more relay nodes which are directly connected to each other. The proposed heuristic SIF algorithm only considers the concatenation of one and two adjacent Delaunay triangles for the sake of polynomial complexity. For the Cases discussed above, the concatenation of more than two adjacent Delaunay triangles are considered to achieve the optimal ESMT, and we observed that the complexity increases.
Notice that the optimal topologies for Case 9, Case 14 and Case 15 have one thing in common: There are more relay nodes that are directly connected to each other (See Figs 13(C), 16(D) and 19(C)).

The Pentagram network
We applied our heuristic algorithm to the Pentagram network, where N = 6. After concatenating 1 adjacent Delaunay triangle, the optimal SIF is achieved (SIF cost for 1 DT = optimal SIF cost = 4.56/bit), and is less than that of ESMT (cost = 4.64/bit) and MST (cost = 5/bit). The relative error percentage is 0% and the cost advantage = 1.01. After applying our heuristic algorithm, the optimal topologies with regard to SIF and ESMT are the same as the ones we discussed about in Section 1 (See Fig 2(B) and 2(C)). Fig 22 shows the MST topology. Hence, the proposed heuristic algorithm achieves the optimal solutions for the Pentagram network. In addition, contrary to optimal ESMT, where the number of necessary Steiner nodes is upperbounded by N − 2, the number of necessary relay nodes in SIF can be greater than N − 2. The reason is that the SIF scheme is not always a tree, while the optimal ESMT is always a tree [8].

Butterfly network
We tested our heuristic algorithm with the Butterfly network in space, where N = 7. Delaunay triangle, where SIF cost (cost = 4.594/bit) is less than ESMT cost (cost = 4.64/bit) and MST cost (cost = 4.93/bit), while the relative error percentage is 0.09%. Fig 23(B) shows the topology of the Butterfly network after concatenating 2 adjacent Delaunay triangles (cost = 4.590/bit) and the relative error percentage is 0%. Fig 24 shows the MST topology and Fig 25 shows the ESMT topology. The cost advantage = 1.009 for 1 DT and the cost advantage = 1.01 for 2 DT. Therefore, SIF is strictly better than ESMT for Butterfly network in Euclidean space that allows to add additional relay nodes. The network cost is defined as ∑ uv w(uv)f(uv).

Random networks
We applied the heuristic algorithm to random networks, which are generated by the Waxman model [30]. Throughout our simulations, we observed that in most of the cases for such random networks, the relative error percentage is 0%, while the cost advantage = 1. Furthermore, SIF outperforms MST for all the tested cases. Fig 26 illustrates the SIF result after concatenating 1 and 2 adjacent Delaunay triangles for one example of such cases when N = 9. Fig 26(A) shows the SIF result after concatenating 1 adjacent Delaunay triangle (cost = 1.77/bit). The relative error percentage for 1 DT is 0.58% and the cost advantage = 0.99. Fig 26(B) shows the optimal SIF result after concatenating 2 adjacent Delaunay triangles (cost = 1.76/bit), the relative error percentage is 0% and the cost advantage = 1. SIF cost for 2 DT achieves the optimal ESMT cost.

Conclusion
In this work, we combined Delaunay triangulation and linear programming techniques to come up with the first exact SIF algorithm and a heuristic SIF algorithm to solve the min-cost multicast problem. Our exact algorithm has an exponential computational complexity and achieves the optimal SIF solution. Our heuristic algorithm has a polynomial computational complexity and the simulation results show that it is effective, as it achieves the sub-optimal SIF solution. The multicast case of Space Information Flow paradigm remains much more interesting, where key open problems such as exact algorithms with polynomial-time complexity and more properties related to SIF are yet to be investigated. We intend to consider nodes mobility in SIF for our future works.