Network switching strategy for energy conservation in heterogeneous networks

In heterogeneous networks (HetNets), the large-scale deployment of small base stations (BSs) together with traditional macro BSs is an economical and efficient solution that is employed to address the exponential growth in mobile data traffic. In dense HetNets, network switching, i.e., handovers, plays a critical role in connecting a mobile terminal (MT) to the best of all accessible networks. In the existing literature, a handover decision is made using various handover metrics such as the signal-to-noise ratio, data rate, and movement speed. However, there are few studies on handovers that focus on energy efficiency in HetNets. In this paper, we propose a handover strategy that helps to minimize energy consumption at BSs in HetNets without compromising the quality of service (QoS) of each MT. The proposed handover strategy aims to capture the effect of the stochastic behavior of handover parameters and the expected energy consumption due to handover execution when making a handover decision. To identify the validity of the proposed handover strategy, we formulate a handover problem as a constrained Markov decision process (CMDP), by which the effects of the stochastic behaviors of handover parameters and consequential handover energy consumption can be accurately reflected when making a handover decision. In the CMDP, the aim is to minimize the energy consumption to service an MT over the lifetime of its connection, and the constraint is to guarantee the QoS requirements of the MT given in terms of the transmission delay and call-dropping probability. We find an optimal policy for the CMDP using a combination of the Lagrangian method and value iteration. Simulation results verify the validity of the proposed handover strategy.


Introduction
Recently, there has been an exponential growth in mobile traffic globally [1]. To cope with the rapid increase in the traffic, various issues and solutions have been discussed in the field of wireless communication. Heterogeneous networks (HetNets), which involve the co-deployment of multiple small base stations (BSs) such as pico and femto BSs together with traditional macro BSs, appear to be one of the economic and efficient solutions being considered [2]. The utilization of densely deployed small BSs not only offers a rich dimension for realizing increases in system capacity, but also fills coverage holes inside the initial deployment of a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 using a constrained MDP (CMDP). Using this, the effects of stochastic handover parameters (i.e., the link qualities and MT's mobility) and consequential handover energy consumption may be captured when calculating the expected total energy consumption, which is the handover decision metric in this work. Moreover, in our CMDP formulation, we considered both real-time and non-real-time calls by introducing different levels of sensitivities to delay. The CMDP cannot be solved using solely tradition dynamic programming such as policy iteration or value iteration. Thus, we solve the CMDP by using a combination of the Lagrangian method and value iteration.

System model Network model
We consider HetNets in which multiple small BSs are located within the coverage of a macro BS, as illustrated in Fig 1. BSs belonging to different network types have different transmission powers and cell coverage, whereas BSs that belong to the same network type have the same transmission power and cell coverage. The frequency reuse factor between BSs is assumed to be unity, which implies that all BSs operate within the same spectrum, regardless of the network type. In this work, we model the two-dimensional (2D) location of MTs by using a continuous random variable with a uniform distribution, and we study the handover performance of a randomly selected MT. Let N be a set of BSs to which the MT can connect, such that N ¼ f1; 2; . . . ; jN jg, where jN j is the cardinality of the set. Of the accessible BSs, the BS to which the MT is currently connected referred to as a serving BS, and a BS that the MT considers for handover is referred as to a target BS. Handover decision making is performed at time instances that are called handover decision epochs.

Link model
Let link n be a downlink wireless channel between the MT to BS n 2 N . Then, the signal-tointerference-plus-noise ratio (SINR) of the MT connected to BS n is defined as SINR n ðQÞ ¼ P n Q n = P j2N;j6 ¼n P j Q j þ s 2 , where P n is the transmission power of BS n, Q ¼ Q 1 ; :::; Q n ; :::; Q jN j h i is the set of link qualities between the accessible BSs and the MT, and σ is the additive white Gaussian noise. In the SINR definition, we assumed that the interference from BSs to which the MT cannot access is neglected because the amount of interference received from them is much smaller than that from the main interferers (i.e., N nn). In this work, the channel quality of link n, which is denoted by Q n , is assumed to be constant during the service time of a frame, and it is modeled as a finite state Markov chain (FSMC) [12]. The FSMC partitions the link quality into a finite number M of non-overlapping intervals, i.e., 0 = γ 0 < γ 1 < . . . < γ M−1 < γ M = 1, and Q n 2 Q n ¼ fg 0 ; g 1 ; . . . ; g MÀ 1 g is set to γ m if the channel quality of link n is within the range [γ m , γ m + 1 ). In the FSMC, because the link quality is modeled as a random variable with an exponential distribution, the probability density function of the channel quality of link n is presented as f Q n ðgÞ ¼ 1 n n exp fÀ g n n g, where ν n is the mean channel quality of link n. Based on f Q n (γ), the steady-state probability that Q n equals a pre-determined link quality γ m is given as g m f Q n ðgÞ dg ¼ exp fÀ g m n n g À exp fÀ g mþ1 n n g. Then, we denote the transition probability from the link quality q n at decision epoch t to the link quality q n 0 at the next decision epoch t + 1 as P Q n q n 0 jq n ½ ¼ Q n ðt þ 1Þ ¼ q n 0 jQ n ðtÞ ¼ q n ½ , and it is presented as follows: where R n (q n ) is the data rate from link n when its link quality is q n , and D m is the expected number of times per second that the link quality passes downward crossings given level γ m . We define data rate R n (q n ) later in this paper. Let V be the discrete random variable representing the MT's velocity, and we also define it as a Markov chain later in this paper. Then, we

Traffic & velocity model
An active MT can be represented by a traffic flow consisting of K + 1 frames. After finishing the transmission of K + 1 frames, the active MT leaves the system before becoming active again. The handover decision and corresponding handover execution is performed before the start of each frame after the first frame, such that there exist total K handover decision epochs for each active MT. In addition, a traffic flow can be classified into non-real-time and realtime traffic. Compared to non-real-time traffic, real-time traffic is sensitive to delays because of the interactive services being provided. Thus, we set different transmission delay requirements to reflect the characteristics of non-real-time and real-time traffic. Because the time interval between two handover decision epochs is too short, there is little change in the MT's velocity. Thus, the MT's velocity is likely to have a correlation with its past and current velocity. To reflect such a characteristic in terms of the velocity, we represent MT's velocity V by adopting a discrete Gauss-Markov mobility model [13], which is widely used to model the MT movement in a cellular network because it captures the essence of the correlation of MT's velocity in time. In this model, we assumed that MT's velocity is correlated in time, and that it can be modeled by a discrete Gauss-Markov random process. Let V(t) be MT's velocity at the handover decision epoch t. Based on the Gauss-Markov mobility model, the velocity can be determined by the recursive realization as follows: where α 2 [0, 1] is the memory level, μ and σ are the mean and standard deviation of V, and ϕ is an uncorrelated Gaussian process, which is independent of V, with zero mean and a unit variance. We define to represent the transition probability from speed v at decision epoch t to speed v 0 at the next decision epoch t 0 . By varying v and counting the number of different outcomes of v 0 from Eq (2), we can calculate the transition probability P V v 0 jv ½ by performing simulations. In addition, we can also obtain the steady-state probability (or probability mass function) of MT's velocity by using the balance equation [14].

Problem formulation
We determined the validity of the proposed handover strategy using an analytic exploration of the energy-saving potential for optimization in terms of the energy consumption at BSs in Het-Nets. To do this, we formulated a handover decision problem as a CMDP. In the CMDP, the objective is to minimize the energy consumption generated at BSs during the service time of a traffic flow, and the constraint is to meet the QoS requirements of an MT given in terms of the transmission delay and call-dropping probability.
In general, a CMDP is defined as tuple (S, A, T, r, c, d), where S is the state space, A is the action space, T is the state-transition probability matrix, e is the function that reflects the energy consumption, c is the function that reflects the call-blocking probability, and d is the function that reflects the transmission delay.
State s 2 S of an MT includes information on its serving BS, velocity, and link qualities to all available BSs, such that the state space is presented as where × is the Cartesian product operator. Action space A includes BSs that the MT can access at each decision epoch. Action a = n shows that the MT connects to BS n to transmit a frame at the next decision epoch. If the target BS is different from the serving BS, a handover occurs; otherwise, there is no handover, which means that the MT remains connected to the current serving BS.
After taking action a(t) in state s(t) at decision epoch t, the transition probability to new state s 0 can be computed as where it is determined that the action only affects the set of reachable states from the current state.
To measure the energy consumption by BSs to transmit a frame, we define e f (s, a), which quantifies the amount of energy required to transmit a frame when taking action a in state s, and it is the sum of transmission cost e TX (s, a) and handover cost e HO (s, a) as follows: e f ðs; aÞ ¼ e TX ðs; aÞ þ e HO ðs; aÞ: ð5Þ Transmission cost e TX (s, a) reflects not only the energy consumed during the transmission of a frame, but also the energy consumed by an electrical circuit that is independent of the transmission power, as given by where P a is the transmission power from BS a to the MT, P a,c is the signal processing and electrical circuit power of BS a, and FL is the number of bits in a frame. In addition, R a ðqÞ is the data rate achieved between BS a and the MT when the link qualities from accessible BSs to the MT are given as a set of q ¼ q 1 ; :::; q a ; :::q jN j h i , and it is expressed as where U n is the number of MTs connected to the BS, and BW is the available bandwidth. In Eq (7), BW/(m + 1) refers to the bandwidth per connection, which is based on an assumption that the available bandwidth at each BS is equally shared among all connected MTs. Note that unlike a link quality is modeled as a stochastic process, e.g., FSMC, there are few studies that model the variation in the number of MTs connected to each BS according to the passage of time as a stochastic process; instead, we treat U n as a random variable. Let us consider HetNets, where there is a large number of MTs, each of which has a small probability of being active.
For such a condition, as the number of attached MTs per BS increases, its distribution is best approximated by the Poisson distribution [15]. Therefore, we consider U n to be Poisson distributed with parameter λ n , where λ n is the MT's density within the coverage area of BS n. Subsequently, for established link n, its unconditional achievable date rate R n ðqÞ can be determined as By using Eq (8), e TX (s, a) can be rewritten as follows: For serving and target BSs, the handover cost e HO (s, a) captures the energy consumption that is incurred by signaling exchanges and processing loads during the handover-execution phase. This value can vary according to the network type of the serving and target BSs [16]. In this work, it is assumed to be static owing to the limitation of not having the exact measurement over various handover-execution phases, as given below: : where e i;a HO is the energy consumed when switching the MT from the serving BS i to the target BS a.
Given policy π and the number of frames in a traffic flow K + 1, the expected total energy consumption when serving the MT over the lifetime of its connection from initial state s can be expressed as: where F p s Á ½ denotes the expectation under policy π and initial state s, δ 2 [0, 1) denotes the discount factor, which determines the importance of the future value at the current decision epoch, and e f (s(t), a(t)) is the energy consumption at decision epoch t. In addition, F K [Á] denotes the expectation under random variable K. Because different MTs may have connections with different numbers of frames to be transmitted, the number of handover decision epochs K is assumed to be a discrete random variable with a Geometric distribution having mean 1/(1 − λ), such that E π (s) in Eq (11) can be rewritten as follows: where B ¼ dl 2 0; 1Þ ½ is the discount factor for the random number of decision epochs per traffic flow.
To complete the description of the CMDP, we require details regarding the remaining two functions that track the QoSs of the MT. First, we define d(s, a), which captures the delay to transmit a frame, and it is presented as Based on Eq (13), the expected total transmission delay over the lifetime of a connection is expressed as Note that real-time traffic is more sensitive than non-real-time traffic in terms of the delay, such that different delay constraints are needed to differentiate between them. For non-realtime traffic, the expected total transmission delay Eq (14) should be less than the total delay threshold D th total as follows: For non-real-time traffic, it is acceptable to consider the total transmission delay because the main concern when transmitting non-real-time traffic is the length of time for which it should wait to transmit all of the data. For real-time traffic, we consider an additional constraint at each handover decision epoch. That is, the transmission delay to service the current frame, i.e., d(s(t), a(t)), should be less than the frame delay threshold D th frame , as follows: dðsðtÞ; aðtÞÞ D th frame : A call may be dropped for various reasons, such as the MT's velocity, insufficient radio resources, and the presence of dead zones [17]. Of the different reasons, we focus on the MT's velocity. When an MT's speed increases, the call-dropping probability during the handover execution increases. Thus, we define c(s, a) in order to capture the call-dropping probability when action a is taken in state s, and it is expressed as where V min and V max are the minimum and maximum velocity thresholds, respectively. Some MTs may be handed over to a target BS in order to achieve a better data rate although there is a risk that the connection may be dropped during the handover execution. On the other hand, others may be fearful of switching to the target BS. Based on Eq (17), the expected average calldropping probability over the lifetime of a connection can be determined as where C π (s) is the expected total call-dropping probability multiplied by a normalizing constant (1 − δ). From [16], we note that by using the normalizing constant, the expected total value will converge to the expected average value when we use stationary policies. Because we consider only stationary policies, C π (s) is the expected average call-dropping probability, and its value must not exceed the call-dropping threshold C th total , as follows:

Optimal policy for CMDP
To obtain optimal policies for constrained control Problems (21) and (22), we adopt the ideas of dynamic programming. Dynamic programming is a method that is employed to solve a complex problem by breaking it down into simpler subproblems in mathematics, computer science, economics, etc. To obtain the optimal solution of a nonconstrained control problem, dynamic programming such as value iteration, policy iteration, and modified policy iteration algorithms can be applied to it, where a minimization problem over all policies is transformed into a set of minimization problems over a much smaller set of actions.
To utilize the dynamic programming techniques for the constrained problems, we used a standard Lagrangian approach, which transforms a constrained minimization problem into an inf-sup problem of Lagrangian [18] as follows: where G(s) represents the value of the constrained problems, and J p μ ðsÞ denotes the Lagrangian of the constrained problems. The Lagrangian is defined as the sum of the function to be minimized and all of the other functions to be constraints weighted by some constants called Lagrange multipliers as follows: J p μ ðsÞ ¼ E p ðsÞ þ m 1 ðC p ðsÞ À C th total Þ þ m 2 ðD p ðsÞ À D th total Þ; ð23Þ where μ 1 and μ 2 are nonnegative Lagrange multipliers corresponding to constraints. Note that the sup-inf problem is more familiar than the inf-sup problem, as it involves first minimizing with respect to the policies and then maximizing with respect to μ. Thus, we change the order of the inf and the sup in Eq (22) by invoking a saddle-point theorem, such that In summary, solving a constrained optimization problem is equivalent to solving a nonconstrained sup-inf problem. For the given μ, our first challenge is to identify the optimal policy π Ã from among all the feasible policy P such that the expected total energy consumption is minimized, as given by where G μ (s) denotes the minimum expected energy consumption under a given μ. To obtain G μ (s), we apply dynamic programming for an unconstrained control problem. From Eq (23), In order to obtain the solutions of the optimality equation, we used the value iteration algorithm, described in Algorithm 1, which is a method that can be used to find ε-optimal policies for a discounted MDP. The detailed convergence proof of the value iteration algorithm is described in [19].  After finding the stationary policy θ μ (s) from Algorithm 1, we performed the update of Lagrange multipliers used to obtain the value G(s) by solving the equation below:

Algorithm 1 Value iteration algorithm
where C y μ ðsÞ and D y μ ðsÞ are computed as follow: Tðs 0 js; y μ ðsÞÞC y μ ðs 0 Þ; ð29Þ Tðs 0 js; y μ ðsÞÞD y μ ðs 0 Þ: ð30Þ To update m zþ1 1 and m zþ1 2 , we can apply a gradient-decent method as follows: where z is the iteration number. In addition, τ 1 and τ 2 are constant step sizes that must be sufficiently small to ensure convergence to optimal solutions. Algorithm 2 describes the procedure employed to achieve an optimal solution of CMDP.

Discussion on practical implementation of proposed handover strategy
When implementing the proposed handover strategy in a practical wireless system, a computational complexity issue with respect to obtaining an optimal policy of CDMP may arise. To reduce the computational complexity, we can consider several alternatives. For example, one is to make a solution table including the optimal policies of CMDP with respect to all possible states that an MT may have in advance using a scheduler at the macro BS, which is adopted in many existing works [10] [20]. The other is to utilize a finite amount of information on the stochastic behavior of handover parameters for a handover decision. To do so, we should model an optimization problem as a finite-horizon CDMP. However, this may decrease the amount of energy savings realized compared to the opposite case (i.e., an infinite amount of information on the stochastic behavior of link qualities is utilized for a handover decision).

Simulation results
To obtain the simulation results, we considered HetNets in which an MT is located within the overlapping area made by two BSs belonging to different types of network, and where the MT can access both BSs during the lifetime of a traffic flow. This setting follows the commonly adopted macro-femto framework, and it is easily extended to a more complex one. Of the different parameters, we set the nominal values as follows: FL = 500000 symbols, BW = 10 MHz, P 1 = 10 W, P 1 = 6 W, P 1,c = P 2,c = 2 W, q 1 2 {0, 6, 10, 14, 1} dB, q 2 2 {0, 6, 8, 12, 1} dB, ν 1 = 6 dB, ν 2 = 8 dB, λ 1 = 1, λ 2 = 0.6. In addition, the velocity of the quantized MT is 6, 14, and 22 km/h, and the maximum and minimum velocity thresholds are 6 and 30 km/h, respectively. For the Gauss-Markov model, the memory level is 0.5, and the mean and standard deviation of the MT's velocity are 6 and 1 km/h, respectively. To find the optimal solution of CMDP, we use the Markov Decision Processes Toolbox in MATLAB. Because it is basically for MDP, we modified the m-file in Toolbox to enable its use for CMDP.
We compared the performance of the proposed vertical handover (VHO) strategy to that of the SINR-based VHO strategy [21], Rate-based VHO strategy [22], and NO-VHO strategy [10]. In each decision epoch, the SINR-based VHO strategy utilizes SINR values from among accessible BSs for a handover decision. In contrast, the rate-based VHO strategy utilizes achievable data rates of the BSs. In the NO-VHO strategy, there is no VHO during the service time of a traffic flow, which implies that the MT is connected to only one BS during the service time of a traffic flow. The performance metrics considered are the expected total energy consumption and the total number of handovers. The expected total energy consumption has been defined in Eq (11), while the total number of handovers refers to the total handover count over the entire service time of a traffic flow for all possible initial states.
The case of nonreal-time traffic : Figs 2 and 3 show the expected total energy consumption and the total number of handovers during the lifetime of a traffic flow with respect to the change in handover costs. The total delay threshold, the average call-dropping probability threshold, and the discount factor of the CMDP framework are set to be 1.2 s, 0.08%, and 0.9, respectively. Fig 2 shows that the optimal policy from the proposed strategy offers the lowest expected total energy consumption compared with other strategies regardless of the handover costs. Fig 3 shows that in the case of the proposed strategy, as the handover cost increases, the total number of handovers decreases. From an energy-saving perspective, this implies that although the achievable data rate from the current BS is relatively lower than other candidate BSs, maintaining the current BS under the condition that the QoS constraints are satisfied may be better strategy than performing a handover to other candidate BSs. This is because each handover incurs some energy consumption, which may be greater than the small energy savings realized from the handover. That is, the proposed strategy can accurately identify the diminishing potential of energy saving at increasing handover costs. On the other hand, SINRbased and rate-based strategies select a BS depending only on the SINR and the achievable rate, and no handover takes place in the NO-VHO strategy. As a result, the total number of handovers for those strategies is constant regardless of the handover cost. Because the objective of our work is to minimize the energy consumption while preserving the QoS experienced by ς ς ς MTs, we conclude that the proposed strategy outperforms other strategies from the perspective of energy efficiency. Figs 4 and 5 illustrate the expected total energy consumption and the total number of handovers during the lifetime of a traffic flow with respect to the change in the required total delay thresholds. In this case, the average call-dropping probability threshold is fixed at 0.08%. From Figs 4 and 5, we see that when the required total delay threshold increases, both the expected total energy consumption and the total number of handovers decrease regardless of the values for the discount factor. As previously mentioned, the main advantage of the proposed strategy is to achieve energy saving by ensuring that the current BS does not hand over to other BSs ς ς ς frequently under the condition of preserving the total delay constraint. However, as the total delay constraint gradually becomes smaller (i.e., the required total delay threshold decreases), the number of cases for which a handover should inevitably occur increase in order to guarantee the total delay constraint, which results in additional energy consumption. Figs 6 and 7 present the expected total energy consumption and the total number of handovers over the lifetime of a traffic flow with respect to changes in the required average calldropping probability thresholds. In this case, the total delay threshold is fixed at 1.2 s. From Figs 6 and 7, we also showed that as the required average call-dropping probability threshold ς ς ς Network switching strategy for energy conservation in heterogeneous networks increases, the expected total energy consumption decreases, whereas the total number of handovers increases regardless of the values of the discount factor. To analyze these results, we focused on the case where an MT moves to the coverage area of BS 2 from that of BS 1. In this case, although the MT can obtain a higher data rate from BS 2 when compared with BS 1 in order for a handover to occur from the perspective of energy saving, the handover may not be performed. This is because of the risk of call dropping, which is caused by the high velocity of ς ς ς Network switching strategy for energy conservation in heterogeneous networks the MT, and which also results in additional energy consumption. This tendency will become more frequent as the expected average call-dropping constraint becomes tighter.
In the case of real-time traffic: Figs 8 and 9 illustrate the expected total energy consumption and the total number of handovers over the lifetime of a traffic flow with respect to the change in handover costs. We set the total delay threshold, frame delay threshold, average calldropping probability threshold, and discount factor of the MDP framework to 1.2 s, 0.14 s, 0.08%, and 0.9, respectively. For all strategies, we assumed that when an MT can access only one BS, it connects to the BS regardless of the frame delay threshold in order to prevent the disconnection of a call. Fig 9 shows that, similar to the case of nonreal-time traffic, the Network switching strategy for energy conservation in heterogeneous networks proposed strategy outperforms other strategies in terms of energy saving. In addition, with respect to the high handover cost, the NO-VHO strategy may also be a good alternative compared to the rate-based and SINR-based strategies. From Fig 9, we also observed that in the proposed strategy, the total number of handovers does not change at the handover costs of 20 and 25, which implies that at the handover cost of 25, handover occurrences are forced because of the frame delay threshold, even though those handovers do not contribute to energy savings. Fig 10 shows the expected total energy consumption over the lifetime of a traffic flow with respect to the change in handover costs and frame delay thresholds. As shown in the figure, as the frame delay threshold decreases, the expected total energy consumption increases. Note that for the same handover cost, the action set may be different according to the frame delay threshold. As the frame delay threshold decreases, the number of available actions decreases, which can also lead to an inefficient network operation from the perspective of energy saving. Figs 11 and 12 illustrate the convergence of the expected total energy consumption and Lagrange multipliers, given that the handover cost, the total delay threshold, the average calldropping probability threshold, and the discount factor of CMDP framework are set to be 5, 1.2 s, 0.08%, and 0.9, respectively.

Conclusion
To improve the energy efficiency in HetNets, we proposed a handover strategy. The objective of this study was to realize energy savings at BSs while serving a traffic flow, without compromising QoS requirements in terms of the transmission delay and call-dropping probability. The proposed strategy was based on a CMDP that captures not only the current system state, but also stochastic behaviors of handover parameters for a handover decision, and it can be applied to both nonreal-time and real-time calls by differentiating between the delay constraints. Simulation results showed that the proposed handover strategy can reduce the energy consumption by at least 12% for nonreal-time traffic and 3% for real-time traffic compared with the existing handover strategies. In addition, depending on the changes in the QoS requirements for the traffic flow, the performance of the proposed strategy will vary depending on the characteristics of the QoS factors.