LEO satellite assisted UAV distribution using combinatorial bandit with fairness and budget constraints

In this paper, an integration between a low earth orbit satellite (LEO-Sat) and unmanned aerial vehicle (UAV) is proposed to assist users in post-disaster areas. In this scenario, multiple UAVs will be distributed to fully cover the victims and provide rescue services, while LEO-Sat provides backhaul links for UAVs to the ground base station (GBS). In this regard, we consider the problem of efficient UAVs distribution to maximize the total sum rate of the victims while assuring fairness in their coverage within the limited resources of UAVs batteries and LEO-Sat bandwidth. In this paper, UAV distribution problem is considered as a combinatorial multi-armed bandit (MAB) with arms’ fairness and limited UAVs battery budget (CMAB-FB) constraints. Additionally, the utilization of LEO-Sat bandwidth resources is optimized based on the average traffic demands of the LEO-UAV links by means of gradient decent algorithm. The results of numerical analysis indicate that the proposed approach outperforms other naïve ben chmarks.


I. Introduction
In recent years, unmanned aerial vehicles (UAVs) have been increasingly utilized in postdisaster relief efforts, due to their ability to reach remote and hard-to-access areas quickly [1,2]. In post-disaster scenarios, UAVs can provide wireless platforms for user equipments (UEs) belonging to victims and rescue workers. However, effective deployment of UAVs in such scenarios is a complex task that requires balancing multiple conflicting objectives, including coverage, limited UAVs battery life, and communication capabilities [1,2]. The limited transmission range of UAVs may hinder them from communicating directly with the nearest ground base station (GBS) or with each other, impacting their deployment and data gathering capabilities in post-disaster areas. On the other side, low earth orbit satellites (LEO-Sat) have revolutionized wireless communications, offering a myriad of applications that have transformed the way we connect and communicate [3]. LEO-Sats, positioned at an altitude ranging from 500 to 2,000 kilometers above the earth's surface, provide several advantages in wireless communications [3]. Firstly, their proximity to the earth enables lower latency, reducing signal delays and enhancing real-time communication. Additionally, LEO-Sats offer high bandwidth a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 this network, the problem of UAVs distribution among users in this post-disaster area will be considered. The distribution of UAVs aims to maximize the total achievable data rate of the victims, while maintaining fairness in their coverage. Also, UAVs' limited battery budget should be considered along with the limited bandwidth resources of the LEO-Sat. To the best of our knowledge, this is the first research effort for proposing LEO-Sat assisted UAV distributions in post-disaster scnearios.
• To efficiently address the efficient UAVs distribution problem within its UAVs energy constraints, it is considered as a MAB game, where "CMAB-FB" algorithm is proposed to implement it. The proposed "CMAB-FB" algorithm will be used to distribute the UAVs among the post-disaster zones to maximize the users' achievable data rate while maintaining coverage fairness among them within the limited battery budget of UAVs. Then, based on the achieved UAVs rates, the bandwidth resources of the LEO-UAV links will be optimized subject to LEO-Sat limited bandwidth resources by means of gradient decent algorithm. By this way, the aforementioned challenges can be addressed as follows: 1) UEs coverage fairness can be assured based on their densities in post-disaster grids, where their densities can be pre-estimated using GPS localization and refined through UAVs exploration during the MAB game.
2) The budget constraint nature of the "CMAB-FB" algorithm results in saving UAVs energy resources.
3) The bandwidth resources of the LEO-Sat will be optimized based on the selected "super-arm", i.e., grids, at each time slot.
• Numerical analyses are conducted to prove the effectiveness of the proposed scheme under different scenario. In this regard, the proposed scheme is compared against other benchmarks such as random and nearest UAV distribution, where the proposed scheme demonstrates superior performance over these benchmark schemes.
The rest of this paper is organized as follows: Section II gives the related works to that presented in this paper. Section III gives the proposed system model including the optimization problem formulation, Section IV gives the proposed "CMAB-FB" and the gradient descent algorithms, Section V gives the conducted numerical simulations, followed by the concluding remarks in Section VI.

II. Literature review
Recently, extensive research has been undertaken to explore the utilization of UAVs for the purpose of supporting post-disaster areas. The synergistic employment of UAVs with cellular networks, and wireless sensor networks (WSNs) to facilitate disaster management applications was investigated in [8]. In another research endeavor given in [9], a genetic algorithm was employed to optimize the placement of UAVs, with the objective of enhancing both the overall coverage and data rate of the wireless network. In [10], an effective methodology to aid rescue operations in locating victims affected by a natural disaster was proposed. This approach involved the utilization of UAVs equipped with LiDAR and infrared depth cameras to construct an independent detection system that was not reliant on illumination intensity. Additionally, [11] involved the deployment of UAVs integrated with a video recorder and geolocation module, enabling the search for survivors within a post-disaster area. Furthermore, [12] explored the concept of flying communication services by equipping UAVs with Wi-Fi, video cameras, and web servers. The objective was to empower individuals affected by a disaster to utilize their smartphones for real-time text and video communication. Building upon this research, the authors of [13] proposed a mobility model based on the self-deployment of an aerial ad hoc network, utilizing the Jaccard dissimilarity metric to facilitate postdisaster scenarios. This mobility model incorporated the movement of victims and generated corresponding UAV mobility patterns to track these individuals. In the realm of disaster management systems, [14] presented a novel approach for energy-efficient task scheduling for the collected data obtained by UAVs from ground IoT networks. As for [15], UAVs were leveraged as on-demand airborne relays to establish connectivity between remote users and GBS, particularly when they were geographically isolated by substantial obstacles. In [16], millimeter wave (mmWave) UAV gateway selection is proposed in post-disaster scenarios. In [17], enhanced dynamic spectrum access was proposed empowered by MAB schemes for UAV networks in post-disaster scenario. This work was then extended in [18] to optimize the UAV 3D trajectory as well. Almost all work existing in literature regarding UAVs applications in postdisaster scenarios assumed that UAVs fly go-and-forth between the post-disaster zones and the ground fusion center in the nearest survival GBS. This will highly consume the UAVs energy and delay the rescue services operations. Moreover, all the above existing research works assumed a full awareness of the network parameters, which cannot be easily obtained in the completely destroyed infrastructure in post-disaster scenarios. In this paper, we leverage LEO-Sat as backhauling for the UAVs networks, which highly relaxes the need of UAVs flying between GBS and post-disaster zones. Moreover, as the proposed scheme is based on online learning, which does not need any prior information about the environment, except users' pre-locations obtained using GPS services.
Regarding the integration between LEO-Sat and UAVs for enhancing/extending aerial coverage. In [19], the authors used a combination of LEO-Sat and UAVs for beyond-5G communication, utilizing millimeter-wave (mmWave) and free-space optical (FSO) links. A multiagent deep reinforcement learning (MARL) approach is used to optimize communication and energy efficiencies, leading to improved peak and worst-case throughput compared to using only one of the links. In [20], the authors proposed the use of UAVs and LEO-Sat for data collection from internet-of-remote-things (IoRT) sensors. In [21], the authors proposed using LEO-Sat and caching by UAVs for content delivery in terrestrial networks to improve connectivity and capacity. The problem of optimizing cache placement, resource allocation, and trajectory was solved using an alternating algorithm. In [22], LEO-Sats were used for UAV tracking using Gauss Hermite filter based on hybrid TDOA/FDOA geolocation measurements. In [23], an integration between LEO-Sat and UAVs was considered for integrated mobile edge caching IoT system. In this model, LEO-Sat broadcasts data, and UAVs collect it from decentralized ground sensors. UAV deployment and power allocation for secure spaceair-ground communications was considered in [24]. The aim of the formulated optimization problem was to maximize the secrecy rate subject to UAV's power and deployment area. Despite the existing research in LEO-UAV integration, none of them considered the problem of LEO-Sat assisted UAV distribution in post-disaster area as presented in this work.

III. System model and optimization problem formulation
The proposed system model for LEO-UAV integration to cover the post-disaster area is shown in Fig 1. The area is divided into a set of M non-overlapped grids collected in M. Each grid i 2 M contains K i UEs. Dividing the post-disaster area into non-overlapped zones comes from the nature of the victims' distributions which are typically formed in sparse groups due to the destruction happening in the area or grouping them within rescue shelters. This assumption was also considered in [25,26], when using UAVs in rescue services in post-disaster scenarios. In the proposed system model, LEO-Sat relays control and traffic data between GBS and UAVs for both control and traffic data. In this study, it is assumed that a set of N UAVs collected in N , N = 5 in Fig 1, where j 2 N and N<M, are always within the coverage area of the LEO-Sat. At each time slot t, the GBS allocates the N UAVs to a selected group of the post-disaster grids, and it relays the information of the selected grids to the UAVs via the LEO-Sat. The criterion based on which the GBS selects the grids is to maximize the achievable data rates of their users, while assuring fairness in the grids coverage based on their users' densities. Then, UAVs will fly towards the selected grides and hover above them to relay their users' traffic data to the GBS via the LEO-Sat backhaul links. This operation is repeated over the time horizon constrained by UAVs' battery capacities and LEO-Sat bandwidth resources. In the followings, the utilized UAV-UE and LEO-UAV link models will be explained in detail, and then the optimization problem of UAVs distribution will be formulated.

A. UAV-UE link model
For the UAV-UE link model, we utilized the simple link model given in [27], which can be given as follows: L j;k i ðx j;k i Þ, Lðx j;k i Þ and L NLoS j;k i ðx j;k i Þ are the total path loss, its line-of-sight (LoS) component, and non-LoS (NLoS) in dB between UAV j and UE k in grid i as a function of their separation distance x j;k i Herein, λ U is the wavelength of the UAV-UE link, and ρ LoS and ρ NLoS indicate the system loss in dB for both LoS and NLoS, respectively. P LoS and P NLoS , where P NLOS = 1−P LOS , are the probabilities for LoS and NLoS links. P LoS is defined as follows [27]. where a and b are constants based on the environment, while y j;k i indicates the elevation angle between UAV j and UE k in grid i. Herein, y j; where h j is the hovering height of UAV j and d Hj;k i is the horizontal distance between UAV j and UE k in grid i. Without loss of generality, uplink transmission is assumed between UEs in grid i and UAV j. Thus, the average data rate between UAV j and grid i at time slot t, C t j;i , can be given as follows: In (4), frequency division multiple access (FDMA) is considered among UEs in grid i, where the total bandwidth B is equally divided among UEs in grid i which is denoted as K i . P Rx;t j;k i is Rx power at UAV j from UE k in grid i at time slot t, and σ 2 is the noise power of the UAV-UE link, respectively. This model assumes uplink transmissions, but the same principle can be applied to the downlink. Moreover, the grids are assumed to be sparse as considered in [25,26], which prevents mutual interferences among grids, which is a reasonable assumption in post-disaster areas as previously explained.

B. LEO-UAV link model
For LEO-UAV communication link, we utilized the link model given in [28], where the received power, P Rx;t S;j , and the achievable uplink data rate in bps at LEO-Sat from UAV j, Z t S;j i , at time slot t are written as follows: where P Tx j;S is the Tx power from UAV to LEO-Sat, and G Tx j ; G Rx S are the Tx and Rx antenna gains from UAV (LEO-Sat) towards LEO-Sat (UAV), respectively. x t S;j is the separation distance between the LEO-Sat and UAV. Also, λ S is the wavelength of the LEO-UAV link. In (6), B t S;j , τ and ε are the allocated bandwidth of the LEO-Sat link at time slot t, the noise temperature, and the Boltzmann constant, respectively.

C. Optimization problem formulation
At each time slot t, the GBS should decide which grids from M the UAVs should cover, and then assigns UAVs to these selected grids. This information is communicated to the UAVs via the LEO-Sat. The objective is to distribute the UAVs-grids in a manner that maximizes the long term achievable data rates of the UAVs and assures fairness among the grids based on their UEs density, while also considering the limited battery capacities of the UAVs and the limited bandwidth of the LEO-Sat. Mathematically speaking, this optimization problem can be formulated as follows: 1 T where, where T indicates the total time horizon, d t j;i 2 f0; 1g is a selection indicator which is equal to one if grid i is selected to be covered by UAV j at time slot t, and zero otherwise. Z t S;j is the assigned capacity of the LEO-UAV of UAV j at time slot t. The 2 nd constraint in (7) means that the total number of selected grids should be less than or equal to the total number of UAVs N as the battery of some of UAVs may be completely depleted during the coverage process and needs for re-charging. The 3 rd constraint means that each UAV j should cover only one grid i at a time slot t. The 4 th constraint means that the energy consumption of UAV j, G t j;i , required to serve grid i at time slot t should not exceed its total battery capacity Γ Umax . G t j;i is defined in (8) and considers both the flying and hovering powers (P f and P h ) and times ðT f;t j;i and T h;t j;i Þ of the UAV. T f;t j;i is the ratio of the distance between the UAV's current location and its target location in grid i at time slot t, divided by the UAV speed, while T h;t j;i is determined by the traffic demand of grid i relative to C t j;i . Actually, there are eight sources of UAV power consumptions as given in details in [29]. However, the flying and the hovering power consumptions are the most dominant ones as shown in [29], with flying consumes more energy than hovering [29]. Both P f and P h are related to the mass of the UAV, the gravitational force, the radius of the propeller, and the air density. In addition, P f depends on the deviation angle between the UAV vertical axis and the Z axis as shown in [29]. For more details about various sources of UAV power consumption and their mathematical details, interested readers are advised to check [29]. The 5 th constraint means that as the link is established between UAV j and grid i at a time slot t, i.e., d t j;i ¼ 1. It should be assigned a bandwidth resource from the LEO-Sat with specific data rate Z t S;j , where the sum data rates of all UAV-UE links corresponding to d t j;i ¼ 1 must not exceed the total capacity of the LEO-Sat, i.e., η Smax . The 6 th constraint is used to ensure fairness among grides based on their UEs density.

IV. Proposed CMAB-FB and gradient decent algorithms
The problem given in (5) is a dynamic non-linear programing problem, without a closed form optimal solution. However, this problem can be simplified by splitting it into two stages, where LEO-Sat capacity resources Z t S;j are typically optimized based on required UAVs traffic rates, i.e., C t j;i , which is based on the optimized d t j;i values. Thus, in the first stage, the UAV-grid association parameters d t j;i are optimized, while in the second stage, Z t S;j values are adjusted based on the optimized d t j;i and their corresponding C t j;i . To optimize the values of d t j;i , the problem is considered as a combinatorial MAB problem with arms' fairness constrained by UAV battery budget. Thus, in this section, we will explain the MAB model in general, then we introduce the proposed "CMAB-FB" algorithm for adjusting the values of d t j;i . Finally, the values of Z t S;j are adjusted using gradient decent algorithm.

A. MAB model
MAB is an efficient online learning methodology, where a player plays over bandit arms and observes their rewards [7]. The player aims to maximize his long-term profit by learning to always play with the arm having the maximum achievable reward. The player has no-prior knowledge about the game except the played arms and their corresponding rewards. At each time slot during the game, the player tries to compromise between always exploiting the arm having the highest reward so far or exploring new unknown ones. This is what is called the exploitation-exploration dilemma of the MAB games [8]. There are several algorithms that can efficiently implement the MAB hypothesis like upper confidence bound (UCB), �−greedy, Thompson sampling (TS), etc [30]. In some of the MAB games, selecting an arm comes with paying cost, which is constrained by the player's limited budget. This type of MAB game is called budget constraints games, where budget constraint UCB and budget constraint TS are two well-known MAB algorithm variants that can efficiently implement these types of MAB games [31]. Also, in some cases, the player should select a group of arms from the arm space, which is called a "super-arm"; this type of MAB game is called a combinatorial bandit because a combination of arms should be selected at each time slot [32][33][34]. Sometimes fairness is required among the selected arms while selecting the super-arms, which are called combinatorial bandit with fairness constraints [4].

B. Optimization of d t j;i
In this stage, we will optimize the values of d t j;i adhering to constraints 1, 2, 3, 4 and 6. This is a time sequential combinatorial non-linear optimization problem, which can be viewed as a combinatorial MAB game with fairness considerations and UAV battery limitations. In this scenario, the player (i.e., GBS) chooses a "super-arm" by combining several "arms" (i.e., grids) at each time slot t, balancing the selection among the available arms based on their UEs density and minimizing the energy costs for the UAVs covering the chosen grids. Algorithm 1 gives the proposed "CMAB-FB" algorithm, where it is influenced by learning with fairness algorithm given in [4]. This algorithm and the LEO-Sat resource optimization will be run by the GBS platform as it is the player of the MAB game and the most powerful entity in the proposed LEO-UAV network. The inputs to the algorithm are the sets M and N , and the design parameter O. Also, the values of UEs density d i ¼ K i = P M i¼1 K i for grids is input to the algorithm, where K i can be pre-estimated using GPS localization and then refined by UAVs exploration during the MAB game. For initialization, at t = 0, the selection vector, i.e., w t i , the number of times grid i was selected up to time slot t, i.e., h t i , the average rate of grid i up to time slot t, i.e., g t i , and the queue of grid i, q t i , are all set to zero 8i2M. q t i is used to assure fairness among the grids as will be explained shortly. For t = 1 to T, the upper confidence bound (UCB) values for 8i2M are set to � g t i ¼ĝ tÀ 1 i þ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Then q t i is evaluated for 8i2M as given in the algorithm. At every time slot, the value of q t i is increased by δ i and decreased by 1 if grid i was selected in time slot t−1. Thus, if grid i is not selected many times, its q t i is increased by multiples of its UE density value, and vice versa, which gives it a high priority for being selected in the next time slot. Thus, after evaluating q t i and � g t i for 8i2M, a super arm A(t)�M is selected based on the following equation: where O is a design parameter used to balance between selecting the grid maximizes the achievable average data rate or that maximizes fairness based on q t i values. A t can be easily evaluated by enumerating the |A t | grids having the highest values of ðq t i þ O� g t i Þ. After obtaining A t , the UAVs should be distributed among them, and obtaining d * ;t j;i in the way that minimizes their ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi • Select the super arm A t as follows: • Select d * ;t j;i that minimizes UAVs energy consumptions as: 1. Calculate Γ t j;i matrix, 8i2A t and 8j 2 N using (8) 2. Connect UAV j to grid i as follows: For itr = 1:N a) fi * ; j * g ¼ argmin Γ t • Observe the average rates of the selected super arm A t and update its related parameters as follows: Update δ i 8i2A t End for energy consumptions as given in Algorithm 1. As a final step, the average data rate corresponding to the selected A t are observed, and its related parameters are updated as given in the Algorithm 1 including the actual observed δ i values corresponding to A t . As the "super-arm" selection in the proposed "CMAB-FB" algorithm is done in the same way as that given in [4]. The time average accumulative regret of the proposed algorithm is the same, which is defined as follows [4]: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi For detailed derivation of (10), readers are advised to check [4].

C. Optimization of η t S;j
In the second step after obtaining d * ;t j;i , the term C t j;i becomes a constant in (7), then we can optimize the values of Z t S;j under constraint 5 as follows: Typically, the goal of optimizing LEO-Sat resources Z t S;j is to accommodate the traffic demands of UAVs C t j;i within LEO-Sat maximum capacity constraint η Smax . This allows us to simplify the optimization problem given in (11) as to minimize the absolute difference between LEO-Sat offering rate Z t S;j and C t j;i as follows: This problem is a non-linear programing problem due to the absolute value. However, it can be linearized by simplifying the absolute function, then solved by any famous iterative method like gradient decent method as given in Algorithm 2. In this Algorithm, we omit t for notation simplification. The inputs to the algorithm are C j,i obtained after UAVs distribution done by Algorithm 1, η Smax , the value of the learning rate α, the maximum number of iterations MaxIter, the tolerance value Tolernc, and the number of UAVs N. The output is the optimized values of Z * t S;j . For initialization, the values of η S,j are randomly selected. Then for iter = 1 to MaxIter, the gradient vector, Gradient, of size1×N is set to 0. Then for j = 1 to N the difference between η S,j and C j,i is calculated, if it is greater than or equal to 0, the Gradient {j} is set to +1, otherwise it is set to -1. Then, the values of η Sj are updated as follows: After updating η S,j , the conditions of maximum capacity constraint, the gradient tolerance and the maximum number of iterations are tested successively and if any one of them is satisfied, the value of Z * S;j is returned and the Algorithm will be terminated. After adjusting the values of Z * t S;j , the values of B * t S;j can be easily obtained by solving (6). From Algorithm 1 and 2, we can conclude that the fairness, UAVs energy conserving and LEO-Sat resource optimization provided by the proposed scheme comes at the expense of slight increase in computational complexity compared to other naïve benchmarks.

V. Numerical analysis
In this section, the effectiveness of the proposed "CMAB-FB" algorithm is evaluated through comprehensive Monto-Carlo simulations. A post-disaster area of 1 Km 2 is considered, which is divided into 36 grids, each with a varying number of users. The altitude of the LEO-Sat is set at 550 kilometers, and the UAV are flying at 100 meters in height. It is assumed that the LEO-Sat has full coverage of the post-disaster area. The Tx power of the LEO-Sat is set to 10 watts, while the Tx power of the UAV and UE are set to 1 watt. The total bandwidth available to the LEO-Sat is 100 MHz, and the bandwidth assigned to the UAV is 40 MHz. Additional simulation parameters are listed in Table 1.
As there are no comparable schemes exist in literature that address the same problem, the performance of the proposed "CMAB-FB" algorithm is compared with two naïve benchmarks: random "Rand" selection and nearest "Nearest" selection. In the random selection method, grids are randomly chosen by GBS with the constraint that each UAV can only serve one grid at a time. The LEO-Sat bandwidth B * t S;j is also randomly distributed across [1, total Sat bandwidth/N]. In the nearest selection method, the GBS chooses the closest grid for each UAV to serve, with the same constraint that each UAV can only serve one grid at a time. Moreover, the LEO-Sat bandwidth B t S;j is equally divided among UAVs. Fig 2 shows the average total system rate in Gbps against the number of UAVs. The total system rates of all compared schemes are increased as we increase the number of UAVs due to covering more grids at a time. The proposed "CMAB-FB" scheme has the best performance among the compared schemes due to its objective of maximizing the sum of average UAVs rates at each time step. Additionally, it is noteworthy that the "Nearest" scheme has better performance compared to the "Rand" scheme, because in the nearest selection, the LEO-Sat bandwidth is shared equally among the UAVs, whereas in the random selection, the LEO-Sat bandwidth is randomly assigned to each LEO-UAV link. At N = 2, the proposed "CMAB-FB" scheme outperforms "Nearest" and "Rand" schemes by 1.327 and 1.5 times, respectively. These values become 1.32 and 2 times at N = 14, respectively. Fig 3 illustrates the relationship between the number of UAVs and the average total energy consumption of UAVs. According to this figure, the "Rand" method has the highest energy consumption performance due to its random grid selection. The "Nearest" method, on the other hand, consistently chooses the closest grids to the UAVs and thus has the lowest energy consumption performance. The proposed "CMAB-FB" method, with its focus on UAVs energy consumption minimization, performs similarly to the "Nearest" method. It is worth mentioning that all the methods have comparable energy efficiency performance due to the constraint that only one UAV can cover one grid at a time. This means that a UAV may have to choose its second or another nearest grid if its closest grid is already being covered by another UAV. Despite this constraint, the proposed "CMAB-FB" method still has performance comparable to the "Nearest" method. At N= 14, both the "CMAB-FB" and the "Nearest" schemes show better energy efficiency performance than the "Rand" method by 5% and 7%, respectively. Fig 4 displays the fairness index in grids selection, which is calculated as: where the term indicates the number of times grid i is selected relative to the total number of grids selections. A value of χ close to 0 indicates that grids are selected based on their UE densities, as determined by the value of δ i . From Fig 4. The proposed "CMAB-FB" scheme has the lowest values of χ, which demonstrates its ability to distribute UAVs over grids fairly based on their UEs density. The "Nearest" scheme has the worst fairness performance as it always selects the nearest grids, while "Rand" scheme has better fairness performance than "Nearest" scheme as it selects grids uniformly. This is the reason why χ remains constant, regardless of the number of UAVs tested, in the "Rand" scheme. However, using "Nearest" scheme, χ tends to decrease as we increase the number of UAVs because many UAVs will be better distributed among their nearest available grids. Also, as the number of UAVs grows, the χ values of all schemes tend to become more like one another due to the reduced number of grid groups available for selection at each iteration. At N = 2, the proposed "CMAB-FB" scheme has lower χ values than "Nearest" and "Rand" schemes by 70 and 24 times, respectively. These values become 4.2 and 2.14 at N = 14, respectively.
The utilization ratio of the LEO-SAT bandwidth in the compared schemes is displayed in Fig 5. This figure demonstrates that the proposed "CMAB-FB" scheme optimizes the LEO-Sat bandwidth utilization. With a low number of UAVs, only a small portion of the LEO-Sat bandwidth is used relative to their traffic needs, but with a high number of UAVs, a larger portion of the LEO-Sat bandwidth is utilized. The "Nearest" scheme has a fixed utilization ratio of one, as the LEO-Sat bandwidth is equally divided among UAVs, regardless of their number or traffic needs. The "Rand" scheme has a fixed utilization of 0.5, as the bandwidth is uniformly distributed in the range [1, total Sat bandwidth/N]. With a high increase in the number of UAVs, the proposed "CMAB-FB" scheme approaches a utilization ratio of one, as the UAVs' traffic needs require the full utilization of the LEO-Sat bandwidth.
The computational complexity of the proposed "CMAB-FB" scheme composed of three components. The first one comes from minimizing energy consumption with computational complexity of OðN 2 Þ, the second one comes from the sorting operation with computational complexity of OðNÞ, and the third one comes from the gradient decent algorithm used to optimize LEO-Sat bandwidth resources with computational complexity of OðNÞ. Therefore, the computational complexity of the proposed algorithm mainly depends on the square of the number of UAVs. Compared to the other benchmarks, the computational complexity of the rand selection is of order OðNÞ as it will generates N random numbers in the range of [1, M]. Also, the computational complexity of the nearest selection is of order OðNÞ as each UAV will select its nearest grid.

VI. Conclusion
In conclusion, this study has focused on the distribution of UAVs in a post-disaster area with the help of LEO-Sat. The goal was to achieve a balance between maximizing the total sum rate of UAVs and ensuring fairness in the coverage of post-disaster grids based on their UEs density. This is done subject to the limited energy and bandwidth resources of UAVs and LEO-Sat. To address this challenge, we have deemed it as a combinatorial MAB with fairness and budget constraints, and we have proposed the "CMAB-FB" algorithm to solve it efficiently. The study also has proposed a way to optimize the LEO-Sat bandwidth based on UAVs' traffic needs. The results of the numerical analysis have demonstrated that the proposed "CMAB-FB" scheme has outperformed other naïve benchmark approaches.