Distributed uplink cache for improved energy and spectral efficiency in B5G small cell network

The advent of content-centric networks and Small Cell Networks (SCN) has resulted in the exponential growth of data for both uplink and downlink transmission. Data caching is considered one of the popular solutions to cater to the resultant challenges of network congestion and bottleneck of backhaul links in B5G networks. Caching for uplink transmission in distributed B5G scenarios has several challenges such as duplicate matching of contents, mobile station’s unawareness about the cached contents, and the storage of large content size. This paper proposes a cache framework for uplink transmission in distributed B5G SCNs. Our proposed framework generates comprehensive lists of cache contents from all the Small Base Stations (SBSs) in the network to remove similar contents and assist uplink transmission. In addition, our framework also proposes content matching at a Mobile Station (MS) in contrast to an SBS, which effectively improves the energy and spectrum efficiency. Furthermore, large size contents are segmented and their fractions are stored in the distributed cache to improve the cache hit ratio. Our analysis shows that the proposed framework outperforms the existing schemes by improving the energy and spectrum efficiency of both access and core networks. Compared to the existing state of the art, our proposed framework improves the energy and spectrum efficiency of the access network by 41.28% and 15.58%, respectively. Furthermore, the cache hit ratio and throughput are improved by 9% and 40.00%, respectively.


Introduction
In recent years, content-based services such as video streaming are exponentially growing. In addition, the number of internet users by 2023 is expected to be 5.3 billion [1]. This will tremendously increase the network volume for both uplink and downlink in Beyond 5th Generation (B5G) networks. It will also result in several challenges such as traffic load, congestion, and latency, in addition to, significant consumption of energy and spectrum [2][3][4] All the aforementioned research papers proposed novel uplink cache architectures and schemes to effectively improve the network performance. However, these papers have not presented the cache benefits on Energy Efficiency (EE) and Spectral Efficiency (SE). In addition, the existing literature proposes to use an SBS for performing content matching [28,29], which entails that the MSs will be unaware of cached contents. This will lead to unnecessary upload of content, which is not desirable. Furthermore, the existing literature has also identified that the large content size significantly degrades the performance of the distributed cache. In addition, the existing works in [28,29] have not considered content duplication and its effect on energy and spectral efficiency in a distributed scenario. The authors in [33] considered content matching at an MS mitigating the duplicate upload, however, it did not consider a distributed scenario. As well as, the content segmentation and its effective placement in the context of distributed cache is not considered in the existing literature. This motivates us to propose a novel cache-enabled uplink transmission framework that addresses all these challenges. The main objective of this work is to improve energy and spectral efficiency along with improving cache performance in a distributed scenario. Firstly, our proposed framework generates an unduplicated list of cache contents in the distributed scenario in an attempt to be used as a map for an MS to decide whether to upload the contents or not. Secondly, it proposes the scheme to perform content matching at an MS and lastly, it performs the content segmentation to divide a large size content into smaller for effective storage across the distributed network. The main contributions of this paper can be summarized as follows: 1. We proposed a novel scheme to effectively generate a disparate list of distributed cache for uplink transmission. The proposed scheme leverages cooperative communication to efficiently generate a list of contents based on popularity and content validity. This way the list is updated with the most relevant content, which improves the cache hit ratio.
2. An efficient content matching scheme is also proposed to facilitate an MS to corroborate the cached contents. This will significantly limit the amount of uplink transmission, which will improve the EE and SE.
3. We also proposed a content segmentation with distributed placement scheme, which improves the content placements in the distributed cache network by splitting the larger contents into smaller. The proposed scheme significantly improves the cache hit ratio, which intrinsically improves the EE and SE. 4. We compare our proposed framework with state-of-the-art architectures and algorithms to analyze its effectiveness. We mainly focused on energy consumption and spectrum efficiency, which is the main limiting factor of the existing literature.
This paper is organized as follows: Related work is discussed in Section 3. The system model is discussed in Section 4. Section 5 presented our proposed framework. Experimental, Performance evaluation and conclusion are discussed in Sections 6 and 7, respectively.

Related work
The use of cache for the downlink transmission is widely researched. Its effectiveness is broadly accepted, especially the energy and spectral efficiencies, which is summarized in Table 1, with some work related to distributed cache in Table 2. However, cache-enabled uplink is researched to a lesser extent. The state-of-the-art and our reference points are in [28,29]. In [28], the authors presented a framework to relieve the burden of wireless SCNs by considering cache-enabled uplink transmission in a delay-tolerant network. The paper also proposed duplicate content matching at an SBS by comparing the hash key of chunks of the file Table 1. Summary of existing works about the impact of cache on energy, spectral and cache efficiencies of SCN.

Ref. Year Caching at Contributions Components Modeling Tools Performance Measurements
Energy Efficiency [27] 2017 SBS, MBS Proposed a cooperative coded caching scheme to obtain EE by reducing the Energy Consumption (EC) by content caching and delivery, and transport of cooperative, and backhaul, in addition, proposed a greedy-based caching placement algorithm to optimize the caching placement that improved the content delivery efficiency and enhanced the Quality of experience (QoE) for end-users.

Het.SCN consists of SBS, MBS, Users
Maximum-Distance Separable (MDS), Zipf EC [34] 2017 SCBS Investigated the system throughput by analyzing the EE of cache-enabled cooperative Dense-SCNs (D-SCNs) using an affinity propagation algorithm to divide the SCBSs into different clusters and derivation the closedform of the EE of Coordinated Multi-Point-Joint Transmission (CoMP-JT) cache-enable cooperative D-SCNs to reduce the EC of transmitting, circuit powers, and caching. after uploading real content. The authors also used First-Input-First-Output (FIFO), random and probabilistic content scheduling strategies for cache management. It enabled an SBS to eliminate the redundancy among users' uploaded contents to improve the network transmission efficiency. Similarly, in [29], the authors presented a cache-uplink framework for HetNet with stochastic distributed BSs for temporary caching of user-generated contents. The authors also presented the relationship between cache storage space and outage probability. While the authors in [33], presented a Broadcast cache assist uplink (BCAU) scheme to perform a matching among the attributes of new content and cache contents at an MS to discard the uploading of the available content in the SBS cache before the real transmission of the actual content taken place to improve the EE and SE of uplink transmission over B5G-SCN but the authors is not considering the cooperative among the distributed cache. However, these papers have not considered EE and SE of distributed cache. In addition, content matching at an SBS, as proposed in [28,29,33], entails unnecessary uploads from the MSs. Lastly, the challenges of large size contents are not addressed in these papers.

System model
This section gives a detailed description of the system model. The cooperative distributed network with cache-enabled SBSs and MBS along with MSs is considered as shown in Fig 1.

Network model
We consider a cellular network that consists of a cloud, a TDD MBS, M TDD SBSs, and N MSs as shown in Fig 1. An MBS is a BS of a cellular network, which is donated by G. An MBS is employed to collect the relevant information from all the SBSs in addition to controlling them. The SBSs and MSs are spatially distributed according to two independent homogeneous Poisson Point Process (hPPP) F B and F U with the density of SBSs and MSs are represented as λ B and λ U , respectively. The set of SBSs is indicated by B ¼ fB j : j ¼ 1; 2; . . . ; Mg serving a set of MSs U ¼ fU i : i ¼ 1; 2; . . . ; Ng. All SBSs are connected to the MBS and a cloud through a backhaul link. While each SBS is bidirectionally connected with its neighboring SBSs via X2 interface (a sidehaul wireless link). All SBSs are in an active mode to be associated with at least one MS to serve. Each MS selects its local SBS based on its propagation distance. Therefore, the set of MSs, which serves by an SBS B j is denoted by U j,i : 1 � i � n, 1 < j < M, n < N, where [49] Wireless net Rolling-horizon collaborative cache optimization scheme.
[51] MNC A reinforcement learning (RL)-based online learning algorithm to search the optimal caching policy.
[52] MEC A Group Behavior and Popularity Prediction based Collaborative Caching (GPCC) strategy based on UDR (User Detail Record).
[48] SBS Cooperative caching at the edge in SBS based on a Bayes-based learning algorithm.
[57] HetsNet Spatially cooperative caching strategy for a two-tier HetsNet consisting of edge servers and caching helpers.
[58] SBS A backhaul-based cooperative caching scheme that groups several SBSs. https://doi.org/10.1371/journal.pone.0268294.t002 i represents an MS served by an SBS B j . In this work, we consider the uplink transmission scenario.

Cache model
We Each cache stores a set of popular contents denoted by Ϝ ¼ fF l : l ¼ 1; 2; . . . ; wg, where w is the total number of cached contents. The SBS's cache C B j stores a set of popular contents where l is the total number of the cached contents of B j . The popularity of content (C B j ;f l ) is denoted by % C B j ;f l and is modeled by Zipf distribution according to [38] where r C B j ;f l represents the popularity rank of the content C B j ;f l , and δ is the skewness of the popularity distribution. Each cached content has some attributes such as name, size, hash key, length, etc. The set of attributes of each cached content is denoted by PðC B j ;f l ;k Þ ¼ fPðC B j ;f l ;1 Þ; PðC B j ;f l ;2 Þ; . . . ; PðC B j ;f l ;k Þg, where l is the serial number of content in a cache and κ is its maximum number of attributes. The attributes of each cached content will be used for matching to determine the duplicate content and eliminate it (More details in section 4). The total storage capacity W of all M + 1 cache in the network can be represented as; where SðG ∁ Þ is the storage capacity of MBS's cache and SðC B j Þ is the storage capacity of an SBS's cache. 3.2.2 Distributed caching model. We are considering a cooperative distributed cache, where caches are located at the SBSs and MBS, which appear to be working as a single cache. The contents are stored in a distributed manner across the network. The rationale of this content segmentation is based on [35,43], where authors argued that the larger size of the contents significantly affects the cache effectiveness due to the scarcity of the storage space.
Therefore, we have proposed that small-sized contents may be stored locally at an SBS, while large size contents can be split into Q segments [27,38,56].
The set of segments of a content can represented as where the size of each segment is determined based on the content's size (SðC B j ;f l Þ) and the free space of the local SBS and its neighboring. The segment(s) stored in a separate cache create(s) E 1 Q encoded packets for each segments as E 1 . . . ; e 1 Q g using MDS code [59], where e 1 1 represents encoded packets for (C B j ;f l ) 1 and so on. Furthermore, if the content is too large, it will be cached at the MBS's cache as a whole content. The placement of segment(s) is determined in each cache by using the Hash key HðC B j ;f l Þ of the whole content.

Cache availability and efficiency.
Cache availability and efficiency can be evaluated using the cache hit ratio and cache miss ratio. A cache hit occurs, when the content is available in the cache, otherwise, it is a cache miss. According to [60,61], a cache hit and miss of each SBS B j are given as: where Hit C B j n and Miss C B j n are the counter of the cache hit and cache miss of SBS B j , respectively.

Miss
In the scenario of the cache-enabled uplink, the availability of data in a cache is categorized as follows: 1. Hit C B j n ¼ 1: New content is available in distributed cache, therefore, an MS send Message of Target Destination (MoTD)(More details on MoTD in Section(4.2)) rather than actual content.
2. Hit C B j n ¼ 0: New content is unavailable in the distributed cache, therefore, new content will be uploaded.

Communication model
The uplink capacity of an MS U j,i denoted by < UL U j;i , and can be represented as [62,63] where B represents the channel bandwidth and SNR U j;i is the signal-interference-to-noise ratio (SINR) of the received signal from MS U j,i at its serving SBS B j and can be represented as, where TP ul U j;i is the uplink transmit power of MS U j,i . H ul j;i is the corresponding uplink channel gain. k.k stands for Euclidean norm and d(U j,i , B j ) is the separation distance between U j,i and B j . α is the path loss exponent. I i is a set of interfering MSs. σ 2 is the noise power spectral density at an MS.
Using (5) and (6), the (5) can be rewritten as; Therefore, the uplink network capacity is the sum of cache hit and miss ratios of both the access link (MS) and backhaul (SBS) and is given by: |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl fflffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl fflffl } where TP ul B j is the uplink transmission power of the SBS. H ul B j is the corresponding UL channel gain by an MBS/C-RAN. k.k stands for Euclidean norm and dðB j ; GÞ is the separation distance between B j and G.

Energy consumption model
In this subsection, we describe the Energy Consumption (EC) of an MS, an SBS, and a cache as follows.

Mobile station energy consumption (EC U j;i ).
The MS EC U j;i is given according to [33,64]: where EC m and EC op are energy consumption of performing matching and other operations, respectively. EC r is energy-consuming for receiving the packet(s) of an SBS's reply. ET ul U j;i is the energy consumption for transmitting data by the MS. In case of cache hit, ET ul U j;i is only calculated for the communication cost of sending the MoTD message to an SBS, else ET ul U j;i is calculated for the communication cost of sending the whole content.
According to (9), the average EC of the MSs ðEC avg U Þ is given as, 3.4.2 Caching energy consumption (C EC B j ). The caching energy consumption C EC B j is the energy spent by the cache to perform the different operations such as cache hit/miss, cache management, cooperation among distributed cache and is given according to [65] as: where EC Hit and EC Miss are the energy consumption of cache hit and miss, respectively. EC bus , EC cell , EC pad , and EC chip are energy consumed of address and data bus, sum cell arrays, address and data pads of processor and off-chip cache, respectively. More detail about the calculation of the EC bus , EC cell , EC pad , and EC chip in [65].
In the case of cooperative distributed caching, an SBS's cache not only consumes energy for data exchange but also for content partitioning. In addition, according to the proposed framework in Section(4), each SBS generates and broadcasts the list of its cache contents to facilitate content matching. Energy consumption of all these operations is considered to compute EC CDC .
Therefore, (11) can be rewritten as, where the Miss C B j r and Hit C B j r of an SBS B j are given according to (3) and (4).

SBS energy consumption (EC B j ).
According to [66], the EC of the SBS is given as where P comp is the total energy consumption by the following: P RFC , P BCK , and P op c are the energy consumption by radio frequency, backhaul link, and other operations related to the communication and contents, respectively. The serial number of the MS U j,i is represented as ðEC op j þ ET ul B j þ EC r j Þ U j;i that denoted to the energy consumption of an SBS B j for executing operations, transmitting and receiving of contents, which are uploaded by the MS U j,i , respectively. ET ul B j is the energy consumption for transmitting data by SBS B j . EC e B j is the EC of an SBS B j for the execution of the cache replacement policy if the cache is full.
In case of a cache hit, E r i is calculated only for the cost of receiving the MoTD at SBS B j , otherwise E r i is calculated for the cost of receiving the whole content.
Using Eqs (11) and (13), we can write; Therefore, the average EC of the SBSs (EC avg B ) is given as Finally, according to (10) and (15), the EC of the overall cellular network is given by:

Spectrum Efficiency (SE) model
By definition, the uplink SE is determined by the relation between the uplink capacity < UL and the total available bandwidth B as given by [67], According to the Shannon channel capacity, the maximum bit rate is given by Then, according to (17) and (18), the SE is given as

Proposed framework for distributed uplink cache
In the distributed cache, there are three challenges. Firstly, content duplication among distributed cache. Secondly, an MS's inability to know about the cache contents. Thirdly, the segmentation and caching distributively the new content with a large size. In order to address these challenges and improve the user's quality of experience, a framework for distributed uplink caching have proposed. There are three main components of the proposed framework namely, generating a list of cache contents, performing matching at an MS, and content segmentation for distributed cache placement as shown in Fig 2.

Generating disparate list of distributed cache for uplink transmission
It is of paramount importance that an MS has a list of contents that are cached to avoid unnecessary uploads. In this subsection, the algorithm of generating the content list is explained in detail. An MBS is designated as a marker final content list (MFCL). The goal of MFCL is to create two content lists namely, Un-Duplicated content List (UDL) and Duplicated Content List (DCL). As its name implies, UDL consists of un-duplicated cache contents that can be used by an MS for content matching. Whereas, the DCL contains duplicated cache contents, which will be used for evicting identical contents. The rest of this section explains the process of creating these contents lists, which is also shown in  The information about the SBSs that are connected to an MBS can be expressed as;  where M rows present the total number of the SBSs. The columns present the characteristics of SBS such that B ID j and C ID B j represent ID of an SBS B j and its cache, respectively. Whereas, B L j is the SBS location in the coverage area of an MBS along with C f S B j , which represents the free space of each SBS and is computed as: The list of contents of the MBS's cache can be expressed as; where w rows present the total number of cached contents while the κ is the maximum number of attributes of the content.

SBSs caches contents.
The list of SBS's cached contents that is denoted by C w�k B j , and can be expressed as; where w is the total number of each SBS's cached contents. Then, the total number of contents in all the SBS' caches can be represented as  (23), to an MFCL, which is an MBS acting as a marker. After receiving the caches contents lists from all the SBSs, an MFCL applies row-wise combination function to generate a consolidated list C T f �k D f as following: where T f is the total number of all cached contents across a distributed cache. 4.1.2 Filtering similar contents. The consolidated list generated by an MFCL may have similar contents received from the various SBSs' cache, in addition to different types of data such as nominal, ordinal, interval, and ratio. In order to remove duplication, the MFCL performs matching among attributes of contents of C T f �k D f as shown in (25).
In order to compute dissimilarity in (25), we considered typical and target contents denoted by y and y • , respectively. The dissimilarity can be calculated as [68] dissimðy; y � Þ ¼ where dissim(y, y • ) represents the difference between the typical content y and target content y • , (P f ) represents the attributes of content, κ is the maximum number of attributes of content. 1. If the attribute P f of contents is numerical, interval or ratio type, then where h can take all the non-missing value of the attribute P f of content.
2. If the attribute P f of contents is nominal or binary, then 3. If the attribute P f of contents is ordinal, then calculate ranking r y;P f , then z y;P f ¼ ðr y;P f À 1Þ where r y;P f represents the ranking of state in the attribute P f of contents, m P f is the number of ordered states of P f , and treat z y;P f as a numerical type.
The similarity between y and y • is computed by: where sim(y, y • ) shows similarity of contents (y, y • ).
The closest similar two contents is a column with a maximum value (value 1) among the values across each content. Alternatively, similarity can also be determined by setting a In this way, duplicate contents can be identified residing in different caches at SBSs, which can then be subsequently evicted to make space for new content.

Checking validity of cache contents.
After identifying similar contents, it is also important to identify the validity of the existing cache contents by the MFCL. The same content in the different cache should be evicted and kept only at one cache. To do this, we proposed Dynamic Validity Period (DVP) with the aim to compute the probability of either keeping or evicting the content from any of the distributed cache. It can be expressed as; where RE B j is the remaining energy of an SBS B j . C f S B j is the cache free space of an SBS as mentioned in (20) according to (21). U j,i is the total number of the MSs served by the SBS B j . % C B j ;f l is the popularity of each cached content according to (1). TimEC C B j ;f l r is the remaining time of content in the cache and is given by: where Expire The DVP is executed after performing the content similarity and determining the duplicate contents of the distributed cache. Let where 1 � m � M. K represented the hash key of the contents of DVP (DVP C B j ;f l ). It mean is represents all the attributes of the contents. That has highest validity. Therefore, it should be placed in UCL. All the remaining contents will be placed in DCL. This can formally be seen in (38) and (39).

Final list generation.
Now that the similarity and validity of the contents are determined, the final lists of duplicated and unduplicated contents can be generated. The UCL is the list of un-duplicate contents of the distributed cache, which can be used by an MS as a map to determine whether to upload contents or send a MoTD instead of uploaded the actual content. The UCL can be formed using (25) and (35). The value of K in (35)  where l is the serial number of the row with max DVP such that l 2 [1, T f ].

C T f �k D f
ðf ; �Þ represents the row of the matrix in (25), which has maximum validity as shown in (35). Eq (36) shows max value of DVP of a single content. The same process is performed for all the contents of distributed cache. It will provide max values of DVP of all the duplicate contents. It can be seen as a cluster of duplicated contents with different DVP values. The contents with maximum DVP value will be placed in UCL, while the remaining values will be placed in DCL. Based on this discussion, UCL can be represented as where l 2 [1, T f ] represent the rows with max DVP values. When all the rows with max DVP are added the final UCL can be expressed as; .
where T � T f presents the total number of unduplicated contents among all distributed cache. The DCL is the list of the duplicated contents of the distributed cache, which determines the contents to be evicted from SBS(s) caches. As previously mentioned that the max DVP values of contents are placed in UCL, the remaining duplicated contents are added to DCL and can be expressed as; where T • is the total number of the duplicated contents across a distributed cache to be eliminated from their caches.

Duplication elimination at sbs and broadcast to the MSs.
After removing duplicate contents from MBS's cache, UCL and DCL are sent to all the SBSs. The elimination of the duplicate contents as suggested by MFCL is the result of DVP in the DCL as shown in (39). The contents with low DVP are evicted from their caches and can be represented as  (20). Additionally, each SBS B j broadcasts the UCL to all its serving MSs U j,i , which can be used for matching, subsequently. The contents of the distributed cache are updated continuously, and their popularity is getting changed for reasons such as times of uploads, downloads, shares, views, etc. Therefore, UCL is updated and broadcast to all the MSs at a periodic time to ensure content consistency among the distributed cache and the MSs. 4.1.6 Proposed (DC) 2 scheme. Based on the discussion, an algorithm is proposed, which is called Disparate Content of Distributed Cache (DC) 2 as shown in Algorithm 1. Each previous section from (4.1.1) to (4.1.5) corresponds to a step of the proposed algorithm. The steps of the proposed algorithm can be summarized as follows: The proposed (DC) 2 algorithm consists of 5 major steps. In step 1, all the caches at SBSs and MBS are processed followed by a similarity check in step 2. The proposed algorithm further checks the validity of the contents before generating the final lists in step 3 and 4, respectively. Finally, duplicate contents are removed from all the target SBSs and the un-duplicated list is sent to all the MSs to perform content matching in step 5. for Add sim(y, y • ) ) SIM T f �k (31); 33 Step 3

Cache assist uplink with an MS enabled matching
This section describes the mechanism of performing content matching at an MS after it receives UCL. The process of matching is performed in three steps as described below. In first step, an MS intends to upload new content to its serving SBS B j . It will make a consolidated list comprising of attributes of new contents and UCL. In secon step, an MS will perform matching to check if its available in the cache. Lastly, the contents will only be uploaded in case cache miss i.e. unavailability of the contents in the cache. If the content is enlisted in UCL, an MS will not upload the content. Instead, MoTD will be sent to its serving SBS. It is worth mentioning that if matching is performed at an MS, it will significantly improve the spectrum and energy efficiency. The aforementioned steps are described in details as following.

Consolidated list of attributes of new content and UCL.
This subsection provide details of constructing a consolidated list of attributes of new contents and the existing UCL. The main steps are as following. (40).

Firstly, an MS creates a list comprising new content and its attributes as shown in
where PðR f 1 j;i ; kÞ represents the new content attributes.
2. The unduplicated contents of the distributed cache is shown in (38), which is combined with (40) by using a row-wise combination function to generate a consolidated list. The consolidate list is denoted by O (T+1)×K and is given by; .
The last row in (41) is the values of attributes of new content.

Matching the attributes of new content and UCL.
After constructing the consolidated list, an MS will start matching to check whether the new content is available in the cache. Using (41), the matching among the attributes of new content and the contents in a distributed cache is performed by using the similar process as explained in Section (4.1.2), especially Eqs (26)- (30).
According to [69], the matching is performed between the control item and treat item. In this work, the control item is the UCL, while the new content is a target content. Then, the matching will result in a dissimilarity matrix as shown in (42).
The last row of (42) shows the dissimilarity between the new and cached content, which can be used to either upload or discard the new contents.

Uploading dissimilar content and ignore similar content.
After matching is performed, the content can either be uploaded or an MoTD is sent. If a match is found, the content is not uploaded and an MoTD is sent instead of the actual content. The content will be uploaded, if the match is not found. In other words, contents will only be uploaded in case of a cache miss.

Proposed UCMM scheme.
Based on the discussion, an algorithm is proposed, which is called is called an Uplink Caching with Mobile Matching (UCMM) scheme as shown in Algorithm 2 and Fig 2-(Part-2). The proposed UCMM performs matching among the attributes of new contents and UCL at an MSs level to determine if new content is already in a cache, which will eliminate the need of uploading duplicate contents. Each previous section from (4.2.1) to (4.2.3) corresponds to a step of the proposed algorithm. The steps of the proposed algorithm can be summarized as follows: In step 1, consolidate the lists of the attributes of the new content and UCl contents into a list followed by a similarity check in step 2. Finally, decides whether to upload dissimilar content or ignore similar content by sending a MoTD to the serving SBS in step 3.

Content segmentation and distributed placement
In the case of distributed caching, the size of contents is a major limiting factor towards the cache effectiveness [27,38,56]. The new content size may be too large to accommodate in a single cache, efficiently. Therefore, it is recommended that the new content should be split into smaller segments with same size to be stored distributively. In this vein, we proposed a scheme that advocates to divide the new content into Q different segments as a function of new content size SðR f 1 j;i Þ and available cache space of the corresponding and its neighboring SBSs as shown in Fig 2-(Part-3) and (Fig 4). The rationale of this content segmentation is the effective placement due to the smaller size in distributed cache across multiple SBSs. According to [56], the Q is equal Q � ð1 þ countðB j � ÞÞ, where 1 is the local SBS B j and count(B j � ) is the number of neighboring SBSs, which have a non-void intersection with B j � and enough free space to cache at least a segment as B j � ≜fB j � 2 B M�4 Loc : C f S B j � > SðR f 1 j;i Þg. In order to facilitate its storage distributively, the segment(s) will be cached in a target distributed cache based on its free space according to (20) and (21). An MDS code method is used to encode the new content into packets as mentioned in Section (3.2.2).
The result of the above discussions is programmatically extracted in a new proposed scheme, which is called Content Segmentation with Distributed Placement (CSDP) as shown in Algorithm-3, and in Fig 2-(part-3). The CSDP consists of 2 major steps, where step 1 performs preparation of the required information based on the size and hash key of the new content. In addition, free space of the local and its neighboring SBSs is also considered.
Step 2, corresponds to new content splitting into Q segments, encoding, and cache placement of segmented contents in a distributed manner.
Loc : C f S B j � > SðR f 1 j;i Þg// Set the target SBSs; 8 Step 2: Split new content and cached it distributively 9 Q ¼ 1 þ countðB j � ÞÞ // total number of segments; 10 for (q = 1, q¡=Q, q++) do 11 Check the C f S B j according to (20 and 21); 12 Store Seg q or set of Seg q s in B j ; 14 Encode e l q j using MDS code, e l q j ! e

Complexity analysis of the proposed framework
As mentioned, and discussed in the previous sections, M is the total number of distributed cache. Each cache store w contents with a total of T f contents in the whole distributed network. Each content has κ attributes. Additionally, the unduplicated and duplicated contents among distributed cache are T, and T • , respectively. As well, the proposed framework consists of three parts as follows: (CD) 2

Experimental design and evaluation
We have compared our proposed framework in different scenarios such as uplink with nocaching (No-cache), cache assisted uplink (Each-cache) [33] and uplink with collaborative distribution caching (SBS-CoDc) [38]. As its name implies in a no-caching scenario, we have considered the absence of caching at the SBS, while in the cache-assisted uplink, we consider uplink with the support of cache. The simulation parameters are shown in Table 3.

Performance evaluation metrics
In our system model, EC avg U , EC avg B , and EC N denote the average energy consumption of the MSs, and SBSs, and the overall network, respectively. In addition, < UL U j;i , < UL B j , and < UL Net denote the uplink data rate of an MS, an SBS, and the overall network, respectively. For a comprehensive evaluation of the performance of the proposed framework, it is compared with the existing schemes in [33,38]. The following metrics are used, which are computed as (43)-(55). (3) and (4), the cache hit ratio of the overall network is given as

Measurements cache availability and efficiency. According to
where Hit n B j is a cache hit ratio of an SBS B j and Miss n B j is the cache miss ratio of the same SBS.
While the cache miss ratio of the overall network is given as 5.1.2 Throughput. Throughput (TH) is the ratio of the average number of successfully transmitted data in GB per second [70] and is computed as follows: The throughput of MS (TH i ) is given as, where T Num Tp is the total number of successfully transmitted data and TT is the transmit time. The average throughput of MSs (TH avg U ) is given as, The average throughput SBSs (TH avg B ) is given as,

Energy efficiency measurements.
The Energy Efficiency (EE) of the uplink is the ratio of uplink data rate to the energy consumption, where the unit is GB/J/s. Area Energy Efficiency (AEE) is measured based on both the energy consumption and the size of the coverage area, where the unit is GB/Joule per area (Km 2 ) [66,67,71].
In this regard, the average EE of the MSs (EE U ) can be calculated as, The average EE of SBSs (EE B ) can be calculated as, Furthermore, the network AEE is computed in two ways, Firstly, based on the uplink data rate and energy consumption, which is denoted by ðAEE < N Þ and can be calculated as, Secondly, based on average EE of MSs and SBSs in addition to the coverage size, which is denoted by ðAEE Size N Þ and can be calculated as,

Spectral efficiency measurements.
Spectral Efficiency (SE) is defined as the uplink data rate per bandwidth measured in GB/s/Hz, which is an important metric to represent the performance of radio resource utilization of the network as given by [67,72].
According to (19), the average SE for MSs (SE avg U ) can be calculated as following Similarly, the average SE of SBSs (SE avg B ) can be calculated as following In addition to, Area Spectral Efficiency (ASE) (ASE SE ) (GB/Hz per area Km 2 ) can be calculated as following

Overall Cache Efficiency (OCE) measurements.
The OCE is the ratio of the cumulative cache hits to cumulative demands. That reflects the overall number of cache hits until a specific time (time slot t). An overall cache Efficiency (OCE) is denoted by CO, and is given according to [73] as, where, P M j¼1 Hit C B j r is the cumulative distributed cache hit ratio, while S D is cumulative demands.

Numerical results
The simulation is performed considering the proposed framework and the existing models namely, No-Cache, Each-cache [33] and SBS-CoDc [38]. In addition, their sub-models in different scenarios with the same simulation parameters are listed in Table 3. The final result is validated by the different performance metrics, which are presented in (5.1) and shown below in different Figures. Fig 5, shows the average cache hit and miss ratio of the Each-cache, and SBS-CoDc along with our Proposed Framework according to (43) and (44).

Cache hit and miss ratio
We can see a slight increase in the average cache hit ratio of SBS-CoDc as compared to Each-cache because of the lack of cooperation among SBSs. The cache hit ratio of Each-Cache is computed separately and summed subsequently. Our proposed framework outperforms SBS-CoDc by improving the cache hit ratio by 9%. The main reason is the effective use of the CSDP scheme, which improves the cache hit ratio. We can also see that our proposed framework has significantly reduced the cache miss ratio.
The rationale is the distribution of contents among different SBSs along with the UCL, which acts as a map for MSs to efficiently locate the cached contents. Due to these improvements, the traffic load at the access network, as well as backhaul link, is significantly reduced, which subsequently improves EE and SE.

Average Energy Consumption (EC)
The average energy consumption of MSs and SBSs is shown in Figs 6 and 7, respectively. Our proposed framework is compared with No-Cache, Each-cache, and SBS-CoDc. The percentage improvements of average energy consumption of MSs and SBSs are shown in Table 4.
We can see that our proposed framework performs significantly better as compared to No-Cache and Each-Cache scenarios. However, about 17% and 13% improvements are noticed as compared to SBS-CoDc, which also has inherent advantages of distributed scenarios. However, our proposed scheme uses UCL for matching at an MS. As the cache hit ratio of our proposed scheme is improved, it also positively affects the average energy consumption by limiting the amount of unnecessary data upload.

Improvement of uplink throughput
The average Uplink Throughput (TH) of our proposed framework is compared with the existing schemes as shown in Figs 8 and 9 of the MSs and SBSs, respectively. The percentage improvements are shown in Table 5. We can see that the THs of our proposed framework for both MS and SBS are significantly better than the No-Cache scheme for obvious reasons. Similarly, compared to Each-Cache, our proposed framework performs better due to its distributed nature. The result shows that our framework is better than SBS-CoDc because of the effective use of MoTD rather than uploading new content. Furthermore, as compared to the existing schemes, matching is done at an MS rather than at an SBS, which means no content upload that subsequently improves TH.

Improvement in energy efficiency
Energy efficiency (EE) is an effective way to show performance improvement. As one can predict that the lower energy consumption as shown previously, will also entail the EE. Our proposed framework is evaluated based on EE and compared with the existing schemes. The EE of MS and SBS increased with the increasing number of MSs is shown in Figs 10 and 11. All the percentage improvements of the proposed framework as a result of comparison with existing schemes are shown in Table 6.  We can see in Fig 10, that the average EE of our proposed framework is better as compared to existing schemes. Compared to SBS-CoDc, our proposed framework improves the average EE by 41%. The rationale is the use of UCL for matching at the MS, which effectively reduces the contents upload in case of a cache hit ratio. Similarly, Fig 11, shows the improved average EE of SBSs of our proposed framework. We can see the average EE of our proposed framework as compared to other schemes with an increasing number of MSs. Our proposed framework outperforms existing schemes by significantly improving average EE by 46% as compared to SBS-CODc. The main reason is the improved hit ratio of our proposed framework, which implicitly improves the average EE because of the use of MoTD that represents the new contents.
We have also used the metric of Area Energy Efficiency (AEE) to show the performance improvement of our proposed scheme. We computed AEE using two ways, firstly as a function of uplink data and secondly as a function of EE. Figs 12 and 13, show the AEE of No-cache, Each-cache, SBS-CoDc, and proposed framework of different numbers of MSs based on two mentioned ways, respectively.  Fig 12, shows the AEE performance as a function of the uplink data rate, which is computed by dividing the total uplink data rate by the total energy consumption per unit area based on (50) . Fig 12, shows that our proposed framework has increased the AEE by 2.34 GB/J/km 2 as compared to SBS-CoDC. A summary of the result of the comparison with other schemes is shown in Table 7.
Furthermore, Fig 13, shows AEE performance as a function of the cell size, which is computed by dividing the total energy efficiency by the total size of the network based on (51) in order to assess the EE of the overall network to its size. In Fig 13, we can see our proposed framework has increased the average AEE by 43% as compared to SBS-CoDc. These improvements are credited to the improved hit ratio, higher TH, and better EE.

Spectral efficiency
The average spectral efficiency (SE) of MSs as compared to No-cache, Each-cache, SBS-CoDc, and our proposed framework for different numbers of MSs is shown in Fig 14. We can see that our proposed framework has improved the SE almost by 16% as compared to SBS-CoDc. Whereas, compared to Each-Cache, improvement is almost 37%. Similarly, Fig 15, shows the  average SE of SBSs of existing schemes as compared to our proposed framework. The summary of improvement is shown in Table 8. The rationale of improved SE is a significant reduction in the number of uplink contents. Since matching is done at an MS that facilitates the decision of either uploading the contents or not. In case of a cache hit, the contents are not uploaded and hence the bandwidth is saved for the use of other requests of the remaining MSs. In this way, a significant amount of spectrum can be saved and more requests can be entertained, which ultimately improves the SE. Fig 16, shows the ASE of No-cache, Each-cache, SBS-CoDc, and proposed framework for different numbers of MSs. We can see in Fig 16, that our proposed framework has improved the ASE by 24% as compared to SBS-CoDc.

Improvement of overall distributed cache efficiency
The overall cache efficiency (OCE) of the distributed cache as compared to Each-cache, SBS-CoDc, and our proposed framework for different numbers of MSs is shown in Fig 17.  Fig 17, shows the OCE of the distributed cache of existing schemes as compared to our proposed framework. We can see that our proposed framework has improved the OCE almost by 28%, and 7.41% as compared to Each-cache, and SBS-CoDc, respectively. This is because the contents of the distributed cache are available to all the MSs, irrespective of their serving SBSs. In addition, the distributed cache miss ratio is less, which improves the cache efficiency and reduces access time to cache.

Conclusion
This paper proposed an efficient uplink cache framework based on a distributed scenario. The proposed framework leveraged the content matching at an MS in contrast to the existing schemes, which perform it at an SBS. In addition, the content matching at an MS has significantly improved the energy and spectral efficiency. The rationale is the reduced uplink contents due to local content matching at an MS. Furthermore, the proposed framework is based on the effective distribution of cache contents over cooperative SBSs, which improves the cache hit ratio. This entailed the subsequent improvement in throughput, energy consumption, and spectrum for MSs as well as SBSs. Our analysis shows that our proposed framework improves the EE and SE of the access network by 41.28% and 15.58%, respectively. Furthermore, an increase of 46.18% and 28.00% is respectively calculated as EE and SE for the