Zero-shot performance analysis of large language models in sumrate maximization

Ali Abir Shuvro; Md. Shahriar Islam Bhuiyan; Faisal Hussain; Md. Sakhawat Hossen

doi:10.1371/journal.pone.0329674

Abstract

Large language models have revolutionized the field of natural language processing and are now becoming a one-stop solution to various tasks. In the field of Networking, LLMs can also play a major role when it comes to resource optimization and sharing. While Sumrate maximization has been a crucial factor for resource optimization in the networking domain, the optimal or sub-optimal algorithms it requires can be cumbersome to comprehend and implement. An effective solution is leveraging the generative power of LLMs for such tasks where there is no necessity for prior algorithmic and programming knowledge. A zero-shot analysis of these models is necessary to define the feasibility of using them in such tasks. Using different combinations of total cellular users and total D2D pairs, our empirical results suggest that the maximum average efficiency of these models for sumrate maximization in comparison to state-of-the-art approaches is around 58%, which is obtained using GPT. The experiment also concludes that some variants of the large language models currently in use are not suitable for numerical and structural data without fine-tuning their parameters.

Citation: Shuvro AA, Bhuiyan MSI, Hussain F, Hossen MS (2025) Zero-shot performance analysis of large language models in sumrate maximization. PLoS One 20(8): e0329674. https://doi.org/10.1371/journal.pone.0329674

Editor: Divya Chaudhary, Northeastern University, UNITED STATES OF AMERICA

Received: December 25, 2023; Accepted: July 18, 2025; Published: August 4, 2025

Copyright: © 2025 Shuvro et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: https://github.com/ShojebBhuiyan/Dataset-PLOS_Zero_shot_Performance_Analysis_of_LLM-Version-4. All the data are publicly available in this repository.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

In the fields of artificial intelligence and Natural Language Processing (NLP), large language models (LLM) are crucial for their ability to comprehend tasks and instantly generate desirable outputs. These models, like GPT-3 and its offspring, have shown to be remarkably adept at comprehending and producing text that resembles human speech [1]. They are essential to a variety of applications, including enhancing sentiment analysis and machine translation, as well as enabling chatbots and virtual assistants [2, 3]. Automation of processes that traditionally needed a lot of human interaction is made possible by LLMs, which boosts productivity and efficiency in a variety of sectors. By facilitating more natural and pertinent communication between machines and humans, they also support improvements in human-computer interaction.

LLMs like GPT-3, GPT-4, LLaMA, Falcon, and PaLM2 have immensely rejuvenized the domain of NLP and significantly contributed to providing a newer scope of research [1, 4, 5]. These language models have their own strengths and weaknesses, making some of them make them more useful than others for specific tasks [6, 7]. Zero-shot analysis is essential to assess the strengths and weaknesses of LLMs because it focuses on the generalization skills of a model by evaluating how well it performs on tasks that it was not explicitly trained for. Researchers learn more about the innate comprehension of language, logic, and context of LLM by putting them through different activities. Zero-shot analysis reveals remarkable adaptability as well as any biases or flaws these models may have, helping us to understand better how to use them in different applications and the need for responsible development and deployment in an artificial intelligence environment [8, 9].

The zero-shot capabilities of an LLM can be tested on various tasks. Among other tasks where LLM have proved its capabilities, sumrate maximization is yet to be analyzed. Sumrate maximization is a special task where the settings of a network is optimized to obtain maximum throughput. In this manuscript, the task of Sumrate maximization is taken into consideration. Sumrate maximization is a crucial task in the networking domain, and many algorithms have been introduced for its effective calculation and optimization [10, 11]. By effectively distributing resources like bandwidth, electricity, or time slots, sumrate maximization improves the overall data transmission rate of a communication system. It is pivotal for enhancing network performance and overall throughput, particularly when several users use a single channel. It is unclear which algorithm would be suitable for a specific scenario, and with the advent of smaller networks courtesy of IoT devices, the selection gets more difficult. LLMs have solved many such problems due to their simplicity of usage. It does not require prior algorithmic knowledge; only knowing the problem itself would suffice [12].

The key contributions of this research are:

Performing a sumrate maximization task leveraging LLMs using multiple iterations to obtain optimized prompt.
Comparative analysis of the sumrate generated from LLMs with other simulated algorithms.

The rest of the paper is organized as follows. Sect 1: Related work comprises of the related literature and recent advances in the domain. Sect 2: Methodology gives an overview of the workflow of the experiment. Sect 3: Results and discussion provides the details of the experiment and its results. Sect 4: Result analysis analyzes the results from the experiment. Sect 5: Limitations provides insight on the lack of using LLMs for such tasks. Sect 6: Conclusion and future work gives concluding remarks and discusses the possibilities and scope of future research.

1 Related work

NLP has taken a large step with the discovery of large language models. Transformers were introduced using a self-attention mechanism for sequence-to-sequence tasks. The encoder-decoder architecture of the Transformers has revolutionized the NLP domain [13]. NLP models like BERT are powerful, with huge upside potential for improved performances in different NLP tasks [14, 19]. It uses a specific implementation of pretrained Transformers to provide contextual understanding [15]. Integrating these tasks into large language models has created a lot of prospects for future research activities [16] starting from complex reasoning abilities through chain of thought to even text-to-image diffusion. Large language models have an inherent capability to find patterns from their training through a vast corpus of conversational datasets. With very little or no data on the current problem, LLMs can figure out solutions from their reasoning capabilities [17]. Newer models of LLMs have developed immensely in different fields and LLMs have greatly advanced in reasoning through chain-of-thought prompting and scratchpad reasoning [18]. While there are many avenues where large language models have proved helpful, there are limitations and challenges as well. For example, multiple instances have shown that LLMs, including GPT models, have proven to provide inaccurate and sometimes even impossible outputs due to their non-deterministic nature, which illustrates their incompetence in reasoning [20–23].

Prompt engineering is key to optimizing performance for tasks like sentiment analysis or language translation. It entails creating purposeful input queries to refine huge language models like GPT or BERT. Recently, large language models have been rapidly used as a tool for zero-shot outputs. While zero-shot chain-of-thoughts prompting can outperform vanilla zero-shots as it has some idea of the facts, zero-shot results are also quite extraordinary due to the reasoning capabilities of LLMs [24]. Modification of instructions to support a particular need has a massive boost in the performances of the large language models [25]. Even though some authors have argued that LLMs are not zero-shot communicators as they fail to understand context, it is seen that LLMs have significantly improved in understanding contexts and providing outputs that are near-perfect with respect to human interpretation [26]. This makes the zero-shot approach suitable for problem-solving many tasks, including resource allocation.

GPT is one of the leading LLMs in recent times. Its realistic outputs have been termed as a new era for NLP. Zero-shot analysis of the performance of GPT has been done in various scenarios [27]. Information extraction using GPT has been done for unannotated texts [28]. Medical text de-identification has been done using GPT, which shows promising results [29]. Similarly, a comparison of the zero-shot results of GPT with other Transformer-based models trained on biomedical tasks has been evaluated to show its prowess on unforeseen data [30]. Lyrics transcription using GPT has also been tested with its zero-shot capabilities [31]. Again, it has been shown that zero-shot performances of GPT, along with other LLMs, have room for development while analyzing financial tasks [32]. These motivate testing the zero-shot capabilities of LLMs in resource allocation problems.

As LLMs have shown promising results in their zero-shot performances, it can be understood that LLMs would be appropriate for resource allocation problems. In specific, D2D communication requires constant allocation strategies. There are many resource allocation techniques in place for D2D communication [33]. Game theoretic and non-game theoretic resource allocation strategies have been discussed, especially in D2D communications [34]. Machine learning approaches have also been used for resource allocation. Reinforcement Learning approaches have created a new pathway for research in the field of resource allocation in heterogeneous cellular networks [35]. Unmanned aerial vehicles have been used to provide an idea of a D2D energy allocation [36]. Also, LLM-based RAG for network optimization shows the diversity of LLM across various platforms and domains [37]. D2D-Enabled 6G resource allocation has been done using Federated Reinforcement Learning [38]. Modification of the Hungarian Algorithm produced near-optimal results for sumrate maximization [39].

2 Methodology

The experiment is conducted on the foundation of [39], where the optimal resource allocation approach for sumrate maximization is compared with the zero-shot performance of LLMs. It involves zero-shot prompt engineering, LLM output generation, simulation of an optimal resource allocation algorithm and comparison of the LLM output and algorithm results. The following subsections, in order, are the steps taken by this experiment to achieve the required results.

2.1 Prompt creation

First, a prompt was crafted which describes the properties of a base station, cellular user equipment (CUE) and device-to-device (D2D) pairs in a 2D grid. The properties which are necessary for sumrate and interference calculations are given in Table 1.

Download:

Table 1. Properties of base stations, CUE and D2D pairs.

https://doi.org/10.1371/journal.pone.0329674.t001

The goal was to input CUE and D2D pair property values in a JSON object and return an assignment matrix that allocates cellular user equipment and D2D pairs in sharing resource blocks (RB) that resulted in the maximum sumrate for the network. We adopted a zero-shot approach where the LLM was free to choose any algorithm.

In order to produce the final prompt, several iterations were made for improvements so that the LLM could comprehend the task. In the first iteration, the problem statement and the input and output JSON format were provided with the help of TypeScript interfaces. In the second iteration, the TypeScript interfaces were dropped and the parameters were mentioned explicitly with default base station values. Further conditions were provided to help with allocation such as only one cellular UE can be paired with one d2d pair. The second iteration contained a special instruction that the LLM may choose not to assign some CUE and D2D pairs in an RB if it resulted in a greater network sumrate. The third iteration contained more information about the network. We added information about the communication Carrier Frequency (1.7 Ghz) and a special instruction that the number of CUEs is far greater than the number of D2D pairs. In the final iteration, a final condition was added that all the cell objects and the receiver devices in the d2d pair must obtain their target SNR. The LLM was explicitly instructed to respond only in a JSON object containing the assignment matrix in all the iterations. The final prompt that was used for the experiment was (Fig 1):

Download:

Fig 1. The prompt used for the experiment.

https://doi.org/10.1371/journal.pone.0329674.g001

2.2 Response generation

The open-source Python library of OpenAI is used in this experiment to communicate with GPT-3.5 and generate suitable outputs. The previously crafted prompts were used as the LLM’s System Role message to specify its behaviour and input-output structure. As the User Role message, a JSON object was input containing the base station, CUE and D2D pair info. The desired assignment matrix was returned as the Assistant Role message.

2.3 Algorithm simulation

The performance of LLM is compared with the existing optimal algorithm [39]. In the case of the optimal algorithm, initially, the problem is transformed into a bipartite matching problem with two sets of nodes - one being CUE and the other being D2D pairs. The weight of the edge between a D2D pair and a CUE is their data rate contribution to the total system sumrate while sharing the resources. After the transformation of the problem, the bipartite graph is solved by assigning D2D pairs to the appropriate CUE (assignment of a D2D pair to a CUE means they will share the same resource). Before allocating this edge weight, the system makes sure that sharing doesn’t lower the overall data rate and keeps both CUE and D2D pairs’ SINR (Signal-to-Interference-plus-Noise Ratio) objectives met. An algorithm following the Hungarian algorithm approach is used to find the optimal assignment of resources between CUE and D2D pairs. This algorithm considers only those assignments that satisfy the constraints, and the calculated data rate is better from a baseline (indicating sharing is feasible without reducing the overall rate).

As shown in Fig 2, the unshared sumrate represents the total sumrate of the system if no D2D pair is sharing resources of any CUE. In essence, it indicates the total system sumrate of CUEs. On the other hand, the Hungarian Graph (not the optimal algorithm) is the solution to the problem following the Hungarian algorithm without considering the situation where an assignment may reduce the overall sumrate.

Download:

Fig 2. Comparison of Sumrate between state-of-the-art algorithms and LLM.

https://doi.org/10.1371/journal.pone.0329674.g002

3 Results and discussion

3.1 Experimental setup

3.1.1 Hardware and software environments.

The hardware environments used for the experiment are:

CPU: i7-1165G7 @ 2.80 GHz
RAM: 8 GB DDR4 @ 3200 MHz

The software environments used for the experiment are:

Environment: Jupyter Notebook
Python Version: 3.11
C++ Version: 17

3.1.2 Parametric values of the LLMs.

The OpenAI chat completion parameters are:

Model: gpt-3.5-turbo-16k-0613
Temperature: 0
Context Window: 16,385 tokens
Max Output Tokens: 4,096

There are some parametric values necessary for the implementation of the algorithms. To make the outcomes similar, the LLMs are fed those values in the form of a prompt and also as in a JSON file attached to the prompt. The values are as follows (Table 2):

Download:

Table 2. Parametric values provided to the LLMs.

https://doi.org/10.1371/journal.pone.0329674.t002

3.2 Empirical results

For the experiment, the total Cellular UEs are taken from 1 to 20, and the total D2D pairs are also taken from 1 to 20. In order to ensure proper visualization of the sumrates calculated by GPT, we are representing the outputs generated from 5 to 16 of the Cellular UE and D2D pairs rounded to the nearest integer.

Table 3 shows the sumrate calculated for different combinations of Total D2D pairs and Total CUEs. It is evident that LLMs are able to pick the patterns based on the characteristics of the features. To be more precise, LLM could understand that the increase of Total CUE would increase sumrate and increase of Total D2D pairs would decrease the sumrate.

Download:

Table 3. Sumrate calculated by GPT for the cell and d2d combination.

https://doi.org/10.1371/journal.pone.0329674.t003

3.3 Evaluation metrics

3.3.1 MSE, RMSE and MAE.

MSE is used to display the proximity of a regression line to a collection of predicted points, which, in this case, is given by LLMs. In order to calculate in the same unit as the response variable and get information on the average divergence between the optimized algorithm and output from LLM, RMSE is used. Along with these metrics, MAE is used to find out the overall disparity between the two observations that were paired, which helps to gauge how accurate continuous variables are.

3.3.2 Performance ratio.

The ratio between the optimal value and the calculated value is known as the Performance Ratio or Efficiency Ratio. It is a measure of the efficiency of an approach with respect to an optimal approach. In the case of this experiment, the optimal algorithm is the algorithm mentioned in Sect [2.3]2.3: Algorithm Simulation and the calculated sumrate is taken as the outputs generated from the LLMs. Now, the Performance Ratio or PR will be:

(1)

The Average PR is calculated for each value of D2D pairs, keeping the Total Cellular UEs constant and vice versa. Finally, the Mean Average Performance Ratio (MAPR) is the mean of all the calculated Average PRs, which can be written as:

(2)

Where,

m = number of Total Cells considered for that instance

n = number of Total D2D pairs considered for that instance

And, PR(i,j) is the Performance ratio of the instance where Total Cells and Total D2D pairs are i and j respectively

3.4 Comparison between GPT and simulated algorithms

Some of the widely used algorithms for this task are optimal algorithm obtained from [39], Bipartite Matching and Unshared (sumrate of Cellular UEs without any D2D pairs). The output from optimal produces the most optimal results, so it is taken as ground truth in calculating the errors. If we look at Fig 2, we can visualize that most of the green dots have a greater height than most of the blue dots. This suggests that the Hungarian Graph algorithm produces a higher sumrate than GPT. The optimal algorithm shows that its sumrate remains steady with changes in the variables. The main summary of the figure is that LLMs are able to capture the essence of the task as it is producing results similar to developed algorithms like the Hungarian Algorithm.

3.4.1 Error calculation.

It is evident that the unshared algorithm produces very similar results to the optimized algorithm from [39]. This is because D2D pairs are not considered in the case. But if it is considered, then there can be large errors. GPT has given an MSE, which is 27% greater than the Bipartite Graph Matching Algorithm, which is one of the algorithms presently in use. It is expected that GPT will not be able to surpass the results produced by these algorithms as it is not fine-tuned to serve this specialized field, and that is exactly what is observed. The purpose of using LLMs like GPT for such a task is to find a sub-optimal result quickly and make fast decisions without going through the hassle of implementation of complex algorithms (Table 4).

Download:

Table 4. Errors from the LLM and simulated algorithms.

https://doi.org/10.1371/journal.pone.0329674.t004

3.4.2 Efficiency calculation.

For calculating the efficiency of the output from GPT with respect to the optimal results, the Performance Ratio is taken as one of the evaluation metrics. The value of the total number of Cellular UEs and the total number of D2D pairs are both taken from 1 to 20 in all possible combinations. For visualization purposes, a portion of the Performance Ratio is shown, along with the Average Performance Ratio and the Mean Average Performance Ratio (MAPR):

In Table 5, the value highlighted in red is the calculated MAPR of GPT with respect to the optimal algorithm. If we want to have a closer look at the continuous progression of the Average PR across each assignment of total CUE, we get: [0.267, 0.432, 0.535, 0.607, 0.592, 0.58, 0.609, 0.638, 0.624, 0.633, 0.644, 0.653, 0.653, 0.647, 0.656, 0.652, 0.649, 0.655, 0.662, 0.656]. This means the LLM performs better when the total CUE increases. Again, if we look at the continuous progression of the Average PR across each assignment of total D2D pairs, we get: [0.874, 0.802, 0.762, 0.719, 0.685, 0.652, 0.614, 0.591, 0.578, 0.541, 0.531, 0.502, 0.496, 0.455, 0.441, 0.417, 0.422, 0.393, 0.341, 0.341]. This means the LLM performs better when the total D2D pairs are less.

Download:

Table 5. MAPR of GPT compared with optimal algorithm.

https://doi.org/10.1371/journal.pone.0329674.t005

4 Result analysis

4.1 Performance comparison of the state-of-the-art LLMs

The Bipartite Graph Matching Algorithm shows outstanding sum-rate maximization but is not fully optimized while expanding D2D pairs with a certain total number of Cellular UE. This leads to inferior throughput. A corresponding remark is made regarding the sum-rate results produced by GPT. This demonstrated how well the GPT error calculations match the Bipartite Graph Matching Algorithm’s calculations taken into its closest integer. The result suggests that GPT operates in a manner that produces outcomes similar to those of the Bipartite Matching Algorithm by nature.

LLMs such as Llama-2, Palm, etc., are observed to hallucinate heavily for the same prompts used for the GPT experiments. Assignment operations rely heavily on structured data, i.e. JSON. Other LLMs failed to provide structured responses, and thus, the shared assignment matrix could not provide meaningful outcomes.

4.2 Efficiency of LLM in sumrate maximization

While maximization can be quite a straightforward task for any AI system, sumrate maximization with several networking constraints is difficult to comprehend. GPT, being the frontrunner in this task, is also struggling to produce praiseworthy results. Compared with the optimized algorithm, GPT has a MAPR of only 0.58. This essentially means that if the total number of cellular users and d2d pairs are between 1 and 20, then we can expect the current GPT model to be effective enough to provide assignments which yield a sumrate which is 58% of the efficient algorithm currently in use at the moment.

4.3 Improving LLM for network optimization

The underlying architecture of LLMs can be modified for specific tasks. Fine-tuned LLMs have proved to be more useful compared to generic LLMs when not dealing with diversified tasks [25, 30]. In the case of sumrate maximization, if a modified LLM is created, trained on a large corpus with data specific to sumrate calculation and maximization, then our particular task of sumrate maximization would yield better results. But the scope of this manuscript is to focus on the zero-shot capabilities of generic LLMs on the sumrate maximization.

Aside from this, instead of directly using LLMs to provide with resource allocation outputs, LLMs can be used in assisting the user with an output of a subtask. The subtask can underline specific protocols and assignments which would reduce the search domain for the state-of-the-art algorithms like the Bipartite Matching algorithm.

5 Limitations

LLM performance depends mostly on the data used to train it. This is evident in this experiment as GPT, trained with significantly more data, performed much better than other LLMs. Another performance consideration is the context size of the LLMs. GPT is optimized for large contexts, so it hallucinates rarely compared to other LLMs with shorter context sizes. The experiment input prompts had a fairly large number of tokens, which caused other LLMs to hallucinate often. GPT is relatively optimized for structured responses with function calling and JSON mode options, while other models require fine-tuning to produce meaningful structured outputs [40]. This made GPT ideal for this experiment as it could process structured data more reliably than other LLMs which lack JSON data optimizations.

The LLMs are language models by design, and even though they have reasoning capabilities that make them suitable for many unforeseen tasks, these models are still not capable of providing outputs that can compete with existing algorithmic approaches for a couple of reasons. Firstly, LLMs are not designed to run modified algorithmic codes in the background. Secondly, LLMs are non-deterministic by nature, which is not expected while performing such tasks.

6 Conclusion and future work

The reasoning capability of LLMs on many tasks has been evaluated and shows satisfactory performance. However, their reasoning ability on complex tasks like sumrate maximization was unexplored. Though they have the ability to produce satisfactory outputs, it is observed that they still cannot compete with state-of-the-art algorithms or approaches. In this experiment, it is seen that the efficiency of LLMs, GPT to be specific, is only around 58%. In cases where numerical and structural information does not play a major role, LLM can produce usable outputs. And the fact that there is no necessity for any prior knowledge makes LLMs a very useful tool. For sumrate maximization, if there are time, knowledge and resource constraints for implementation, LLMs can be an alternative approach for sub-optimal assignments.

One of the major drawbacks of LLMs is they need to be modified to support various tasks. This is one of the improvements that can be integrated into this experiment in the future. With proper fine-tuning, many LLMs which are unusable now can become useful for this specific task [40]. Along with that, changing the desired output structures can sometimes yield a better output.

References

1. Floridi L, Chiriatti M. GPT-3: its nature, scope, limits, and consequences. Minds Mach. 2020;30:681–94.
- View Article
- Google Scholar
2. Feine J, Morana S, Gnewuch U. Measuring service encounter satisfaction with customer service chatbots using sentiment analysis. 2019.
3. Araci D. Finbert: financial sentiment analysis with pre-trained language models. arXiv preprint 2019. https://arxiv.org/abs/1908.10063
4. Touvron H, Lavril T, Izacard G, Martinet X, Lachaux MA, Lacroix T. Llama: open and efficient foundation language models. arXiv preprint 2023. https://arxiv.org/abs/2302.13971
5. Chowdhery A, Narang S, Devlin J, Bosma M, Mishra G, Roberts A. Palm: scaling language modeling with pathways. J Mach Learn Res. 2023;24(240):1–113.
- View Article
- Google Scholar
6. Miao Y, Bai Y, Chen L, Li D, Sun H, Wang X. An empirical study of netops capability of pre-trained large language models. arXiv preprint 2023.
- View Article
- Google Scholar
7. Wu S, Koo M, Blum L, Black A, Kao L, Scalzo F. A comparative study of open-source large language models, GPT-4 and Claude 2: multiple-choice test taking in nephrology. arXiv preprint 2023.
- View Article
- Google Scholar
8. Lou X, Guo J, Zhang J, Wang J, Huang K, Du Y. PECAN: leveraging policy ensemble for context-aware zero-shot human-AI coordination. arXiv preprint 2023. https://arxiv.org/abs/2301.06387
9. Bommarito J, Bommarito M, Katz DM, Katz J. GPT as knowledge worker: a zero-shot evaluation of (AI) CPA capabilities. arXiv preprint 2023.
- View Article
- Google Scholar
10. Weeraddana PC, Codreanu M, Latva-aho M, Ephremides A, Fischione C. Weighted sum-rate maximization in wireless networks: a review. Found Trends Netw. 2012;6(1–2):1–163.
- View Article
- Google Scholar
11. Tan CW, Chiang M, Srikant R. Fast algorithms and performance bounds for sum rate maximization in wireless networks. IEEE/ACM Trans Netw. 2012;21(3):706–19.
- View Article
- Google Scholar
12. Petrović N, Koničanin S, Suljović S. ChatGPT in IoT systems: arduino case studies. In: MIEL. 2023.
13. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Adv Neural Inf Process Syst. 2017;30.
- View Article
- Google Scholar
14. Saharia C, Chan W, Saxena S, Li L, Whang J, Denton EL, et al. Photorealistic text-to-image diffusion models with deep language understanding. Adv Neural Inf Process Syst. 2022;35:36479–94.
- View Article
- Google Scholar
15. Devlin J, Chang MW, Lee K, Toutanova K. Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint 2018. https://arxiv.org/abs/1810.04805
16. Zhou C, Li Q, Li C, Yu J, Liu Y, Wang G. A comprehensive survey on pretrained foundation models: a history from bert to chatgpt. arXiv preprint 2023. https://arxiv.org/abs/2302.09419
17. Deb A, Oza N, Singla S, Khandelwal D, Garg D, Singla P. Fill in the blank: exploring and enhancing LLM capabilities for backward reasoning in math word problems. arXiv preprint 2023.https://arxiv.org/abs/231001991
18. Zhang J, Wang X, Ren W, Jiang L, Wang D, et al. A thought structure for coherent and correct llm reasoning. Proc AAAI Conf Artif Intell. 2025;39(25):26733–41.
- View Article
- Google Scholar
19. Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, et al. Chain-of-thought prompting elicits reasoning in large language models. Adv Neural Inf Process Syst. 2022;35:24824–37.
- View Article
- Google Scholar
20. Kasneci E, Seßler K, Küchemann S, Bannert M, Dementieva D, Fischer F. ChatGPT for good? On opportunities and challenges of large language models for education. Learn Individ Diff. 2023;103:102274.
- View Article
- Google Scholar
21. Wei J, Tay Y, Bommasani R, Raffel C, Zoph B, Borgeaud S. Emergent abilities of large language models. arXiv preprint 2022. https://arxiv.org/abs/2206.07682
22. Kabir M, Islam MS, Laskar MTR, Nayeem MT, Bari MS, Hoque E. BenLLMEval: a comprehensive evaluation into the potentials and pitfalls of large language models on Bengali NLP. arXiv preprint 2023. https://arxiv.org/abs/2309.13173
23. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large language models still can’t plan (a benchmark for llms on planning and reasoning about change). arXiv preprint 2022. https://arxiv.org/abs/2206.10498
24. Kojima T, Gu SS, Reid M, Matsuo Y, Iwasawa Y. Large language models are zero-shot reasoners. Adv Neural Inf Process Syst. 2022;35:22199–213.
- View Article
- Google Scholar
25. Wei J, Bosma M, Zhao VY, Guu K, Yu AW, Lester B. Finetuned language models are zero-shot learners. arXiv preprint 2021.
- View Article
- Google Scholar
26. Ruis L, Khan A, Biderman S, Hooker S, Rocktäschel T, Grefenstette E. Large language models are not zero-shot communicators. arXiv preprint 2022. https://arxiv.org/abs/2210.14986
27. Qin C, Zhang A, Zhang Z, Chen J, Yasunaga M, Yang D. Is ChatGPT a general-purpose natural language processing task solver? arXiv preprint 2023.
- View Article
- Google Scholar
28. Wei X, Cui X, Cheng N, Wang X, Zhang X, Huang S, et al. Zero-shot information extraction via chatting with chatgpt. arXiv preprint 2023. https://arxiv.org/abs/2302.10205
29. Liu Z, Yu X, Zhang L, Wu Z, Cao C, Dai H, et al. Deid-gpt: zero-shot medical text de-identification by gpt-4. arXiv preprint 2023.
- View Article
- Google Scholar
30. Jahan I, Laskar MTR, Peng C, Huang J. Evaluation of ChatGPT on biomedical tasks: a zero-shot comparison with fine-tuned generative transformers. arXiv preprint 2023. https://arxiv.org/abs/2306.04504
31. Zhuo L, Yuan R, Pan J, Ma Y, LI Y, Zhang G. Lyricwhiz: robust multilingual zero-shot lyrics transcription by whispering to chatgpt. arXiv preprint 2023.
- View Article
- Google Scholar
32. Shah A, Chava S. Zero is not hero yet: benchmarking zero-shot performance of LLMs for financial tasks. arXiv preprint 2023. https://arxiv.org/abs/2305.16633
33. Jayakumar S. A review on resource allocation techniques in D2D communication for 5G and B5G technology. Peer-to-Peer Netw Appl. 2021;14:243–69.
- View Article
- Google Scholar
34. Rathi R, Gupta N. Game theoretic and non-game theoretic resource allocation approaches for D2D communication. Ain Shams Eng J. 2021;12(2):2385–93.
- View Article
- Google Scholar
35. Zhi Y, Tian J, Deng X, Qiao J, Lu D. Deep reinforcement learning-based resource allocation for D2D communications in heterogeneous cellular networks. Digit Commun Netw. 2022;8(5):834–42.
- View Article
- Google Scholar
36. Xu YH, Sun QM, Zhou W, Yu G. Resource allocation for UAV-aided energy harvesting-powered D2D communications: a reinforcement learning-based scheme. Ad Hoc Networks. 2022;136:102973.
- View Article
- Google Scholar
37. Zeeshan HMA, Umer M, Akbar M, Kaushik A, Jamshed MA, Jung H, et al. LLM-based retrieval-augmented generation: a novel framework for resource optimization in 6G and beyond wireless networks. IEEE Commun Magazine. 2025.
- View Article
- Google Scholar
38. Guo Q, Tang F, Kato N. Federated reinforcement learning-based resource allocation in D2D-enabled 6G. IEEE Network. 2022.
- View Article
- Google Scholar
39. Hussain F, Hassan MY, Hossen MS, Choudhury S. An optimal resource allocation algorithm for D2D communication underlaying cellular networks. In: 2017 14th IEEE Annual Consumer Communications & Networking Conference (CCNC). 2017. p. 867–72.
40. Tang X, Zong Y, Zhao Y, Cohan A, Gerstein M. Struc-bench: are large language models really good at generating complex structured data?. arXiv preprint 2023. https://arxiv.org/abs/2309.08963

[ref1] 1. Floridi L, Chiriatti M. GPT-3: its nature, scope, limits, and consequences. Minds Mach. 2020;30:681–94.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Feine J, Morana S, Gnewuch U. Measuring service encounter satisfaction with customer service chatbots using sentiment analysis. 2019.

[ref3] 3. Araci D. Finbert: financial sentiment analysis with pre-trained language models. arXiv preprint 2019. https://arxiv.org/abs/1908.10063

[ref4] 4. Touvron H, Lavril T, Izacard G, Martinet X, Lachaux MA, Lacroix T. Llama: open and efficient foundation language models. arXiv preprint 2023. https://arxiv.org/abs/2302.13971

[ref5] 5. Chowdhery A, Narang S, Devlin J, Bosma M, Mishra G, Roberts A. Palm: scaling language modeling with pathways. J Mach Learn Res. 2023;24(240):1–113.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref6] 6. Miao Y, Bai Y, Chen L, Li D, Sun H, Wang X. An empirical study of netops capability of pre-trained large language models. arXiv preprint 2023.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref7] 7. Wu S, Koo M, Blum L, Black A, Kao L, Scalzo F. A comparative study of open-source large language models, GPT-4 and Claude 2: multiple-choice test taking in nephrology. arXiv preprint 2023.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref8] 8. Lou X, Guo J, Zhang J, Wang J, Huang K, Du Y. PECAN: leveraging policy ensemble for context-aware zero-shot human-AI coordination. arXiv preprint 2023. https://arxiv.org/abs/2301.06387

[ref9] 9. Bommarito J, Bommarito M, Katz DM, Katz J. GPT as knowledge worker: a zero-shot evaluation of (AI) CPA capabilities. arXiv preprint 2023.
View Article
Google Scholar

[18] View Article

[19] Google Scholar

[ref10] 10. Weeraddana PC, Codreanu M, Latva-aho M, Ephremides A, Fischione C. Weighted sum-rate maximization in wireless networks: a review. Found Trends Netw. 2012;6(1–2):1–163.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref11] 11. Tan CW, Chiang M, Srikant R. Fast algorithms and performance bounds for sum rate maximization in wireless networks. IEEE/ACM Trans Netw. 2012;21(3):706–19.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref12] 12. Petrović N, Koničanin S, Suljović S. ChatGPT in IoT systems: arduino case studies. In: MIEL. 2023.

[ref13] 13. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Adv Neural Inf Process Syst. 2017;30.
View Article
Google Scholar

[28] View Article

[29] Google Scholar

[ref14] 14. Saharia C, Chan W, Saxena S, Li L, Whang J, Denton EL, et al. Photorealistic text-to-image diffusion models with deep language understanding. Adv Neural Inf Process Syst. 2022;35:36479–94.
View Article
Google Scholar

[31] View Article

[32] Google Scholar

[ref15] 15. Devlin J, Chang MW, Lee K, Toutanova K. Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint 2018. https://arxiv.org/abs/1810.04805

[ref16] 16. Zhou C, Li Q, Li C, Yu J, Liu Y, Wang G. A comprehensive survey on pretrained foundation models: a history from bert to chatgpt. arXiv preprint 2023. https://arxiv.org/abs/2302.09419

[ref17] 17. Deb A, Oza N, Singla S, Khandelwal D, Garg D, Singla P. Fill in the blank: exploring and enhancing LLM capabilities for backward reasoning in math word problems. arXiv preprint 2023.https://arxiv.org/abs/231001991

[ref18] 18. Zhang J, Wang X, Ren W, Jiang L, Wang D, et al. A thought structure for coherent and correct llm reasoning. Proc AAAI Conf Artif Intell. 2025;39(25):26733–41.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref19] 19. Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, et al. Chain-of-thought prompting elicits reasoning in large language models. Adv Neural Inf Process Syst. 2022;35:24824–37.
View Article
Google Scholar

[40] View Article

[41] Google Scholar

[ref20] 20. Kasneci E, Seßler K, Küchemann S, Bannert M, Dementieva D, Fischer F. ChatGPT for good? On opportunities and challenges of large language models for education. Learn Individ Diff. 2023;103:102274.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref21] 21. Wei J, Tay Y, Bommasani R, Raffel C, Zoph B, Borgeaud S. Emergent abilities of large language models. arXiv preprint 2022. https://arxiv.org/abs/2206.07682

[ref22] 22. Kabir M, Islam MS, Laskar MTR, Nayeem MT, Bari MS, Hoque E. BenLLMEval: a comprehensive evaluation into the potentials and pitfalls of large language models on Bengali NLP. arXiv preprint 2023. https://arxiv.org/abs/2309.13173

[ref23] 23. Valmeekam K, Olmo A, Sreedharan S, Kambhampati S. Large language models still can’t plan (a benchmark for llms on planning and reasoning about change). arXiv preprint 2022. https://arxiv.org/abs/2206.10498

[ref24] 24. Kojima T, Gu SS, Reid M, Matsuo Y, Iwasawa Y. Large language models are zero-shot reasoners. Adv Neural Inf Process Syst. 2022;35:22199–213.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref25] 25. Wei J, Bosma M, Zhao VY, Guu K, Yu AW, Lester B. Finetuned language models are zero-shot learners. arXiv preprint 2021.
View Article
Google Scholar

[52] View Article

[53] Google Scholar

[ref26] 26. Ruis L, Khan A, Biderman S, Hooker S, Rocktäschel T, Grefenstette E. Large language models are not zero-shot communicators. arXiv preprint 2022. https://arxiv.org/abs/2210.14986

[ref27] 27. Qin C, Zhang A, Zhang Z, Chen J, Yasunaga M, Yang D. Is ChatGPT a general-purpose natural language processing task solver? arXiv preprint 2023.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref28] 28. Wei X, Cui X, Cheng N, Wang X, Zhang X, Huang S, et al. Zero-shot information extraction via chatting with chatgpt. arXiv preprint 2023. https://arxiv.org/abs/2302.10205

[ref29] 29. Liu Z, Yu X, Zhang L, Wu Z, Cao C, Dai H, et al. Deid-gpt: zero-shot medical text de-identification by gpt-4. arXiv preprint 2023.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref30] 30. Jahan I, Laskar MTR, Peng C, Huang J. Evaluation of ChatGPT on biomedical tasks: a zero-shot comparison with fine-tuned generative transformers. arXiv preprint 2023. https://arxiv.org/abs/2306.04504

[ref31] 31. Zhuo L, Yuan R, Pan J, Ma Y, LI Y, Zhang G. Lyricwhiz: robust multilingual zero-shot lyrics transcription by whispering to chatgpt. arXiv preprint 2023.
View Article
Google Scholar

[64] View Article

[65] Google Scholar

[ref32] 32. Shah A, Chava S. Zero is not hero yet: benchmarking zero-shot performance of LLMs for financial tasks. arXiv preprint 2023. https://arxiv.org/abs/2305.16633

[ref33] 33. Jayakumar S. A review on resource allocation techniques in D2D communication for 5G and B5G technology. Peer-to-Peer Netw Appl. 2021;14:243–69.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref34] 34. Rathi R, Gupta N. Game theoretic and non-game theoretic resource allocation approaches for D2D communication. Ain Shams Eng J. 2021;12(2):2385–93.
View Article
Google Scholar

[71] View Article

[72] Google Scholar

[ref35] 35. Zhi Y, Tian J, Deng X, Qiao J, Lu D. Deep reinforcement learning-based resource allocation for D2D communications in heterogeneous cellular networks. Digit Commun Netw. 2022;8(5):834–42.
View Article
Google Scholar

[74] View Article

[75] Google Scholar

[ref36] 36. Xu YH, Sun QM, Zhou W, Yu G. Resource allocation for UAV-aided energy harvesting-powered D2D communications: a reinforcement learning-based scheme. Ad Hoc Networks. 2022;136:102973.
View Article
Google Scholar

[77] View Article

[78] Google Scholar

[ref37] 37. Zeeshan HMA, Umer M, Akbar M, Kaushik A, Jamshed MA, Jung H, et al. LLM-based retrieval-augmented generation: a novel framework for resource optimization in 6G and beyond wireless networks. IEEE Commun Magazine. 2025.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref38] 38. Guo Q, Tang F, Kato N. Federated reinforcement learning-based resource allocation in D2D-enabled 6G. IEEE Network. 2022.
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref39] 39. Hussain F, Hassan MY, Hossen MS, Choudhury S. An optimal resource allocation algorithm for D2D communication underlaying cellular networks. In: 2017 14th IEEE Annual Consumer Communications & Networking Conference (CCNC). 2017. p. 867–72.

[ref40] 40. Tang X, Zong Y, Zhao Y, Cohan A, Gerstein M. Struc-bench: are large language models really good at generating complex structured data?. arXiv preprint 2023. https://arxiv.org/abs/2309.08963

Figures

Abstract

Introduction

1 Related work

2 Methodology

2.1 Prompt creation

2.2 Response generation

2.3 Algorithm simulation

3 Results and discussion

3.1 Experimental setup

3.1.1 Hardware and software environments.

3.1.2 Parametric values of the LLMs.

3.2 Empirical results

3.3 Evaluation metrics

3.3.1 MSE, RMSE and MAE.

3.3.2 Performance ratio.

3.4 Comparison between GPT and simulated algorithms

3.4.1 Error calculation.

3.4.2 Efficiency calculation.

4 Result analysis

4.1 Performance comparison of the state-of-the-art LLMs

4.2 Efficiency of LLM in sumrate maximization

4.3 Improving LLM for network optimization

5 Limitations

6 Conclusion and future work

References