Residual-aided CSI-free end-to-end learning for multiuser MIMO

Emmanuel Ampoma Affum; Osumanu Futa; Maxwell Afriyie Oppong; Daniel Owusu Biney

doi:10.1371/journal.pone.0344696

Abstract

A paradigm shift from Channel State Information (CSI)-dependent architectures to intelligent, AI-native air interfaces is required as 6G wireless systems advance. Conventional Multi-User Multiple-Input Multiple-Output (MU-MIMO) systems have substantial pilot overhead and computational complexity since they rely on explicit CSI for beamforming and interference management. This study suggests a novel Deep Unfolding Successive Over-Relaxation (DU-SOR) paradigm to overcome these constraints. In contrast to conventional end-to-end learning techniques that operate as “black boxes,” DU-SOR combines iterative residual refining with a sparse Graph Transformer. The network can intuitively solve the inverse problem without explicit channel matrix inversion thanks to this novel architecture, which uses graph priors to condition the signal estimation. Extensive empirical analyses show that the proposed framework accomplishes three main goals: (i) near-optimal performance, confirmed by a mutual information score of 0.98 at 20 dB SNR; (ii) mathematically proven scalable complexity, reducing the scaling order from to via sparse attention mechanisms; and (iii) robust generalisation across various channel conditions (Rayleigh, Rician, 3GPP UMi). This work offers a scalable foundation for sustainable AI-native 6G receivers by combining sparse-graph efficiency with CSI-free operation.

Citation: Affum EA, Futa O, Oppong MA, Biney DO (2026) Residual-aided CSI-free end-to-end learning for multiuser MIMO. PLoS One 21(4): e0344696. https://doi.org/10.1371/journal.pone.0344696

Editor: Daosen Zhai, Northwestern Polytechnical University, CHINA

Received: December 4, 2025; Accepted: February 24, 2026; Published: April 24, 2026

Copyright: © 2026 Affum et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: This study’s simulation code is all freely accessible at Zenodo: https://doi.org/10.5281/zenodo.19355766. The repository includes all channel models, baseline detector implementations, training and evaluation scripts, and full Python/PyTorch implementations of the DU-SOR framework. All results are derived from computational simulations with parameters fully defined in the Materials and Methods section; no experimental datasets were created.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

A paradigm change towards intelligent AI-native networks is signalled by the switch from 5G to 6G wireless technologies [1]. Communication performance in 5G and earlier generations has depended on accurate Channel State Information (CSI) [2–4], especially in Multiple-Input Multiple-Output (MIMO) systems. Beamforming, spatial multiplexing, and interference control rely on CSI, which describes the propagation environment between transmitters and receivers, to optimise throughput and reliability [5]. However, there are significant challenges in gathering accurate CSI in dynamic, multi-user, massive MIMO systems. These include excessive computational complexity, significant pilot and feedback overhead, and intrinsic privacy risks [3,6].

In large MIMO environments, typical pilot-based estimating takes 10–20% of radio resources, resulting in a basic bottleneck that inhibits scalability and efficiency as networks grow denser and more dynamic [7]. Fig 1 illustrates these multi-dimensional issues, such as heterogeneity, real-time accuracy requirements, and signalling overhead limits. Furthermore, because of the matrix inversion of [8], the computing complexity of linear detectors such as MMSE scales cubically with the number of users . The conceptual transition from CSI-dependent to CSI-free designs is shown in Fig 2.

Download:

Fig 1. Key problems in conventional MU-MIMO systems.

The figure demonstrates the multi-dimensional issues including heterogeneity in user equipment, large data quantities, real-time accuracy requirements, and the necessity for low signalling overhead. The creation of CSI-free architectures is driven by these difficulties.

https://doi.org/10.1371/journal.pone.0344696.g001

Download:

Fig 2. Conceptual evolution from CSI-dependent to CSI-free AI-native architectures.

The conventional method (left) necessitates explicit channel estimation blocks, which cause slowness and errors. By combining these features into a single learning framework, the suggested AI-native method (right) removes pilot overhead.

https://doi.org/10.1371/journal.pone.0344696.g002

Deep learning (DL), a revolutionary technology [9–11], enables end-to-end (E2E) learning, where neural networks collaboratively optimise all signal processing components. Fig. 3 illustrates the conceptual difference between AI-native communication paradigms and model-based optimisation. Nevertheless, there isn’t a single method in the literature that addresses strong generalisation, computational scalability, and overhead removal at the same time. Model-free methods such as DeepRx [12] are opaque, difficult to understand, and expensive to train. Explicit CSI is usually necessary for model-based methods such as OAMP-Net [13] to work. Although they show promise, recent Graph Neural Networks (GNNs) [14] frequently have excessive latency because of full-graph attention methods.

Download:

Fig 3. Comparison of signal processing paradigms.

(Left) Explicit channel matrices are used in classical linear detection. (Centre) Iterative SOR techniques reveal consecutive stages of over-relaxation. (Right) Deep learning-based methods use data to directly learn the mapping.

https://doi.org/10.1371/journal.pone.0344696.g003

This work suggests the DU-SOR (Deep Unfolding Successive Over-Relaxation) architecture to fill these gaps. The identified research gaps and how they relate to our research questions are summarised in Table 1.

Download:

Table 1. Identified research gaps and RQ alignment.

https://doi.org/10.1371/journal.pone.0344696.t001

Contributions

Our specific contributions are:

Novel Architecture: We provide a synergistic combination of residual refinement and sparse graph transformers. Our approach employs graph priors to condition blind residual updates, enabling implicit channel inversion, in contrast to DeepRx (pure CNN) or OAMP-Net (needs CSI).
Theoretical Rigour: We present a convergence theorem based on contraction mapping principles (Theorem 1) and a formal complexity analysis demonstrating scaling (Proposition 1).
Verified Scalability: In comparison to post-2023 GNN baselines [14,15], we show through direct hardware measurement using NVIDIA Management Library (NVML) that our sparse attention technique considerably reduces VRAM utilisation and power consumption.

Distinction from Prior Work. Our DU-SOR framework is the first to: (a) eliminate CSI dependency through implicit learning of interference patterns; (b) provide mathematically proven complexity via sparse attention; and (c) guarantee convergence via contraction mapping principles. In contrast, previous works like OAMP-Net [13] require explicit CSI matrices and DeepRx [12] uses black-box architectures without complexity guarantees. All three of the identified gaps (G1–G3) are simultaneously addressed by this combination.

Materials and methods

A networked E2E autoencoder that has been collaboratively tuned to handle CSI-free multi-user detection is used in the suggested framework. The system architecture, the channel modelling framework, and the mathematical formulas directing the residual-aided learning procedure are all covered in this part.

System model and assumptions

The system aims for an uplink MU-MIMO configuration in which a BS with N antennas (N ≫ K) receives transmissions from K single-antenna UEs. The model for the received signal is:

(1)

where is the channel matrix, contains transmitted symbols (), and is AWGN [16,17]. Key assumptions include block fading spanning coherence intervals of length T = 200 symbols and perfect time synchronisation [18]. Unlike traditional systems [19] which allocate symbols for pilots, we assume .

Channel modelling framework

To ensure robust generalisation, the system is evaluated using a range of channel models.

Small-scale fading models

Rayleigh Fading represents Non-Line-of-Sight (NLOS) conditions with rich dispersion. The coefficients are i.i.d. complex Gaussian: .

Rician Fading includes a Line-of-Sight (LOS) element:

(2)

where dB is the Rician factor [20,21].

Standardised and correlated models

3GPP Urban Microcell (UMi) is based on TR 38.901 [16], modelling realistic path loss and shadowing at 3.5 GHz. For spatial correlation, we apply the Kronecker model:

(3)

where R_BS and R_UE are the correlation matrices [22,23]. Time-varying channels incorporate mobility effects using Jakes’ spectrum [24].

Residual-aided autoencoder architecture

Lightweight encoders at UEs and a sophisticated decoder at the BS make up the framework.

User-side encoder.

Each UE encodes a bit sequence into a complex symbol x_k. The architecture consists of:

Feature Extraction: Temporal patterns are extracted by a 1-D CNN using 128 feature maps and a kernel size of 3.

Residual Block: To prevent gradient vanishing, a residual mapping is used:

(4)

where is a two-layer CNN [25,26].

Symbol Mapping: The features are projected to complex symbols x_k by a fully connected layer, then normalised to meet the power constraint.

Base station decoder.

The input signal y is processed by the BS decoder employing Iterative Residual Refinement and a Graph Transformer Module. The entire process is described in Algorithm 1.

Algorithm 1 Residual-Aided CSI-Free Detection (DU-SOR)

1: Input: Received signal y, Max Iterations T

2: Output: Detected symbols

3: Construct Graph via k-NN on antenna features

4: Initialize estimate

5: for t = 0 to T − 1 do

6: Extract Features:

7: Sparse Attention:

8:

9: (A is sparse mask)

10: Residual Update:

11:

12: (Learned relaxation)

13:

14: end for

15:

Graph transformer module with sparse attention

Antenna signals are represented as nodes in a graph by the Graph Transformer Module. We use a Sparse Attention technique to decrease complexity, where interactions are limited to the top-k nearest neighbours by the adjacency matrix A:

(5)

where Q, K, V are the query, key, and value matrices, and ⊙ represents element-wise multiplication with the sparsity mask [15,27].

Iterative residual refinement

The decoder unfolds T iterations of residual updates, inspired by deep unfolding techniques [13,28]:

(6)

where is a parameter-shared MLP that predicts the residual error.

Connection to classical SOR

The Successive Over-Relaxation (SOR) method for solving linear systems Ax = b has the classical form:

(7)

where r^(t) = b − Ax^(t) is the residual, D and L are the diagonal and lower triangular parts of A, and is the relaxation parameter [17]. Our framework generalises this by: (i) replacing the fixed linear operator with a learned nonlinear mapping , and (ii) making the relaxation factor data-adaptive. Specifically, our update becomes:

(8)

where is a sigmoid function scaled to (0, 2) and is the predicted residual correction. This learned relaxation enables automatic tuning to channel conditions, generalising SOR convergence guarantees while maintaining the intuitive residual-correction structure. The term “deep unfolding” refers to this practice of mapping iterative algorithm steps to neural network layers with learnable parameters [28,29].

Graph construction details

The k-NN graph is constructed as follows:

Node Features. Each node v_n (corresponding to antenna n) is associated with a feature vector:

(9)

where and are the magnitude and phase of the received signal, and are normalised antenna position coordinates.

Distance Metric. Edges are determined using Euclidean distance in the feature space:

(10)

Graph Sparsity. We use k = 8 nearest neighbours based on sensitivity analysis (see Ablation Studies). Each node connects to its k closest neighbours, yielding edge set with .

Static vs. Dynamic Construction. For computational efficiency, we employ a hybrid approach: the base graph topology is precomputed from antenna geometry (static), while edge weights are dynamically updated each forward pass based on received signal features. This balances adaptivity with inference speed. Formally:

(11)

where denotes the k nearest neighbours of node i based on geometry, and is a learnable temperature parameter.

Theoretical analysis of convergence

We examine the residual update as a fixed-point iteration to ensure the iterative process in Algorithm 1 converges to a stable solution.

Lemma 1 (Spectral Normalisation Bound). Let W be a weight matrix with spectral normalisation applied, i.e., where is the largest singular value. Then for any input u, the linear mapping has Lipschitz constant exactly 1.

Proof. By definition, , since spectral normalisation ensures . □

Theorem 1 (Convergence of Residual Refinement). Let the iteration be

and assume that the learned mapping satisfies:

1. Lipschitz continuity: There exists L_G > 0 such that

2. Strong monotonicity (descent property): There exists such that

If the relaxation parameter satisfies

(12)

then the operator

is a contraction mapping, and the iteration converges linearly to a unique fixed point .

Proof. For any u,v, we compute

(13)

Using strong monotonicity and Lipschitz continuity:

Thus,

Define

Condition (12) ensures q < 1, hence F^(t) is a contraction. By the Banach Fixed Point Theorem, the iteration converges to a unique fixed point with linear rate q. □

Remark 1 (Interpretation of Assumptions). The strong monotonicity condition reflects the fact that the residual network is trained to approximate a descent direction of the detection loss, i.e., . Lipschitz continuity is enforced through spectral normalisation of all weight matrices and output scaling. Empirically, we verify that the effective Lipschitz constant remains below 1 throughout training (see S3 Fig), and that the learned relaxation remains within the stable contraction regime.

Remark 2 (Empirical Stability Analysis). Theorem 1 establishes that strong monotonicity () is a sufficient condition for linear convergence. However, our empirical results (S3 Fig) demonstrate that the trained network operates in a neutral stability regime, where both the Lipschitz constant L and monotonicity parameter approach zero (L ≈ 0.007, ).

This behaviour indicates that the DU-SOR network has learned a highly efficient quasi-one-shot estimation strategy. Rather than relying on iterative corrections that depend strongly on the previous state x^(t), the residual module learns to predict the optimal correction vector directly from the received signal y. This renders the mapping nearly independent of the current state x^(t), resulting in a vanishing Lipschitz constant (L ≪ 1).

Mathematically, when L → 0, the update operator satisfies:

indicating that the iteration neither contracts nor expands distances—a neutral fixed-point behaviour. This guarantees unconditional stability and enables rapid convergence, effectively bypassing the slower descent trajectory predicted by classical iterative theory. The network has thus discovered an estimation strategy that is more direct than the iterative refinement framework it was designed to implement.

Loss function

A composite loss function balances multiple objectives:

(14)

where is the multi-task block error rate loss averaged over users, is a mutual information regulariser (), and the penalty () promotes sparsity in the model weights.

Training methodology

Curriculum learning strategy.

A curriculum learning method is used to keep the network from diverging under high-interference conditions. Training begins with loud noise/low SNR (0 dB) settings before moving on to higher SNR levels (up to 30 dB).

Meta-learning initialisation.

The network weights are initialised using Model-Agnostic Meta-Learning (MAML). The network is pre-trained on a distribution of channel tasks, providing a favourable starting point that enables rapid adaptation.

Complexity and scalability analysis

Graph-system mapping.

We first clarify the relationship between graph structure and system dimensions. The graph is constructed with nodes corresponding to BS antenna elements. In massive MIMO systems with loading ratio (typically ), both K and N scale together. We express complexity in terms of K for comparison with user-centric baselines.

Linear detectors (MMSE).

The MMSE detector requires calculating . The matrix inversion of a K × K matrix is the dominant operation:

(15)

This cubic scaling renders MMSE impractical for large K [30].

Proposed DU-SOR network

Proposition 1 (Computational Complexity). For the proposed DU-SOR detector with sparse k-NN graph attention where for constant c > 0, the computational complexity per iteration scales as where d is the feature dimension. With fixed loading ratio , this translates to complexity.

Proof. Step 1: Sparse Attention Complexity. Standard dense attention computes , requiring operations. Our sparse formulation (Eq. 5) restricts computation to non-zero entries in the adjacency mask A. With k-NN sparsity, each node attends to exactly k neighbours, yielding total attention computations, each requiring operations.

Step 2: Sparsity Justification. The choice is motivated by: (i) theoretical results showing that neighbours suffice to preserve spectral properties of random geometric graphs [14], and (ii) empirical studies of antenna coupling in massive MIMO arrays where significant correlation exists only among geometrically proximate elements [30]. For uniform linear arrays, coupling strength decays exponentially with antenna separation, justifying sparse connectivity.

Step 3: Complexity Derivation. With , the attention complexity becomes:

(16)

The MLP layers contribute , which is dominated by attention for typical in our architecture. For L iterations with fixed loading : (17) □

Conditions and limitations

The bound holds under: (i) fixed loading ratio , (ii) sparse antenna coupling structure amenable to k-NN approximation, and (iii) . For highly correlated channels requiring denser graphs, complexity may approach in the worst case.

Experimental setup

The experimental evaluation was conducted using a simulated uplink MU-MIMO system with users and N = 128 BS antennas. NVIDIA A100 GPUs with 40 GB of VRAM were used to create the framework in PyTorch 2.1. Over 500 epochs of training were conducted using the Adam optimiser (, ) with a cosine decay learning rate schedule (10⁻³ to 10⁻⁵). The statistical models mentioned above were used to construct channel realisations. All baseline techniques (DeepRx, OAMP-Net, and GNN-Detector) were assessed using their published setups to guarantee a fair comparison.

Results

CSI-free operation and performance (RQ1)

The suggested method outperforms previous GNN-based detectors [12–14], DeepRx (BLER ≈ 10⁻²), and OAMP-Net with a BLER of 10⁻³ at 15 dB SNR. At the 10⁻³ BLER operational point, performance is barely 1.0 dB away from the MMSE-genie bound. The Genie-MMSE bound establishes the lowest possible error floor by using perfect CSI, which is not obtainable in practice, as a theoretical baseline. The pre-log penalty disappears when , resulting in an 18% increase in spectral efficiency over MMSE-based systems [8,31]. Fig 4 compares complexity scaling, and Fig 5 presents extensive FLOPs measurements across various user counts.

Download:

Fig 4. FLOPs vs. number of users (log scale).

In contrast to the scaling of conventional MMSE detectors and the scaling of full-graph GNN approaches, the suggested residual-aided framework exhibits scaling, allowing for feasible deployment in massive MIMO systems.

https://doi.org/10.1371/journal.pone.0344696.g004

Download:

Fig 5. Detailed computational complexity comparison.

FLOPs (in millions) versus number of users K for the proposed DU-SOR method compared to MMSE, OAMP-Net, and DeepRx baseliness. The proposed method demonstrates consistently lower computational requirements across all user counts.

https://doi.org/10.1371/journal.pone.0344696.g005

A detailed breakdown of computational complexity across different user counts is presented in Fig 5. The proposed DU-SOR framework consistently requires fewer FLOPs than all baseline methods, with the gap widening as the number of users increases, confirming the scaling advantage.

Hardware resources and energy (RQ2)

An NVIDIA A100 GPU was used to measure hardware parameters in order to guarantee a thorough examination. Instead of using Thermal Design Power (TDP) estimations, power consumption was measured using the NVIDIA Management Library (NVML) polling at 10 ms intervals during inference batches. The detailed resource consumption metrics are summarised in Table 2.

Download:

Table 2. Hardware resource consumption (K = 64 users).

https://doi.org/10.1371/journal.pone.0344696.t002

The Energy-Delay Product (EDP) showed a 32% reduction compared to OAMP-Net [13]. The inference latency scaling with respect to the number of users is illustrated in Fig 6. The proposed method maintains sub-10 ms latency even at K = 64 users, satisfying real-time processing requirements for 5G NR and beyond.

Download:

Fig 6. Inference latency versus number of users.

Latency (ms) as a function of the number of users K for the proposed DU-SOR method and baseline approaches. The sparse attention mechanism enables the proposed method to maintain lower latency compared to baseline methods across all user counts. The secondary axis displays the Energy-Delay Product (EDP).

https://doi.org/10.1371/journal.pone.0344696.g006

Fig 7 presents the spectral efficiency comparison, with detailed performance across the full SNR range shown in Fig 8. The proposed framework achieves superior spectral efficiency compared to all baselines, particularly at medium-to-high SNR values where the elimination of pilot overhead provides the greatest benefit.

Download:

Fig 7. Spectral efficiency vs. SNR showing an 18% gain.

By removing pilot overhead, the suggested CSI-free architecture significantly increases spectral efficiency, especially at higher SNR values.

https://doi.org/10.1371/journal.pone.0344696.g007

Download:

Fig 8. Spectral efficiency versus SNR comparison.

Detailed spectral efficiency (bits/s/Hz) as a function of SNR (dB) for the proposed DU-SOR method and baseline approaches. The 18% improvement over conventional MMSE-based systems is consistent across the evaluated SNR range.

https://doi.org/10.1371/journal.pone.0344696.g008

Robust generalisation (RQ3)

The system maintained a BLER of 2 × 10⁻³ with 15 dB SNR for K_r = 5 dB Rician fading, within 1.2 dB of the Rayleigh baseline. Up to 100 Hz, the system was resilient to Jakes’ Doppler frequencies; after 200 Hz, there was noticeable deterioration. The SNR gap remained less than 1.5 dB on Kronecker-correlated channels. These findings show that generalisable properties across channel distributions are successfully captured by the meta-learning initialisation.

Robustness under impairments.

BLER stayed below 5 × 10⁻³ with timing offsets of ±0.25 symbol periods. Practical robustness to quantisation effects was demonstrated by the 15% BLER increase with 8-bit ADCs compared to 10-bit.

Ablation studies.

BLER deteriorates to 5 × 10⁻³ at 15 dB (5 × worse) in the absence of residual connections. There was a 20% decline at 20 dB SNR in the absence of the mutual information regulariser. Its crucial role in scalability was confirmed when the sparse attention mask was removed, increasing VRAM utilisation to 4.2 GB.

Graph sparsity (k).

We evaluated neighbours. Performance saturates at k = 8 (BLER = 1.02 × 10⁻³), with k = 4 showing 15% degradation and k ≥ 16 providing marginal gains (<2%) at increased computational cost. We thus adopt k = 8 as the default.

Discussion

Technical resilience (RQ1 and G1)

The ability of the hybrid graph-transformer model to implicitly learn interference patterns eliminates pilot contamination and estimating mistakes found in traditional methods [32,33]. Without any pilot overhead, the framework achieves a BLER of 10⁻³ at 15 dB SNR, proving that CSI-free operation is possible without appreciable performance reduction. This validates Theorem 1, which states that the residual refinement accurately approximates the gradient descent steps of a maximum-likelihood detector.

Sustainability and scalability (RQ2 and G2)

The framework makes massive MIMO computationally tractable by reducing complexity from cubic to linear-logarithmic (as demonstrated in Proposition 1). Global sustainability targets for green networks [29] are in line with the observed decrease in peak power (150 W vs. 235 W for full-graph GNNs). Two important architectural choices—the parameter-shared residual refinement blocks and the sparse attention method, which lowers quadratic attention complexity—are responsible for the scalability gains.

Generalisation and practicality (RQ3 and G3)

Practical implementation depends on the system’s ability to generalise to Rician and 3GPP UMi channels without fine-tuning. While the curriculum learning technique guarantees steady convergence across SNR ranges, the meta-learning initialisation using MAML offers an advantageous starting point that captures common features across channel distributions. Rather than overfitting to particular channel statistics, the strong generalisation seen—maintaining performance within 1.5 dB across a variety of channel conditions—indicates that the learnt representations reflect essential characteristics of multi-user interference.

Practical deployment considerations.

The deployment of learnable encoders at user equipment (UE) raises practical considerations that merit discussion.

Encoder Complexity. The proposed UE encoder comprises approximately 47,000 trainable parameters, requiring <200 KB storage and <0.5 ms inference latency on mobile-grade processors (tested on Snapdragon 888). This is comparable to existing modem DSP complexity and well within UE computational budgets.

Compatibility with Standards. The encoder outputs are designed to lie within standard QAM constellation regions through the power normalisation layer. This enables graceful fallback: a legacy receiver can demodulate signals from our encoder (with performance degradation), while the full benefits require the matched neural decoder. This hybrid compatibility facilitates incremental deployment.

Model Distribution and Updates. Pre-trained encoder weights can be distributed via:

Factory provisioning: Models embedded in device firmware, updated through standard software updates.
Broadcast channels: Leveraging existing System Information Block (SIB) mechanisms in LTE/NR for model parameter broadcast.
Federated refinement: Optional on-device fine-tuning using federated learning, preserving privacy while enabling adaptation.

Hybrid Deployment Mode. For scenarios where UE modification is infeasible, the framework supports a decoder-only mode where UEs employ standard modulation (e.g., 64-QAM) and only the BS utilises the neural decoder. Our experiments show this mode achieves 70% of the full E2E gains while requiring no UE changes, providing a practical migration path.

Limitations and scope

Although the suggested framework performs well, the current scope of this work is defined by a number of restrictions.

Simulation-based evaluation. Synthetic channel models (3GPP UMi, Rayleigh, Rician) are used in the main evaluation. Non-stationary interference and site-specific multipath clustering are two examples of propagation phenomena that are not captured by these models, despite the fact that they are industry standard and commonly used for benchmarking [16]. We assessed robustness under hardware impairment models, such as phase noise, I/Q imbalance, and low-resolution ADCs (8-bit), in order to partially address this issue. Under these circumstances, the results in Section “Robustness under impairments” show gentle degradation, indicating practical deployability. Future work will use the DeepMIMO ray-tracing dataset [34] for quasi-real validation and Software Defined Radio (SDR) testbeds for field experiments.

Single base station focus. Single-BS uplink circumstances are taken into account in the current implementation. However, by creating a single graph that spans several BSs and using edge-type embeddings to discriminate between intra-BS and inter-BS coordination, the graph-based architecture easily extends to Coordinated Multi-Point (CoMP) configurations. This expanded graph would be used by the sparse attention mechanism (Eq. 5), and initial analysis indicates that limited inter-BS connectivity will preserve complexity. Future research should focus on full CoMP evaluation with macro-diversity gains.

Deployment resources. Significant GPU resources (NVIDIA A100, 40 GB VRAM) are needed for training. We observe that the architecture is compatible with common compression methods for edge deployment. Dynamic INT8 quantisation in preliminary studies resulted in a model size reduction of about 3.5× with less than 10% BLER degradation. With acceptable performance trade-offs, structured pruning at 50% sparsity reduced the model size by two times. An additional route to lightweight deployment is provided by knowledge distillation to a compact student model (50% less parameters), which achieves 2.1× compression with just 15% BLER increase. Future research aimed at FPGA and edge GPU implementations will thoroughly characterise these tactics.

Future directions

Building on the current findings, several research directions warrant investigation:

Validation in the real world: To verify performance under realistic propagation and hardware settings, field tests are conducted utilising SDR testbeds (such as the USRP X310) and evaluated on the DeepMIMO ray-tracing dataset.
Multi-BS extension: The graph concept is extended to Cell-Free Massive MIMO and CoMP scenarios, where the adjacency matrix represents inter-BS coordination links as well as intra-BS antenna coupling.
Lightweight deployment: Systematic assessment of knowledge distillation, structured pruning, and quantisation (INT8/INT4) for use on edge devices with power budgets under 10 W.
Adaptive iterations: using reinforcement learning to dynamically modify the number of residual refinement iterations according to latency requirements and current channel conditions.

Conclusion

The DU-SOR framework for CSI-free MIMO detection was introduced in this paper. We offer a solid basis for AI-native 6G receivers by carefully proving convergence (Theorem 1) and complexity (Proposition 1), and verifying these assertions against contemporary baselines using measured hardware metrics.

Three significant contributions are made by this work. First, we show that it is possible to achieve effective CSI-free operation, which eliminates pilot overhead and keeps detection performance within 1.0 dB of genie-aided boundaries. Second, we formally demonstrate that sparse residual learning paths enable scalable massive MIMO deployment by reducing computational complexity from to . Third, we verify strong generalisation for a variety of channel circumstances without fine-tuning, which is essential for realistic wireless systems.

Supporting information

S1 Fig. Extended BLER performance curves.

Comprehensive block error rate performance comparison for Rayleigh, Rician, and 3GPP UMi channel models across all examined SNR values.

https://doi.org/10.1371/journal.pone.0344696.s001

(TIFF)

S2 Fig. Training convergence analysis.

Learning curves for the suggested framework with and without curriculum learning and meta-learning initialisation that demonstrate loss convergence.

https://doi.org/10.1371/journal.pone.0344696.s002

(TIFF)

S1 Table. Detailed hyperparameter settings.

All training setups, channel model parameters, and neural network hyperparameters used in the studies are fully specified.

https://doi.org/10.1371/journal.pone.0344696.s003

(PDF)

S3 Fig. Empirical Lipschitz constant verification.

(a) The evolution of the residual mapping ’s empirical Lipschitz constant L throughout training epochs, demonstrating that spectral normalisation keeps L < 1 during optimisation. At initialisation, the Lipschitz constant is roughly 0.02; at convergence, it is less than 0.01. (b) The strong monotonicity constant evolves. As mentioned in Remark 2, the network has evolved a quasi-one-shot estimate approach where both L and are almost zero.

https://doi.org/10.1371/journal.pone.0344696.s004

(TIFF)

S2 Table. Complexity comparison under varying conditions.

Detailed FLOP counts for the proposed method and baselines across different user counts (K), loading ratios (), and graph sparsity levels (k).

https://doi.org/10.1371/journal.pone.0344696.s005

(PDF)

Acknowledgments

The authors express their gratitude to the anonymous reviewers for their insightful comments that enhanced the manuscript’s quality. Additionally, the authors thank the corresponding institutions for their computational resources.

References

1. Saad W, Bennis M, Chen M. A vision of 6G wireless systems: applications, trends, technologies, and open research problems. IEEE Network. 2019;34(3):134–42.
- View Article
- Google Scholar
2. Larsson EG, Edfors O, Tufvesson F, Marzetta TL. Massive MIMO for next generation wireless systems. IEEE Commun Mag. 2014;52(2):186–95.
- View Article
- Google Scholar
3. Marzetta TL. Noncooperative cellular wireless with unlimited numbers of base station antennas. IEEE Trans Wireless Commun. 2010;9(11):3590–600.
- View Article
- Google Scholar
4. Goldsmith A. Wireless communications. Cambridge: Cambridge University Press; 2005.
5. Tse D, Viswanath P. Fundamentals of wireless communication. Cambridge: Cambridge University Press; 2005.
6. Shlezinger N, Whang J, Eldar YC, Dimakis AG. Model-based deep learning. In: 2020 IEEE International conference on acoustics, speech and signal processing (ICASSP), Barcelona, Spain: 2020. 4617–21.
7. Bockelmann C, Pratas N, Nikopour H, Au K, Svensson T, Stefanovic C, et al. Massive machine-type communications in 5g: physical and MAC-layer solutions. IEEE Commun Mag. 2016;54(9):59–65.
- View Article
- Google Scholar
8. Lu L, Li GY, Swindlehurst AL, Ashikhmin A, Zhang R. An overview of massive MIMO: benefits and challenges. IEEE J Sel Top Signal Process. 2014;8(5):742–58.
- View Article
- Google Scholar
9. O’Shea T, Hoydis J. An introduction to deep learning for the physical layer. IEEE Trans Cogn Commun Netw. 2017;3(4):563–75.
- View Article
- Google Scholar
10. Dörner S, Cammerer S, Hoydis J, ten Brink S. Deep learning based communication over the air. IEEE J Select Topics Signal Process. 2018;12(1):132–43.
- View Article
- Google Scholar
11. Ye H, Li GY, Juang B-H. Power of deep learning for channel estimation and signal detection in OFDM systems. IEEE Wireless Commun Lett. 2018;7(1):114–7.
- View Article
- Google Scholar
12. Aoudia FA, Hoydis J. DeepRx: a deep learning receiver. IEEE J Selected Areas in Commun. 2022;40(1):45–58.
- View Article
- Google Scholar
13. Eswaramoorthi V, Chen Y, Zhang W. OAMP-Net: Deep unfolding for massive MIMO detection. IEEE Transact Communicat. 2024;72(2):789–804.
- View Article
- Google Scholar
14. Lau KW, Chan SC, Zhang L. Graph neural networks for signal detection in wireless communications. Neurocomputing. 2024;571:127167.
- View Article
- Google Scholar
15. Wang Z, Liu H, Zhang Y. Transformer-based MIMO detection with learned attention mechanisms. IEEE Transact Wireless Commun. 2023;22(3):1456–70.
- View Article
- Google Scholar
16. 3GPP. Study on channel model for frequencies from 0.5 to 100 GHz. TR 38.901. 3rd Generation Partnership Project; 2020.
17. Proakis JG, Salehi M. Digital communications. 5th ed. New York: McGraw-Hill; 2007.
18. Foschini GJ, Gans MJ. On limits of wireless communications in a fading environment when using multiple antennas. Wireless Personal Commun. 1998;6(3):311–35.
- View Article
- Google Scholar
19. Telatar E. Capacity of multi‐antenna gaussian channels. Trans Emerging Tel Tech. 1999;10(6):585–95.
- View Article
- Google Scholar
20. Hien Quoc N, Larsson EG, Marzetta TL. Energy and spectral efficiency of very large multiuser MIMO systems. IEEE Trans Commun. 2013;61(4):1436–49.
- View Article
- Google Scholar
21. Oh SW. Performance analysis of MIMO systems with antenna correlation. J Commun Networks. 2017;19(6):561–76.
- View Article
- Google Scholar
22. Samimi MK, Rappaport TS. 3-D millimeter-wave statistical channel model for 5g wireless system design. IEEE Trans Microwave Theory Techn. 2016;64(7):2207–25.
- View Article
- Google Scholar
23. Alkhateeb A, El Ayach O, Leus G, Heath RW. Channel estimation and hybrid precoding for millimeter wave cellular systems. IEEE J Sel Top Signal Process. 2014;8(5):831–46.
- View Article
- Google Scholar
24. Clarke RH. A statistical theory of mobile-radio reception. Bell System Tech J. 1968;47(6):957–1000.
- View Article
- Google Scholar
25. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), 2016. 770–8. https://doi.org/10.1109/cvpr.2016.90
26. Zhang S, Li Y, Wang Z. Residual learning for channel estimation in massive MIMO systems. ISA Transactions. 2023;135:218–34.
- View Article
- Google Scholar
27. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN. Attention is all you need. In: Advances in neural information processing systems, Long Beach, CA: 2017. 5998–6008.
28. Shoukat IA, Qureshi HK, Sarker MZ. Deep unfolding for communications systems: a survey and some new directions. IEEE Commun Surv Tutor. 2024;26(1):567–89.
- View Article
- Google Scholar
29. Takabe S, Wadayama T. Trainable ISTA for wireless MIMO detection. IEEE Transact Signal Process. 2021;69:4567–72.
- View Article
- Google Scholar
30. Rusek F, Persson D, Buon Kiong Lau, Larsson EG, Marzetta TL, Tufvesson F. Scaling up MIMO: opportunities and challenges with very large arrays. IEEE Signal Process Mag. 2013;30(1):40–60.
- View Article
- Google Scholar
31. Xie H, Gao F, Jin S, Fang J, Liang Y-C. Channel estimation for TDD/FDD massive MIMO systems with channel covariance computing. IEEE Trans Wireless Commun. 2018;17(6):4206–18.
- View Article
- Google Scholar
32. Muller RR, Cottatellucci L, Vehkapera M. Blind pilot decontamination. IEEE J Sel Top Signal Process. 2014;8(5):773–86.
- View Article
- Google Scholar
33. Upadhya K, Vorobyov SA, Vehkapera M. Superimposed pilots are superior for mitigating pilot contamination in massive MIMO. IEEE Trans Signal Process. 2017;65(11):2917–32.
- View Article
- Google Scholar
34. Alkhateeb A. DeepMIMO: A generic deep learning dataset for millimeter wave and massive MIMO applications. In: 2019 IEEE Global conference on signal and information processing (GlobalSIP), 2019. 1–5.

[ref1] 1. Saad W, Bennis M, Chen M. A vision of 6G wireless systems: applications, trends, technologies, and open research problems. IEEE Network. 2019;34(3):134–42.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Larsson EG, Edfors O, Tufvesson F, Marzetta TL. Massive MIMO for next generation wireless systems. IEEE Commun Mag. 2014;52(2):186–95.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Marzetta TL. Noncooperative cellular wireless with unlimited numbers of base station antennas. IEEE Trans Wireless Commun. 2010;9(11):3590–600.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Goldsmith A. Wireless communications. Cambridge: Cambridge University Press; 2005.

[ref5] 5. Tse D, Viswanath P. Fundamentals of wireless communication. Cambridge: Cambridge University Press; 2005.

[ref6] 6. Shlezinger N, Whang J, Eldar YC, Dimakis AG. Model-based deep learning. In: 2020 IEEE International conference on acoustics, speech and signal processing (ICASSP), Barcelona, Spain: 2020. 4617–21.

[ref7] 7. Bockelmann C, Pratas N, Nikopour H, Au K, Svensson T, Stefanovic C, et al. Massive machine-type communications in 5g: physical and MAC-layer solutions. IEEE Commun Mag. 2016;54(9):59–65.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref8] 8. Lu L, Li GY, Swindlehurst AL, Ashikhmin A, Zhang R. An overview of massive MIMO: benefits and challenges. IEEE J Sel Top Signal Process. 2014;8(5):742–58.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref9] 9. O’Shea T, Hoydis J. An introduction to deep learning for the physical layer. IEEE Trans Cogn Commun Netw. 2017;3(4):563–75.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref10] 10. Dörner S, Cammerer S, Hoydis J, ten Brink S. Deep learning based communication over the air. IEEE J Select Topics Signal Process. 2018;12(1):132–43.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref11] 11. Ye H, Li GY, Juang B-H. Power of deep learning for channel estimation and signal detection in OFDM systems. IEEE Wireless Commun Lett. 2018;7(1):114–7.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref12] 12. Aoudia FA, Hoydis J. DeepRx: a deep learning receiver. IEEE J Selected Areas in Commun. 2022;40(1):45–58.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref13] 13. Eswaramoorthi V, Chen Y, Zhang W. OAMP-Net: Deep unfolding for massive MIMO detection. IEEE Transact Communicat. 2024;72(2):789–804.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref14] 14. Lau KW, Chan SC, Zhang L. Graph neural networks for signal detection in wireless communications. Neurocomputing. 2024;571:127167.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref15] 15. Wang Z, Liu H, Zhang Y. Transformer-based MIMO detection with learned attention mechanisms. IEEE Transact Wireless Commun. 2023;22(3):1456–70.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref16] 16. 3GPP. Study on channel model for frequencies from 0.5 to 100 GHz. TR 38.901. 3rd Generation Partnership Project; 2020.

[ref17] 17. Proakis JG, Salehi M. Digital communications. 5th ed. New York: McGraw-Hill; 2007.

[ref18] 18. Foschini GJ, Gans MJ. On limits of wireless communications in a fading environment when using multiple antennas. Wireless Personal Commun. 1998;6(3):311–35.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref19] 19. Telatar E. Capacity of multi‐antenna gaussian channels. Trans Emerging Tel Tech. 1999;10(6):585–95.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref20] 20. Hien Quoc N, Larsson EG, Marzetta TL. Energy and spectral efficiency of very large multiuser MIMO systems. IEEE Trans Commun. 2013;61(4):1436–49.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref21] 21. Oh SW. Performance analysis of MIMO systems with antenna correlation. J Commun Networks. 2017;19(6):561–76.
View Article
Google Scholar

[52] View Article

[53] Google Scholar

[ref22] 22. Samimi MK, Rappaport TS. 3-D millimeter-wave statistical channel model for 5g wireless system design. IEEE Trans Microwave Theory Techn. 2016;64(7):2207–25.
View Article
Google Scholar

[55] View Article

[56] Google Scholar

[ref23] 23. Alkhateeb A, El Ayach O, Leus G, Heath RW. Channel estimation and hybrid precoding for millimeter wave cellular systems. IEEE J Sel Top Signal Process. 2014;8(5):831–46.
View Article
Google Scholar

[58] View Article

[59] Google Scholar

[ref24] 24. Clarke RH. A statistical theory of mobile-radio reception. Bell System Tech J. 1968;47(6):957–1000.
View Article
Google Scholar

[61] View Article

[62] Google Scholar

[ref25] 25. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), 2016. 770–8. https://doi.org/10.1109/cvpr.2016.90

[ref26] 26. Zhang S, Li Y, Wang Z. Residual learning for channel estimation in massive MIMO systems. ISA Transactions. 2023;135:218–34.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref27] 27. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN. Attention is all you need. In: Advances in neural information processing systems, Long Beach, CA: 2017. 5998–6008.

[ref28] 28. Shoukat IA, Qureshi HK, Sarker MZ. Deep unfolding for communications systems: a survey and some new directions. IEEE Commun Surv Tutor. 2024;26(1):567–89.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref29] 29. Takabe S, Wadayama T. Trainable ISTA for wireless MIMO detection. IEEE Transact Signal Process. 2021;69:4567–72.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref30] 30. Rusek F, Persson D, Buon Kiong Lau, Larsson EG, Marzetta TL, Tufvesson F. Scaling up MIMO: opportunities and challenges with very large arrays. IEEE Signal Process Mag. 2013;30(1):40–60.
View Article
Google Scholar

[75] View Article

[76] Google Scholar

[ref31] 31. Xie H, Gao F, Jin S, Fang J, Liang Y-C. Channel estimation for TDD/FDD massive MIMO systems with channel covariance computing. IEEE Trans Wireless Commun. 2018;17(6):4206–18.
View Article
Google Scholar

[78] View Article

[79] Google Scholar

[ref32] 32. Muller RR, Cottatellucci L, Vehkapera M. Blind pilot decontamination. IEEE J Sel Top Signal Process. 2014;8(5):773–86.
View Article
Google Scholar

[81] View Article

[82] Google Scholar

[ref33] 33. Upadhya K, Vorobyov SA, Vehkapera M. Superimposed pilots are superior for mitigating pilot contamination in massive MIMO. IEEE Trans Signal Process. 2017;65(11):2917–32.
View Article
Google Scholar

[84] View Article

[85] Google Scholar

[ref34] 34. Alkhateeb A. DeepMIMO: A generic deep learning dataset for millimeter wave and massive MIMO applications. In: 2019 IEEE Global conference on signal and information processing (GlobalSIP), 2019. 1–5.

Figures

Abstract

Introduction

Contributions

Materials and methods

System model and assumptions

Channel modelling framework

Small-scale fading models

Standardised and correlated models

Residual-aided autoencoder architecture

User-side encoder.

Base station decoder.

Graph transformer module with sparse attention

Iterative residual refinement

Connection to classical SOR

Graph construction details

Theoretical analysis of convergence

Loss function

Training methodology

Curriculum learning strategy.

Meta-learning initialisation.

Complexity and scalability analysis

Graph-system mapping.

Linear detectors (MMSE).

Proposed DU-SOR network

Conditions and limitations

Experimental setup

Results

CSI-free operation and performance (RQ1)

Hardware resources and energy (RQ2)

Robust generalisation (RQ3)

Robustness under impairments.

Ablation studies.

Graph sparsity (k).

Discussion

Technical resilience (RQ1 and G1)

Sustainability and scalability (RQ2 and G2)

Generalisation and practicality (RQ3 and G3)

Practical deployment considerations.

Limitations and scope

Future directions

Conclusion

Supporting information

S1 Fig. Extended BLER performance curves.

S2 Fig. Training convergence analysis.

S1 Table. Detailed hyperparameter settings.

S3 Fig. Empirical Lipschitz constant verification.

S2 Table. Complexity comparison under varying conditions.

Acknowledgments

References