Empowering precise advertising with Fed-GANCC: A novel federated learning approach leveraging Generative Adversarial Networks and group clustering

In the realm of targeted advertising, the demand for precision is paramount, and the traditional centralized machine learning paradigm fails to address this necessity effectively. Two critical challenges persist in the current advertising ecosystem: the data privacy concerns leading to isolated data islands and the complexity in handling non-Independent and Identically Distributed (non-IID) data and concept drift due to the specificity and diversity in user behavior data. Current federated learning frameworks struggle to overcome these hurdles satisfactorily. This paper introduces Fed-GANCC, an innovative federated learning framework that synergizes Generative Adversarial Networks (GANs) and Group Clustering. The framework incorporates a user data augmentation algorithm predicated on adversarial generative networks to enrich user behavior data, curtail the impact of non-uniform data distribution, and enhance the applicability of the global machine learning model. Unlike traditional approaches, our framework offers user data augmentation algorithms based on adversarial generative networks, which not only enriches user behavior data but also reduces the challenges posed by non-uniform data distribution, thereby enhancing the applicability of the global machine learning (ML) model. The effectiveness of Fed-GANCC is distinctly showcased through experimental results, outperforming contemporary methods like FED-AVG and FED-SGD in terms of accuracy, loss value, and receiver operating characteristic (ROC) indicators within the same computing time. Experimental results vindicate the effectiveness of Fed-GANCC, revealing substantial enhancements in accuracy, loss value, and receiver operating characteristic (ROC) metrics compared to FED-AVG and FED-SGD given the same computational time. These outcomes underline Fed-GANCC’s exceptional prowess in mitigating issues such as isolated data islands, non-IID data, and concept drift. With its novel approach to addressing the prevailing challenges in targeted advertising such as isolated data islands, non-IID data, and concept drift, the Fed-GANCC framework stands as a benchmark, paving the way for future advancements in federated learning solutions tailored for the advertising domain. The Fed-GANCC framework promises to offer pivotal insights for the future development of efficient and advanced federated learning solutions for targeted advertising.


Introduction
This paper proposes and investigates a new framework-the Federated Generative Adversarial Network and Clustering Aggregation (Fed-GANCC)-to address the challenges encountered during the rapid growth of the global internet advertising market.The growth rate of the advertising market has increased by approximately 20%, from $250 billion in 2017 to $430 billion in 2021 [1][2][3][4].However, challenges such as privacy leaks, high communication costs, significant data discrepancies, and the formation of isolated data islands, non-identically independently distributed (non-IID) data, and concept drift persist during this growth.
The growth of the global internet advertising market is challenged by the formation of isolated data islands.An isolated data island refers to the situation where data is trapped in silos and is not easily shared or integrated with other data.This phenomenon occurs primarily due to three reasons: user reluctance to share data, strict data sharing policies, and high communication costs.In particular, amidst the growing emphasis on data privacy, a solution that complies with the EU's General Data Protection Regulation and China's Internet Personal Information Security Protection Guidelines is necessary [5,6].Moreover, even with the widespread implementation of 5G, coping with the substantial user behavior data generated by the 4.9 billion internet users in 2021 [7] and an average weekly online time of 28.5 hours [8] remains a challenge.
To address these challenges, we employ federated learning, Generative Adversarial Networks (GANs), and multi-center federated learning.Federated learning provides a secure, decentralized solution for exchanging user behavior data [9,10].However, due to the unique behavior patterns of each user and the unique features of the data stored on their devices, the data distributions on individual user devices may differ from the overall data distribution, resulting in non-IID data distribution in targeted advertising scenarios [11][12][13].Hence, we introduce GANs to address non-IID data.However, to better cope with the issue of concept drift, multi-center federated learning is also required.
Therefore, how to speed up the training of the global ML model and improve its accuracy has become a new focal point, which also brings challenges to non-IID data.Another significant challenge in targeted advertising is concept drift.Concept drift occurs when the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways.This can cause previously accurate predictive models to become less accurate as time goes on.Specifically, federated learning aims to accomplish exact ad distribution using the ML model parameters of numerous users.In theory, the parameters of different users should correspond to different global models.However, the shifting behaviors and preferences of users can lead to this drift, making it difficult to maintain a consistent and accurate global ML model.To comprehensively address the challenges of isolated data islands, non-IID data, and concept drift in targeted advertising, this paper proposes the Federated Generative Adversarial Network and Clustering Aggregation (Fed-GANCC) framework.Our approach not only facilitates efficient distributed model computation but also places paramount importance on preserving data privacy.

Literature review
This section sets out to establish a comprehensive understanding of the existing body of knowledge surrounding the application of machine learning and federated learning in the context of targeted advertising.The ensuing discussion evaluates the methodological approaches, theoretical underpinnings, and practical implications of previous works, illuminating their merits and limitations.

Machine learning in targeted advertising
Machine learning has surfaced as an instrumental tool in managing multi-party data sharing, an integral part of targeted advertising.Shi and Gu's work highlighted how using balanced sampling and ensemble learning can improve accuracy while minimizing communication overhead [14].However, Seker contended that such an approach may not be scalable given the exponential growth of data, proposing instead a predictive model that utilizes feature vectors from advertisement-related data [15].Kuppusamy extended this line of inquiry by adding a mechanism to adaptively update advertisement recommendations according to user preferences [16].Furthermore, Malhi et al. leveraged machine learning to tune advertisement deployment parameters, enhancing prediction model accuracy [17].
Despite these advances, challenges persist in improving the accuracy and efficiency of model iterations.Furthermore, the control of user data and the training process by a central server presents potential privacy risks.

Federated learning in targeted advertising
Federated learning emerges as a viable solution to some of these challenges, providing a decentralized architecture that ensures user autonomy in the training process [18].Several scholars have devised federated learning frameworks tailored for targeted advertising.For instance, Kumar et al. employed a multi-signal joint approach to create an advertisement data market, improving accuracy while reducing loss [19].However, its stability is yet to be evaluated under diverse datasets.Jiang et al. utilized a federated sponsorship search auction mechanism, attracting more data providers and facilitating the federated learning ecosystem [20].Still, Wu et al. pointed out the limited applicability of this approach in predicting native advertisements and proposed a privacy-preserving framework incorporating distributed model and user behavior [21].
Nevertheless, the problem of non-Identically Distributed (non-IID) data remains underaddressed in most federated learning frameworks.The personal biases embedded in user behavior data contribute to a significant disparity in data distribution, resulting in divergences in parameter updates.Furthermore, these frameworks lack mechanisms to mitigate concept drift.
The review of these literatures elucidates the need for a more robust solution to address the pressing challenges in targeted advertising, including data privacy, non-IID data distribution, and concept drift, which this study aims to address.

The role of federated learning and gans in addressing data heterogeneity and user model bias
In order to address data heterogeneity and reduce bias in user models, many innovative solutions have been proposed in the literature.For instance, the work of Li et al. highlights the use of the Fed-Prox algorithm.This approach extends the Fed-Avg methodology by introducing a regularization term in the user's local loss function, with the aim to minimize the discrepancy between local and global models [22].
Despite its merits, the improvements offered by Fed-Prox over Fed-Avg are relatively modest.In a response to this, Wang et al. argued that factors such as varying numbers of user's local updates, different optimization algorithms, and diverse data distributions, could lead to optimization target bias issues, subsequently decreasing the convergence rate of distributed optimization algorithms [23].To tackle this, they proposed Fed-Nova, an approach that normalizes gradients of each user's local model before performing a global update, thereby rectifying the optimization target bias.
Turning to specific use cases, Zhu et al. employed Fed-OVA, a method designed for the CIFAR-10 dataset.Their approach involved replacing ten-category classifiers with ten binary classifiers, each dedicated to determining the probability that an image belongs to a certain category [24].While Fed-OVA effectively mitigates the effects of label distribution bias, it lacks flexibility in cases where a user's local data contains only one label.
The special issue of concept bias remains largely unaddressed by existing methods.To fill this gap, the Fed-MA algorithm, proposed by Wang et al., leverages model aggregation and aligns the neural structure layer-by-layer [25].Another novel approach, developed by Yu et al., uses grouped convolution layers to separate features and aggregate them according to the features' belonging groups, achieved via intra-group weighted averaging [26].However, it's worth noting that the Fed-MA algorithm is complex to implement and alters the structure of the original neural network.
To address these complex challenges, Xie et al. introduced a concept known as multi-center federated learning.They implemented the Federated Stochastic Expectation Maximization (Fe-SEM) algorithm in their study [27].
Fig 1 visually represents search results from the Web of Science Core Collection for "Federated Learning," "Advertising," and "GAN" spanning the last five years.The number of publications for these three topics during this time were 2957, 4595, and 8516, respectively.These numbers suggest an increasing interest in these research areas.However, there appears to be a significant lack of research at the intersection of these areas, indicating a promising direction for future study.• The Central Server: Serving as the key processing unit, the Central Server is furnished only with the parameters of the user's ML model.It is accountable for strenuous computations such as the generation of adversarial models and execution of personalized group aggregation computations.In our model, we assume that it possesses unlimited computational power and storage capacity.

System model
• The Application Provider (AP): The APs, forming connections with the Central Server, facilitate communication and share the user's ML model parameters.Their primary responsibilities encompass collecting, uploading model parameters and executing moderately complex computations like creating classification layers based on user's ML model features.We presuppose that APs possess ample computational power and storage capacity to support these tasks.
• The User Nodes: The user nodes, responsible for data augmentation using the adversarial model, model training and parameter uploading, connect to one AP at a time.
The Fed-GANCC framework excels in enabling personalized advertising without compromising user data privacy.E-commerce platforms, for instance, can use Fed-GANCC to provide tailored product recommendations based on user history, while content providers like streaming services can deliver personalized content suggestions, thereby enhancing user engagement.From a commercial standpoint, Fed-GANCC presents a unique value proposition: by decentralizing data processing and emphasizing model parameter sharing, businesses can improve their advertising strategies while ensuring compliance with data protection regulations.Such an approach can bolster user trust, elevate brand reputation, and potentially offer a competitive advantage.Consider a real-world deployment scenario where an international e-commerce platform aims to offer regional promotions based on diverse user preferences across countries.With varying data protection laws, centralizing user data becomes challenging.Here, Fed-GANCC can shineregional nodes process user data locally, sharing only model parameters with the central server.This decentralization enables the server to develop a global model that caters to varied user preferences, ensuring effective and regulatory-compliant targeted advertising.
The user-generated data, evolving from web browsing, online services or application usage, resides only at the user node.This gives rise to the user dataset D user .Given the independent generation of data by each user, the data distribution and volume across devices may not be identical and independent, thereby leading to distinct statistical characteristics that might affect the convergence of algorithms.
To mitigate this, the Central Server initially generates and distributes an adversarial model based on the training dataset.Consequently, users augment local data using this adversarial model.Following this, the enhanced local data D E aids in training the user ML model parameters θ user , which are then stored locally.After the successful upload of θ user , the AP receives these parameters, conducts feature extraction to generate a classification layer and subsequently uploads it to the Central Server.
The Central Server uses the user ML model's classification layer as a clustering object and applies a clustering algorithm to produce user groups, which are then allocated to their respective user groups.The set of users is defined as U, and a specific user is denoted as U k , where k = {1, 2, 3. ..}.The user U k 's sample data is represented as D k ¼ fX ðkÞ 1 ; X ðkÞ 2 ; . . .; X ðkÞ i g.The sample data volume of this user U k is denoted as S, and the total training sample volume is S sum ¼ P n i¼1 S. The symbols used later and their explanations are shown in Table 1.

Framework overview
The Fed-GANCC framework comprises three components: (1) the generative adversarial model at the central server; (2) the AP node's generative user model feature classification layer; (3) the user node, responsible for user behavior data collection, data enhancement, and personalized grouping aggregation.
• The generative adversarial model creates data similar to the training set by generating samples z * p z (x) from training data g * p g (x).The adversarial model generates a public dataset, evaluated by the DMM distance metric between the generated and training datasets.
• A conditional generative adversarial network can be used to balance local user data category distribution by augmenting missing data, thereby improving data quality.
• The central server performs personalized grouping aggregation.However, in targeted advertising, the FedAvg algorithm may suffer from bias and reduced accuracy due to the nonindependent and non-identically distributed user behavior data.

Generative Adversarial Network scheme
Motivation for the Generative Adversarial Network.
• Non-IID data causes inefficiencies in federated learning algorithms.This divergence in the FedAvg algorithm results from biased data distribution estimation by user nodes [28].
• When using GANs, we opted for a third-party dataset scheme for better privacy, despite potential local ML parameter inconsistencies.We enhanced this scheme by managing GAN models centrally and performing uniform adjustments for stable, high-quality data generation [29].
• Traditional GANs use the Inception Score (IS) as an evaluation index, which has certain limitations.Instead, we use Maximum Mean Discrepancy (MMD) to measure training and generated dataset distances.Lower MMD values indicate better GAN performance.
To tackle these, data can be augmented to alleviate user nodes' data distribution discrepancies.This method relies on data augmentation principles and is implemented by adding extra data to the server through GANs, bypassing user node access limitations.The validity of this approach will be further explored in the experimental section.
The GAN-based procedure.Fig 3 portrays the user behavior data augmentation process using Generative Adversarial Networks (GANs).The central server first dispatches a GAN model to the Clients.This model, grounded on a third-party public dataset, is designed to mitigate the non-IID nature of the user behavior data stored in the clients.Using this dispatched GAN model, Clients then create synthetic data, which enriches their local user behavior data.
The optimal equilibrium for GAN can be represented as: Rewriting the above equation in terms of variables DN, G z , and the newly introduced This max-min problem is then recast into: Our aim is to construct a model that closely matches the actual data distribution.We quantify this similarity using the Maximum Mean Discrepancy (MMD) distance, computed using a Gaussian kernel function K xy (x, y): The MMD distance is then: The true expectation is computationally intractable, thus we resort to sample estimation for the MMD value.Given n samples x 1 , . .., x n from the training set and n samples y 1 , . .., y n generated by G z , the estimated MMD is: Given the unavailability of the true data distribution Pdata(x), we approximate it using the empirical distribution of the training data.This allows the generated samples to exhibit a feature distribution similar to the training samples.

Clustering-based classification layer scheme
Motivation for classification layer clustering.Federated Learning (FL) embodies an innovative distributed learning method protecting user data privacy by leveraging local Machine Learning (ML) parameters without direct user data access.Nonetheless, FL's effectiveness diminishes in real-world scenarios such as targeted advertisement recommendations, where non-IID data presents significant challenges.The inherent limitation of FL approaches lies in their inability to account for disparities in user data distributions while updating a single global model by aggregating user-specific gradients.
We address this issue through the implementation of a cluster-based federated learning approach, drawing on the success of such methods in combating concept drift.This strategy entails the allocation of users to different global models (or centers), enabling the amalgamation of multiple global models to capture non-IID data across diverse users.
Yet, the conventional clustering methods that anchor their strategies on the test accuracy of user models fall short in non-IID scenarios like targeted advertisement recommendation.These methods often group high accuracy models with extreme models, specifically those only equipped to perform certain types of operations, which clearly diverges from the optimal targeted ad delivery objective [30,31].
To address these shortcomings, we propose a novel approach that employs classification layer similarities as the clustering basis.This methodology facilitates the effective grouping of users with similar interests, consequently reducing model training time.The validity and effectiveness of this approach will be demonstrated in subsequent experimental sections.
Classification layer clustering-based procedure.This paper introduces a two-stage personalized aggregation and grouping strategy that leverages classification layer similarities, eschewing the conventional practice of using the model's test accuracy as the clustering foundation.The method initiates with users training their models utilizing improved user behavior data, which is subsequently uploaded to the Access Point (AP) node via the central server.The server executes the Federated Averaging (FedAvg) algorithm first, followed by feature extraction layer aggregation.
Here, θ t+1 represents the global model parameters for the (t+1) round, and p i symbolizes the i-th user node contributing to the computation.y i tþ1 are the user model parameters included in the calculation, while P I i¼1 denotes the total user count.The local model's classification layer serves as the clustering subject, with the server running the clustering algorithm to yield user grouping outcomes.Here, w * i is the i-th user group's optimal cluster model, and r ðiÞ k denotes user k's cluster membership (if n2S i , then r ðiÞ k = 1; otherwise, r ðiÞ k = 0).The User symbolizes the entire user set, argmin f k (x) indicates the variable value corresponding to the minimum of the objective function f k (x), and C represents the number of groups.
The classification layer undergoes grouping and aggregation to produce various cluster models, which are subsequently disseminated to the corresponding user groups.Here, argmin f n (x) signifies the variable value where the objective function f n (x) for user n is minimal, and S i denotes the i-th user group.The optimal cluster model for the i-th user group is represented by w Pseudocode of Fed-GANCC framework This section provides the pseudo-code of the Fed-GANCC framework, including the user behavior data enhancement module (see Algorithm 1) and the personalized grouping aggregation module node update part (see Algorithm 2).
Algorithm 1: where jY y i j represents the number of data labeled with y in the i-th user.The user behavior data enhancement part takes the adversarial network model G generated by Sever as input.Line 4 represents the calculation of the maximum number of label classes; line 5 represents the user traversing all the data label classes in the local; line 7-9 represent data enhancement for the data classes that need to be expanded respectively.
Algorithm 2 Personalized grouping aggregation module update part takes T, P, t = 0, and θ 0 are the inputs for the algorithm 2's personalized grouping aggregation module update component, where T denotes the number of global iterations, P, the total number of users, t, and θ 0 , the initial global model.Set B to reflect the data batch size for the vehicle node.Specifically, line 5-8 represents the local model training of each user and the upload of the local model; line 10 represents the running of the FedAvg algorithm and the aggregation of the feature extraction layer; line 9 represents the obtaining of the latest dataset; line 11-15 represents the taking of the classification layer of the local model as the clustering object, the running of the clustering algorithm to produce the user grouping results; the grouping and aggregation of the classification layer to generate a number of cluster models, which are then distributed to the corresponding user groups; The computation of the global model parameters and the associated performance metrics ACC, Loss, FPR, and TPR are shown on lines 16 through 25, and their return to the AP is shown on line 27.

Formal definition and property analysis of Generative Adversarial Networks (GANs)
Strict definition and basic properties of Generative Adversarial Networks.Definition 1 First, let us formally define Generative Adversarial Networks (GANs) from the perspective of deep learning.In a GAN, we have two key components: a generator G, whose objective is to generate data samples that are as realistic as possible, and a discriminator D, whose objective is to accurately distinguish between real data and data generated by the generator.Together, they form a dynamic "zero-sum game" framework.We can describe this process using the following min-max problem: Here, x represents real data, z represents noise sampled from a prior distribution, p data is the distribution of real data, and p z is the distribution of noise.In this process, the generator G and the discriminator D are optimized alternately until a Nash equilibrium is reached.
Enhancing user behavior data with Generative Adversarial Networks.In this section, we provide a detailed description of how to apply generative adversarial networks to enhance user behavior data.We take user behavior data as real data and optimize the GAN framework so that the generator G can generate data with similar statistical properties to real user behavior data, thereby achieving data augmentation.

Definition 2 When enhancing user behavior data, we can define a loss function L user based on the distribution difference between the generated data and the real data, aiming to minimize this difference:
Here, D KL denotes the Kullback-Leibler divergence, which measures the difference between two probability distributions.By continuously optimizing this loss function, we can make the generated data increasingly close to real user behavior data.

Formal definition and property analysis of group clustering algorithm
Strict definition and basic properties of group clustering algorithm.In this section, we formally define the Group Clustering algorithm and discuss its basic properties.Group Clustering is a specific type of clustering algorithm that aims to partition data points into different groups (or clusters) under certain constraints.These constraints are often determined by our understanding of the data and the requirements of the task.
Definition 3 Given a dataset D = x 1 , x 2 , . .., x n , Group Clustering can be seen as an optimization problem: Here, G = G 1 , G 2 , . .., k represents the partition of clusters, c i is the centroid of the i-th cluster, and d(x j , c i ) is a distance metric (e.g., Euclidean distance) that measures the distance between data point x j and cluster center c i .Our goal is to find a partition G that minimizes this objective function.
Group clustering with data generated by Generative Adversarial Networks (GANs).In this section, we will elaborate on how to perform Group Clustering using data generated by Generative Adversarial Networks (GANs).By generating data through GANs, we can obtain richer and more diverse user behavior patterns, which helps us classify users more accurately.
Definition 4 Assuming we have generated a set of data D 0 ¼ x 0 1 ; x 0 2 ; . . .; x 0 m using GANs, our Group Clustering problem can be extended to the following optimization problem:

Properties of parameter solution set satisfying Generative Adversarial Networks (GANs) and group clustering conditions
Theorem: Existence of parameter solution set.Theorem 1 For the parameter space Θ that satisfies the defined conditions of Generative Adversarial Networks (GANs) and Group Clustering, there exists a non-empty open set whose closure is a convex set satisfying the constraints, denoted as Θ 0 .
[Proof of Theorem 1] In order to prove the theorem, we start by building a mathematical model for the parameter solutions of GANs and Group Clustering algorithms.First, we define the parameter vector θ = (θ 1 , θ 2 , . .., θ p ) and construct the following optimization problem: Here, L(D, G, θ) denotes the loss function, D is the original dataset, G refers to the dataset generated by GANs, and C(θ)�0 represents the constraints for GANs and the Group Clustering algorithm.
Assumption 1: The loss function L(D, G, θ) is continuous and bounded within the domain of the parameter solutions.Let's further break down this assumption: Assumption 2: The constraint C(θ)�0 defines a closed feasible region.In mathematical terms: Assumption 3: The parameter space Θ is a complete metric space: According to the Baire category theorem, the parameter space Θ must contain a non-empty open set, whose closure is a convex set.Let's denote this set as Θ 0 .
We then need to prove that within Θ 0 , there exists θ* 2 Θ 0 such that L(D, G, θ) � L(D, G, θ) for all θ 2 Θ 0 satisfying C(θ)�0.Therefore, θ* is the solution set that satisfies the defined conditions of GANs and Group Clustering.
This completes the proof of the existence of a parameter solution set that satisfies the conditions defined for GANs and Group Clustering.Further analysis on uniqueness and stability of solutions will require more in-depth research.
Theorem: Boundedness of parameter solution set.Theorem 2 For the parameter space Θ 0 that satisfies the defined conditions of Generative Adversarial Networks (GANs) and Group Clustering, we can find a finite constant M > 0 such that for any θ 2 Θ 0 , we have |θ| � M.
[ Next, we proceed by contradiction.Suppose Θ 0 is unbounded, then there exists a sequence Based on Assumption 5, we know that F(θ n ) is bounded, which means there exists a constant F > 0 such that |F(θ n )| � F holds for all n.According to the Heine-Borel Theorem, if a function F is bounded, then its domain must be bounded.
Therefore, by Assumption 5, we can find a sequence and F(x n ) is bounded.This contradicts Assumption 4 because if |x n |!1, then according to Assumption 4, we cannot find a constant L > 0 such that L(D, G, x n )�L holds for all n.
Thus, we can conclude that the parameter space Θ 0 that satisfies the defined conditions of GANs and Group Clustering must be bounded.This completes the proof of the boundedness of the parameter solution set.
Theorem: Convergence of parameter solution set.Theorem 3 In the parameter space Θ 0 which fulfills the conditions of the Generative Adversarial Networks (GANs) and Group Clustering, a sequence ðy n Þ  In the first part of the proof, we aim to demonstrate the continuity of F over Θ 0 .For any � > 0, there exists δ > 0 such that when |θ 0 − θ| < δ, it leads to |F(θ 0 ) − F(θ)| < �.This completes the proof of F's continuity on Θ 0 .
Next, we demonstrate the convergence of the parameter solution set.As per the definition of function continuity, there exists a sequence Lastly, we need to show that the sequence ðFðy n ÞÞ 1 n¼1 also converges to F(θ).For any � > 0, as F is continuous on Θ 0 , there exists N 2 N such that for all n > N, we have |F(θ n ) − F(θ)| < �.Hence, the sequence ðFðy n ÞÞ 1 n¼1 converges to F(θ).
This concludes the proof.Corollary: Closure property of parameter solution set satisfying GANs and group clustering conditions.Corollary 1 Let Θ 0 be a parameter solution set that satisfies the existence (P e ), boundedness (P b ), and convergence (P c ) properties as proven under the conditions of GANs and Group Clustering.Then, the closure � Y 0 of the parameter solution set Θ 0 will also possess these properties.The importance of this corollary in the paper lies in the fact that the existence, boundedness, and convergence properties of the parameter solution set are crucial in the parameter optimization process when using the federated learning framework for targeted advertising.This corollary further ensures that these properties are maintained in the closure of the parameter solution set, providing theoretical guarantees for finding global optimal solutions in a distributed setting.
[Proof of Corollary 1] Given that we have already proven the properties of the parameter solution set Θ 0 under conditions P e , P b , and P c , we now aim to show that these properties also hold for the closure � Y 0 of Θ 0 .
Existence: Since Θ 0 is non-empty, by the definition of closure, we have � Y 0 6 ¼ ;.Boundedness: As Θ 0 is bounded, there exists R > 0 such that for all θ 2 Θ 0 , we have |θ|�R.According to the properties of closure, for any � y 2 � Y 0 , there exists a sequence of θ converging to � y within Θ 0 .Therefore, we have j � yj � R, implying that � Y 0 is bounded.Convergence: Considering any convergent sequence of Θ 0 , its limit point must belong to � Y 0 .Thus, any limit point of a convergent sequence of Θ 0 also belongs to � Y 0 , proving the convergence property of � Y 0 .Thus, we have shown that if the parameter solution set Θ 0 satisfies P e , P b , and P c , then the closure � Y 0 of Θ 0 will also possess these properties.

Dataset description
We

Experimental parameter settings
The specific configuration of the experimental platform is: 12th Gen Intel(R) Core(TM) i7-12700H CPU@ 2.30GHz, DDR4 2933MHz 16GB memory, Windows 11 Home Edition 64-bit operating system, The experimental software uses python 3.6, pytorch 1.4.0+cpu and Keras 2.4.3.This experiment uses LSTM to verify the advertising click-through rate dataset.The LSTM framework structure is an input layer, a hidden layer, and an output layer.The specific experimental parameter settings are shown in Table 2.We compared the Acc, Loss, and ROC curves of the Fed-AVG, Fed-SGD, Fed-prox, and Fed-GANCC.As evidenced in Fig 4, only Fed-Sgd and Fed-GANCC have their precision steadily increasing with the number of global ML model rounds across all frameworks, with the enhancement of Fed-sdg being negligible.Moreover, the Fed-GANCC model exhibits an accuracy that is approximately 30% higher than the Fed-avg model on the ADAC dataset.When using the CTRD dataset, this difference increased to 40%.Furthermore, the Fed-prox framework was observed to be highly unstable during the experiment, and the Fed-sgd framework performed poorly.We believe this is mainly due to the strong tendency that the user uploaded local models have, resulting in both models failing to adapt.

• Accuracy Results
• Loss Results

• ROC Results
We compared the ROC of the above six frameworks as shown in Fig 6 .Fed-sgd and Fedprox were both below the dashed line in both datasets, indicating that their performance was lower than random prediction and had low learning efficiency.Based on the comparison of these frameworks, only FED-AVG performed well, but their performance was still much weaker than the Fed-GANCC framework.
Regardless of whether the ADAC or CTRD dataset is used, the Fed-GANCC framework is able to maintain a high level of user prediction, outperforming several common federated learning frameworks such as FED-AVG, FED-SGD, and FED-prox.

Discussion
Despite the promising results demonstrated by Fed-GANCC, it's essential to address its limitations.For instance, while the Generative Adversarial Network scheme is proficient at addressing data distribution disparities, it may require substantial computational resources, potentially limiting its application in resource-constrained environments.Additionally, the Classification Layer Clustering might not be universally applicable for all types of data drift scenarios, and further studies would be required to tailor it for specific applications.Regarding the applicability of Fed-GANCC, while it excels in environments where data distribution is heterogeneous and subject to drift, it might not be the most suitable for homogeneous data distribution settings or scenarios where computational resources are sparse.However, in domains like targeted advertising where user behavior is dynamic and evolves over time, Fed-GANCC stands as a potent tool.

Conclusion
The challenges posed non-IID and concept drift in targeted advertising have resulted in poor performance of traditional federated learning frameworks.To mitigate these challenges, this paper introduces a novel federated learning framework, named Fed-GANCC, which combines the principles of Generative Adversarial Networks and Classification Layer Clustering.To address the data distribution gap between user nodes, the framework employs a Generative Adversarial Network scheme.Additionally, the framework leverages a Classification Layer Clustering scheme to handle the adversarial effects of concept drift in the targeted advertising domain.An evaluation of the Fed-GANCC framework was conducted using a benchmark dataset.The comparison of the performance of Fed-GANCC with existing baseline frameworks revealed that the proposed framework achieved better results than the baseline approaches, as evidenced by metrics such as Accuracy, Loss, and ROC curve.Our findings demonstrate the potential of Fed-GANCC in addressing the issues of isolated data islands, non-IID, and concept drift in targeted advertising and make a substantial contribution to the area of federated learning.
Our study has paved the way for further research in enhancing federated learning frameworks for advertising.Possible future research directions include: 1)Exploring variations of the Generative Adversarial Network scheme tailored for different types of non-IID data.2) Investigating the scalability of Fed-GANCC for larger and more diverse datasets.3) Designing robustness mechanisms against adversarial attacks in the context of federated learning.In conclusion, our findings not only highlight the potential of Fed-GANCC in addressing the challenges in targeted advertising but also provide a balanced view of its strengths and areas for future enhancements, thereby solidifying its position as a noteworthy contribution to federated learning.

Fig 2
Fig 2 represents the proposed Fed-GANCC framework.It consists of three core entities: the Central Server, the Application Provider (AP), and the User Node, with the sole transmission of user ML model parameters while ensuring data retention at the user nodes.

Fig 2 .
Fig 2. System model.https://doi.org/10.1371/journal.pone.0298261.g002 In this process, the mathematical model of adversarial learning takes on the form of a zerosum game.The players of this game are the Discriminator function DN(X): R n ![0, 1] and the generator's Gz function: R d !R n .The generator creates sample Gz from random sample z following the Rn distribution o, intending to mimic the distribution of the training samples.In contrast, the discriminator DN aspires to separate these synthetic samples from the training samples drawn from distribution c.This adversarial dynamic is captured by the following loss

Proof of Theorem 2 ] 0 ! R. Assumption 4 :Assumption 5 :
We set a new set of assumptions, shifting the focus of the proof to the loss function L(D, G, θ) and the function F : Y The loss function L(D, G, θ) is continuous and bounded, meaning that there exists a finite constant L > 0 such that L(D, G, θ)�L holds for all θ 2 Θ 0 .There exists a function F : Y 0 !R such that F(θ) = L(D, G, θ), and F is continuous and bounded on Θ 0 .

1
n¼1 � Y 0 can be constructed such that |θ n − θ| !0 as n ! 1.Here, θ is a parameter solution complying with our conditions.[Proof of Theorem 3] Continuing with our earlier defined loss function L(D, G, θ) and function F : Y 0 !R, we introduce new mathematical tools for the proof.

Fig 5
Fig 5 provides insights into the effect of global ML model training cycles on the stability of different federated learning frameworks.Specifically, as shown in the figure, the loss value of Fed-AVG tends to become increasingly unstable as the number of cycles increases across both datasets.Out of the four frameworks considered, only Fed-GANCC and Fed-prox exhibit relative stability, albeit with a noticeable gap between the two.

Table 1 .
Symbols. https://doi.org/10.1371/journal.pone.0298261.t001 User Behavior Data Enhancement Based on Generative Adversarial Network Algorithm 1 [33]oyed two click-through rate prediction datasets, Ali_Display_Ad_Click (ADAC) and Click-Through Rate Data (CTRD), respectively sourced from open data platform Ali Tianchi1 and Kaggle Online Advertising Click-Through Rate Prediction Competition2.ADAC dataset contains 20 fields and 704,319 data points, of which 80% was used as the training dataset and the remaining 20% as the test dataset.Kun Gai et al. proposed a Large Scale Piece-wise Linear Model (LS-PLM) for click-through rate (CTR) prediction in real-world business, and verified it with the ADAC dataset[32].Guorui Zhou et al. proposed a Deep Interest Network (DIN) for online advertising click-through rate prediction, and used the same dataset to validate the model[33].CTRD dataset consists of 17 fields and 80,000 training data points and 20,000 test data points.Both datasets are used for online advertising click-through rate prediction, and can effectively reflect whether the advertisement is effectively delivered to the specified people, thereby verifying the ability of Fed-GANCC model.Therefore, we selected these two datasets as the experiment dataset.