Federated learning based futuristic biomedical big-data analysis and standardization

Medical data processing and analytics exert significant influence in furnishing dependable decision support for prospective biomedical applications. Given the sensitive nature of medical data, specialized techniques and frameworks tailored for application-centric processing are imperative. This article presents a conceptualization for the analysis and uniformitarian of datasets through the implementation of Federated Learning (FL). The realm of medical big data stems from diverse origins, necessitating the delineation of data provenance and attribute paradigms to facilitate feature extraction and dependency assessment. The architecture governing the data collection framework is intricately linked to remote data transmission, thereby engendering efficient customization oversight. The operational methodology unfolds across four strata: the data origin layer, data acquisition layer, data classification layer, and data optimization layer. Central to this endeavor are multi-objective optimal datasets (MooM), characterized by attribute-driven feature cartography and cluster categorization through the conduit of federated learning models. The orchestration of feature synchronization and parameter extraction transpires across multiple tiers of neural networking, culminating in the provisioning of a steadfast remedy through dataset standardization and labeling. The empirical findings reflect the efficacy of the proposed technique, boasting an impressive 97.34% accuracy rate in the disentanglement and clustering of telemedicine data, facilitated by the operational servers within the ambit of the federated model.


I. Introduction
The biomedical data and relative sources for the creation of interdependent-data, termed as meta-data has increased exponentially.The biomedical data digitalization has improved the quality of service (QoS) and the decision capabilities of medical treatment and handling approaches.The data internally, has an enormous information with a classified and categorized parameter.Hence the demand for optimizing heaps of biomedical data is a challenging task.With the technological enhancements, the improvised approaches such as analytics, provide a reliable support to understand, classify and categorize the data into various sub-heaps of information indexing databases.Typically, the tread of data-optimization and classifying is backup with storage servers and accessing terminologies.The approaches can be tracked backed to Storage Area Networks (SAN) in storing and providing a reliable backup for uploaded information.With enhancements, the SAN based systems were developed to Cloud based system via interconnected servers.
The interconnected server nodes within the cloud ecosystem have established a pseudo networking link, resulting in the accumulation of a substantial volume of unclassified data.This phenomenon can be characterized as a consequential outcome of extensive data computational systems.The strategy of centralizing data storage and processing has garnered substantial support from a plethora of technological tools and methodologies.A significant challenge that arises pertains to the regulation of data volume and the management of data indexing flow originating from the centralized servers intended for communication purposes.A prospective solution involves a shift towards decentralizing data storage and confining processing activities primarily to edge devices.This shift is underpinned by the forthcoming paradigm of comprehensive data categorization and classification on a large scale.Furthermore, the augmentation of machine learning-based processing accentuates the necessity to critically assess and validate data stabilization and training models, all the while circumventing the need for centralized data storage.
The terminology of Federated Learning (FL) is based on the ideal concept of "NOT to store the data" hence it follows the approach of de-centralizing the data storage and processing via the participating devices.These devices are edge or terminating devices which either generates the data or participates in training the data models for reliable decision support.Within the confines of this article, an innovative approach to the field of biomedical big-data analytics is brought to the forefront.The primary objective of this approach revolves around the establishment of a dependable and high-performance model or framework.This model is meticulously designed to address the intricate challenges associated with the optimization and standardization of vast and intricate biomedical datasets across a spectrum of heterogeneous networking devices and intricate infrastructural components.
The significance of this proposed approach lies in its potential to revolutionize the manner in which biomedical data is harnessed, processed, and ultimately utilized for informed decision-making.By meticulously optimizing and standardizing data, the proposed model seeks to enhance data coherence, consistency, and reliability, thereby facilitating more accurate and meaningful analyses.In the context of modern healthcare, where a deluge of complex and diverse data is generated from various sources, ranging from medical imaging equipment to wearable devices, the need for a robust approach to harmonize and streamline these datasets is paramount.
The objective of proposed model is to bridge the gap by offering a systematic and coherent solution that caters to the intricacies of different data sources, networking protocols, and infrastructure topologies for biomedical big-data analysis and processing.Through a meticulous integration of cutting-edge techniques from fields such as data science, machine learning, and network engineering, this approach endeavors to unlock the true potential of biomedical big data.By providing a solid foundation for data optimization and standardization, this model stands poised to propel the field of biomedical analytics into a new era of enhanced insights and discoveries.The model is bound to develop the federated learning based technique for classifying and customizing the datasets.The article is organized with an introduction and literature survey in section I and II followed by a proposed model design in section III and mathematical proof of concept in section IV.The results and discussion with conclusion is discussed in section V and VI respectively.

II. Literature survey
The Federated Learning (FL) is an improved version of technological development built from machine learning and advance networking tools.The concept of federated learning (FL) is to process and train the model without storing the data into a designated server.The approach is to develop a sustainable model via remote/edge devices connected to form a networking setup in training and customizing the overall model with upgraded information and training sets.The various challenges, approaches and ecosystem for federated learning is discussed and reported by [1] using heterogeneous and massive networking ecosystem.The concept of implementing a distributed learning via FL in heterogeneous network is complicated at the scenario of biomedical data processing.Since the biomedical data is sensitive and classified under the mode of storing and accessing.Hence a defined and standard operational protocol is needed.The process is supported by customizing the Electronic Health Records (EHR).[2] discussed the provision for designing and developing EHR based on federated learning approaches.This approach is based on prediction of models under heterogeneous networks.
Since the data sharing and training the model by a contributive approach in federated learning models is based on edge devices.These devices are bound to operate in given security conditions, hence the security of FL based networking model is a concern.[3] has discussed various aspects and reality of security concerns under data privacy.The researchers have developed and validated these techniques [4][5][6] in FL ecosystem development and processing the data under a secure and reliable manner in a distributed ecosystem.The FL applications [7] are wider and have a larger scope of operation under the given technological limitations.The biomedical data processing is one of the most influential paradigm of FL based model learning as discussed in [8].The technique discusses the scope and future of digital health under Federated Learning ecosystem to support the fundamental arguments of information transmission and processing via remote/edge devices.
The dataset used in this experimental setup is Multi-Objective Optimal Medical (MooM) datasets [9] the dataset are processed and streamed into a standardized manner for faster, reliable and secure transmission under the networking ecosystem.The MooM datasets are claimed by a dedicated transferring protocol namely TelMED [10] to assure the data under dynamic user classification and clustering.The TelMED protocol is a reliable technique for distributed computation [10,11].These approaches internally support and validate the purpose of building a reliable and self-learning ecosystem for decision making.In this article, the proposed framework assures the reliable and sustainable model design using federated learning towards optimizing the datasets and channel data.The recommendation support is extended from a learning approach of big-data collected from application front [12,13].These datasets acquire a larger portion of server space as it is independent of medical data standards.
In recent times, many efforts are in progress to assure and propose medical standards for data transmission and representation on a server.The approach [14] of personalization based data generation in EHR and recommendation has improved the analysis and standardization.The approach assures the data is categorized into recommended categories and further segmented on server (proposed storage space).Further the standardization is discussed on fair data storage and utilization [15] with respect to biomedical datasets.[15] discusses on the reliability fragments of data storages and accessing.A dedicated standard recommendation [16] termed as Biomedical and Health Informatics (BHMI) is proposed by The International Medical Informatics Association (IMIA) for educating the biomedical data standardization across multiple platforms.
In general, the existing studies are dependent on the base foundation values such as architecture of storage system, configuration of servers, alignment of interdependency topologies and administrative privileges.These components provide an unrealistic approach for biomedical data transfer and standardization.Hence the proposed approach is developed with an objective to assure a reliable data standardization and analysis platform for biomedical big-datasets.

III. System design
The federated learning (FL) modes are based on remote/edge node training and contributive learning from the connected and inter-connected devices.The data fragmentation and supporting approaches are bound to perform a regional approach of data categorization and classifying under a given time internal.The proposed diagram is categorized into four layers as shown in Fig 1 .The primary and first layer is the 'Data Origin Layer (DOL)', it is responsible for data collection and processing via various edge/remote devices connected or participated in the processing of computational process.The devices data is categorized into the internal sources such as origin, instrumental or interdependent via multiple hopping.Each of data is loaded into the repository via an ordered arrangement of warehouse and dataset collections.This provides the ecosystem a stability to read, analysis, validate and record the information tracks and origin points.In the first layer, the overall contribution of data collection and processing is designed and developed.
The architectural composition advances with the incorporation of the 'Data Acquisition Layer (DAL)' as the second tier, closely followed by the 'Data Classification Layer (DCL)' in the third tier.These two layers synergistically intertwine, yielding a cohesive and synchronized framework essential for orchestrating the coordination and processing of data.Their cardinal objective is to effectuate the metamorphosis of raw, unstructured data residing within the repository into an organized and structured format.Within the realm of the 'Data Acquisition Layer (DAL),' the primary thrust lies in the utilization of elementary yet indispensable machine learning techniques, such as data filtering and alignment.These techniques play a pivotal role in imposing a coherent structure upon the incoming data influx.The data, post this preliminary transformation, is rendered amenable to subsequent layers, characterized by enhanced consistency and applicability.
Progressing to the subsequent stratum, the 'Data Classification Layer (DCL)' comes to the fore.Its raison d'être revolves around the classification and categorization of the preprocessed data into discrete classes or categories.This classification serves as a precursor to more intricate analyses and processing steps, as it empowers subsequent layers with a refined and intelligible input.The culmination of this sequential progression manifests in the 'Data Optimization Layer (DPL)' positioned as the fourth tier.This layer is dedicated to a multifaceted role involving intricate data manipulation and selective extraction.It operates with precision to transform the preprocessed data streams into a structured assemblage of distinct clusters.Each of these clusters, serving as an encapsulation of data points, embodies a coherent set of features, thereby facilitating a seamless avenue for data standardization and optimization.To gain a profound visual insight into this intricate stratified architecture, Figs 2 and 3 is presented as a definitive elucidation.It visually underscores the hierarchical alignment of each layer in precise consonance with the prescribed operational standards and device specifications, thereby underscoring the meticulous orchestration of the overall data processing framework.

IV. Mathematical model and hypothesis
The biomedical data is generated via multiple origins sources and servers.These datasets are independent and are dependent based on the scenario and instrumental setup.The dependency is difficult to evaluate since the origin factor of datasets are complex and untracked.Using the proposed technique the framework for optimizing medical datasets in telemedicine is designed and developed.The technique extracts informative attributes set (A) with an origin evaluation parameter (P) such that ðA )  of all computational dataset creation.The origin O X of source databases is represented as ðD i ÞÞ with each origin point reflects the paradigm operation as shown in Eq 1 as (ω) is the factor of feature association.
Then the learning sequence of the origin (O X ) is directly dependent on source of resourcing index and path of dataset generation.
Where the attribute ðDA �! Þ of each incoming dataset is validated and indexed.The summarization of parameters results in a cumulative database termed as centralized Edge database (E X ), the (E X ) is internally combined with supporting parameters such as ðE X Þ U ) ðmin o2f ðiÞ ðDA �! ÞÞ with a resulting ecosystem of large (E X ) as shown in Eq 3 On customization of source parameters with reference to feature set is as below According to simulative agreement the factorial values of dataset origin and other resources need a track and follow-up path accordingly.Considering the path (}) with a tracking ratio as shown in Eq 6 The follow-up path (}) has a with an extraction variable (ξ) is evaluated with independent matrix, each matrix is a supporting parameter of previously path-charted values where (D) is the distributed factor of path evaluated.

Path evaluation and federated learning
Each path } ! is a resultant of multiple data origin O X and its associated paradigms using the terminology of federated learning, the structural function f(S) is extracted from each path (}) as (}�O X ) at instance of each path directed towards the distributed factor (D) based datacenters.Consider the path (}) is justified with normal operation strategies.The procedure is dependent on path vector (} X ) and routing matrix (R), whereas in federated, the interdependency of each variable associated in paths operation is considered and evaluated as shown in Eq 7 Where DA �! a fractional value formed by attributes extraction while passing the path and route.Technically, the dependent matrix of each path and route is represented as M(},R) X such that (}�O X �R) and the user (U) in operation is validated with supportive learning.The users that are directly supporting the leaning objective by consent to participate are represented as (U C ) and others under non-consent are ð � U C Þ such that, the inter-linear associations of users are extracted to study the pattern of data analysis as shown in Eq 9 with Γ P as evaluation pattern.
Where each path (}) is dependent on Γ P to assure a reliable matrix in path selection.Thus Γ P extracts the path values from origin source via interdependent features and attributes.

Federated data-categorization
The extracted data is typically aligned and formatted based on path, series of resource contribution and feature dependency evaluation as (}�O X �R))D with a path of user participation is finalized.Each dataset from multiple devices are pooled and interdependency features are extracted and represented as ðf O X Þ with a dependency matrix } X ) R X ) with each user data is labeled as an individual entity ðf O X Þ i with (i!1) under the operational principles.Consider the data stream (D S ) stored in (D) is extracted as shown in Eq 11.
Where each ðO X Þ j ) ðdPjrðO X ÞÞ with a supporting variable matrix to evaluated data indexing is represented.Typically the evaluation data in ðD S Þ min ) 1ðD S Þ n 0 with a customization database.The orientation of dataset (D S ) min is a resultant of processed attributes in a medical edge device and origin devices.Thus, the categorization of datasets are typically based on feature-set (f) with reference tk origin source (O X ) as (f(x))O X ) such that, 8(f(x))(D S ) min ) with a supporting matrix (D) and threshold feature evaluation time (ΔT) with each associated featureset.

Feature-set extraction and evaluation
The process of feature-set extraction is relevance to the applied of feature data categorization.The relevance of information is to assure the processed medical dataset is authenticated and has generated (O X ) i based origin matrix for evaluation.The feature extraction (f e ) is represented from (8f e �f) with a feature dependency matrix to be considered before evaluation.Consider feature selection set (f e ) as ðf e Þ ¼ ½f e y 1 ; f e y 2 ; f e y 3 ; f e y 4 . . .f e y n � such that (8e�f e 2f) to achieve a feature independent ratio.The feature set of given data (medical) using data-stream (D S ) is a bi-product of ðD S Þ min [ ðO X Þ i jðO X Þ j under additional feature / attribute mapping as shown in Eq 14.
With instance repeated with multiple occurrences, the rational values of each featureset (f e ) is monitored and extracted as shown in Eq 16 The feature set extracted from each relevance of origin ΔO X is subjected to (i,j) such that, ½8DO X ði;jÞ � fe � and has a relatively higher feature dependency matrix.The process of evaluation is further computed using a regional matrix of feature dependencies and reverse origin tracking.

Federated learning based data optimization and standardization
The major goal of data analytics in biomedical datasets is to achieve a higher reliability factor and accessing for a supportive decision making.Typically, the extracted features from Eq 16 is validated and processed with in a single unit of cloud/server.The terminology of federated learning is to achieve a remote computation using decentralization of data serves and accumulates.The terminology is to support the extracted feature mapping and customization with the other computational devices associated in layer as shown in Fig 2 .The variables customization is supported by multiple edge devices ξ d connected via an agreement portal as (8ξ d �}|R) and each of ξ d is represented with a follow-up server (ξ d ) S such that (ξ d ) S �S where S is a server of multiple occurrence on S = (S 1 ,S 2 ,S 3 . ..S n ).These edge devices are relatively complex on privacy and accessing information.Each edge device (ξ d ) S )A(D S ) min is appended to assure the flow of information stream.Typically, the node on each serve can optimize and be feature centric.It additionally monitors the configuration (C f ) bound on (ξ d ) S such that 8C f 2(ξ d ) S at a given interval of time (t) as shown in below hypothesis.
Case 1: Optimization of datasets.
On customization of secondary paradigm used by edge devices (ξ d ) is reflected and validated as below On secondary optimizations The generalization process of each extracted and optimized datasets is filtered and processed with a dual linking and origin parameters.(ΔO X ) max and ΔO X respectively.These coordinates are responsible to achieve an in-order sequence of extracted feature with respect to origin and path (}) and route (R) in relevance to decision making.The resulting matrix of information is represented below.
Where ðR X Þ O D is the representative vertex of multiple parameters extracted via various routers (R), path (}) and origin (O X ) optimized dataset as X ðR X Þ O D with each data isolated with operating standards and principles.
Case 2: Standardization of datasets.The dataset under consideration with respect to Eq 16 is related to the series of features mapped.In this case, the datasets can be standardized with reference to the operation standards.These standards are typically bound to minimal span operation (MSO) approach.Consider the standardization (S D ) with each data dependent of multiple parameters such as attributes (A), source of origin (O X ) and feature categorization (f C ) class.These parameters are further evaluated and reconstructed to attain the standardization process as demonstrated below Where }ðx; E X �! Þ is a communicative path values of extracted datasets to achieve a higher order standardization via multiple occurrences (i.e.) (Γ P ) on (ΔO X ) respectively with a relatively mapping of feature set ð fe Þ on indirect indexing, the S D representation follows as below On reduction the process of optimized as Where each standardization values (S D ) is supportive and functional to the validated path }ðx; E X �! Þ with reference to (Γ P ) and origin (O X ) such that 8(O X )��, where (�) is the universal origin standards and operation principles to perform the operations.

V. Results and discussions
The computational approach of the proposed technique is to assure the data (medical) is under standardization and hence reflects the process of dependency tracking and process evaluation for larger datasets.The technique has compared as paradigm such as Origin of information indexed (O X ), path of uploading (}), route sourcing (R), extracted attributes ð E X �! Þ database and secondary influencing (Γ P ) as evaluation parameter to provide a reliable decision support.These paradigms are compared under a regional scaling of data transfer and dependency with reference to data modeling as shown in Table 1.

Implementation setup
The proposed approach is developed on federated cloud environment on schematic outlines provided by AWS edge servers.The categorization and customization of servers and server configuration is managed using kubernetics (K8s).This is an open source platform for orchestration and scaling of cloud applications.The federated server images are docked under a single-line containers for micro-service creation.The setup is simulated on Ubuntu 20.20 server OS with 128GB RAM and 100GB soft-limit of server space under AWS.

VI. Conclusion
The Bigdata analytics of biomedical datasets is a challenging research front.In this paper, a dedicated and novel framework for biomedical data standardization, analysis and optimization is performed using federated learning models.The proposed framework has achieved data optimizations and standardization in three layers of federated modeling as shown in Fig 2 .The framework has demonstrated a mathematical proof to interconnect and extract various interdependent features based on paradigm such as Origin of information indexed (O X ), path of uploading (}), route sourcing (R), extracted attributes ð E X �! Þ database and secondary influencing (Γ P ) as evaluation parameter to provide a reliable decision support.Overall, the interconnected and aligned processes are aggregated using federated learning approach under data standardization and optimization.The proposed technique has outperformed the existing neural networking models and machine learning models over federated approach.The technique has demonstrated an accuracy of 97.34% with a cluster size of 542.In near future the technique and framework can be improvised with multiple-layer neural networks based computational models for improved decision making supports.

X p=8p 2 f
Þ where (f) is the feature set associated in attribute mapping (M).Consider the incoming dataset (D) from multiple origin fronts as D = {D 1 ,D 2 ,D 3 . ..D n } where each ðDÞ i ) DA �! and ðDÞ i ) DO X ��! where (ΔO X ) the origin is source

Figs 4 and 5
Figs 4 and 5 discusses on the potential representation of comparative analysis and validation.The Fig 4 demonstrates the performance matrix of the proposed technique with across multiple bandwidth.Whereas Fig 5. Deals with the comparison with neural networking (NN) model and machine learning (ML) models with respect to the proposed technique.The proposed Federated (FL) model has outperformed the process with stabilized resource allocation as the cluster size increases.The path parameter is further extended to validate in Figs 6and 7respectively based on the routing and evaluation ratio.In general, the proposed federated approach has improved the stabilization ratio of the resources as the cluster size is enhanced.