Table 1.
Comparison between Federated Learning (FL), traditional machine learning (TML), and distributed machine learning (DML) algorithms. DML methods are commonly data driven (DMLd) or computing driven (DMLc).
Data driven methods (DMLd) mainly try to learn from large volume distributed data, whereas computing driven methods (DMLc) aim to parallelize computing in learning from centralized data. Computing framework\ refers to the whole eco-system for learning, and model switch refers to easiness of switching a new learning model.
Fig 1.
A conceptual view of the FL framework.
The local update (downstream) and global update (upstream) are carried out iteratively to ensure models trained using local data are aggregated at central server, and then dispatched to distributed sites.
Fig 2.
A conceptual view of node weight variance calculation.
Five neural networks with the same architecture are trained using same training sample. The first hidden layer nodes are trained with the same input features and the first node is chosen to calculate the node variance.
Fig 3.
Comparisons of weight variance between two weight matching methods, Static node matching vs. Dynamic node matching (proposed).
The x-axis denotes the neuron node ID of the first hidden layer, consisting of 10 neurons, of a neural network. The network was trained five times till convergence, using same training data. The y-axis denotes the variance of the weight values of each of the hidden nodes (Larger variance mean the neuron weights are more unstable across different training times, even for the same feature dimension of the same neuron).
Fig 4.
Node a is the starting point since it has the smallest distance 0.11 with node B, therefore, B will be matched to a. Node α will be matched with {a, B} with MST. This MST matching process will continue for node b and c.
Table 2.
An example of pairwise distance tables between three sites where each site has three nodes: Site1={A, B, C}, Site2={a, b, c}, and Site3={α, β, γ}.
Each value in the table denotes distance between two nodes across two sites.
Table 3.
Definition for symbols used in node matching.
Fig 5.
after each client finishing training its local model, client cα is randomly chosen from and node
will be randomly selected among all the nodes in the first hidden layer of client cα local mode.
Fig 6.
node {a, B, α} are matching nodes.
Table 4.
Summary of the benchmark datasets used in the experiments, including sample amount, attributes amount, data characteristics and class distribution.
Table 5.
The pseudo code of the experiment settings and comparisons (all methods are compared based based on same training/test data.
The initial network weights of each site are the same for different methods to avoid discrepancy due to random weight initialization.
Table 6.
Experimental results from Diabetes dataset using Manhattan distance based matching.
For FedDNA, the matching freezes after first two rounds of dynamic node alignment.
Table 7.
Experimental results from Spam dataset using Manhattan distance based matching.
For FedDNA, the matching freezes after first two rounds of dynamic node alignment.
Table 8.
Experimental results from Occupancy detection dataset using Manhattan distance based matching.
For FedDNA, the matching freezes after first two rounds of dynamic node alignment.
Table 9.
Experimental results from Patient survival prediction dataset using Manhattan distance based matching.
For FedDNA, the matching freezes after first two rounds of dynamic node alignment.
Fig 7.
Overall performance comparisons between FedDNA and FedAvg, FedDyn.
Outliers ca be observed from both methods, overall FedDNA outperforms FedAvg and FedAvg performs similarly as FedDyn.
Fig 8.
Performance comparisons between FedDNA and FedAvg with respect to different class distributions for diabetes dataset.
Fig 9.
Performance comparisons between FedDNA and FedAvg with respect to different class distributions for spam dataset.
Fig 10.
Performance comparisons between FedDNA and FedAvg with respect to different class distributions for patient survival prediction dataset.
Fig 11.
Performance comparisons between FedDNA and FedAvg with respect to different class distributions for occupancy dataset.