A distributed computing model for big data anonymization in the networks

doi:10.1371/journal.pone.0285212

Fig 1.

Spark ecosystem [22].

More »

Expand

Fig 2.

Spark cluster architecture [19,21].

More »

Expand

Table 1.

A sample medical dataset.

More »

Expand

Table 2.

Medical dataset in 4-anonymous model.

More »

Expand

Table 3.

Medical dataset in 3-diversity model.

More »

Expand

Fig 3.

Generalization tree for age and job attributes.

More »

Expand

Fig 4.

Schema of hierarchical data clustering in Spark framework: (a) reading dataset from HDFS into worker nodes. (b) Assigning a unique key to all recrods. (c) The first round of data clustering and forming sub-clusters. (d) The second round of data clustering and forming smaller sub-clusters.

More »