CANDIDATE: A tool for generating anonymous participant-linking IDs in multi-session studies

doi:10.1371/journal.pone.0260569

Fig 1.

Flowcharts of the CANDIDATE procedure: (a) Add, (b) Encode, (c) Lookup, and (d) Hash.

More »

Expand

Fig 2.

Examples of how different hash functions code the name “Christian”.

More »

Expand

Table 1.

Collision probability with djb2, CRC-32 and double hashes (half djb2/half CRC-32) for 100 randomly selected names with different coding space sizes.

Based on simulation with 10,000 iterations.

More »

Expand

Fig 3.

Example of encoding and collision handling: (a) Encoding the first participant with hash function 0. (b) Encoding the second participant with hash function 0. (c) Further seven participants encoded without any collisions. (d) The eighth participant gives a collision. (e) A free slot is found by instead using hash function 1 (CRC-32). The hash function used (hash function 1) is attached to the collision entry together with a check code obtained using hash function 11 (salted). (f) A collision also occurs for participant nine. (g) A free slot is obtained using hash function 1 and attaching the hash function used and the validation code to the colliding item. (h) the tenth participant is encoded without collision.

More »

Expand

Table 2.

Distribution of hash functions used by CANDIDATE for encoding 100 participants with coding spaces of 1000, 10,000 and 100,000.

More »

Expand

Fig 4.

A screenshot of the CANDIDATE tool implementation.

More »

Expand

Fig 5.

Encoding success rates for small samples (N ≤ 100).

More »

Expand

Fig 6.

Encoding success rates for larger samples (100 ≤ N ≤ 1000).

Note that the y-axis starts at 99.6% to show the small variations.

More »

Expand

Fig 7.

Collision rates for small samples (N ≤ 100).

More »

Expand

Fig 8.

Collision rates for larger samples (100 ≤ N ≤ 1000).

More »

Expand

Fig 9.

Log-log plot of mean anonymity with coding spaces of 100 (two-digit IDs), 1,000 (three-digit IDs), 10,000 slots (four-digit IDs), and 100,000 (five-digit IDs) with a phonebook of 103,472 names.

Error bars indicate the minimum and maximum anonymity.

More »

Expand

Fig 10.

Percentage of unused ID slots with small sample sizes.

This indicates the portion of phonebook entries that can be discarded as non-participants during an attack.

More »

Expand

Fig 11.

Percentage of unused ID slots with large sample sizes.

This indicates the portion of phonebook entries that can be discarded as non-participants during an attack.

More »

Expand