SIG-DB: Leveraging homomorphic encryption to securely interrogate privately held genomic databases

doi:10.1371/journal.pcbi.1006454

Fig 1.

Example of a locality sensitive hash (LSH), derived from a DNA sequence.

In SIG-DB, the k-mers are created using a sliding window of 1 character, as illustrated, with a sequence of length n resulting in ‘n-k’ k-mers.

More »

Expand

Fig 2.

SIG-DB protocol for (1) hashing sequence into locality sensitive hash (LSH), (2) encrypting and passing LSH, (3) hashing database elements into LSHs, (4) comparing encrypted query LSH to unencrypted database LSHs, (5) passing scores and decrypting, and (6) calculating IoU, IoD, and IoQ scores.

Unencrypted space = blue boxes and encrypted space = green boxes.

More »

Expand

Fig 3.

Intersection over union (IoU) scores for k-mer = {8, 16, 32} with random query sequence mutation rates from 0–100%.

K-mer = 8 showed best performance, with correct identification of sequence of interest up to mutation rates of 45%. The red circles indicate the largest mutation rate for each k-mer size that returned the correct result as the highest IoU value. The horizontal red line represents IoU = 0.1.

More »

Expand

Table 1.

SIG-DB algorithm performance based on proportion of sequence randomly mutated in E. coli and S. aureus (DB = 50 seqs, LSH = 100k bases, Q = 20K bases, E = 20K bases).

More »

Expand

Table 2.

SIG-DB algorithm performance based on mutations localized to one half of total sequence (DB = 50 seqs, LSH = 100k bases, Q = 20K bases, E = 20K bases).

More »

Expand

Table 3.

SIG-DB runtime for increasing database sizes (Mut = 0, IoU = 1.0, K = 8).

More »

Expand

Table 4.

SIG-DB algorithm performance based on relative sizes of query and database elements (K = 8, DB = 50 seqs, LSH = 100k bases, Mut = 0).

More »

Expand