Efficient discovery of frequently co-occurring mutations in a sequence database with matrix factorization

doi:10.1371/journal.pcbi.1012391

Efficient discovery of frequently co-occurring mutations in a sequence database with matrix factorization

Fig 2

Example cost or distance function computation for a point mutation for the matrix V.

Here, an Asparagine (N) mutated to a Glutamic Acid (E). Only the highlighted/diagonal cells of the matrix are considered and summed, resulting in a final score of 2 (1 + 0 + 1) for the N to E positional comparison. This computation is repeated for each position in the protein, across all protein sequences in the database.

doi: https://doi.org/10.1371/journal.pcbi.1012391.g002