Fig 1.
a) 1D cross-correlation, b) 2D cross-correlation.
Fig 2.
Optical setup of 2D cross-correlator for reference sequence “ATTGCCCA” and query sequence “TGCC”.
Fig 3.
Three types of output noises; a) Overlap noise: code overlapping as the result of DV-Curve encoding method results in output peaks which mislead to sequence matching, b) System noise: output peak value decreases when realistic condition is simulating, c) Neighbor noise: adjacent peaks (as the result of either valid peaks or high-altitude noises) avoids proper indel locating.
Fig 4.
a) Example of coding patterns, b) 16 sample states of all possible 1024 states of problem 1 for C3, 4, assuming i = 2 and j = 2, where C3, 4 is a coding set containing 4 characters that each of them is a 3 × 3 matrix, i and are width and height, respectively, of a grid of codes that contains all combination of C3, 4. So, multiplying 44 possible states, required for creating this grid, by 4 states for each code leads to 45 or 1024 states. Exact matched pattern of single codes and multiple codes overlaps are shown with a green stroke rectangle.
Fig 5.
a) Example of coding set, b) problem 1 with i = 3, j = 2, and K = 3 for one step of cross-correlation; if in the left side, the sum of inner product of two grids for the shown spatial relative place of them will be calculated, its result (= 5) was equal to the sum of inner product of right side elements (right side elements are sub-grid of left side grids (= 5).
Fig 6.
(a) Set of four codes with N = 4, d = 3. (b) a c-grid (2 × 2 grid) with an overlap noise of length four, so if E = 1 will be chosen. This coding set is unacceptable (because size of peak in this example is 4 too and 4—E (= 1) is the maximum acceptable noise); overlap noise location is marked with a green border.
Fig 7.
Evolution and generation runtimes during search for various sizes of zero-scored 1D code.
Fig 8.
Evolution and generation runtimes during search for various sizes of zero-scored 2D code.
Table 1.
Effectiveness of optimizing triple parameters (i.e. relative threshold, E, and N) and coding metrics.
Table 2.
Related methods’ features summary.
Fig 9.
Evalution steps.
Table 3.
Quadruple evaluating metrics for different mutation rates (%)– 3 × 3 coding set.
Table 4.
Quadruple metrics under different mutation rates (%)– 9 × 9 coding set.
Table 5.
Quadruple metrics under different mutation rates for integer coding [33] (%).
Fig 10.
Snetivity and mutation rates for three method; BLAST, cross-correlator based on integer coding set (CPO), and cross-correlator based on GAC coding set (XC-GAC).
Table 6.
Sensitivity (%) of three methods.
Table 7.
Cross-correlation peak to L ratio for the 3 × 3 coding set.
Table 8.
Average run time for loading and encoding 303 query sequences.
Table 9.
Run time taken to process 303 query SEQUENCES IN 100 reference scenes.
Table 10.
Assumptions of speed comparison assessment.
Table 11.
Runtime (second) taken to search long and short query sequence in human genome.
Table 12.
Assumptions of k-mer counting assessment.
Fig 11.
Average relative errors of cross-correlating all encoded sequences with length 1 to 4 with first 1260 bp of first 12 chromosomes of Homo sapiens GRCh38.p12 data.
Fig 12.
Effect of lens choice on FFT noise of sequence "ATCG" coded with coding set with d = 3, N = 2, E = 0 and score equal to 0.
a) Input and output patterns, note halo created around each code at the output pattern, b) Different peak values for various coding.
Fig 13.
Example of cutoff error; a) coding set, b) "TAGGAATCGGACAATCCC" as the reference sequence is splitted into 3 lines with 6 codes, while "AATC" is th equery sequence. End of line 1 and begining of line 2 contain query sequence which is breaked from middle and it cannot be detected by the cross-correlation process.
Fig 14.
Average relative errors of cross-correlating all motifs with length 1 to 4 with first 1260 bp of first 12 chromosomes of Homo sapiens GRCh38.p12 data (sequences are encoded using coding set with d = 3, N = 2, E = 0 and score equal to 0, while each line of SLM cosist of 42 columns).
On the left, a part of the coded reference sequence is shown, while on the right, average relative errors resulted from optical simulation is compared with that of bahaviolral simulationconsidering free boundary of width a) 2 pixels and b) 10 pixels around each nocletide code.
Fig 15.
Average relative errors of cross-correlating all motifs with length 1 to 4 and first 1260 bp of first 12 chromosomes of Homo sapiens GRCh38.p12 data (sequences are encoded by coding set with d = 10, N = 30, E = 12, and zero-score, each line of SLM cosist of 42 columns and extra 2 pixels are added around each coding).
Result of optical simulation is compared with ideal simulation.