Optical pattern generator for efficient bio-data encoding in a photonic sequence comparison architecture

doi:10.1371/journal.pone.0245095

Fig 1.

a) 1D cross-correlation, b) 2D cross-correlation.

More »

Expand

Fig 2.

Optical setup of 2D cross-correlator for reference sequence “ATTGCCCA” and query sequence “TGCC”.

More »

Expand

Fig 3.

Three types of output noises; a) Overlap noise: code overlapping as the result of DV-Curve encoding method results in output peaks which mislead to sequence matching, b) System noise: output peak value decreases when realistic condition is simulating, c) Neighbor noise: adjacent peaks (as the result of either valid peaks or high-altitude noises) avoids proper indel locating.

More »

Expand

Fig 4.

a) Example of coding patterns, b) 16 sample states of all possible 1024 states of problem 1 for C_{3, 4}, assuming i = 2 and j = 2, where C_{3, 4} is a coding set containing 4 characters that each of them is a 3 × 3 matrix, i and are width and height, respectively, of a grid of codes that contains all combination of C_{3, 4}. So, multiplying 4⁴ possible states, required for creating this grid, by 4 states for each code leads to 4⁵ or 1024 states. Exact matched pattern of single codes and multiple codes overlaps are shown with a green stroke rectangle.

More »

Expand

Fig 5.

a) Example of coding set, b) problem 1 with i = 3, j = 2, and K = 3 for one step of cross-correlation; if in the left side, the sum of inner product of two grids for the shown spatial relative place of them will be calculated, its result (= 5) was equal to the sum of inner product of right side elements (right side elements are sub-grid of left side grids (= 5).

More »

Expand

Fig 6.

(a) Set of four codes with N = 4, d = 3. (b) a c-grid (2 × 2 grid) with an overlap noise of length four, so if E = 1 will be chosen. This coding set is unacceptable (because size of peak in this example is 4 too and 4—E (= 1) is the maximum acceptable noise); overlap noise location is marked with a green border.

More »

Expand

Fig 7.

Evolution and generation runtimes during search for various sizes of zero-scored 1D code.

More »

Expand

Fig 8.

Evolution and generation runtimes during search for various sizes of zero-scored 2D code.

More »

Expand

Table 1.

Effectiveness of optimizing triple parameters (i.e. relative threshold, E, and N) and coding metrics.

More »

Expand

Table 2.

Related methods’ features summary.

More »

Expand

Fig 9.

Evalution steps.

More »

Expand

Table 3.

Quadruple evaluating metrics for different mutation rates (%)– 3 × 3 coding set.

More »

Expand

Table 4.

Quadruple metrics under different mutation rates (%)– 9 × 9 coding set.

More »

Expand

Table 5.

Quadruple metrics under different mutation rates for integer coding [33] (%).

More »

Expand

Fig 10.

Snetivity and mutation rates for three method; BLAST, cross-correlator based on integer coding set (CPO), and cross-correlator based on GAC coding set (XC-GAC).

More »

Expand

Table 6.

Sensitivity (%) of three methods.

More »

Expand

Table 7.

Cross-correlation peak to L ratio for the 3 × 3 coding set.

More »

Expand

Table 8.

Average run time for loading and encoding 303 query sequences.

More »

Expand

Table 9.

Run time taken to process 303 query SEQUENCES IN 100 reference scenes.

More »

Expand

Table 10.

Assumptions of speed comparison assessment.

More »

Expand

Table 11.

Runtime (second) taken to search long and short query sequence in human genome.

More »

Expand

Table 12.

Assumptions of k-mer counting assessment.

More »

Expand

Fig 11.

Average relative errors of cross-correlating all encoded sequences with length 1 to 4 with first 1260 bp of first 12 chromosomes of Homo sapiens GRCh38.p12 data.

More »

Expand

Fig 12.

Effect of lens choice on FFT noise of sequence "ATCG" coded with coding set with d = 3, N = 2, E = 0 and score equal to 0.

a) Input and output patterns, note halo created around each code at the output pattern, b) Different peak values for various coding.

More »

Expand

Fig 13.

Example of cutoff error; a) coding set, b) "TAGGAATCGGACAATCCC" as the reference sequence is splitted into 3 lines with 6 codes, while "AATC" is th equery sequence. End of line 1 and begining of line 2 contain query sequence which is breaked from middle and it cannot be detected by the cross-correlation process.

More »

Expand

Fig 14.

Average relative errors of cross-correlating all motifs with length 1 to 4 with first 1260 bp of first 12 chromosomes of Homo sapiens GRCh38.p12 data (sequences are encoded using coding set with d = 3, N = 2, E = 0 and score equal to 0, while each line of SLM cosist of 42 columns).

On the left, a part of the coded reference sequence is shown, while on the right, average relative errors resulted from optical simulation is compared with that of bahaviolral simulationconsidering free boundary of width a) 2 pixels and b) 10 pixels around each nocletide code.

More »

Expand

Fig 15.

Average relative errors of cross-correlating all motifs with length 1 to 4 and first 1260 bp of first 12 chromosomes of Homo sapiens GRCh38.p12 data (sequences are encoded by coding set with d = 10, N = 30, E = 12, and zero-score, each line of SLM cosist of 42 columns and extra 2 pixels are added around each coding).

Result of optical simulation is compared with ideal simulation.

More »

Expand