Skip to main content
Advertisement

< Back to Article

Fig 1.

Block diagram illustration of the (A) HELIOS method and (B) HELIOS optical architecture.

(A) the HELIOS method aligns two input sequences by performing the coding and alignment procedures to exactly locate character matches and variations; and (B) the HELIOS optical architecture executes the HELIOS method by performing the optical beam provision unit, the optical modulation and mechanism unit, and output capturing unit, utilizing inherent parallelism and high-speed processing in optics.

More »

Fig 1 Expand

Fig 2.

An example of the proposed coding scheme for DNA, RNA, and protein sequences.

In this example, short DNA, RNA, and protein sequences are coded based on self-label and nearby-label coding schemes with preset values as follows: Offsetself = 450, Stepself = 10, Offsetnearby = 0, Stepnearby = 9, k = 2, and R = 1. The parameter Chi stands for the character positioned in location i as the current character in the self-label coding, and the kth previous character in the nearby-label coding scheme. The parameter V represents a preset value between 0 to 19 for amino acids in the protein sequence and 0 to 3 for nucleotides in the DNA and the RNA sequences. Every character is coded with two values determined by the self-label and nearby-label coding schemes, as represented in its corresponding white block. For nearby-label coding of those characters positioned at the beginning of the sequence, the nearby-label coding wraps around the sequence and considers the desired nearby character at the end of the sequence.

More »

Fig 2 Expand

Fig 3.

Step-by-step progress of the S1-align operation of the HELIOS method for optical sequence alignment.

(A) Two input sequences, i.e S1 and S2, are given to the HELIOS method, assuming the third character (i.e. ‘A’ in S1 and E in S2) is mutated. (B) S1 and S2 are coded based on the proposed coding procedure, assuming k = 2 and R = 1. While the self-label codes of the third character are different for S1 and S2, the nearby-label codes of the fifth characters (i.e. ‘S’) are different as well, due to the mutated character, assuming k = 2. Afterward, the coded S2 is shifted one time horizontally towards the left and right of the coded S2 assuming R = 1. Then, the main S1 is compared with the main S2 and all its shifts, and hence, (C) the comparison results are presented for each comparison. (D) Next, the comparison output vector is formed by aggregating all the comparison results, where the matched characters result in nonzero entries. As represented with the zero entry, the mutated character in position 3 within the input sequences is successfully located due to the different self-label codes, while the 5th character is false mismatched due to the different nearby-label codes. (E) To compensate for this false mismatch, the ith entry of the output is determined according to aggregating the ith and the (i + k)th entries of the comparison outcome vector. Hence, the 5th entry is recovered by the corresponding nonzero value at the 7th entry; while proper detection of character mutation at the 3rd entry is not affected.

More »

Fig 3 Expand

Fig 4.

Side-by-side representation of the S1-align and S2-align operations of the HELIOS method.

(A) As an overall view, the S1-align operation locates character substitutions, as well as character insertions in S1 (or character deletions from S2). For this purpose, it compares the main and all shifted S2 vectors with the S1 vector. Afterward, to produce the 1D output vector, the ith entry of the output is determined according to the ith and (i + k)th entries of the comparison outcome vector. (B) Similarly, the S2-align operation compares the main and all shifted S1 vectors with the S2 vector to locate character substitutions, as well as character insertions in S2 (i.e. character deletions from S1).

More »

Fig 4 Expand

Fig 5.

Output explanation of the HELIOS method.

The output of the HELIOS method is represented with a two-row matrix, while the first and second rows are produced by the S1-align and S2-align operations, respectively. Moreover, each entry represents the alignment output of the characters at the corresponding position in the input sequences. By traversing the output from left to right, nonzero entries in both rows depict identical characters, i.e. character matching; while zero entries in both rows depict character mutation. Finally, a zero entry of a row along with a nonzero entry of the other one indicates indel, i.e. character insertion or deletion, and can be represented by a gap at the corresponding position of the sequence, containing the nonzero entry.

More »

Fig 5 Expand

Fig 6.

Schematic illustration of the HELIOS optical architecture.

(A) The optical beam provision unit provides a collimated beam to feed the whole system. In this manner, the wideband laser beam, produced by a laser source, passes through the laser line bandpass filter and the pinhole to be cleaned. Afterward, the clean beam is diverged and collimated with passing through the objective and imaging lenses, respectively. Finally, the collimated beam is directed to the optical modulation and mechanism unit. (B) In the optical modulation and mechanism unit, passing collimated beam through WSF #1 modulates the wavelength of the optical beam based on the self-label coding of S2 and S1 on the first and second rows of a 2 × N pixels image, respectively; while PSF #1 performs their polarization selection based on their nearby-label coding scheme. Afterward, the objective and imaging lens arrays diverge and recollimate the optical beam through a horizontal direction to perform the shifting process of the alignment procedure. Moreover, WSF #2 and PSF #2 code S1 and S2 on the first and second rows of a 2 × N pixels image, respectively. By passing the expanded beams through WSF #2 and PSF #2, the proposed architecture compares the shifted coded S2 with S1 at the first row, implementing the S1-align operation, and compares the shifted coded S1 with S2 at the second row, implementing the S2 -align operation. Finally, each pixel is directed to two distinct pixels via a chiral medium to compensate for false mismatches. (C) Finally, in the output capturing unit, optical thresholdder eliminates wavelength cross-talks and speckle noises of the output before capturing. Afterward, the output is captured by a bi-convex lens and a charged-coupled device (CCD) camera.

More »

Fig 6 Expand

Fig 7.

Modulation approach of the HELIOS Optical architecture, utilizing the wavelength and polarization of the optical beams.

To implement the self-label coding through the wavelength modulation approach, every character of the input sequence is modulated with a distinct wavelength, within the spectral range of [450–650] nanometers with bandwidth of 10 nanometers. On the other hand, to implement the nearby-label coding through the polarization selection approach, every character of the input sequence is assigned to a specific polarization along a 9-degree azimuth angle in the range of 0 to 180 degrees. Each approach provides twenty distinct codes for protein sequences, while only four of them are employed for coding DNA and RNA sequences.

More »

Fig 7 Expand

Fig 8.

Simulation outputs of the HELIOS method and its optical architecture.

In this case of study, the “Severe acute respiratory syndrome coronavirus 2” sequences [49] are aligned in the form of (A) Protein, (B) RNA, and (C) DNA; while some single and multiple mutations/indels are manually imposed to the sequence with varying distributions. For more clarity, only a small portion of each full-length alignment including 60 characters is shown in this figure, with the beginning at (A) character 240, (B) character 721, and (C) character 1921. In the coding and the alignment procedure, the parameters are set to R = 4 and k = 5. By investigating the outputs, two input sequences are successfully aligned in a two-line output by performing two consecutive procedures of the HELIOS method, as well as, passing optical beams through two units of the HELIOS optical architecture. As a result, the matches, mutations, and indels are detected and located accurately.

More »

Fig 8 Expand

Table 1.

The parameter Identity of the HELIOS method in the quantitative measurement of homology.

The Identity reports the number of exactly matched characters of two compared sequences (in percentage) aligned by the HELIOS method, assuming the “Nine ND5 protein sequences dataset” [53].

More »

Table 1 Expand

Table 2.

The parameter Similarity of the HELIOS method in the quantitative measurement of homology.

The Similarity measures the resemblance of two compared sequences (in percentage) aligned by the HELIOS method. Various amino acids are categorized into six groups based on their physicochemical properties; including GAVLI, FYW, STCM, KRH, DENQ, and P. Moreover, the “Nine ND5 protein sequences dataset” [53] is assumed in this study.

More »

Table 2 Expand

Table 3.

The parameter Alignment Score of the HELIOS method in the quantitative measurement of homology.

To calculate the Alignment Score of two compared sequences aligned by the HELIOS method, the BLOSUM62 substitution scoring matrix is adopted with gap opening and extension penalties equal to -10 and -0.5, respectively. Moreover, the “Nine ND5 protein sequences dataset” [53] is assumed in this study.

More »

Table 3 Expand

Table 4.

The parameter Identity of the Smith-Waterman algorithm in the quantitative measurement of homology.

The Identity reports the number of exactly matched characters of two compared sequences (in percentage) aligned by the HELIOS method, assuming the “Nine ND5 protein sequences dataset” [53].

More »

Table 4 Expand

Table 5.

The parameter Similarity of the Smith-Waterman algorithm in the quantitative measurement of homology.

The Similarity measures the resemblance of two compared sequences (in percentage) aligned by the HELIOS method. Various amino acids are categorized into six groups based on their physicochemical properties; including GAVLI, FYW, STCM, KRH, DENQ, and P. Moreover, the “Nine ND5 protein sequences dataset” [53] is assumed in this study.

More »

Table 5 Expand

Table 6.

The parameter Alignment Score of the Smith-Waterman algorithm in the quantitative measurement of homology.

To calculate the Alignment Score of two compared sequences aligned by the HELIOS method, the BLOSUM62 substitution scoring matrix is adopted with gap opening and extension penalties equal to -10 and -0.5, respectively. Moreover, the “Nine ND5 protein sequences dataset” [53] is assumed in this study.

More »

Table 6 Expand

Table 7.

A brief report of the quantitative measurement of homology of measurement of the HELIOS method, compared to nine well-known algorithms, including SW, NW, BLAST, ClustalW, ClustalΩ, Muscle, T-Coffee, Kalign, and MAFFT. In this manner, the parameters Identity, Similarity, and Alignment Score are reported.

The Identity reports the number of exactly matched characters of two compared sequences (in percentage), aligned by each algorithm. Moreover, the Similarity measures the resemblance of two compared sequences (in percentage), aligned by every aformentioned algorithm. Various amino acids are categorized into six groups based on their physicochemical properties; including GAVLI, FYW, STCM, KRH, DENQ, and P. Finally, to calculate the Alignment Score of two compared sequences, aligned by every aforementioned algorithm, the BLOSUM62 substitution scoring matrix is adopted with gap opening and extension penalties equal to -10 and -0.5, respectively. Twelve diferent datasets are considered for this study [53, 5561].

More »

Table 7 Expand

Table 8.

The parameter Sensitivity (SEN) of the HELIOS method with referencing the Smith-Waterman algorithm in the accuracy measurement of classification output.

The assumed dataset is the “Nine ND5 protein sequences dataset” [53] in this study.

More »

Table 8 Expand

Table 9.

The parameter Specification (Spec) of the HELIOS method with referencing the Smith-Waterman algorithm in the accuracy measurement of classification output.

The assumed dataset is the “Nine ND5 protein sequences dataset” [53] in this study.

More »

Table 9 Expand

Table 10.

The parameter Accuracy (Acc) of the HELIOS method with referencing the Smith-Waterman algorithm in the accuracy measurement of classification output.

The assumed dataset is the “Nine ND5 protein sequences dataset” [53] in this study.

More »

Table 10 Expand

Table 11.

The parameter Positive Predictive Value (PPV) of the HELIOS method with referencing the Smith-Waterman algorithm in the accuracy measurement of classification output.

The assumed dataset is the “Nine ND5 protein sequences dataset” [53] in this study.

More »

Table 11 Expand

Table 12.

The parameter Negative Predictive Value (NPV) of the HELIOS method with referencing the Smith-Waterman algorithm in the accuracy measurement of classification output.

The assumed dataset is the “Nine ND5 protein sequences dataset” [53] in this study.

More »

Table 12 Expand

Table 13.

The parameter Matthew’s Coefficient Correlation (MCC) of the HELIOS method with referencing the Smith-Waterman algorithm in the accuracy measurement of classification output.

The assumed dataset is the “Nine ND5 protein sequences dataset” [53] in this study.

More »

Table 13 Expand

Table 14.

The parameter Test’s Accuracy (F-Score) of the HELIOS method with referencing the Smith-Waterman algorithm in the accuracy measurement of classification output.

The assumed dataset is the “Nine ND5 protein sequences dataset” [53] in this study.

More »

Table 14 Expand

Table 15.

A brief report of the accuracy measurement of classification output of the HELIOS method with referencing well-known algorithms, including Smith-Waterman, Needleman-Wunsch, ClustalW, ClustalΩ, BLAST, Muscle T-Coffee, Kalign, MAFFT algorithms.

The parameters SEN, Spec, ACC, PPV, NPV, MCC, and F-Score are averaged and reported. The twelve datasets [53, 5561] are considered.

More »

Table 15 Expand

Table 16.

Time and space complexities of the HELIOS compared to well-known algorithms, performing a pairwise sequence alignment.

In this table, N and M are the lengths of the first and the second input sequences, respectively, assuming N > M. Moreover, the parameter L × W is the size of the aperture of modulators, used in HELIOS optical architecture. As represented in this table, HELIOS offers a fraction of MN by L2 × W for time complexity, leading to O(1) in the case of large modulators.

More »

Table 16 Expand

Table 17.

Detailed description of the employed datasets for the genome-to-genome alignment, and the reads alignment with a reference sequence.

In these comparisons, the Illumina and PacBio data for Arabidopsis thaliana Ler-0 genome [68] are employed. Moreover, the human Illumina and PacBio reads are employed from the Ashkenazi child data set, which is available from the Genome in a Bottle project [69], NCBI SRA accession SRX847862. Furthermore, the reference genomes are the Arabidopsis thaliana Col-0 reference genome [62], human reference genome versionGRCH38.p7 [63], and the chimpanzee (pan troglodytes) genome [70], released in PanTro4, GenBank accession GCF0001515.6.

More »

Table 17 Expand

Table 18.

Processing time and memory requirement of genome-to-genome alignment for the HELIOS optical architecture, compared to Nucmer4, Mauve, LASTZ default, LASTZ match, OptCAM, HAWPOD, and Moiré Techniques.

For these comparison scenarios, all reported timings for Nucmer4, Mauve, LASTZ default, and LASTZ match are measured on a dual-CPU, 32-core AMD Opteron 6276 computer with 256 GB of DDR3 PC3–12800 RAM, using 32 parallel threads. On the other hand, for the analytical estimation of the processing times of the HELIOS optical architecture, OptCAM, HAWPOD, and Moiré Technique, a typical graphene-based modulator is considered with an aperture size of 1024 × 1024 pixels and a 100 MHz switching rate. Moreover, the processing times of the HELIOS optical architecture are reported for more recently developed modulators with various aperture sizes, such as 1920 × 1080 and 4096 × 4096 pixels, as well as various switching rates, including 35 GHz [72] and 4.5 THz [67]. It should be noted that the processing times reported for Nucmer4, Mauve, LASTZ default, and LASTZ match include Wall time and CPU time, while the HELIOS method is implemented within its optical architecture, and hence, the corresponding wall time is assumed zero.

More »

Table 18 Expand

Table 19.

Processing time and memory usage to align PacBio and Illumina reads to the Arabidopsis and Human reference genomes by HELIOS optical architecture, compared to BLASR, BWA-MEM, Bowtie2, Nucmer4, Moiré Technique, HAWPOD, and OptCAM.

In this regard, the analytical estimation of the processing times of the HELIOS optical architecture, OptCAM, HAWPOD, and Moiré Technique, are reported by employing a typical graphene-based modulator with an aperture size of 1024 × 1024 pixels and a 100 MHz switching rate. In versus, the processing times of Nucmer4, Bowtie2, BWA-MEM, and BLASR are measured on a dual-CPU, 32-core AMD Opteron 6276 computer with 256 GB of DDR3 PC3–12800 RAM, using 32 parallel threads, as reported in [14]. The report includes the times to build the genome index and to align the sequences. Moreover, the processing times of HELIOS optical architecture are also reported for more recently developed modulators with various aperture sizes, such as 1920 × 1080 and 4096 × 4096 pixels, as well as various switching rates, including 35 GHz [72] and 4.5 THz [67].

More »

Table 19 Expand

Table 20.

Processing time of the HELIOS optical architecture, compared to Smith-waterman, assuming the SWISS-PROT database [64].

In this manner, various lengths of query sequences, from 144 to 5478 bases are aligned to the SWISS-PROT database [64], which contains 392,768 sequences and a total of 141,218,456 characters. In this regard, the processing times of Smith-waterman are reported by executing on NVIDIA TESLA K40 GPU with 44.3 giga cell updates per second (GCUPS) and GTX 275 with 21 GCUPS [65]. On the other hand, the processing times of HELIOS optical architecture are reported for a modulator with 1024 × 1024 pixels aperture size and 100 MHz switching rate [66] as a default, as well as for more recently developed modulators with various aperture sizes, such as 1920 × 1080 and 4096 × 4096 pixels, as well as various switching rates, including 35 GHz [72] and 4.5 THz [67].

More »

Table 20 Expand