Skip to main content

Advertisement

PLOS Computational Biology

Browse
Publish
- Submissions
- Policies
- Manuscript Review and Publication
About

Search Search

advanced search

< Back to Article

Fig 1 — Fig 1.

Block diagram illustration of the (A) HELIOS method and (B) HELIOS optical architecture.
(A) the HELIOS method aligns two input sequences by performing the coding and alignment procedures to exactly locate character matches and variations; and (B) the HELIOS optical architecture executes the HELIOS method by performing the optical beam provision unit, the optical modulation and mechanism unit, and output capturing unit, utilizing inherent parallelism and high-speed processing in optics.

More »

Fig 2 — Fig 2.

An example of the proposed coding scheme for DNA, RNA, and protein sequences.
In this example, short DNA, RNA, and protein sequences are coded based on self-label and nearby-label coding schemes with preset values as follows: Offset_self = 450, Step_self = 10, Offset_nearby = 0, Step_nearby = 9, k = 2, and R = 1. The parameter Ch_i stands for the character positioned in location i as the current character in the self-label coding, and the k^th previous character in the nearby-label coding scheme. The parameter V represents a preset value between 0 to 19 for amino acids in the protein sequence and 0 to 3 for nucleotides in the DNA and the RNA sequences. Every character is coded with two values determined by the self-label and nearby-label coding schemes, as represented in its corresponding white block. For nearby-label coding of those characters positioned at the beginning of the sequence, the nearby-label coding wraps around the sequence and considers the desired nearby character at the end of the sequence.

More »

Fig 3 — Fig 3.

Step-by-step progress of the S₁-align operation of the HELIOS method for optical sequence alignment.
(A) Two input sequences, i.e S₁ and S₂, are given to the HELIOS method, assuming the third character (i.e. ‘A’ in S₁ and E in S₂) is mutated. (B) S₁ and S₂ are coded based on the proposed coding procedure, assuming k = 2 and R = 1. While the self-label codes of the third character are different for S₁ and S₂, the nearby-label codes of the fifth characters (i.e. ‘S’) are different as well, due to the mutated character, assuming k = 2. Afterward, the coded S₂ is shifted one time horizontally towards the left and right of the coded S₂ assuming R = 1. Then, the main S₁ is compared with the main S₂ and all its shifts, and hence, (C) the comparison results are presented for each comparison. (D) Next, the comparison output vector is formed by aggregating all the comparison results, where the matched characters result in nonzero entries. As represented with the zero entry, the mutated character in position 3 within the input sequences is successfully located due to the different self-label codes, while the 5^th character is false mismatched due to the different nearby-label codes. (E) To compensate for this false mismatch, the i^th entry of the output is determined according to aggregating the i^th and the (i + k)^th entries of the comparison outcome vector. Hence, the 5^th entry is recovered by the corresponding nonzero value at the 7^th entry; while proper detection of character mutation at the 3rd entry is not affected.

More »

Fig 4 — Fig 4.

Side-by-side representation of the S₁-align and S₂-align operations of the HELIOS method.
(A) As an overall view, the S₁-align operation locates character substitutions, as well as character insertions in S₁ (or character deletions from S₂). For this purpose, it compares the main and all shifted S₂ vectors with the S₁ vector. Afterward, to produce the 1D output vector, the i^th entry of the output is determined according to the i^th and (i + k)^th entries of the comparison outcome vector. (B) Similarly, the S₂-align operation compares the main and all shifted S₁ vectors with the S₂ vector to locate character substitutions, as well as character insertions in S₂ (i.e. character deletions from S₁).

More »

Fig 5 — Fig 5.

Output explanation of the HELIOS method.
The output of the HELIOS method is represented with a two-row matrix, while the first and second rows are produced by the S₁-align and S₂-align operations, respectively. Moreover, each entry represents the alignment output of the characters at the corresponding position in the input sequences. By traversing the output from left to right, nonzero entries in both rows depict identical characters, i.e. character matching; while zero entries in both rows depict character mutation. Finally, a zero entry of a row along with a nonzero entry of the other one indicates indel, i.e. character insertion or deletion, and can be represented by a gap at the corresponding position of the sequence, containing the nonzero entry.

More »

Fig 6 — Fig 6.

Schematic illustration of the HELIOS optical architecture.
(A) The optical beam provision unit provides a collimated beam to feed the whole system. In this manner, the wideband laser beam, produced by a laser source, passes through the laser line bandpass filter and the pinhole to be cleaned. Afterward, the clean beam is diverged and collimated with passing through the objective and imaging lenses, respectively. Finally, the collimated beam is directed to the optical modulation and mechanism unit. (B) In the optical modulation and mechanism unit, passing collimated beam through WSF #1 modulates the wavelength of the optical beam based on the self-label coding of S₂ and S₁ on the first and second rows of a 2 × N pixels image, respectively; while PSF #1 performs their polarization selection based on their nearby-label coding scheme. Afterward, the objective and imaging lens arrays diverge and recollimate the optical beam through a horizontal direction to perform the shifting process of the alignment procedure. Moreover, WSF #2 and PSF #2 code S₁ and S₂ on the first and second rows of a 2 × N pixels image, respectively. By passing the expanded beams through WSF #2 and PSF #2, the proposed architecture compares the shifted coded S₂ with S₁ at the first row, implementing the S₁-align operation, and compares the shifted coded S₁ with S₂ at the second row, implementing the S₂ -align operation. Finally, each pixel is directed to two distinct pixels via a chiral medium to compensate for false mismatches. (C) Finally, in the output capturing unit, optical thresholdder eliminates wavelength cross-talks and speckle noises of the output before capturing. Afterward, the output is captured by a bi-convex lens and a charged-coupled device (CCD) camera.

More »

Fig 7 — Fig 7.

Modulation approach of the HELIOS Optical architecture, utilizing the wavelength and polarization of the optical beams.
To implement the self-label coding through the wavelength modulation approach, every character of the input sequence is modulated with a distinct wavelength, within the spectral range of [450–650] nanometers with bandwidth of 10 nanometers. On the other hand, to implement the nearby-label coding through the polarization selection approach, every character of the input sequence is assigned to a specific polarization along a 9-degree azimuth angle in the range of 0 to 180 degrees. Each approach provides twenty distinct codes for protein sequences, while only four of them are employed for coding DNA and RNA sequences.

More »

Fig 8 — Fig 8.

Simulation outputs of the HELIOS method and its optical architecture.
In this case of study, the “Severe acute respiratory syndrome coronavirus 2” sequences [49] are aligned in the form of (A) Protein, (B) RNA, and (C) DNA; while some single and multiple mutations/indels are manually imposed to the sequence with varying distributions. For more clarity, only a small portion of each full-length alignment including 60 characters is shown in this figure, with the beginning at (A) character 240, (B) character 721, and (C) character 1921. In the coding and the alignment procedure, the parameters are set to R = 4 and k = 5. By investigating the outputs, two input sequences are successfully aligned in a two-line output by performing two consecutive procedures of the HELIOS method, as well as, passing optical beams through two units of the HELIOS optical architecture. As a result, the matches, mutations, and indels are detected and located accurately.

More »

Table 1 — Table 1.

The parameter Identity of the HELIOS method in the quantitative measurement of homology.
The Identity reports the number of exactly matched characters of two compared sequences (in percentage) aligned by the HELIOS method, assuming the “Nine ND5 protein sequences dataset” [53].

More »

Table 2 — Table 2.

The parameter Similarity of the HELIOS method in the quantitative measurement of homology.
The Similarity measures the resemblance of two compared sequences (in percentage) aligned by the HELIOS method. Various amino acids are categorized into six groups based on their physicochemical properties; including GAVLI, FYW, STCM, KRH, DENQ, and P. Moreover, the “Nine ND5 protein sequences dataset” [53] is assumed in this study.

More »

Table 3 — Table 3.

The parameter Alignment Score of the HELIOS method in the quantitative measurement of homology.
To calculate the Alignment Score of two compared sequences aligned by the HELIOS method, the BLOSUM62 substitution scoring matrix is adopted with gap opening and extension penalties equal to -10 and -0.5, respectively. Moreover, the “Nine ND5 protein sequences dataset” [53] is assumed in this study.

More »

Table 4 — Table 4.

The parameter Identity of the Smith-Waterman algorithm in the quantitative measurement of homology.
The Identity reports the number of exactly matched characters of two compared sequences (in percentage) aligned by the HELIOS method, assuming the “Nine ND5 protein sequences dataset” [53].

More »

Table 5 — Table 5.

The parameter Similarity of the Smith-Waterman algorithm in the quantitative measurement of homology.
The Similarity measures the resemblance of two compared sequences (in percentage) aligned by the HELIOS method. Various amino acids are categorized into six groups based on their physicochemical properties; including GAVLI, FYW, STCM, KRH, DENQ, and P. Moreover, the “Nine ND5 protein sequences dataset” [53] is assumed in this study.

More »

Table 6 — Table 6.

The parameter Alignment Score of the Smith-Waterman algorithm in the quantitative measurement of homology.
To calculate the Alignment Score of two compared sequences aligned by the HELIOS method, the BLOSUM62 substitution scoring matrix is adopted with gap opening and extension penalties equal to -10 and -0.5, respectively. Moreover, the “Nine ND5 protein sequences dataset” [53] is assumed in this study.

More »

Table 7 — Table 7.

A brief report of the quantitative measurement of homology of measurement of the HELIOS method, compared to nine well-known algorithms, including SW, NW, BLAST, ClustalW, ClustalΩ, Muscle, T-Coffee, Kalign, and MAFFT. In this manner, the parameters Identity, Similarity, and Alignment Score are reported.
The Identity reports the number of exactly matched characters of two compared sequences (in percentage), aligned by each algorithm. Moreover, the Similarity measures the resemblance of two compared sequences (in percentage), aligned by every aformentioned algorithm. Various amino acids are categorized into six groups based on their physicochemical properties; including GAVLI, FYW, STCM, KRH, DENQ, and P. Finally, to calculate the Alignment Score of two compared sequences, aligned by every aforementioned algorithm, the BLOSUM62 substitution scoring matrix is adopted with gap opening and extension penalties equal to -10 and -0.5, respectively. Twelve diferent datasets are considered for this study [53, 55–61].

More »

Table 8 — Table 8.

The parameter Sensitivity (SEN) of the HELIOS method with referencing the Smith-Waterman algorithm in the accuracy measurement of classification output.
The assumed dataset is the “Nine ND5 protein sequences dataset” [53] in this study.

More »

Table 9 — Table 9.

The parameter Specification (Spec) of the HELIOS method with referencing the Smith-Waterman algorithm in the accuracy measurement of classification output.
The assumed dataset is the “Nine ND5 protein sequences dataset” [53] in this study.

More »

Table 10 — Table 10.

The parameter Accuracy (Acc) of the HELIOS method with referencing the Smith-Waterman algorithm in the accuracy measurement of classification output.
The assumed dataset is the “Nine ND5 protein sequences dataset” [53] in this study.

More »

Table 11 — Table 11.

The parameter Positive Predictive Value (PPV) of the HELIOS method with referencing the Smith-Waterman algorithm in the accuracy measurement of classification output.
The assumed dataset is the “Nine ND5 protein sequences dataset” [53] in this study.

More »

Table 12 — Table 12.

The parameter Negative Predictive Value (NPV) of the HELIOS method with referencing the Smith-Waterman algorithm in the accuracy measurement of classification output.
The assumed dataset is the “Nine ND5 protein sequences dataset” [53] in this study.

More »

Table 13 — Table 13.

The parameter Matthew’s Coefficient Correlation (MCC) of the HELIOS method with referencing the Smith-Waterman algorithm in the accuracy measurement of classification output.
The assumed dataset is the “Nine ND5 protein sequences dataset” [53] in this study.

More »

Table 14 — Table 14.

The parameter Test’s Accuracy (F-Score) of the HELIOS method with referencing the Smith-Waterman algorithm in the accuracy measurement of classification output.
The assumed dataset is the “Nine ND5 protein sequences dataset” [53] in this study.

More »

Table 15 — Table 15.

A brief report of the accuracy measurement of classification output of the HELIOS method with referencing well-known algorithms, including Smith-Waterman, Needleman-Wunsch, ClustalW, ClustalΩ, BLAST, Muscle T-Coffee, Kalign, MAFFT algorithms.
The parameters SEN, Spec, ACC, PPV, NPV, MCC, and F-Score are averaged and reported. The twelve datasets [53, 55–61] are considered.

More »

Table 16 — Table 16.

Time and space complexities of the HELIOS compared to well-known algorithms, performing a pairwise sequence alignment.
In this table, N and M are the lengths of the first and the second input sequences, respectively, assuming N > M. Moreover, the parameter L × W is the size of the aperture of modulators, used in HELIOS optical architecture. As represented in this table, HELIOS offers a fraction of MN by L2 × W for time complexity, leading to O(1) in the case of large modulators.

More »

Table 17 — Table 17.

Detailed description of the employed datasets for the genome-to-genome alignment, and the reads alignment with a reference sequence.
In these comparisons, the Illumina and PacBio data for Arabidopsis thaliana Ler-0 genome [68] are employed. Moreover, the human Illumina and PacBio reads are employed from the Ashkenazi child data set, which is available from the Genome in a Bottle project [69], NCBI SRA accession SRX847862. Furthermore, the reference genomes are the Arabidopsis thaliana Col-0 reference genome [62], human reference genome versionGRCH38.p7 [63], and the chimpanzee (pan troglodytes) genome [70], released in PanTro4, GenBank accession GCF0001515.6.

More »

Table 18 — Table 18.

Processing time and memory requirement of genome-to-genome alignment for the HELIOS optical architecture, compared to Nucmer4, Mauve, LASTZ default, LASTZ match, OptCAM, HAWPOD, and Moiré Techniques.
For these comparison scenarios, all reported timings for Nucmer4, Mauve, LASTZ default, and LASTZ match are measured on a dual-CPU, 32-core AMD Opteron 6276 computer with 256 GB of DDR3 PC3–12800 RAM, using 32 parallel threads. On the other hand, for the analytical estimation of the processing times of the HELIOS optical architecture, OptCAM, HAWPOD, and Moiré Technique, a typical graphene-based modulator is considered with an aperture size of 1024 × 1024 pixels and a 100 MHz switching rate. Moreover, the processing times of the HELIOS optical architecture are reported for more recently developed modulators with various aperture sizes, such as 1920 × 1080 and 4096 × 4096 pixels, as well as various switching rates, including 35 GHz [72] and 4.5 THz [67]. It should be noted that the processing times reported for Nucmer4, Mauve, LASTZ default, and LASTZ match include Wall time and CPU time, while the HELIOS method is implemented within its optical architecture, and hence, the corresponding wall time is assumed zero.

More »

Table 19 — Table 19.

Processing time and memory usage to align PacBio and Illumina reads to the Arabidopsis and Human reference genomes by HELIOS optical architecture, compared to BLASR, BWA-MEM, Bowtie2, Nucmer4, Moiré Technique, HAWPOD, and OptCAM.
In this regard, the analytical estimation of the processing times of the HELIOS optical architecture, OptCAM, HAWPOD, and Moiré Technique, are reported by employing a typical graphene-based modulator with an aperture size of 1024 × 1024 pixels and a 100 MHz switching rate. In versus, the processing times of Nucmer4, Bowtie2, BWA-MEM, and BLASR are measured on a dual-CPU, 32-core AMD Opteron 6276 computer with 256 GB of DDR3 PC3–12800 RAM, using 32 parallel threads, as reported in [14]. The report includes the times to build the genome index and to align the sequences. Moreover, the processing times of HELIOS optical architecture are also reported for more recently developed modulators with various aperture sizes, such as 1920 × 1080 and 4096 × 4096 pixels, as well as various switching rates, including 35 GHz [72] and 4.5 THz [67].

More »

Table 20 — Table 20.

Processing time of the HELIOS optical architecture, compared to Smith-waterman, assuming the SWISS-PROT database [64].
In this manner, various lengths of query sequences, from 144 to 5478 bases are aligned to the SWISS-PROT database [64], which contains 392,768 sequences and a total of 141,218,456 characters. In this regard, the processing times of Smith-waterman are reported by executing on NVIDIA TESLA K40 GPU with 44.3 giga cell updates per second (GCUPS) and GTX 275 with 21 GCUPS [65]. On the other hand, the processing times of HELIOS optical architecture are reported for a modulator with 1024 × 1024 pixels aperture size and 100 MHz switching rate [66] as a default, as well as for more recently developed modulators with various aperture sizes, such as 1920 × 1080 and 4096 × 4096 pixels, as well as various switching rates, including 35 GHz [72] and 4.5 THz [67].

More »

Publications
PLOS Aging and Health
PLOS Biology
PLOS Climate
PLOS Complex Systems
PLOS Computational Biology
PLOS Digital Health
PLOS Ecosystems
PLOS Genetics

PLOS Global Public Health
PLOS Medicine
PLOS Mental Health
PLOS Neglected Tropical Diseases
PLOS One
PLOS Pathogens
PLOS Sustainability and Transformation
PLOS Water

Home
Blogs
Collections
Give feedback
LOCKSS

Privacy Policy
Terms of Use
Advertise
Media Inquiries
Contact

PLOS is a nonprofit 501(c)(3) corporation, #C2354500, based in California, US