Skip to main content
Advertisement
  • Loading metrics

HELIOS: High-speed sequence alignment in optics

Abstract

In response to the imperfections of current sequence alignment methods, originated from the inherent serialism within their corresponding electrical systems, a few optical approaches for biological data comparison have been proposed recently. However, due to their low performance, raised from their inefficient coding scheme, this paper presents a novel all-optical high-throughput method for aligning DNA, RNA, and protein sequences, named HELIOS. The HELIOS method employs highly sophisticated operations to locate character matches, single or multiple mutations, and single or multiple indels within various biological sequences. On the other hand, the HELIOS optical architecture exploits high-speed processing and operational parallelism in optics, by adopting wavelength and polarization of optical beams. For evaluation, the functionality and accuracy of the HELIOS method are approved through behavioral and optical simulation studies, while its complexity and performance are estimated through analytical computation. The accuracy evaluations indicate that the HELIOS method achieves a precise pairwise alignment of two sequences, highly similar to those of Smith-Waterman, Needleman-Wunsch, BLAST, MUSCLE, ClustalW, ClustalΩ, T-Coffee, Kalign, and MAFFT. According to our performance evaluations, the HELIOS optical architecture outperforms all alternative electrical and optical algorithms in terms of processing time and memory requirement, relying on its highly sophisticated method and optical architecture. Moreover, the employed compact coding scheme highly escalates the number of input characters, and hence, it offers reduced time and space complexities, compared to the electrical and optical alternatives. It makes the HELIOS method and optical architecture highly applicable for biomedical applications.

Author summary

The character-by-character alignment of two long biological sequences, i.e. DNA, RNA, and protein, is a tedious task, but essential for recognizing homologies, relationships, and variations. In this case, every alteration, including mutations (substitution), and indels (insertion or deletion) is vital and required for many biological developments like diagnosis, medicine, and vaccination. However, the applicability of current sequence alignment methods is limited, specifically in processing time and memory usage, due to their inherent serialism and imperfections of electrical systems, as well as inefficient coding schemes of optical approaches. It approximately leads to quadratic run-time and space requirements in terms of input sequence lengths, becoming an expensive and laborious process for the real-time alignment of large datasets. Hence, proposing a superior alignment method in terms of accuracy, performance, and applicability can promote biological research and developments. Here, we show that we can overcome the long-lasting and challenging problems in sequence alignment procedure by exploiting optics as a novel computing technology. In this manner, we propose a novel method and its optical architecture for alignment of DNA, RNA, and protein sequences by exploiting high-speed processing and operational parallelism in optics. As our simulation studies confirm, it provides an accurate sequence alignment with outperforming the most widely used electrical and optical alternatives in the terms of processing time and memory requirements.

This is a PLOS Computational Biology Methods paper.

Introduction

Bioinformatics develops computation-intensive techniques to enhance theoretical and practical biological studies [1]. Pairwise sequence alignment as one of the key operations of bioinformatics compares two DNA, RNA, or protein sequences to recognize homology, similarity, and variation [2]. The character-by-character alignment of two long biological sequences is a tedious task, but essential to locate character matches, mutations (i.e., substitution), and indels (i.e., insertion or deletion) in favor of many biological developments [3].

In this manner, many existing sequence alignment methods consume considerable resources to perform an accurate sequence alignment [4]. For instance, Smith-Waterman [5] and Needleman-Wunsch [6] are based on dynamic programming (DP); BLAST [7], ClustalW [8], ClustalΩ [9] T-Coffee [10], and Kalign [11] utilize a heuristic search; MUSCLE [12] and MAFFT [13] are iterative methods which perform an FFT-based cross-correlation; MUMmer [14] relies on suffix trees; and HMM-based methods [15] are built upon a probabilistic model. Despite their accurate sequence alignment, their resource demands, specifically in time and space, are originated from their sequential natures [16]. Moreover, these methods suffer from various problems due to the imperfections of the electrical systems, such as high computational time and space, high power consumption, heat generation, slow response, etc. [17]. Furthermore, the rapid enlargement of biological datasets and the advancements of bioscience challenge them more than ever [18]. Although various parallel and distributed optimization methods [19] could moderate some of these problems, the electrical implementation of these algorithms enforces inherent serial computation and high memory requirements [20]. Specifically, these methods lead to high time and space requirements in terms of the input sequence lengths [21, 22], which severely limits their applicability. Hence, proposing a superior method in terms of speed, accuracy, and applicability is crucial for the real-time processing of large biological data.

Fortunately, the inherent benefit of optics and photonics [23], as a novel computing technology, provides high-speed operational parallelism and avoids the imperfections of electrical systems [24]. Accordingly, biophotonics develops optical techniques for biological developments [25]. In this manner, some methods have been accomplished recently, such as correlation-based methods [26, 27], Fourier-Transform-based algorithms [28, 29], HAWPOD [30], Moiré Technique [3133], OptCAM [34], GAC [35], and SPOMF [36]. Some of them [2629] address optical similarity measurement algorithms for sequence alignment by taking advantage of optical correlation and Fourier Transform. Despite their high-speed processing, these methods only measure the similarities and differences between the input sequences within specific zones, regardless of the exact location of the variations and their importance to biological developments [1]. On the other hand, some studies [3035] have achieved high-speed optical approaches for pairwise DNA alignment, which are capable of locating the variations. However, their sequence coding assumptions limit the number of input characters to that of the DNA sequences, which makes them incapable of aligning RNA and protein sequences; and misses their specific outcomes in diagnosis, medicine, and vaccination [3]. Finally, it should be noted that these methods should adopt an efficient biological data encoding by an optical modulator to avoid utilizing a large number of pixels per character. Accordingly, proposing a comprehensive pairwise alignment method can promote biological research by character-by-character alignment of various kinds of biological sequences in a fast accurate process.

In this manner, we are motivated to propose an advanced ultra-fast all-optical method to accurately align any pair of biological sequences. The proposed method is named HELIOS, abbreviating High-speed sEquence aLIgnment in OpticS. By exploiting high-speed processing and operational parallelism in optics [24], the highly sophisticated HELIOS method avoids the problems of current sequence alignment methods, as well as the imperfections of their electrical implementations. On the other hand, adopting an efficient optical encoding of biological data, the HELIOS method outperforms the alternative optical methods in the case of time and space requirements. While the proposed method is discussed in two separate sections for more clarity, i.e., HELIOS method and HELIOS optical architecture, each one is manipulated to enhance the other one, and both form a single coherent system. The basic block illustration of the HELIOS method and the HELIOS optical architecture are presented in Fig 1A and 1B, respectively. Given the interdisciplinary nature of the HELIOS method, it can outperform many well-known alignment algorithms in terms of processing time with comparable accuracy, as verified by our comprehensive simulation studies and analytical computations. Finally, the main innovative and exclusive contributions of this paper are described as follows:

  • Proposing an accurate pairwise sequence alignment method for DNAs, RNAs, and proteins.
  • Designing an ultra-fast all-optical high-throughput architecture for the proposed method.
  • Proposing an optical coding scheme utilizing wavelength and polarization of optical beams.

thumbnail
Fig 1. Block diagram illustration of the (A) HELIOS method and (B) HELIOS optical architecture.

(A) the HELIOS method aligns two input sequences by performing the coding and alignment procedures to exactly locate character matches and variations; and (B) the HELIOS optical architecture executes the HELIOS method by performing the optical beam provision unit, the optical modulation and mechanism unit, and output capturing unit, utilizing inherent parallelism and high-speed processing in optics.

https://doi.org/10.1371/journal.pcbi.1010665.g001

The organization of this manuscript is as follows. The Method section establishes the general concept of the HELIOS method, while its optical architecture is elaborated in the Optical Architecture section. Afterward, the Discussion and Results section discusses the functionality, accuracy, performance, and applicability of the HELIOS method and its optical architecture. Finally, the paper is concluded with future directions in the Conclusion and Future Perspective section.

Method

Principally, the HELIOS method is composed of two main procedures to perform a parallel accurate pairwise sequence alignment, as illustrated in Fig 1A. It includes a) Coding procedure to code each character of input sequences with two parameters, and b) Alignment procedure to align two coded input sequences by performing two distinct operations in parallel. The detailed descriptions of the procedures are presented as follows.

Coding procedure

Generally, the adopted coding scheme considerably affects the efficiency of the sequence alignment methods [2]. Thus, the coding procedure of the HELIOS method adopts a high compact distinct coding pattern to maximize parallelism and to achieve noise reduction. For this purpose, it codes every character of the input sequence according to two coding strategies: a) Self-label coding to code every character based on the character itself, and b) Nearby-label coding to provide a unique code for every character based on its nearby character.

First, the self-label coding provides a distinct code for every character within the input sequence based on the character itself. In this manner, the required number of distinct codes equals the number of nucleotides (four) in DNA and RNA or the number of amino acids (twenty) in protein sequences. Moreover, the nearby-label coding provides a distinct code for every character based on its nearby character. Specifically, it provides a unique code for each character based on its kth previous character. So, by traversing the sequence, as the kth previous character changes, the assigned code to the current character varies as well. Preserving locality information, it prevents data interference through the alignment procedure, as discussed in the Alignment procedure subsection. Same as the self-label coding, the required number of distinct codes are four, four, and twenty for coding DNA, RNA, and protein sequences, respectively.

Finally, to code the input sequence, it is traversed character-by-character, and every character is coded to an entry with two parameters according to both self-label and nearby-label coding schemes, described above. By putting together all coded entries, every sequence is represented as a one-dimensional (1D) vector with a size of 1 × length of the sequence (N). Here, we summarize the proposed coding procedure using Eq 1, formulating the coded pattern, and Eq 2, representing the adopted self-label and nearby-label coding schemes as follows: (1) (2) where vector Code represents the coded pattern for each sequence; parameters Cself,i and Cnearby,ik stand for the self-label and nearby-label coded values of the ith character, based on the character itself and its kth previous character, respectively. Moreover, the variable N represents the length of the sequence. Furthermore, variable CScheme,j calculates the coded value of character in position j within the sequence, according to coding strategy Scheme which is either the self-label or the nearby-label coding. In addition, the parameter OffsetScheme defines the smallest value assumed for character coding, and StepScheme is the difference between two consequent code values. It is worth noting that in the case of the self-label and nearby-label coding schemes, both values of Offset and Step are independent and different. Finally, the variable Chj stands for the character in position j within the sequence, while the parameter VChj represents a preset value for every character within the range of 0 and the number of bases minus 1. For instance, it equals 0, 1, 2, and 3 for A, T, G, and C in DNA sequences, respectively.

According to the above discussions, the size of the coded pattern represents the length of the sequence. On the other hand, the value of each entry of the code vector indicates the corresponding character within the input sequence. These features enable random access to each character within the code vector, and hence, prevent information loss with no restriction on the length of the input sequence. Moreover, proposing a one-dimensional (1D) coding scheme enables a two-dimensional (2D) arrangement of various codded patterns for further parallel processing. Furthermore, considering the kth previous character instead of the adjacent one in the proposed nearby-label coding scheme, it preserves the uniqueness of the code vector in the case of identical consecutive characters or pairs, like “AAAAAA” or “ACACAC”. Specifically, as the kth previous character changes in traversing the input sequence, the assigned code to the identical consecutive characters varies as well. It prevents false character match through the alignment procedure. As parameter k can be set from 1 to the length of the input sequence (N), the input sequence is assumed as a circular sequence. Hence, the nearby-label coding scheme can wrap around from one end to the other, in the case of large k, to code each character based on any desired nearby character. In this case study and without loss of generality, we assume k equals R + 1, in which R is the number of the sequence shifts, discussed in the Alignment procedure subsection. Some examples of the proposed procedure for coding protein, DNA, and RNA sequences are presented in Fig 2. It should be noted that the presented values in Fig 2 are chosen according to the optical features and implementation choices, as discussed in the Optical architecture section.

thumbnail
Fig 2. An example of the proposed coding scheme for DNA, RNA, and protein sequences.

In this example, short DNA, RNA, and protein sequences are coded based on self-label and nearby-label coding schemes with preset values as follows: Offsetself = 450, Stepself = 10, Offsetnearby = 0, Stepnearby = 9, k = 2, and R = 1. The parameter Chi stands for the character positioned in location i as the current character in the self-label coding, and the kth previous character in the nearby-label coding scheme. The parameter V represents a preset value between 0 to 19 for amino acids in the protein sequence and 0 to 3 for nucleotides in the DNA and the RNA sequences. Every character is coded with two values determined by the self-label and nearby-label coding schemes, as represented in its corresponding white block. For nearby-label coding of those characters positioned at the beginning of the sequence, the nearby-label coding wraps around the sequence and considers the desired nearby character at the end of the sequence.

https://doi.org/10.1371/journal.pcbi.1010665.g002

Alignment procedure

Once the input sequences are coded, the alignment procedure aligns two coded input sequences to determine their similarities and differences by locating character matches, mutations, and single or multiple indels. For this purpose, it performs two operations in parallel: a) S1-align operation to determine the state of characters (i.e., character matching, substitution, insertion, or deletion) within the first sequence, and b) S2-align operation to determine the state of characters within the second sequence. For simplicity, the first and second sequences are called S1 and S2 in the following, respectively.

S1-align operation.

To specify the state of characters within S1, the S1-align operation determines whether every character in S1 corresponds to an identical character in S2; while in the case of character mutations (i.e., substitution) or indels (i.e., insertion or deletion), this correspondence does not exist. Moreover, while mutations only substitute the character itself; indels cause right-shifting or left-shifting of the rest of the sequence as well [2]. So, both character substitution and character-shifting should be addressed in the case of mutations and indels, respectively. In this manner, the S1-align operation shifts the coded S2 vector one to R times in the horizontal direction towards the left and right of the main S2 vector, as depicted in Fig 3B. Afterward, the main S2 vector and all its shifts are compared to the non-shifted S1 vector correspondingly. Performing this comparison, as shown in Fig 3C, a nonzero entry appears in the comparison results if the corresponding self-label and nearby-label codes of the input sequences are identical. Otherwise, the corresponding entry of the comparison results remains zero in the case of non-identical characters. It is worth noting that while the comparison of S1 and S2 enables the detection of matched characters and mutations, single or multiple indels are detected by comparing the S1 vector to the shifted S2 vectors. Finally, each entry of the comparison outcome vector is formed by aggregating corresponding entries of all vectors of the comparison results, which are resulted from comparing the non-shifted S1 vector with the main S2 vector and all its shifts. So, the comparison outcome vector is represented in a row, as depicted in Fig 3D. As a key advantage, distinct code assignment to similar characters within the input sequences by the nearby-label coding scheme prevents data interference through the horizontal shifting and comparing processes.

thumbnail
Fig 3. Step-by-step progress of the S1-align operation of the HELIOS method for optical sequence alignment.

(A) Two input sequences, i.e S1 and S2, are given to the HELIOS method, assuming the third character (i.e. ‘A’ in S1 and E in S2) is mutated. (B) S1 and S2 are coded based on the proposed coding procedure, assuming k = 2 and R = 1. While the self-label codes of the third character are different for S1 and S2, the nearby-label codes of the fifth characters (i.e. ‘S’) are different as well, due to the mutated character, assuming k = 2. Afterward, the coded S2 is shifted one time horizontally towards the left and right of the coded S2 assuming R = 1. Then, the main S1 is compared with the main S2 and all its shifts, and hence, (C) the comparison results are presented for each comparison. (D) Next, the comparison output vector is formed by aggregating all the comparison results, where the matched characters result in nonzero entries. As represented with the zero entry, the mutated character in position 3 within the input sequences is successfully located due to the different self-label codes, while the 5th character is false mismatched due to the different nearby-label codes. (E) To compensate for this false mismatch, the ith entry of the output is determined according to aggregating the ith and the (i + k)th entries of the comparison outcome vector. Hence, the 5th entry is recovered by the corresponding nonzero value at the 7th entry; while proper detection of character mutation at the 3rd entry is not affected.

https://doi.org/10.1371/journal.pcbi.1010665.g003

Summarizing the above discussion, we can conclude that the S1-align operation successfully locates character substitution (both single and multiple) and character insertion (both single and multiple) in S1 (i.e. character deletion from S2). However, specifying characters deletion from S1 requires a further comparative operation, named S2-align operation, as follows.

S2-align operation.

As a complementary comparative operation, the S2-align operation determines the state of every character in S2 by finding its corresponding character in S1. For this purpose, the S2-align operation repeats the comparative S1-align operation, except that it shifts the S1 pattern (instead of S2) one to R times in the horizontal direction towards the left and right of the main S1 vector, as shown in Fig 4. Afterward, the main S1 vector and all its shifts are compared to the non-shifted S2 vector correspondingly. Thus, this operation successfully locates character mutations (both single and multiple), as well as character insertions (both single and multiple) in S2 (i.e. character deletions (both single and multiple) from S1). Specifically, character mutations and insertions are represented with zero entries in the comparison outcome vector.

thumbnail
Fig 4. Side-by-side representation of the S1-align and S2-align operations of the HELIOS method.

(A) As an overall view, the S1-align operation locates character substitutions, as well as character insertions in S1 (or character deletions from S2). For this purpose, it compares the main and all shifted S2 vectors with the S1 vector. Afterward, to produce the 1D output vector, the ith entry of the output is determined according to the ith and (i + k)th entries of the comparison outcome vector. (B) Similarly, the S2-align operation compares the main and all shifted S1 vectors with the S2 vector to locate character substitutions, as well as character insertions in S2 (i.e. character deletions from S1).

https://doi.org/10.1371/journal.pcbi.1010665.g004

It is worth noting that the number of consecutive indels (i.e. consecutive insertions or consecutive deletions) is assumed to not be larger than R, and hence, the value of R should be large enough to support all probable variations between two sequences. However, small values of R can be chosen in the case of aligning two similar sequences. Moreover, for aligning two input sequences with different lengths, the shorter one should slide all over the longer one to determine every probable variation, which results in a large value of R. As each sequence shifts in the horizontal direction towards the left and right of the other sequence, the parameter R varies in the range of [1, ]. However, the various choices of R (from 1 to ) do not affect the processing time and the speed, as discussed in more detail in the Optical architecture section.

Output vector production.

Once the S1-align and S2-align operations are performed, every entry of the comparison outcome vector can be determined accordingly, as depicted in Fig 3. Specifically, as shown in Fig 3A and 3D, a mutation or an indel within the input sequence results in a zero value at the corresponding entry of the comparison outcome vector. For example, the 3rd character (i.e. character ‘A’) within S1 is mutated against the character ‘E’ in S2. However, regarding the nearby-label coding scheme, the code of the kth next character is also affected. Hence, assuming k = 2, the 5th characters of the input sequences (i.e. characters ‘S’ of S1 and S2) are nearby-label coded differently as shown in Fig 3B. Consequently, this variation causes a false mismatch, as well as a false zero value at the 5th entry of the comparison outcome vector as shown in Fig 3C and 3D, respectively.

To compensate for the false mismatched characters, the corresponding character is involved whose nearby-label code is determined based on the false mismatched characters, which is (i + k)th charater (i.e. character ‘L’ at the 7th entry of S1 and S2 in Fig 3A and 3D). In this manner to produce the final output, the ith entry of the output is determined according to aggregating the ith and (i + k)th entries of the comparison outcome vector. For example, as depicted in Fig 3D and 3E, the false mismatch at the 5th entry is recovered by the corresponding nonzero value at the 7th entry; while proper detection of character mutation at the 3rd entry is not affected.

As a final word, it should be noted that all aforementioned steps to produce the final output, i.e. shifting, comparing, aggregating, etc., are done in parallel with no hardware complexity, taking advantage of the inherent parallelism in optics.

Analysis and review of the output.

As follows, we summarize the proposed alignment procedure: a) S1-align operation compares the mail and all shifted S2 vectors with the S1 vector to locate character substitutions, as well as character insertions in S1 (or deletions from S2), as depicted in Fig 4A and 4b) S2-align operation compares the main and all shifted S1 vectors with the S2 vector to locate character substitutions, as well as character insertions in S2 (or deletions from S1), as depicted in Fig 4B. To produce the output, the ith entry of the output is determined according to the ith and (i + k)th entries of the comparison outcome vector. Producing a 1D vector for each comparison, performed by the S1-align or S2-align operations, the output can be arranged as a two-row matrix, as formulated in Eq 3; while Eq 4 calculates every entry of the output, as follows: (3) (4) where, output vector represents the output of the sequence alignment in two rows, the parameter Outrow,i represents its ith entry as a result of comparing the ith character of the input sequences, while Outrow,i+k compensates the probable false mismatching. Parameter row stands for the output rows’ indices, resulted from the S1-align or S2-align operations. Moreover, variable Outrow,j represents the comparison outcome for the jth character by aggregating all corresponding entries, comparing the non-shifted vector of one sequence with the main and all shifted vectors of the other one as discussed in the S1-align and S2-align operation subsections. For this purpose, variable Arow,j, x represents the comparison result of two coded characters: Outrow,j as the jth coded character within the non-shifted sequence, and Code2/row, x as the xth coded character within the main or shifts of the other sequence. For instance, for calculating output [1, 3], as the state of 3rd entry of S1, the S1-align operation aggregates out1,3 and out1,5 based on Eq 3, assuming k = 2 and R = 1. While Eq 4 calculates out1,3, assuming row = 1 and i = 3, by accumulating the results of comparing codes of the 3rd entry of S1 (i.e. Code1,3) with the 2nd, 3rd, and 4th entries of S2 (i.e. Code2,2, Code2,3, Code2,4, respectively). Similar operation calculates out1,5. To finalize, the pseudo-code of the coding and alignment procedures is depicted in Algorithm 1.

Algorithm 1 Pseudo-code of the HELIOS method, including the coding and alignment procedures.

Require: S1S2R ≥ 0 ∧ k > 0

for each input sequence called Sinput (input = 1 → 2) do

  for character i = 1 → N do

   

   

   Sinput.Code[i] ⇐ CselfCnearby

  end for

end for

for each operation called row = 1 → 2 do

  for entry i = 1 → N do

   for entry j = (ik) → (i + k) do

    if Srow.Code[i] = S2/row.Code[j] then

     Out[row, i] ⇐ 1

    end if

   end for

  end for

  for entry i = 1 → N do

   Output[row, i] ⇐ Out[row, i] ∨ Out[row, i + k]

  end for

end for

Performing the HELIOS method, the output appears as a two-row matrix, while each entry represents the alignment output of two characters at the corresponding position within the input sequences, as shown in Fig 5. By traversing the output from left to right, nonzero entries in both rows depict identical characters, i.e. character matching, at the corresponding position of the input sequences. On the other hand, zero entries in both rows depict character mutation at the corresponding position of the input sequences. Finally, a zero entry of a row along with a nonzero entry of the other one indicates indel, i.e. character insertion or deletion, and can be represented by a gap at the corresponding position of the sequence, containing the nonzero entry, as shown in Fig 5.

thumbnail
Fig 5. Output explanation of the HELIOS method.

The output of the HELIOS method is represented with a two-row matrix, while the first and second rows are produced by the S1-align and S2-align operations, respectively. Moreover, each entry represents the alignment output of the characters at the corresponding position in the input sequences. By traversing the output from left to right, nonzero entries in both rows depict identical characters, i.e. character matching; while zero entries in both rows depict character mutation. Finally, a zero entry of a row along with a nonzero entry of the other one indicates indel, i.e. character insertion or deletion, and can be represented by a gap at the corresponding position of the sequence, containing the nonzero entry.

https://doi.org/10.1371/journal.pcbi.1010665.g005

Summarizing the HELIOS method, we would like to emphasize that it exactly locates character matches, mutations, and single/multiple indels through the alignment procedure; while the coding procedure presents distinct coding patterns for input sequences and reduces the noises at the output vector, represented in a convenient form.

Optical architecture

Equivalent to the HELIOS method, the HELIOS optical architecture is developed to exploit the inherent parallelism and ultra-fast processing capabilities of optics, as depicted in Figs 1B and 6. The HELIOS optical architecture consists of three main units: a) Optical beam provision unit to prepare collimated beam to feed the proposed optical architecture, b) Optical modulation and mechanism unit to accomplish the coding and alignment procedures of the HELIOS method, and c) Output capturing unit to capture the final output of HELIOS optical architecture. The units are explained in more detail as follows.

thumbnail
Fig 6. Schematic illustration of the HELIOS optical architecture.

(A) The optical beam provision unit provides a collimated beam to feed the whole system. In this manner, the wideband laser beam, produced by a laser source, passes through the laser line bandpass filter and the pinhole to be cleaned. Afterward, the clean beam is diverged and collimated with passing through the objective and imaging lenses, respectively. Finally, the collimated beam is directed to the optical modulation and mechanism unit. (B) In the optical modulation and mechanism unit, passing collimated beam through WSF #1 modulates the wavelength of the optical beam based on the self-label coding of S2 and S1 on the first and second rows of a 2 × N pixels image, respectively; while PSF #1 performs their polarization selection based on their nearby-label coding scheme. Afterward, the objective and imaging lens arrays diverge and recollimate the optical beam through a horizontal direction to perform the shifting process of the alignment procedure. Moreover, WSF #2 and PSF #2 code S1 and S2 on the first and second rows of a 2 × N pixels image, respectively. By passing the expanded beams through WSF #2 and PSF #2, the proposed architecture compares the shifted coded S2 with S1 at the first row, implementing the S1-align operation, and compares the shifted coded S1 with S2 at the second row, implementing the S2 -align operation. Finally, each pixel is directed to two distinct pixels via a chiral medium to compensate for false mismatches. (C) Finally, in the output capturing unit, optical thresholdder eliminates wavelength cross-talks and speckle noises of the output before capturing. Afterward, the output is captured by a bi-convex lens and a charged-coupled device (CCD) camera.

https://doi.org/10.1371/journal.pcbi.1010665.g006

Optical beam provision unit

The optical beam provision unit provides a collimated optical beam to feed the whole optical system. In this manner, it employs a wideband unpolarized laser source [37], a laser line bandpass filter [38], a pinhole, and two lenses, as depicted in Fig 6A. For this purpose, the wideband laser generates an intense coherent monochromatic light beam in a wide spectral range; while the laser line bandpass filter transmits laser light with suppressing ambient light as well as lower intensity secondary laser lines. It improves contrast by only transmitting light within a specific wavelength range e.g. 450 to 650 nanometers in this case of study. Moreover, the thermal load is minimized on the blocking glass and the epoxy by facing the highly reflective side of the filter to the laser source. Afterward, the Galilean beam expander model [39] is employed to provide the collimated beam. It utilizes a pinhole and two lenses, including a) an objective lens, which is a bi-concave lens with a negative focal length (−f1) to diverge the beam, and b) an imaging lens, which is a plano-convex lens with a positive focal length (+ f2) to collimate the diverged beam; while c) a pinhole, placed at the focal point of the lenses, spatially filters the beams to reduce its high pulse energy density and to prevent arcing the air. The absence of the focal point between the lenses because of different signs of the focal lengths (−f1 + f2), avoids high energy density between the lenses, as well as results in a compact design, erect output, and elimination of the correction lens.

Summarizing the above discussion, the wideband laser beam passes through the laser line bandpass filter and the pinhole to be cleaned. Afterward, the clean beam is diverged and collimated by passing through the objective and imaging lenses, respectively. Finally, the collimated beam is directed to the optical modulation and mechanism unit to fill the aperture of the modulator cells with a proper amplitude, wavelength, and polarization of the optical beams.

Optical modulation and mechanism unit

In the HELIOS optical architecture, the coding and alignment procedures of the HELIOS method are performed by transmitting collimated beams through the optical modulation and mechanism unit, as depicted in Fig 6B. In this unit, the self-label and nearby-label coding schemes are performed by modulating the wavelength and polarization of the optical beams, respectively. Specifically, it is performed by passing the collimated beam through electrically controlled spatial filters [40, 41]; while the S1-align and S2-align operations of the alignment procedure are simultaneously performed by expanding and overlapping the modulated beams.

Modulation approach.

To perform the self-label coding, a wavelength selection approach is employed to modulate every character of the input sequences at a distinct wavelength. In this manner, a recently developed electrically controllable wavelength selective filter (WSF) is adopted [40], which is built upon a liquid crystal. The filter covers the spectral band in the range of [450–1000] nanometers with bandwidth less than 10 nanometers and throughput more than 80 percents. Employing electronically controlled liquid crystal, the filter transmits only a selected wavelength of light and excludes others at each pixel.

Besides, the nearby-label coding is implemented utilizing the polarization of the optical beams. In this manner, a proposed polarization-based spatial filter (PSF) is employed [41], which is built upon an S-waveplate. It modulates every character of the input sequence with a unique linear polarization. This filter operates by transmitting a specific polarization along an azimuth angle θ in the range of [0, 180] degrees; while rejecting other polarizations. As the S-waveplate is a polarization-sensitive element, the transmittance of the S-waveplate at each pixel can be controlled by adjusting a bias voltage on the waveplate. Hereupon, it passes the incident beam at a specific polarization at each pixel. So, this property enables us to electrically modulate the polarization of the optical beams.

Therefore, the proposed architecture modulates specific wavelengths and polarizations of the unpolarized light beams, transmitted by the optical beam provision unit. Moreover, where a modulated beam crosses wavelength-selective and polarization-based spatial filters, it can only pass through the filters in the case of identical wavelengths and polarizations. Hence, it enables the comparison of two coded patterns in the optical architecture. Despite providing many distinct codes, in this study we assume modulation wavelength in the range of [450, 650] nanometers with 10 nanometers channel spacing, and linear polarization selection in the range of [0, 180] degree with angle variation of 9 degrees. This assumption provides the required orthogonal code sets for the self-label and nearby-label codings of DNA, RNA, and protein sequences, as depicted in Fig 7.

thumbnail
Fig 7. Modulation approach of the HELIOS Optical architecture, utilizing the wavelength and polarization of the optical beams.

To implement the self-label coding through the wavelength modulation approach, every character of the input sequence is modulated with a distinct wavelength, within the spectral range of [450–650] nanometers with bandwidth of 10 nanometers. On the other hand, to implement the nearby-label coding through the polarization selection approach, every character of the input sequence is assigned to a specific polarization along a 9-degree azimuth angle in the range of 0 to 180 degrees. Each approach provides twenty distinct codes for protein sequences, while only four of them are employed for coding DNA and RNA sequences.

https://doi.org/10.1371/journal.pcbi.1010665.g007

Mechanism of the unit.

In the mechanism of the unit, at first, a space of 2 × N pixels is reserved on the WSF #1 and PSF #1, as depicted in Fig 6D and 6E, respectively. While the first row modulates all characters within S2 for performing the S1-align operation, the second row modulates all characters within S1 for performing the S2-align operation. Hence, transmitting collimated beam through WSF #1 and PSF #1 modulates wavelength and polarization of optical beams based on the self-label and nearby-label coding of the input sequences, respectively.

To realize shifting process of the modulated beams along the horizontal direction as discussed in the alignment procedure subsection, every modulated beam is expanded to multiple horizontal beams by two micro-lens arrays, as depicted in Fig 6B and 6F: a) an objective microlens array, composed of 2 × N bi-convex lenses with a negative focal length to diverge the modulated beams in the horizontal direction, and b) an imaging microlens array, composed of 2 × N bi-concave lenses with a positive focal length to recollimate the converged beams. It is worth noting that to implement the various numbers of sequence shifts, the objective and image lens arrays with focal lengths of different signs can be adopted to expand every modulated beam to the required range of horizontal pixels. In this process, the number of shifts, i.e. value of parameter R, does not affect the performance of the optical system, since expanding and recollimating the optical beam are performed in parallel.

Afterward, the recollimated modulated S2 and S1 beams (produced by WSF #1, PSF #1, and microlens arrays) are fed to WSF #2 and PSF #2. The WSF #2 and PSF #2 modulate the self-label and nearby-label codes of S1 and S2 on the first and the second rows of reserved 2 × N pixels, respectively, as depicted in Fig 6G and 6H. Hence, by passing the recollimated modulated beams through WSF #2 and PSF #2, the proposed architecture compares the shifted coded S2 with S1 at the first row, implementing the S1-align operation. Concurrently, it compares the shifted coded S1 with S2 at the second row, implementing the S2-align operation. Specifically, in the case that the crossed beam (modulated by PSF #1 and WSF #1) and the pixel on PSF #2 and WSF #2 have identical wavelength and polarization, respectively, the beam passes through the filters, indicating a non-blocking state, and so, a nonzero amplitude pixel appears at the comparison outcome. Otherwise, the optical beam fails to pass through WSF #2 or PSF #2, indicating a blocking state, and so, a zero amplitude pixel appears at the comparison outcome vector. In this manner, the presence of a pixel with nonzero amplitude at the output clarifies a matching state through the alignment procedure; while a zero amplitude pixel clarifies a mismatch (i.e. mutation or indel) correspondingly.

As discussed in the Output vector production subsection, false mismatches on the kth next pixels to the right of mutated characters lead to false zero amplitude pixels at the output. Hence, the value of these pixels should be recovered to produce the proper output. In this manner, the property of double refraction of optical beams in optically active media is employed [42]. When a linearly-polarized beam of light enters an optically active medium, like a chiral liquid crystal, it is split into two separate beams of opposite circular polarizations, traveling at different speeds through the medium. Hence, the beams are refracted and diverged by an angle according to the different propagation speeds of the right-circularly and left-circularly polarized light beams [42]. So, every beam, entered the optically active medium, leaves it from two different locations with a specific distance apart. While light speed determines the angle of refraction, the desired distance between the two exit points can be set by the property and geometry of the optically active medium. Considering the double refraction property, every beam representing the ith pixel is split into two beams at the ith and the (ik)th pixels, as depicted in Fig 6I.

Summarizing the above discussion, the S1-align and S2-align operations of the alignment procedure are completely and concurrently performed by passing the collimated beams through the spaces of 2 × N pixels of the WSFs, PSFs, microlens arrays, and optically active media. It is worth noting that the inherent parallelism of optics enables multiple input sequences to be arranged on the aperture of the modulator cells and to be aligned through the HELIOS optical architecture simultaneously. Therefore, efficient use of coding space, as well as considerable speed-up can be achieved by the proposed optical architecture.

Output capturing unit

Finally, optical thresholding is performed by an optical thresholder, shown in Fig 6C to provide a clean outcput. In this manner, it eliminates wavelength cross-talks and speckle noises of the output before capturing. At last, the output is converged to a proper aperture by a bi-convex lens to be captured by a charge-coupled device (CCD) camera [43], as depicted in Fig 6C. The resultant output, represented as 2 × N pixels image, includes nonzero and zero amplitudes, while each pixel represents the state of the corresponding character within the input sequences.

Discussion and results

Comprehensively, various simulation approaches and numerical analyses are investigated to assess the HELIOS method and its optical architecture. At first, the functionality of the HELIOS method and its optical architecture is validated through investigating various simulation outputs. Next, the accuracy of the HELIOS method is inspected with comprehensive simulation studies and statistical analyses for various datasets. Afterward, the time and space complexities and the performance of the HELIOS optical architecture are estimated by analytical computation. For a comparative study, we consider various well-known algorithms, including BLAST [7], ClustalW [8], ClustalΩ [9], MUSCLE [12], T-Coffee [10], Kalign [11], MAFFT [13], Smith-waterman [5], Needleman-Wunsch [6] Nucmer4 [14], BLASR [44], BWA-MEM [45], Bowtie2 [46], Mauve [47], LASTZ [48], Moiré Technique [31], HAWPOD [30], and OptCAM [34]. Finally, some well-known applications are presented that can potentially benefit from the HELIOS method.

Functional validation

In order to evaluate the functionality of the HELIOS method and its optical architecture, the alignment outputs of numerous DNA, RNA, and protein sequences are investigated. In this manner, the HELIOS method and its optical architecture are simulated in MATLAB simulation tool and COMSOL Multiphysics software, respectively. As a case study, the “Severe acute respiratory syndrome coronavirus 2” sequences [49] are aligned and represented in the form of protein, RNA, and DNA, in Fig 8A–8C, respectively; while some single and multiple mutations/indels are manually imposed to the sequence with varying distributions. For more clarity, only a small portion of each full-length alignment, including 60 characters is represented in Fig 8. As shown in Fig 8, two input sequences are successfully aligned in a two-line output by performing two consecutive procedures of the HELIOS method, as well as, passing optical beams through two units of the HELIOS optical architecture. Moreover, the wavelength modulation within the range of 450 to 650 nanometers with 10 nanometers spacing, accompanied with polarization selection in the range of 0 to 180 degrees with 9-degree angle variation are performed for modulating optical beams in the HELIOS optical architecture. It is noteworthy that the crosstalk between neighboring pixels due to an electric field leakage, produced by filtering a specific wavelength and polarization, is negligible and does not affect our results [50].

thumbnail
Fig 8. Simulation outputs of the HELIOS method and its optical architecture.

In this case of study, the “Severe acute respiratory syndrome coronavirus 2” sequences [49] are aligned in the form of (A) Protein, (B) RNA, and (C) DNA; while some single and multiple mutations/indels are manually imposed to the sequence with varying distributions. For more clarity, only a small portion of each full-length alignment including 60 characters is shown in this figure, with the beginning at (A) character 240, (B) character 721, and (C) character 1921. In the coding and the alignment procedure, the parameters are set to R = 4 and k = 5. By investigating the outputs, two input sequences are successfully aligned in a two-line output by performing two consecutive procedures of the HELIOS method, as well as, passing optical beams through two units of the HELIOS optical architecture. As a result, the matches, mutations, and indels are detected and located accurately.

https://doi.org/10.1371/journal.pcbi.1010665.g008

Analyzing the simulation outputs, all the character matches, mutations, and indels are accurately detected and located within the protein, RNA, and DNA sequences. As depicted in Fig 8, the character matches are presented with the nonzero entries and high amplitude pixels within the outputs of the HELIOS method and its optical architecture, respectively; while the zero entries and zero-amplitude pixels represent mutations or indels. Eventually, investigating the simulation outputs verifies the accurate functionality of the HELIOS method in both levels of method and optical architecture.

Accuracy evaluation

In order to comprehensively assess the accuracy of the HELIOS method, two statistical analyses are performed through simulations of various datasets: 1) Quantitative measurement of homology [51], and 2) Accuracy measurement of classification output [52], compared to the well-known algorithms.

Quantitative measurement of homology.

To perform quantitative measurement of homology [51], the parameters Identity, Similarity, and Alignment Score of the HELIOS outputs are calculated through simulation studies, as reported in Tables 13, respectively, assuming the “Nine ND5 protein sequences dataset” [53]. While the Identity reports the number of exactly matched characters of two sequences (in percentage), the Similarity measures the resemblance of two compared sequences. Specifically, regarding the physicochemical properties, the amino acids are categorized into six groups with different similarity values; including GAVLI, FYW, STCM, KRH, DENQ, and P. As the third metric, the BLOSUM62 [54] substitution scoring matrix [54] is adopted to calculate the Alignment Score, with gap opening and extension penalties equal to -10 and -0.5, respectively.

thumbnail
Table 1. The parameter Identity of the HELIOS method in the quantitative measurement of homology.

The Identity reports the number of exactly matched characters of two compared sequences (in percentage) aligned by the HELIOS method, assuming the “Nine ND5 protein sequences dataset” [53].

https://doi.org/10.1371/journal.pcbi.1010665.t001

thumbnail
Table 2. The parameter Similarity of the HELIOS method in the quantitative measurement of homology.

The Similarity measures the resemblance of two compared sequences (in percentage) aligned by the HELIOS method. Various amino acids are categorized into six groups based on their physicochemical properties; including GAVLI, FYW, STCM, KRH, DENQ, and P. Moreover, the “Nine ND5 protein sequences dataset” [53] is assumed in this study.

https://doi.org/10.1371/journal.pcbi.1010665.t002

thumbnail
Table 3. The parameter Alignment Score of the HELIOS method in the quantitative measurement of homology.

To calculate the Alignment Score of two compared sequences aligned by the HELIOS method, the BLOSUM62 substitution scoring matrix is adopted with gap opening and extension penalties equal to -10 and -0.5, respectively. Moreover, the “Nine ND5 protein sequences dataset” [53] is assumed in this study.

https://doi.org/10.1371/journal.pcbi.1010665.t003

For a comparative study, the quantitative measurement of homology is performed by the HELIOS method and is compared to various well-known algorithms, including BLAST [7], ClustalW [8], ClustalΩ [9], MUSCLE [12], MAFFT [13], Kalign [11], T-Coffee [10], Smith-Waterman (SW) [5], and Needleman-Wunsch (NW) [6] algorithms. As an instance, the values of Identity, Similarity, and Alignment Score of the Smith-Waterman algorithm are reported in detail in Tables 46, respectively, to be compared to those of the HELIOS method. Additionally, we consider twelve different datasets for this evaluation, represented in S1 TextS12 Text; while the input sequences of each dataset are represented in Table A3 in its corresponding file. Moreover, the quantitative measurement of homology of all aforementioned algorithms are reported in Tables A4-A33 in the S1 TextS12 Text for twelve different datasets [53, 5561]. By the way, as a brief report, the average value of each parameter, achieved by the aforementioned algorithms, are reported in Table 7 for the twelve datasets.

thumbnail
Table 4. The parameter Identity of the Smith-Waterman algorithm in the quantitative measurement of homology.

The Identity reports the number of exactly matched characters of two compared sequences (in percentage) aligned by the HELIOS method, assuming the “Nine ND5 protein sequences dataset” [53].

https://doi.org/10.1371/journal.pcbi.1010665.t004

thumbnail
Table 5. The parameter Similarity of the Smith-Waterman algorithm in the quantitative measurement of homology.

The Similarity measures the resemblance of two compared sequences (in percentage) aligned by the HELIOS method. Various amino acids are categorized into six groups based on their physicochemical properties; including GAVLI, FYW, STCM, KRH, DENQ, and P. Moreover, the “Nine ND5 protein sequences dataset” [53] is assumed in this study.

https://doi.org/10.1371/journal.pcbi.1010665.t005

thumbnail
Table 6. The parameter Alignment Score of the Smith-Waterman algorithm in the quantitative measurement of homology.

To calculate the Alignment Score of two compared sequences aligned by the HELIOS method, the BLOSUM62 substitution scoring matrix is adopted with gap opening and extension penalties equal to -10 and -0.5, respectively. Moreover, the “Nine ND5 protein sequences dataset” [53] is assumed in this study.

https://doi.org/10.1371/journal.pcbi.1010665.t006

thumbnail
Table 7. A brief report of the quantitative measurement of homology of measurement of the HELIOS method, compared to nine well-known algorithms, including SW, NW, BLAST, ClustalW, ClustalΩ, Muscle, T-Coffee, Kalign, and MAFFT. In this manner, the parameters Identity, Similarity, and Alignment Score are reported.

The Identity reports the number of exactly matched characters of two compared sequences (in percentage), aligned by each algorithm. Moreover, the Similarity measures the resemblance of two compared sequences (in percentage), aligned by every aformentioned algorithm. Various amino acids are categorized into six groups based on their physicochemical properties; including GAVLI, FYW, STCM, KRH, DENQ, and P. Finally, to calculate the Alignment Score of two compared sequences, aligned by every aforementioned algorithm, the BLOSUM62 substitution scoring matrix is adopted with gap opening and extension penalties equal to -10 and -0.5, respectively. Twelve diferent datasets are considered for this study [53, 5561].

https://doi.org/10.1371/journal.pcbi.1010665.t007

Analyzing all data reported in Tables 1, 4, and 7, we can confirm that the HELIOS method detects and locates a bit more identical characters among the input sequences in most of the given datasets, and hence, it leads to higher values of the Identity compared to the SW and other well-known algorithms (as reported in S1 TextS12 Text). The main reason can be stated as follows; since the HELIOS method inserts limited consecutive gaps freely (R gaps in maximum), it detects and locates more identical characters between two input sequences. Moreover, comparing Table 2 with 5 and considering Table 7, we can conclude that while the Similarity values, achieved by the HELIOS method are approximately equal to those of alternative algorithms for the most given datasets, while at the worst case it is small by only 2.28%, compared to T-Coffee for ND6 (NADH dehydrogenase subunit 6) protein of eight species dataset [55]. On the other hand, as reported in Tables 3, 6, and 7, the values of the Alignment Scores, calculated by the HELIOS method highly reach those of alternative algorithms in our case studies. Therefore, we can conclude that the HELIOS method performs a comparable accurate alignment against the alternative algorithms.

Accuracy measurement of classification output.

The accuracy measurement of the classification output [52] of the HELIOS method is addressed by calculating the values of Sensitivity (SEN), Specificity (Spec), Accuracy (ACC), Positive Predictive Value (PPV), Negative Predictive Value (NPV), Matthew’s Coefficient Correlation (MCC), and Test’s Accuracy (F-Score) in the simulation studies, according to Eq 5 to Eq 11, respectively. (5) (6) (7) (8) (9) (10) (11) where, parameters TP, TN, FP, and FN stand for True Positive, True Negative, False Positive, and False Negative, respectively.

As a comparative study, the accuracy measurement of the classification output of the HELIOS method is accomplished, and the corresponding metrics (as formulated by Eqs 5 to 11) are calculated with considering Smith-Waterman [5], Needleman-Wunsch [6], ClustalW [8], ClustalΩ [9], BLAST [7], MUSCLE [12], T-Coffee [10], Kalign [11], and MAFFT [13] algorithms as the reference algorithms. As an instance, the values of SEN, Spec, ACC, PPV, NPV, MCC, and F-Score of the HELIOS method are reported with considering SW as the reference algorithm in Tables 814, respectively. It should be noted that the values of these metrics for other mentioned algorithms which are considered as the reference algorithm for various datasets are reported in Tables A34-A96 in S1 TextS12 Text, respectively. By the way, as a brief report, the average value of each parameter considering the mentioned reference algorithms is reported in Table 15 for the same twelve datasets addressed in Table 7.

thumbnail
Table 8. The parameter Sensitivity (SEN) of the HELIOS method with referencing the Smith-Waterman algorithm in the accuracy measurement of classification output.

The assumed dataset is the “Nine ND5 protein sequences dataset” [53] in this study.

https://doi.org/10.1371/journal.pcbi.1010665.t008

thumbnail
Table 9. The parameter Specification (Spec) of the HELIOS method with referencing the Smith-Waterman algorithm in the accuracy measurement of classification output.

The assumed dataset is the “Nine ND5 protein sequences dataset” [53] in this study.

https://doi.org/10.1371/journal.pcbi.1010665.t009

thumbnail
Table 10. The parameter Accuracy (Acc) of the HELIOS method with referencing the Smith-Waterman algorithm in the accuracy measurement of classification output.

The assumed dataset is the “Nine ND5 protein sequences dataset” [53] in this study.

https://doi.org/10.1371/journal.pcbi.1010665.t010

thumbnail
Table 11. The parameter Positive Predictive Value (PPV) of the HELIOS method with referencing the Smith-Waterman algorithm in the accuracy measurement of classification output.

The assumed dataset is the “Nine ND5 protein sequences dataset” [53] in this study.

https://doi.org/10.1371/journal.pcbi.1010665.t011

thumbnail
Table 12. The parameter Negative Predictive Value (NPV) of the HELIOS method with referencing the Smith-Waterman algorithm in the accuracy measurement of classification output.

The assumed dataset is the “Nine ND5 protein sequences dataset” [53] in this study.

https://doi.org/10.1371/journal.pcbi.1010665.t012

thumbnail
Table 13. The parameter Matthew’s Coefficient Correlation (MCC) of the HELIOS method with referencing the Smith-Waterman algorithm in the accuracy measurement of classification output.

The assumed dataset is the “Nine ND5 protein sequences dataset” [53] in this study.

https://doi.org/10.1371/journal.pcbi.1010665.t013

thumbnail
Table 14. The parameter Test’s Accuracy (F-Score) of the HELIOS method with referencing the Smith-Waterman algorithm in the accuracy measurement of classification output.

The assumed dataset is the “Nine ND5 protein sequences dataset” [53] in this study.

https://doi.org/10.1371/journal.pcbi.1010665.t014

thumbnail
Table 15. A brief report of the accuracy measurement of classification output of the HELIOS method with referencing well-known algorithms, including Smith-Waterman, Needleman-Wunsch, ClustalW, ClustalΩ, BLAST, Muscle T-Coffee, Kalign, MAFFT algorithms.

The parameters SEN, Spec, ACC, PPV, NPV, MCC, and F-Score are averaged and reported. The twelve datasets [53, 5561] are considered.

https://doi.org/10.1371/journal.pcbi.1010665.t015

As reported in Tables 814 with assuming SW as the reference algorithm, and then reported in Table 15, the values of SEN, Spec, ACC, PPV, NPV, MCC, and F-Score of the HELIOS method approximately reach the value of one for most of the comparative studies. The latter observation confirms that the simulation outputs by the HELIOS method are highly similar to those of SW, BLAST, MUSCLE, ClustalW, ClustalΩ, Kalign, and MAFFT; while there are some differences with the outputs of NW and T-Coffee. It is noteworthy that detection of more identical characters by the HELIOS method, and hence, the higher value of the Identity, as represented in Tables 1 and 7, results in the less value of the aforementioned parameters, as presented in Tables 815.

Performance evaluation

For performance evaluation, at first, we analyze the time and space complexities of the HELIOS optical architecture by analytical computations, and afterward, the processing time and required memory of the HELIOS optical architecture is analytically estimated. Both analyses are compared to the well-known algorithms with various implementation assumptions.

First of all, it is worth noting that the ability to arrange numerous input sequences on the 2D aperture of the employed modulators empowers the HELIOS optical architecture to simultaneously align numerous sequences through each alignment process, as the result of the highly sophisticated HELIOS optical architecture along with the inherent parallel processing capability of optics. Moreover, relying on parallelism, it eliminates the need for large storage components, such as RAMs. Subsequently, its required memory only depends on the aperture size of the employed modulators. By the way, we consider the utilized space on the modulators to modulate the whole input sequences as the space complexity of the HELIOS optical architecture. Here, the aperture size of modulators directly impacts the time and space complexities. Specifically, they can be large enough to modulate a whole genome, or thousands of short sequences to be aligned concurrently. Moreover, regarding negligible light propagation delay through an optical architecture, the time complexity and processing time of numerous pairwise sequences alignments are also determined by the switching time of the modulators. Therefore, taking advantage of the HELIOS optical architecture, the input sequences are aligned at the speed of light, regardless of the number of sequence shifts (i.e. R) in the alignment procedure. In all, while these two key factors, i.e. the aperture size and switching times of the modulators, determine the time complexity and processing time of the HELIOS optical architecture; the utilized space on the modulators is considered as the space complexity of HELIOS optical architecture. In folowing case of studies, we consider two modulators of size L × W pixels, as L represents its length and W stands for its width, and assume the first and the second input sequences of lengths N and M, respectively, while N > M.

Time and space complexities.

As the HELIOS method employs a simple coding scheme to align two sequences, which codes every character of input sequence by only two entries, it achieves the time complexity in order of and space complexity in order of . However, employing large modulators for aligning short sequences leads to O(1) time complexity. To emphasize the superiority of optical computing, it should be noted that common electrical algorithms necessitate storage of large matrices, and hence, are inefficient for aligning a substantial number of long sequences. For instance, it reaches O(MN) for SW, NW, ClustalW, MUSCLE, and T-Coffee [21, 22]. As shown in Table 16, the time and space complexities of the HELIOS optical architecture are considerably less than those of alternative electrical algorithms.

thumbnail
Table 16. Time and space complexities of the HELIOS compared to well-known algorithms, performing a pairwise sequence alignment.

In this table, N and M are the lengths of the first and the second input sequences, respectively, assuming N > M. Moreover, the parameter L × W is the size of the aperture of modulators, used in HELIOS optical architecture. As represented in this table, HELIOS offers a fraction of MN by L2 × W for time complexity, leading to O(1) in the case of large modulators.

https://doi.org/10.1371/journal.pcbi.1010665.t016

On the other hand, a few recent proposed optical approaches, such as Moiré Technique [31], HAWPOD [30], and OptCAM [34] benefit from the parallelism and high-speed processing of optics. However, due to their exclusive coding assumptions, they can be adopted for neither RNAs nor proteins alignment. Moreover, these methods occupy a large number of pixels to code a DNA character which leads to inefficient resource utilization. While the OptCAM needs 8 pixels to code a single character, the HAWPOD and the Moiré Technique use two and four columns on a modulator (2 × W pixels), respectively. However, it is only two pixels for the HELIOS optical architecture. Moreover, while the output of the Moiré Technique is too noisy and practically useless for large input sequences, the outputs of the HELIOS method and its optical architecture are highly manipulated to be easy to understand. Finally, estimating the order of time complexity of Moiré Technique, HAWPOD, and OptCAM leads to , , and , respectively. Additionally, the estimations of the order of their space complexities are , , and , for Moiré Technique, HAWPOD, and OptCAM, respectively. According to these estimations, the HELIOS optical architecture outperforms the alternative optical approaches in terms of time and space complexities, alongside its wider applicability.

Considering the electrical implementation, the HELIOS method is simple enough to be executed on a typical computer. Specifically, in the case of time complexity, comparing each character of an input sequence with 2R+1 characters of the other one leads to time complexity in the order of O((2R + 1)(M + N)). In this regard, since the variable R is in the range of [1, ], the time complexity varies in the range of [O(3(M + N)), O(2MN + N + M)]. Additionally, in the case of space complexity, the required space for the coding and alignment procedures can be estimated as follow:

  • O(2(M + N)) for coding procedure,
  • O((2R + 1)(M + N)) for shifting and aligning two sequences; varying in [O(3(M + N)), O(2MN + N + M)] for R in the range of [1, ],
  • O(M + N) for compensation of false mismatching and storing the final output.

Therefore, the space complexity of the HELIOS method, executed on an electrical computer is in the range of [O(6(M + N)), O(2MN + 4(M + N))]. Concluding the above discussion, we can state that the HELIOS method achieves linear to quadratic time and space complexities, running on an electrical computer.

Processing time comparison.

For a comprehensive comparison of the processing time and the memory requirement, the following scenarios are considered. In this manner, the processing times and memory of the HELIOS optical architecture are analytically estimated and compared to A) Nucmer4 [14], Mauve [47], LASTZ match [48], LASTZ default [48], Moiré Technique [31], HAWPOD [30], and OptCAM [34] in the case of genome-to-genome alignment, as reported in [14], B) Nucmer4 [14], BLASR [44], BWA-MEM [45], Bowtie2 [46], Moiré Technique [31], HAWPOD [30], and OptCAM [34] in the case of reads alignment with a reference sequence considering Illumina and PacBio databases for Arabidopsis [62] and Human [63] reference genomes, as reported in [14], and finally, C) Smith-waterman [5], in the case of aligning various length of sequences with the SWISS-PROT dataset [64], reported in [65]. The analytical estimation of processing time of HELIOS optical architecture is performed by considering the aperture size and switching times of the modulators, as follows, according to Eq 12; while the memory requirement equals the aperture size of the modulators, according to Eq 13. (12) (13) where, P stands for the processing time, and S represents the required memory to align two input sequences of length N and M. Moreover, L and W are the length and width of the modulators in pixels, respectively, and T stands for the switching time of the modulators.

In this manner, for the first and the second comparison scenarios, all reported timings are measured on a dual-CPU, 32-core AMD Opteron 6276 computer with 256 GB of DDR3 PC3–12800 RAM, using 32 parallel threads. Moreover, detailed information about assumed datasets in both scenarios are shown in Table 17 [14]. Finally, for the third comparison, an NVIDIA TESLA K40 GPU with 44.3 giga cell updates per second (GCUPS), and GTX 275 with 21 GCUPS [65] are employed. On the other hand, for the analytical estimation of the processing times of the HELIOS optical architecture, we consider a typical graphene-based modulator with an aperture size of 1024 × 1024 pixels and a 100 MHz switching rate [66], which can modulate 1,048,576 bases per 10 nanoseconds. It should be noted that the switching rate can be increased up to 4.5 THz, resulting in 0.22 picoseconds for the switching time, by employing a recently developed metamaterial-based modulator [67]. Additionally, the aperture size of the modulator can be enlarged as well, for example to 4096 × 4096 pixels to escalate the number of modulated bases per switch, which speeds up the process by 64 times.

thumbnail
Table 17. Detailed description of the employed datasets for the genome-to-genome alignment, and the reads alignment with a reference sequence.

In these comparisons, the Illumina and PacBio data for Arabidopsis thaliana Ler-0 genome [68] are employed. Moreover, the human Illumina and PacBio reads are employed from the Ashkenazi child data set, which is available from the Genome in a Bottle project [69], NCBI SRA accession SRX847862. Furthermore, the reference genomes are the Arabidopsis thaliana Col-0 reference genome [62], human reference genome versionGRCH38.p7 [63], and the chimpanzee (pan troglodytes) genome [70], released in PanTro4, GenBank accession GCF0001515.6.

https://doi.org/10.1371/journal.pcbi.1010665.t017

A) For the first comparison scenario, the processing time of the HELIOS optical architecture is compared to that of Nucmer4, Mauve, LASTZ default, LASTZ match, Moiré Technique, HAWPOD, and OptCAM in the case of genome-to-genome alignment, as represented in Table 18.

At first, the reference assemblies of human genome, version GRCh38.p7 [63] (3.088 Gb) and chimpanzee genome, (release PanTro4, GenBank accession GCF_000001515.6) [70] (3.31 Gb) are aligned to one another, as represented in Table 17. It takes 3104 minutes with 66 GB memory for Nucmer4, and more than 2 days for Mauve, LASTZ default, and LASTZ match, as represented in Table 18. However, the HELIOS optical architecture performs this alignment in 190.5 seconds with 1 MB memory usage. On the other hand, it takes more than 4 days for Moiré Technique and HAWPOD, and 25.40 minutes for OptCAM, with 1MB memory requirements for all of them, assuming the same modulators.

thumbnail
Table 18. Processing time and memory requirement of genome-to-genome alignment for the HELIOS optical architecture, compared to Nucmer4, Mauve, LASTZ default, LASTZ match, OptCAM, HAWPOD, and Moiré Techniques.

For these comparison scenarios, all reported timings for Nucmer4, Mauve, LASTZ default, and LASTZ match are measured on a dual-CPU, 32-core AMD Opteron 6276 computer with 256 GB of DDR3 PC3–12800 RAM, using 32 parallel threads. On the other hand, for the analytical estimation of the processing times of the HELIOS optical architecture, OptCAM, HAWPOD, and Moiré Technique, a typical graphene-based modulator is considered with an aperture size of 1024 × 1024 pixels and a 100 MHz switching rate. Moreover, the processing times of the HELIOS optical architecture are reported for more recently developed modulators with various aperture sizes, such as 1920 × 1080 and 4096 × 4096 pixels, as well as various switching rates, including 35 GHz [72] and 4.5 THz [67]. It should be noted that the processing times reported for Nucmer4, Mauve, LASTZ default, and LASTZ match include Wall time and CPU time, while the HELIOS method is implemented within its optical architecture, and hence, the corresponding wall time is assumed zero.

https://doi.org/10.1371/journal.pcbi.1010665.t018

Secondly. for aligning two Arabidopsis species, the Arabidopsis lyrata assembly 1.0 [71] (207 Mb) is aligned to the Arabidopsis thaliana Col-0 reference genome [62] (120 Mb), as represented in Table 17. This process takes 25.7 minutes with 4.6 GB, 79.6 minutes with 3.3GB, 2135 minutes with 1.3 GB, 132 minutes with 0.6 GB for Nucmer4, Mauve, LASTZ default, and LASTZ match, respectively, as represented in Table 18. While it takes 63.17 minutes, 15.79 minutes, and 3.70 seconds as the processing time and 1 MB memory usage for Moiré Technique, HAWPOD, and OptCAM, respectively. As depicted in these comparisons, it can be clarified how the employed coding assumption can enhance the performance of an optical system, regardless of the high-speed processing and inherent parallelism of optics. Hence, relying on its compact coding scheme, the HELIOS optical architecture outperforms other optical alternatives, specifically in terms of the processing time. Finally, it is performed in 0.4627 seconds with 1MB memory usage by the HELIOS optical architecture.

Finally, aligning two assemblies of a microscopic animal, the tardigrade (Hypsibius dujardini), represented in two different studies [73, 74] is considered. In this manner, the assembly of Hd-Boothby [73] (212 Mb) is aligned with the assembly of Hd-Blaxter [74] (135 Mb). As represented in Table 18, it takes 30 minutes with 4.9 GB, 541 minutes with 4.0 GB, 153 minutes with 0.4 GB, and more than two days for Nucmer4, Mauve, LASTZ match, and LASTZ default, respectively. Moreover, it takes 72.78 minutes, 18.19 minutes, and 4.2647 seconds with 1 MB memory usage for Moiré Technique, HAWPOD, and OptCAM, respectively. Finally, the HELIOS optical architecture aligns them in 0.5331 seconds with 1 MB Memory usage.

It is noteworthy that the processing times reported for Nucmer4, Mauve, LASTZ default, and LASTZ match include Wall time and CPU time. While the HELIOS optical architecture performs the HELIOS method by light beam propagation through photonics components, and hence, it results in zero wall time. Taking advantages of the operational parallelism of optics, the processing time of the HELIOS optical architecture can be considerably reduced by employing larger and faster modulators, such as 1920 × 1080 and 4096 × 4096 pixels for their aperture size, as well as 35 GHz [72] and 4.5 THz [67] for their switching rate, as reported in Table 18.

B) For the second comparison scenario, the processing time of the HELIOS optical architecture is estimated and compared to that of Nucmer4 [14], Bowtie2 [46], BWA-MEM [45], and BLASR [44], as reported in Table 19. For this comparative study, we adopt PacBio SMRT and Illumina reads from Aradopsis thaliana Ler-0 [68] for aligning to the Arabidopsis thaliana Col-0 reference genomes [62]. Moreover, a subset of Illumina and PacBio reads from the publicly available Ashkenazi dataset, which is available from the Genome in a Bottle project [69], NCBI SRA accession SRX847862, is aligned to the human genome reference GRCh38.p7 [63]. Detailed information about these datasets is shown in Table 17 [14]. Finally, it should be noted that in these time comparisons, we reported the runtimes of the building alignment index and aligning process by Nucmer4, Bowtie2, BWA-MEM, and BLASR.

thumbnail
Table 19. Processing time and memory usage to align PacBio and Illumina reads to the Arabidopsis and Human reference genomes by HELIOS optical architecture, compared to BLASR, BWA-MEM, Bowtie2, Nucmer4, Moiré Technique, HAWPOD, and OptCAM.

In this regard, the analytical estimation of the processing times of the HELIOS optical architecture, OptCAM, HAWPOD, and Moiré Technique, are reported by employing a typical graphene-based modulator with an aperture size of 1024 × 1024 pixels and a 100 MHz switching rate. In versus, the processing times of Nucmer4, Bowtie2, BWA-MEM, and BLASR are measured on a dual-CPU, 32-core AMD Opteron 6276 computer with 256 GB of DDR3 PC3–12800 RAM, using 32 parallel threads, as reported in [14]. The report includes the times to build the genome index and to align the sequences. Moreover, the processing times of HELIOS optical architecture are also reported for more recently developed modulators with various aperture sizes, such as 1920 × 1080 and 4096 × 4096 pixels, as well as various switching rates, including 35 GHz [72] and 4.5 THz [67].

https://doi.org/10.1371/journal.pcbi.1010665.t019

In order to align 481,000 PacBio reads (2748 Mbp) to the Arabidopsis reference genome (120 Mbp), reported in Table 17, the HELIOS optical architecture needs 6.1423 seconds with 1 MB memory, compared to 95 minutes, 49 minutes, and 24 minutes for BLASR, BWA-MEM, and Nucmer4, respectively, as reported in Table 19. On the other hand, the required times to perform this alignment are 838.5 minutes, 209 minutes, 49.1 seconds for Moiré Technique, HAWPOD, and OptCAM, respectively. Finally, the HELIOS optical architecture only needs 1MB memory usage, similar to Moiré Technique, HAWPOD, and OptCAM; while BLASR, BWA-MEM, and Nucmer4 require 4065 MB, 2162 MB, and 5743 MB memory, respectively.

Since the Arabidopsis genome is a small reference genome (120 Mbp), we consider aligning 3.9M PacBio reads (30.5 Gbp) with the Human reference genome (3.09 Gbp) as well, as shown in Table 17. As represented in Table 19, the HELIOS optical architecture achieves 29.25 minutes as the processing time, compared to 1720 minutes, 1569 minutes, 886 minutes, 239680 minutes, 59919 minutes, 234.06 minutes for BLASR, BWA-MEM, Nucmer4, Moiré Technique, HAWPOD, and OptCAM, respectively. Moreover, BLASR, BWA-MEM, and Nucmer4 require 77.3 GB, 12.2 GB, and 95.2 GB memory, respectively, against 1MB memory usage of Moiré Technique, HAWPOD, OptCAM, and HELIOS optical architecture.

As a similar comparative study, the processing times and memory requirement are calculated for the Illumina reads with Arabidopsis and Human reference genomes. For aligning 23 million Illumina reads (6919 Mbp) with Arabidopsis reference genome (120 Mbp), the HELIOS optical architecture requires 15.46 seconds, compared to 30 minutes, 24 minutes, 29 minutes, 2111.5 minutes, 527.87 minutes, and 123.72 seconds for BWA-MEM, Bowtie2, Nucmer4, Moiré Technique, HAWPOD, and OptCAM, respectively. Moreover, BWA-MEM, Bowtie2, and Nucmer4 require 3360 MB, 686 MB, and 1283 MB of memory, respectively, against 1MB for the mentioned optical approaches. On the other hand, to align 264 million Illumina reads (39.1 Gbp) with the Human reference genome (3.09 Gbp), the HELIOS requires 37.50 minutes, compared to 293 minutes, 214 minutes, 182 minutes, 307260 minutes, 76815 minutes, and 300 minutes for BWA-MEM, Bowtie2, Nucmer4, Moiré Technique, HAWPOD, and OptCAM, respectively. Moreover, BWA-MEM, Bowtie2, and Nucmer4 require 15.7 GB, 22.6 GB, and 90.6 GB of memory, respectively, against 1MB for the mentioned optical approaches as reported in Table 19.

C) For the third comparison scenario, the processing time of the HELIOS optical architecture is analytically estimated and compared with that of GPU implementation of Smith-waterman. In this manner, various lengths of query sequences are aligned to the SWISS-PROT database [64], which contains 392,768 sequences and a total of 141,218,456 characters. In this regard, Table 20 reports the processing time of Smith-waterman, executed on NVIDIA TESLA K40 GPU with 44.3 giga cell updates per second (GCUPS), and on GTX 275 with 21 GCUPS [65]. Moreover, this table reports the analytical estimation of the processing time of the HELIOS optical architecture, assuming various switching rates and aperture sizes of the modulators As an instance, the HELIOS optical architecture aligns the query P27895 with the length of 1000 characters to SWISS-PROT database, in 1.4914 microseconds, utilizing a 1024 × 1024-pixel and 100MHz modulator; while it requires 4.54 seconds and 8.6 seconds to be executed on NVIDIA TESLA K40 GPU and GTX 275, respectively. As presented in this table, the HELIOS optical architecture is faster than GPU-based implementation of Smith-waterman by approximately 3.04 million and 5.76 million times, considering implementation on NVIDIA TESLA K40 GPU and GTX 275, respectively.

thumbnail
Table 20. Processing time of the HELIOS optical architecture, compared to Smith-waterman, assuming the SWISS-PROT database [64].

In this manner, various lengths of query sequences, from 144 to 5478 bases are aligned to the SWISS-PROT database [64], which contains 392,768 sequences and a total of 141,218,456 characters. In this regard, the processing times of Smith-waterman are reported by executing on NVIDIA TESLA K40 GPU with 44.3 giga cell updates per second (GCUPS) and GTX 275 with 21 GCUPS [65]. On the other hand, the processing times of HELIOS optical architecture are reported for a modulator with 1024 × 1024 pixels aperture size and 100 MHz switching rate [66] as a default, as well as for more recently developed modulators with various aperture sizes, such as 1920 × 1080 and 4096 × 4096 pixels, as well as various switching rates, including 35 GHz [72] and 4.5 THz [67].

https://doi.org/10.1371/journal.pcbi.1010665.t020

As verified by all the comparative studies, represented in Tables 1820 the HELIOS method outperforms all alternative electrical algorithms in the processing time and memory requirement for all of the implementation scenarios, as well as, for various lengths of query sequences. This supremacy relies on the highly sophisticated HELIOS method and its optical architecture, which takes advantages of high-speed processing and operational parallelism in optics. Furthermore, the employed compact coding assumption highly enhances the performance of the HELIOS optical architecture, compared to other optical alternatives, specifically in terms of processing time.

A quick review of optical implementation challenges

While this research mainly focuses on the design of the HELIOS method and its architecture, we would like to provide a brief review of the common challenges associated with the successful integration of an optical system into a product. In this manner five challenges are considered as follows: a) alignment tolerance, b) imprecision in optical components, c) speckle noises, d) thermal tolerance, and e) dust, humidity, and contaminants. These challenges mostly belong to mechanical engineering and the physical sciences of light waves and materials. Addressing each challenge in detail requires standalone research that discusses various solutions and probable arising issues. While we provide a brief review of each challenge, researchers and specialists in those areas of knowledge can expertly solve them. In this manner, various considerable research studies have discussed the aforementioned challenges [7584] by employing a wide range of knowledge, including optics, physics, mechanics, electrics, and software programming.

Alignment tolerances.

In optical systems, while there have been a lot of investments in a near-perfect design and fabrication of optical components, the lack of perfect alignment of the component causes performance issues. By a perfect alignment, we mean that the optical elements should be placed exactly where the theoretical optical and mechanical design specifies. Any small deviations from the designed shape of optical components cause a big difference in optical systems. As components might be slightly misaligned in the optical systems, the quality of light transmission is directly impacted, and consequently, the accuracy of the output decays [85]. Advancements in optical technologies have reduced the alignment tolerances to the nanometer levels [75, 76]. Therefore, even a few micrometers of misalignment can result in the loss of final output. As out-of-tolerance components do not cause complete functional failure, the progressive degradation of optical systems makes determinations of failure points more difficult. As an instance, several types of deviations in lens positions that affect the performance of the optical system include spacing, decentering, and tilt, which are defined as follows. a) Spacing, the space between optical elements, b) Tilt, the angle of the lens’s optical axis with respect to the axis of the device system, and c) decentering, misalignment of the optical axis with respect to the other devices.

As a key feature, the HELIOS method and its optical architecture are designed to be as simple as possible to reduce the probable challenges in the implementation phase. Notably, this simplicity comes with producing precise sequence alignment, compared to other well-known counterparts. As described in the Section “Optical architecture”, the functional part of the optical architecture which implements the HELIOS method (subsection “Optical modulation and mechanism unit”), only consists of eight components, including two lenses, five filters, and a chiral medium which is simple in comparison to its optical counterparts (for example 22 components in HAWPOD). Meanwhile, in the other parts of the architecture, the optical beam provision unit provides a clean optical beam to feed the system, and the output capturing unit captures the results respectively (which existed in all optical systems). Additionally, the HELIOS optical architecture prevents rotating and reflecting the beam to avoid the alignment complexities that they cause (HAWPOD and Moiré Technique do rotation and reflection). It places all components in a straight line and makes the structure simple. Therefore, passing optical beams through a straight path highly increases the alignment tolerance of the system.

Fortunately, photonic device manufacturers have been developing new alignment techniques with a wide range of mechanical, electrical, and software systems to manipulate alignment tolerance between optical components. As a result, an automated alignment process by implementing the optimal positioning system architecture is serial kinematics [86]. Serial kinematics uses a single actuator to position an optical component in one direction in space. By moving each axis independently from the others and only when needed, the serial kinematics: a) aligns optical components with less motion-induced error, b) eliminates additional joints, c) is capable of individual motions, or step sizes, of less than 10 nanometers, d) provides travel for each component to hundreds of millimeters, e) increases design modularity, f) simplifies programming, g) increases the flexibility of the implemented system, and h) finds home references for each axis very simple.

It should be noted in the employed filters, the size of each pixel is far more than 10 nm, which makes the alignment error negligible. For instance, it is 20 × 20 μm for 4.5 THz modulator [67], 450 × 250 nanometers for 100MHz modulator [66], 4.24 × 4.24 millimeters for 35GHz modulator [72]. By employing the serial kinematics and benefiting from the simple straight design of the HELIOS optical architecture, the alignment tolerance of the proposed system is addressed with high precision (less than 10 nanometers), without complicated challenges.

Imprecision in the optical components.

All optical components built with advanced technology require a high degree of precision to meet their expected properties. Sometimes optical components are required to operate reliably under increasingly tough conditions. While a near-perfect design and fabrication of optical components have been achieved in recent years, any tiny imprecision in the optical component causes failure in their properties and functions. Manufacturing high-precision optical designs at the micron-level tolerances require seamless integration between their fundamental materials of glass or plastic (for the optical design) and metal (for the optomechanics). In recent decades by increasing the availability of low-price and high-power lasers, innovative industries provide more efficient materials processing, smaller components, increasingly detailed inspection, and greater accuracy [77, 78]. As an example, in our case study, for the resolution of an imaging lens, the imaging lens with less than one arcminute decenter can achieve a resolution of around 128 lp/mm at 20% contrast. However, when 2 of the elements were relaxed to allow less than six arcminute decenter, the imaging lens could only achieve a resolution of around 86 lp/mm at 20% contrast.

Speckle noises.

Speckle is a granular interference that appears in images and diffraction patterns, produced by objects that are rough on the scale of an optical wavelength [87]. It is caused by multiple forward and backward scattering of light waves. It is a ubiquitous phenomenon that inherently exists in laser-based optical systems and directly degrades the quality of the optical system. Speckle noise in the optical systems impairs both the visual quality of the output and the performance of automatic analyses. The presence of speckle noise often obscures subtle but important details, and thus, is detrimental to high-resolution imaging systems. It also affects the performance of automatic analysis methods intended for objective and accurate quantifications. Although the resolution, speed, and depth of optical imaging systems have been greatly enhanced recently [79, 80], their intrinsic problem (i.e. speckle noise) has not been completely solved.

As the first attempt to reduce the speckle noise in HELIOS optical architecture, the optical beam provision unit employs a wideband unpolarized laser source [37], a laser line bandpass filter [38], and a pinhole, as depicted in Fig 6A. While the wideband laser generates an intense coherent monochromatic light beam in a wide spectral range, the laser line bandpass filter transmits laser light by suppressing ambient light as well as lower intensity secondary laser lines. It improves contrast by only transmitting light within a specific wavelength range (e.g. 450 to 650 nanometers). Afterward, the pinhole, placed at the focal point of the lenses, spatially filters the beams to reduce the high pulse energy density of the beam and to prevent arcing the air. Removing ambient beams reduces occurring speckle noises in the rest of the architecture. As well as, a spatial filter as an optical thresholder is placed at the end of the optical modulation and mechanism unit. It eliminates wavelength cross-talks and speckle noises of the output before capturing, and increases the quality of the results.

As an additional approach, in the case of speckle noises, as rough surfaces act differently with different wavelengths, we can use various sets of wavelengths (with slightly different offsets and steps, explained in Section “Coding procedure”). In this manner, the HELIOS optical architecture once performs the sequence alignment with the employed example of coding schema, shown in Figs 2 and 7. Afterward, a new set of wavelengths and polarizations are chosen (with different offsets and steps), and then the alignment process is performed again. By doing the sequence alignment procedure at least three times and then comparing their outputs, the probable occurred noises, due to cross-talk, speckle, etc, can be determined and removed. It is worth noting that the three-time execution of the sequence alignment multiplies the processing times by three and causes performance overhead. However, this overhead is negligible with the considerably low processing time of HELIOS optical architecture, represented in Tables 1820. Additionally, more pixels of the modulators can be adopted to modulate each code, for example, 2 × 2 pixels instead of one pixel. It makes the results more resistant to various kinds of noises, with negligible overhead performance.

Thermal tolerance.

Although an on-desk optical architecture is placed on a desk with a temperature that remains between 20 and 25°C for its entire useful life, any considerable change in the temperature causes materials to expand and contract, which results in damage to optical devices [81, 82]. The thermal expansion can: a) change the optical distances, b) shift the lens spacing, and c) change the focal lengths of lenses as the lens glass expands or contracts. So, temperature changes may cause an optic to lose focus. More importantly, as commonly used metal housings and optical glasses are rigid materials, stresses generated by a small change in component dimensions can be very high. These thermally induced stresses can lead to the failure of the lens or housing, sheering the bond lines and falling out of bonded lenses, and frustrating closely fit lenses. Additionally, when the temperature changes rapidly, some components or certain areas of a component experience temperature variation faster than others. It can lead to thermally induced malformation of components. Even if thermally induced stresses are low enough not to cause damage, they can cause stress-induced birefringence in lenses, tending to blur randomly polarized light.

Fortunately, the cracking lenses and sheering bonded interfaces can be avoided entirely by using lens mounts. Basically, lens mounts handle: a) the differences in thermal expansion and contraction between lenses and housing, and b) enough clearance between components to expand or contract as required. It is worth noting that all optical structures also include a temperature managing system to keep the structure at the same temperature. Moreover, material selection is another way to avoid problems with thermal expansion. For instance, optical glasses and metals can have similar coefficients of thermal expansion.

Each component employed in HELIOS architecture is mounted in its customized mount by the manufacturer. As depicted in Fig 6, the employed lenses in HELIOS optical architecture are made of glass and mounted in metal mounts, they work accurately in the temperature range of -20 to 60 °C. On the other hand, the employed modulators with 4.5 THz [67] offer speed stability in a wide range of temperature variations, i.e. 25 − 145 °C. So, it could be concluded that the HELIOS optical architecture (with the employed components) works precisely in the temperature range of 25 − 60 °C.

Dust, humidity, contaminants.

Despite many electrical and mechanical systems, any tiny condensation, dirt, or dust rapidly degrades the performance of the optical systems [88]. Dust and dirt reduce the clarity of the light, while condensation can completely blind the light. While the outer surfaces of the outermost component are easily cleaned, the surfaces inside an optical system are mostly impossible to clean. Sadly, an optical system can be contaminated by itself from the inside. Furthermore, some materials, such as adhesives, rubbers, and plastics, can outgas. The outgassed materials can permanently condense on optical surfaces and blur the light.

By sealing the optics, most of the contaminants like dust and dirt can be managed [83, 84]. Elastomeric seals retain adequate compression under all circumstances, and at all positions. Dispensed sealants and adhesives undergo careful process development to ensure every assembly is adequately sealed. Additionally, most optical components are made waterproof, similar to all other waterproof devices. System waterproofing also makes it air-tight. Moreover, to prevent internal condensation, the optical assembly is built in a dry environment, like a cold space or a de-humidified space, and it greatly reduces the likelihood of condensation appearing on the inside optical surfaces. To prevent outgassing materials from condensing on the inside surfaces, working temperature limits of the component and side-effects of working out of those temperatures should be examined, which is directly related to its thermal tolerance. In application with high-power lasers, the components reach fairly high service temperatures. In these applications, the intolerant materials including plastics, adhesives, and rubbers should be eliminated from the design completely.

By assembling the structure in a dry environment, sealing it, and making it waterproof, the HELIOS optical architecture avoids most of the contaminates and internal condensation. Additionally, while the HELIOS optical architecture uses moderate laser power [37], the materials of employed components are made of glass and metal, which are resistant to the applied power of the laser. It prevents the whole structure from outgassing.

Target applications

While sequence alignment algorithms play a significant role in bioinformatics, so far in this paper, it has been discussed as the best candidate applications to benefit from the parallel processing capability of the HELIOS method. As discussed above, this method facilitates three major data alignment scenarios, including, a) pair-wise genome alignment, b) alignment of two large sets of sequences, and c) reads alignment with a reference genome. Besides, the HELIOS method can be adopted to speed up the comparative studies with comparable accuracy for various types of biological sequences. In this manner, some major examples of the HELIOS applications are listed as follow:

  • Identification of homologous, orthologous, and paralogous.
  • Comparing a gene and its products.
  • Function prediction and locating key features (e.g. catalytic domains, disulphide bridges).
  • Identifying shared domains, and duplicated regions.
  • Accelerating research studies on declaring genetic impairment and its detrimental effects (for example in genetic diseases).
  • Advancing research projects on declaring genetic improvement and its beneficial effects (for instance in crop improvement).
  • Speeding up the clinical tests to detect genetic diseases (such as Sickle-Cell).

Summarizing the above discussions, key advantages of the HELIOS method originate from its methodology, including high accuracy, simplicity, convenient output, no need to further processing, etc.; while its practical advantages arise from the HELIOS optical architecture, including high-speed processing, parallelism, optimal power consumption, no need for storage components, etc.

Conclusion and future perspective

In this paper, a novel all-optical high-throughput sequence alignment method is proposed for aligning DNA, RNA, and protein sequences. The proposed method and its optical architecture are named HELIOS method and HELIOS optical architecture, respectively. The HELIOS method consists of two procedures to perform an accurate sequence alignment: a) coding procedure, and b) alignment procedure. It exactly determines the state of every character within input sequences, i.e., character matches, mutations, and indels. Moreover, the HELIOS optical architecture employs both high-speed processing and operational parallelism in the optical domain, as well as avoids various problems of electrical systems. It is built upon two units: a) optical beam provision unit, and b) optical modulation and mechanism unit, adopting polarization and wavelength of the optical beams.

For evaluation, the functionality and accuracy of the HELIOS method and its optical architecture are approved through behavioral and optical simulation studies, respectively. Furthermore, the complexity and performance of the HELIOS method and its optical architecture are calculated by analytical estimations. The results of accuracy evaluation indicate that the HELIOS method achieves comparable homologous alignment between two sequences, in comparison with the well-known algorithms. As our simulation results confirm, the alignment outputs by the HELIOS method are highly similar to those of SW, BLAST, MUSCLE, ClustalW, ClustalΩ, Kalign, and MAFFT; while there are some differences with the outputs of NW and T-Coffee. Moreover, according to our performance evaluation, the employed compact coding scheme highly escalates the number of input characters, and hence, it offers reduced time and space complexities, compared to the electrical and optical alternatives. Specifically, relying on its highly sophisticated method and optical architecture, as well as high-speed processing and operational parallelism in optics, the HELIOS optical architecture outperforms all alternative electrical and optical algorithms in terms of processing time and memory requirement.

As future works, we plan to enrich the method and the corresponding optical architecture as follows. Regarding the HELIOS method, the detection of further biological variations beyond mutations and indels will be addressed, including duplication, inversion, and translocation. Regarding the optical architecture, the efficiency of the HELIOS method will be improved by increasing the resolution of coded patterns, employing the amplitude and phase modulations of the optical beams. The latter improvement can boost the parallel processing capability of the proposed architecture.

Supporting information

S1 Text. Including Tables A1-A96, assuming dataset 1: “Nine ND5 protein sequences dataset” [53].

https://doi.org/10.1371/journal.pcbi.1010665.s001

(PDF)

S2 Text. Including Tables A1-A96, assuming dataset 2: Nine beta globin protein sequences dataset [53].

https://doi.org/10.1371/journal.pcbi.1010665.s002

(PDF)

S3 Text. Including Tables A1-A96, assuming dataset 3: ND6 (NADH dehydrogenase subunit 6) proteins for eight different species [55].

https://doi.org/10.1371/journal.pcbi.1010665.s003

(PDF)

S4 Text. Including Tables A1-A96, assuming dataset 4: ENCODE Transcription Factor Targets Dataset, TFAP2A Gene Set [56].

https://doi.org/10.1371/journal.pcbi.1010665.s004

(PDF)

S5 Text. Including Tables A1-A96, assuming dataset 5: Hub Proteins Protein-Protein Interactions Dataset, AARR1 Gene Set [57].

https://doi.org/10.1371/journal.pcbi.1010665.s005

(PDF)

S6 Text. Including Tables A1-A96, assuming dataset 6: ChIP-X Enrichment Analysis Resource, CHEA Transcription Factor Targets Dataset, AP1M2 Gene set [58].

https://doi.org/10.1371/journal.pcbi.1010665.s006

(PDF)

S7 Text. Including Tables A1-A96, assuming dataset 7: ChIP-X Enrichment Analysis Resource, CHEA Transcription Factor Targets Dataset, AP1S2 Gene Set [58].

https://doi.org/10.1371/journal.pcbi.1010665.s007

(PDF)

S8 Text. Including Tables A1-A96, assuming dataset 8: Encyclopedia of DNA Elements Resource, ENCODE Transcription Factor Targets Dataset, SP1 Gene Set [59].

https://doi.org/10.1371/journal.pcbi.1010665.s008

(PDF)

S9 Text. Including Tables A1-A96, assuming dataset 9: Hub Proteins Resource, Protein-Protein Interactions Dataset, PTPN6 Gene Set [57].

https://doi.org/10.1371/journal.pcbi.1010665.s009

(PDF)

S10 Text. Including Tables A1-A96, assuming dataset 10: Kinase Enrichment Analysis Resource, KEA Substrates of Kinases Dataset, ULK Gene Set [61].

https://doi.org/10.1371/journal.pcbi.1010665.s010

(PDF)

S11 Text. Including Tables A1-A96, assuming dataset 11: Jaspar PWMs Resource, JASPAR Predicted Transcription Factor Targets Dataset, MAX Gene Set [60].

https://doi.org/10.1371/journal.pcbi.1010665.s011

(PDF)

S12 Text. Including Tables A1-A96, assuming dataset 12: Kinase Enrichment Analysis Resource, KEA Substrates of Kinases Dataset, YES1 Gene Set [61].

https://doi.org/10.1371/journal.pcbi.1010665.s012

(PDF)

References

  1. 1. Lesk A. Introduction to bioinformatics1. Oxford university press; 2019.
  2. 2. Haque W, Aravind A, Reddy B. Pairwise sequence alignment algorithms: a survey. In: Proceedings of the 2009 conference on Information Science, Technology and Applications; 2009. p. 96–103.
  3. 3. Kulkarni S, Roy S. Clinical genomics. Academic Press; 2014.
  4. 4. Blake JD, Cohen FE. Pairwise sequence alignment below the twilight zone. Journal of molecular biology. 2001;307(2):721–735. pmid:11254392
  5. 5. Zou H, Tang S, Yu C, Fu H, Li Y, Tang W. asw: accelerating Smith–Waterman algorithm on coupled CPU–GPU architecture. International Journal of Parallel Programming. 2019;47(3):388–402.
  6. 6. Jararweh Y, Al-Ayyoub M, Fakirah M, Alawneh L, Gupta BB. Improving the performance of the needleman-wunsch algorithm using parallelization and vectorization techniques. Multimedia Tools and Applications. 2019;78(4):3961–3977.
  7. 7. Boratyn GM, Thierry-Mieg J, Thierry-Mieg D, Busby B, Madden TL. Magic-BLAST, an accurate RNA-seq aligner for long and short reads. BMC bioinformatics. 2019;20(1):1–19. pmid:31345161
  8. 8. Hung CL, Lin YS, Lin CY, Chung YC, Chung YF. CUDA ClustalW: An efficient parallel algorithm for progressive multiple sequence alignment on Multi-GPUs. Computational biology and chemistry. 2015;58:62–68. pmid:26052076
  9. 9. Sievers F, Higgins DG. Clustal Omega for making accurate alignments of many protein sequences. Protein Science. 2018;27(1):135–145. pmid:28884485
  10. 10. Notredame C, Higgins DG, Heringa J. T-coffee: a novel method for fast and accurate multiple sequence alignment11Edited by J. Thornton. Journal of Molecular Biology. 2000;302(1):205–217. pmid:10964570
  11. 11. Lassmann T. Kalign 3: multiple sequence alignment of large datasets; 2020.
  12. 12. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research. 2004;32(5):1792–1797. pmid:15034147
  13. 13. Rozewicki J, Li S, Amada KM, Standley DM, Katoh K. MAFFT-DASH: integrated protein sequence and structural alignment. Nucleic acids research. 2019;47(W1):W5–W10. pmid:31062021
  14. 14. Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. MUMmer4: A fast and versatile genome alignment system. PLOS Computational Biology. 2018;14(1):1–14. pmid:29373581
  15. 15. Eddy SR. A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation. PLOS Computational Biology. 2008;4(5):1–14. pmid:18516236
  16. 16. Chowdhury B, Garai G. A review on multiple sequence alignment from the perspective of genetic algorithm. Genomics. 2017;109(5):419–431. pmid:28669847
  17. 17. Cathey JJ. Theory and problems of electronic devices and circuits; 2002.
  18. 18. Pal S, Mondal S, Das G, Khatua S, Ghosh Z. Big data in biology: The hope and present-day challenges in it. Gene Reports. 2020; p. 100869.
  19. 19. Díaz D, Esteban FJ, Hernández P, Caballero JA, Dorado G, Gálvez S. Parallelizing and optimizing a bioinformatics pairwise sequence alignment algorithm for many-core architecture. Parallel Computing. 2011;37(4-5):244–259.
  20. 20. Chatterjee K, Joshi S. An Overview on High Performance Issues of Parallel Architectures. Internet Technologies and Application Research. 2013;V.1:11–17.
  21. 21. Baichoo S, Ouzounis CA. Computational complexity of algorithms for sequence comparison, short-read assembly and genome alignment. Biosystems. 2017;156:72–85. pmid:28392341
  22. 22. Zhang Y, Chan JWT, Chin FYL, Ting HF, Ye D, Zhang F, et al. On the Complexity of Constrained Sequences Alignment Problems. In: 8th International Frontiers of Algorithmics Workshop, FAW 2014; 2014. p. 309–319.
  23. 23. Saleh BE, Teich MC. Fundamentals of photonics. john Wiley & sons; 2019.
  24. 24. Javidi B, Horner JL. Real-time optical information processing. Academic Press; 2012.
  25. 25. Keiser G. Biophotonics. Springer; 2016.
  26. 26. Curilem Saldías M, Villarroel Sassarini F, Muñoz Poblete C, Vargas Vásquez A, Maureira Butler I. Image correlation method for DNA sequence alignment. PloS one. 2012;7(6):e39221. pmid:22761742
  27. 27. Notredame C. Recent Evolutions of Multiple Sequence Alignment Algorithms. PLOS Computational Biology. 2007;3(8):1–4. pmid:17784778
  28. 28. Wen Z, He J, Huang SY. Topology-independent and global protein structure alignment through an FFT-based algorithm. Bioinformatics. 2020;36(2):478–486. pmid:31384919
  29. 29. Ping P, Zhu X, Wang L. Similarities/dissimilarities analysis of protein sequences based on PCA-FFT. Journal of biological systems. 2017;25(01):29–45.
  30. 30. Maleki E, Babashah H, Koohi S, Kavehvash Z. All-optical DNA variant discovery utilizing extended DV-curve-based wavelength modulation. J Opt Soc Am A. 2018;35(11):1929–1940. pmid:30461853
  31. 31. Tanida J, Nitta K, Yahata A. Spatially coded moire matching technique for genome information visualization. In: Optical Information Processing Technology. vol. 4929. International Society for Optics and Photonics; 2002. p. 26–33. https://doi.org/10.1117/12.483210
  32. 32. Niita K, Togo H, Yahata A, Tanida J. Genome information analysis using spatial coded moiré technique. In: Technical Digest. CLEO/Pacific Rim 2001. 4th Pacific Rim Conference on Lasers and Electro-Optics (Cat. No. 01TH8557). vol. 2. IEEE; 2001. p. II–II.
  33. 33. Maleki E, Babashah H, Koohi S, Kavehvash Z. High-speed all-optical DNA local sequence alignment based on a three-dimensional artificial neural network. J Opt Soc Am A. 2017;34(7):1173–1186. pmid:29036127
  34. 34. Maleki E, Koohi S, Kavehvash Z, Mashaghi A. OptCAM: An ultra-fast all-optical architecture for DNA variant discovery. Journal of Biophotonics. 2020;13(1):e201900227. pmid:31397961
  35. 35. Akbari Rokn Abadi S, Hashemi Dijujin N, Koohi S. Optical pattern generator for efficient bio-data encoding in a photonic sequence comparison architecture. PLOS ONE. 2021;16(1):1–27.
  36. 36. Brodzik AK. Phase-only filtering for the masses (of DNA data): A new approach to sequence alignment. IEEE transactions on signal processing. 2006;54(6):2456–2466.
  37. 37. Silfvast WT. Laser fundamentals. Cambridge university press; 2004.
  38. 38. Niraula M, Yoon JW, Magnusson R. Single-layer optical bandpass filter technology. Opt Lett. 2015;40(21):5062–5065. pmid:26512519
  39. 39. Sze JR, Wei AC. Compact beam expander based on planar structure to avoid inner focus. Optical Review. 2016;23(5):842–847.
  40. 40. Abuleil M, Abdulhalim I. Narrowband multispectral liquid crystal tunable filter. Opt Lett. 2016;41(9):1957–1960. pmid:27128048
  41. 41. Ram BSB, Senthilkumaran P, Sharma A. Polarization-based spatial filtering for directional and nondirectional edge enhancement using an S-waveplate. Appl Opt. 2017;56(11):3171–3178.
  42. 42. Hess AJ, Poy G, Tai JSB, Žumer S, Smalyukh II. Control of light by topological solitons in soft chiral birefringent media. Physical Review X. 2020;10(3):031042.
  43. 43. Feng Y, Zhang HF, Wang J, Xu YL, Chen JT, Yang DX, et al. Design of a Nonvacuum-Cooling Compact CCD Camera for Scientific Detection. IEEE Transactions on Nuclear Science. 2019;66(10):2286–2292.
  44. 44. Chaisson MJ, Tesler G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC bioinformatics. 2012;13(1):1–18. pmid:22988817
  45. 45. Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. bioinformatics. 2009;25(14):1754–1760. pmid:19451168
  46. 46. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature methods. 2012;9(4):357–359. pmid:22388286
  47. 47. Darling AC, Mau B, Blattner FR, Perna NT. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome research. 2004;14(7):1394–1403. pmid:15231754
  48. 48. Harris RS. Improved pairwise alignment of genomic DNA. The Pennsylvania State University; 2007.
  49. 49. Lai CC, Shih TP, Ko WC, Tang HJ, Hsueh PR. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and coronavirus disease-2019 (COVID-19): The epidemic and the challenges. International journal of antimicrobial agents. 2020;55(3):105924. pmid:32081636
  50. 50. Anzagira L, Fossum ER. Color filter array patterns for small-pixel image sensors with substantial cross talk. J Opt Soc Am A. 2015;32(1):28–34. pmid:26366487
  51. 51. Moss DS, Jelaska S, Pongor S. Essays in bioinformatics. vol. 368. IOS Press; 2005.
  52. 52. Hamada M, Kiryu H, Iwasaki W, Asai K. Generalized Centroid Estimators in Bioinformatics. PloS one. 2011;6:e16450. pmid:21365017
  53. 53. Abo-Elkhier MM, Abd Elwahaab MA, Abo El Maaty MI. Measuring Similarity among Protein Sequences Using a New Descriptor. BioMed research international. 2019;2019. pmid:31886192
  54. 54. Mount DW. Using BLOSUM in Sequence Alignments. Cold Spring Harbor Protocols. 2008;2008(6):pdb.top39. pmid:21356855
  55. 55. Xie Xl, Zheng Lf, Yu Y, Liang Lp, Guo Mc, Song J, et al. Protein sequence analysis based on hydropathy profile of amino acids. Journal of Zhejiang University Science B. 2012;13(2):152–158.
  56. 56. Consortium EP, et al. A user’s guide to the encyclopedia of DNA elements (ENCODE). PLoS biol. 2011;9(4):e1001046.
  57. 57. Chen EY, Tan CM, Kou Y, Duan Q, Wang Z, Meirelles GV, et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC bioinformatics. 2013;14(1):1–14. pmid:23586463
  58. 58. Lachmann A, Xu H, Krishnan J, Berger SI, Mazloom AR, Ma’ayan A. ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments. Bioinformatics. 2010;26(19):2438–2444. pmid:20709693
  59. 59. Consortium EP, et al. The ENCODE (ENCyclopedia of DNA elements) project. Science. 2004;306(5696):636–640.
  60. 60. Mathelier A, Zhao X, Zhang AW, Parcy F, Worsley-Hunt R, Arenillas DJ, et al. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic acids research. 2014;42(D1):D142–D147. pmid:24194598
  61. 61. Lachmann A, Ma’ayan A. KEA: kinase enrichment analysis. Bioinformatics. 2009;25(5):684–686. pmid:19176546
  62. 62. Kaul S, Koo HL, Jenkins J, Rizzo M, Rooney T, Tallon LJ, et al. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. nature. 2000;408(6814):796–815.
  63. 63. Schneider VA, Graves-Lindsay T, Howe K, Bouk N, Chen HC, Kitts PA, et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome research. 2017;27(5):849–864. pmid:28396521
  64. 64. Bairoch A, Apweiler R. The SWISS-PROT Protein Sequence Data Bank and Its New Supplement TREMBL. Nucleic Acids Research. 1996;24(1):21–25. pmid:8594581
  65. 65. Yahya M, Hasan L, Ali SA. High-throughput Protein Sequence Alignment on Multi-core Systems. International Journal of Integrated Engineering. 2020;12(7):62–71.
  66. 66. Soref RA, De Leonardis F, Passaro VM. Tunable optical-microwave filters optimized for 100 MHz resolution. Optics express. 2018;26(14):18399–18411. pmid:30114020
  67. 67. Shrekenhamer D, Montoya J, Krishna S, Padilla WJ. Four-color Metamaterial absorber THz spatial light modulator. Advanced Optical Materials. 2013;1(12):905–909.
  68. 68. Lee H, Gurtowski J, Yoo S, Marcus S, McCombie WR, Schatz M. Error correction and assembly complexity of single molecule sequencing reads. BioRxiv. 2014; p. 006395.
  69. 69. Zook JM, Catoe D, McDaniel J, Vang L, Spies N, Sidow A, et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Scientific data. 2016;3(1):1–26. pmid:27271295
  70. 70. Mikkelsen T, Hillier L, Eichler E, Zody M, Jaffe D, Yang SP, et al. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 2005;437(7055):69–87.
  71. 71. Hu TT, Pattyn P, Bakker EG, Cao J, Cheng JF, Clark RM, et al. The Arabidopsis lyrata genome sequence and the basis of rapid genome size change. Nature genetics. 2011;43(5):476–481. pmid:21478890
  72. 72. Dalir H, Xia Y, Wang Y, Zhang X. Athermal broadband graphene optical modulator with 35 GHz speed. ACS photonics. 2016; 3(9):1564–1568.
  73. 73. Boothby TC, Tenlen JR, Smith FW, Wang JR, Patanella KA, Nishimura EO, et al. Evidence for extensive horizontal gene transfer from the draft genome of a tardigrade. Proceedings of the National Academy of Sciences. 2015; 112(52):15976–15981. pmid:26598659
  74. 74. Koutsovoulos G, Kumar S, Laetsch DR, Stevens L, Daub J, Conlon C, et al. No evidence for extensive horizontal gene transfer in the genome of the tardigrade Hypsibius dujardini. Proceedings of the National Academy of Sciences. 2016; 113(18):5053–5058. pmid:27035985
  75. 75. Boudreau RA, Boudreau SM. Passive micro-optical alignment methods. CRC Press; 2018.
  76. 76. Hinrichs KM, Piotrowski JJ. Neural networks for faster optical alignment. Optical Engineering. 2020;59(7):074107.
  77. 77. Zolfaghari A. Fabrication of Precise Optical Components Using Electroforming Process and Precision Molding. The Ohio State University. 2021.
  78. 78. Mimura H, Takei Y, Kume T, Takeo Y, Motoyama H, Egawa S, et al. Fabrication of a precise ellipsoidal mirror for soft X-ray nanofocusing. Review of Scientific Instruments. 2018;89(9):093104. pmid:30278763
  79. 79. Mohan E, Rajesh A, Sunitha G, Konduru RM, Avanija J, Ganesh Babu L. A deep neural network learning-based speckle noise removal technique for enhancing the quality of synthetic-aperture radar images. Concurrency and Computation: Practice and Experience. 2021;33(13):e6239.
  80. 80. Jeon W, Jeong W, Son K, Yang H. Speckle noise reduction for digital holographic images using multi-scale convolutional neural networks. Optics letters. 2018;43(17):4240–4243. pmid:30160761
  81. 81. Shaaban KS, Alomairy S, Al-Buriahi M. Optical, thermal and radiation shielding properties of B2O3–NaF–PbO–BaO–La2O3 glasses. Journal of Materials Science: Materials in Electronics. 2021;32(21):26034–26048.
  82. 82. Nakamura F, Suda S, Kurosu T, Ibusuki Y, Noriki A, Tamai I, et al. Analyzing Thermal Tolerance of Mirror-based Optical Redistribution for Co-packaged Optics. In: CLEO: Science and Innovations. Optica Publishing Group; 2022. p. SF3O–8.
  83. 83. Watts B, Pilet N, Sarafimov B, Witte K, Raabe J. Controlling optics contamination at the PolLux STXM. Journal of Instrumentation. 2018;13(04):C04001.
  84. 84. Wang Z, Chen C, Liu Q, Lou Y, Suo Z. Extrusion, slide, and rupture of an elastomeric seal. Journal of the Mechanics and Physics of Solids. 2017;99:289–303.
  85. 85. Wang C, Xing S, Xu M, Shi H, Wu X, Fu Q, et al. The Influence of Optical Alignment Error on Compression Coding Superresolution Imaging. Sensors. 2022;22(7):2717. pmid:35408330
  86. 86. Xuan JQ, Xu SH, et al. Review on kinematics calibration technology of serial robots. International journal of precision engineering and manufacturing. 2014;15(8):1759–1774.
  87. 87. Goodman JW. Speckle phenomena in optics: theory and applications. Roberts and Company Publishers; 2007.
  88. 88. Brown A, Bernot D, Ogloza A, Olson K, Thomas J, Talghader J. Physical origin of early failure for contaminated optics. Scientific reports. 2019;9(1):1–9. pmid:30679675