A DNA algorithm for the job shop scheduling problem based on the Adleman-Lipton model

A DNA (DeoxyriboNucleic Acid) algorithm is proposed to solve the job shop scheduling problem. An encoding scheme for the problem is developed and DNA computing operations are proposed for the algorithm. After an initial solution is constructed, all possible solutions are generated. DNA computing operations are then used to find an optimal schedule. The DNA algorithm is proved to have an O(n2) complexity and the length of the final strand of the optimal schedule is within appropriate range. Experiment with 58 benchmark instances show that the proposed DNA algorithm outperforms other comparative heuristics.


Introduction
It is well known that the traditional silicon-based computers use serial algorithms, so that their computing speed cannot qualitatively leap. It is also well known that optimal solutions of most of the celebrated computationally intractable problems can only be found by an exhaustive search through all possible solutions. However, the insurmountable difficulty lies in the fact that such an exhaustive search is too vast to carry out using currently available computing technology, so that numerous intractable problems cannot be solved effectively. Some visionary remarks were made about new ways of solving such problems through possible miniaturizations. Feynman's view [1] was widely accepted, stating the possibility of establishing "submicroscopic" computers. Since then, although significant progresses have been made in relation to computer miniaturization, the goal of sub-microscopic computers has not yet been achieved.
As a new interdisciplinary area, DNA computing has received increasing attentions. Massive parallelism and huge storage capacity are two significant advantages of DNA computing. Parallelism means DNA computing can perform billions of operations simultaneously. Furthermore, DNA computers can solve more intractable problems, such as non-deterministic polynomial-time) (NP)-hard problems, in linear time, as compared with conventional electronic computers in exponential time. In addition, the high density of data stored in DNA strands and the ease in duplicating them can make such exhaustive searches possible. Adleman's experiment [2] solved the Hamiltonian Path Problem for a given directed graph, and demonstrated the strong parallel computing power of DNA computing. Lipton's DNA-based solution of the satisfiability problem [3]  on random natural selection and recombination, the optimization results obtained by classical evolutionary algorithm are still limited [24]. Also due to the stubborn nature of JSSP, a single meta-heuristic method can no longer solve this problem well [18]. In addition, these heuristic approaches do not traverse all possible solutions, and can only find relatively good solutions through operations such as crossover, mutation and iteration. Even if a heuristic finds the optimal solution, the heuristic itself cannot prove that the solution it found is the actual optimal solution. By contrast, DNA computing may be used to solve the JSSP. As long as appropriate encoding and manipulation are used, all possible solutions to the problem can be produced in one step. Deaton et al. [25] summarized three basic steps in using DNA computing to solve a problem: encoding, interaction and extraction. The first step is the basis of the other two steps, so that the key and the difficult part of DNA computing is to transform the problem into an equivalent DNA computing model by mapping.
Until now, there is not much reported research on solving JSSPs using DNA computing. Yin et al. [26] solved a FSSP using DNA computing by transforming the FSSP problem into a directed graph. Wang et al. [27] proposed a new parallel DNA algorithm to solve the task scheduling problem based on the Adleman-Lipton model, with an enlightening idea.
In this work, an appropriate encoding strategy is developed first to generate all possible solutions in parallel using DNA computing. The advantage of this encoding is that, once a scheduling sequence is determined, the makespan corresponding to each schedule is also determined. Theoretically efficient and parallel DNA algorithms based on Adleman-Lipton model are then proposed for solving the JSSP which can be solved with an O(n 2 ) complexity.
In the experiments, the DNA computing algorithms proposed in this work is simulated through two tool libraries of Python. Simulation experiments with 58 benchmark instances show that the proposed DNA algorithm outperforms other comparative heuristics.
The remainder of this paper is organized as follows. Section 2 describes the Adleman-Lipton model and describes the JSSP. Section 3 proposes a DNA algorithm for the JSSP and provides a performance analysis of the proposed DNA algorithm. Section 4 reports the experimental results of the proposed DNA algorithm and the comparison with several heuristic algorithms on 58 benchmark instances. Section 5 concludes this work with a summary and future research directions.

Preliminaries
This section is composed of two parts. The first part explains the Adleman-Lipton model, and the second part gives a formal description of the JSSP.

The Adleman-Lipton model
DNA is a polymer which is strung together from monomers called deoxyribonucleotides [28]. A single strand DNA molecule consists of a sequence of nucleotides with four different bases, i.e., adenine, guanine, cytosine and thymine, abbreviated as A, G, C and T, respectively. Every strand, according to its chemical structure, has a 5'-3' direction or a 3'-5' direction, with the 5'end matching the 3'-end. In the double strand molecule, the two single strands have opposite directions. Using the Watson-Crick complementarity, i.e., the A-T pairing and the G-C pairing, without other possible pairings, a double strand DNA molecule can be formed under appropriate conditions. For instance, the single strand 5'-ACGTTA-3' and its complement 3'-TGCAAT-5' can form a double strand, also referred to as a duplex. Assume the upper strand runs from left to right in the 5'-3' direction, and consequently the lower strand runs from left to right in the 3'-5' direction. The complement 3'-TGCAAT-5' of the strand 5'-ACGTTA-3' is denoted by ACGTTA À . The length of a single strand DNA molecule is the number of nucleotides in the molecule. Thus a single strand consisting of 12 nucleotides is said to be a 12 mer, i.e., a polymer consisting of 12 monomers.
The Adleman-Lipton model. A test tube is a set of molecules of DNA, i.e., a multi-set of finite strings over the alphabets {A, G, C, T}. The following operations can be performed: (1) Merge (N 1 ,N 2 ,. . .,N k ): given k test tubes N 1 ,N 2 ,. . .,N k , this operation pours the DNA solution in each of the test tubes N 2 ,. . .,N k into test tube N 1 . The uniform mixed solution is referred to as N 1 .
(3) Separation (N 1, X , N 2 ): given a test tube N 1 and a string X, this operation transfers all the single strands containing string X in test tube N 1 to test tube N 2 . The single DNA strands removed from test tube N 1 are no longer contained in test tube N 1 . If N 1 does not contain X, this operation does nothing.
(4) Selection (N 1, L , N 2 ): given a test tube N 1 and an integer L, this operation filters all DNA strands of length L in N 1 and put them into test tube N 2 . Consequently, N 1 no longer contains these filtered DNA strands.    (9) Cutting (N, ω 1 ω 2 ): given a test tube N and strings with ω 1 ω 2 , this operation divides every strand containing [ω 1 ω 2 ] in N to different strands as follows where ω 1 ω 2 corresponds to the recognition site of the cutting operation.   In actual biological experiments, the above operations are feasible and achievable. Take the Sort(N 1 , N 2 , N 3 ) operation as an example. In gel electrophoresis, the migration speed of DNA strands is related to its own length. The longer the strand, the slower the migration speed. Therefore, through gel electrophoresis experiments, the longest and shortest DNA strands in the test tube can be obtained. Since all operations mentioned above can be performed in lab within constant biological steps, it is reasonable to assume that the complexity of each operation is O(1). In previous works ([ 10-12, 14, 15, 27]), many researchers have used this same approach to analyze the complexity of DNA computing algorithms. Therefore, the same approach is used in this study.

The job shop scheduling problem
The JSSP is already known as a typical NP-hard problem [29]. An n × m JSSP can be formally described as follows [30]. There are n jobs and m machines denoted as J = (J 1 , J 2 , � � �,Jn) and M = (M 1 , M 2 , � � �,M m ), respectively. Each job must be processed (or handled) through all m machines to fulfil its processing tasks. The processing of a job is also called an operation. Each job requires m operations. Only one machine is required for each operation, and only one operation can be handled on one of the m machines. Once started on a specified machine, an operation is not allowed to be interrupted until the processing of the job is completed, meaning that each operation begins only when all its previous operations are finished, i.e., preemption is not allowed. The processing time and the sequence of operations, i.e., the machining paths are given in advance. The goal of a JSSP is to find the optimal schedule in order to minimize the maximum makespan. A JSSP with n jobs and m machines has (n!) m possible solutions.
The notations used for the mathematical description of the JSSP are given below.
(1) n and m denote the numbers of jobs and machines, respectively.
(3) t i,j represents the processing time of O i,j .
(4) TJ i,j represents the completion time of O i,j , i.e., the cumulative completion time of job j up to operation i. The mathematical programming model of the JSSP is given in the following.

MinðMax
The objective function minimizes the makespan, i.e., the maximum completion time. Constraint (2) represents precedence relationship between the operations. Constraint (3) means that preemption is not allowed. Constraint (4) gives the domains of the variables.
Example 1. Table 1 shows a n × m = 4 × 2 FSSP example with 4 jobs J 1 , J 2 , J 3 and J 4 and 2 machines M 1 M 1 and M 2 . The jobs have the same operation sequence, i.e., the same machining path, where they first pass through machine 1 (M 1 ) and then pass through machine 2 (M 2 ). The time needed by each job on each machine is shown in the table.
Example 2. Table 2 shows a n × m = 3 × 3 JSSP example with 3 jobs J 1 , J 2 and J 3 , each with a different machining path, processed on 3 machines, M 1 , M 2 and M 3 . The machines required are shown in the column M i and the time needed by each job on each machine is shown in the column t i,j . For instance, the 1 st operation of job 1 (J 1 ), i.e., O 1,1 , is processed on machine 3 (M 3 ) and the processing time corresponding to O 1,1 is t 1,1 = 7 units. The 2 nd operation of job 1 (J 1 ), i.e., O 2,1 , is processed on machine 1 (M 1 ) and the processing time corresponding to this operation is t 2,1 = 4 units, and so on.
From the two examples above, it is intuitive that the JSSP is an extension of the FSSP. The biggest difference between a JSSP and a FSSP lies in the machining paths of the jobs, as shown in Tables 1 and 2. As compared with the FSSP as shown in Table 1, the machining paths of the jobs are different from each other in a JSSP as shown in Table 2. If all the jobs have the same machining path, i.e., each job needs the same operations, without distinction between the operations and the machines, the JSSP becomes a FSSP.
As compared with FSSP, JSSP is much more complicated and closer to the practical problems in production. In a FSSP, only one time matrix is needed. However, in a JSSP, two matrices are required, one represents the processing time and the other represents the machines needed by the jobs. Therefore, this work focuses on the more practical and representative JSSP for an in-depth study.

A DNA algorithm for the job shop scheduling problem
Encoding is the key and difficult part of solving the combinatorial optimization problem with DNA computing. Therefore, this section starts with the coding scheme and then gives an overview of the proposed algorithm. The detailed algorithm is finally presented.

Encoding
A schedule, or scheduling sequence, of an n × m JSSP can be denoted by OP 1 −OP 2 −� � �OP n×m , where OP i 2[1, n] indicates a job number. In this schedule, the i th appearance of job j indicates operation i of job j, i.e., O i,j . Each number OP i appears exactly m times.
Take a scheduling sequence 1-3-2-2-1-3-3-1-2 of a 3 × 3 JSSP as an example. The first number '1' indicates operation 1 of job 1; the second number '3' indicated operation 1 of job 3. The fourth number '2' (the 2 nd appearance of job 2) indicates operation 2 of job 2. Similarly, the seventh number '3' (the 3 rd appearance of job 3) indicates operation 3 of job 3. Obviously, once a scheduling sequence is determined, the makespan corresponding to this schedule is uniquely determined. For example, referring to the data in Table 2 in Example 2, the completion times or makespans of the 3 jobs can be easily calculated. The completion times of jobs 1, 2 and 3 are 13, 18 and 18, respectively. Consequently, the makespan (completion time) for this schedule is 18. Fig 1 in the following shows this schedule as a Gantt chart.
Effective encoding needs to be used to reasonably map real problems to DNA molecular computing models, and to generate all possible solutions in parallel in one step. In the following, p, E i , q, F j are used to represent different DNA single strands with the same length, e.g., u mer, with u as a positive integer. The notations p and q are used for DNA ligation, as defined in Section 2.2, and E i and F j represent the single strands for operation i and job j, respectively. The single strand pE i qF j can be used to indicate operation i of job j, i.e., O i,j . In this way, all ðn!Þ m possible solutions can be easily generated.
Example 3. Table 3 shows a n × m = 5 × 6 JSSP example with 5 jobs, each with a different machining path, processed on 6 machines. Data for each job are shown in two columns in the table. The first column shows the corresponding machine number and the second column shows the time needed by the job on each machine. Fig 2 in the following is an optimal scheduling sequence of this 5×6 JSSP. It means that the jobs are processed in the order of 5-4-2-1-3 for the 1 st operation, in the order of 5-2-1-4-3 for the 2 nd operation, and so on, where the number '0' in the middle represents a separator. The same job processed by different machines are identified with the same color. A DNA encoding method based on scheduling sequences is proposed below.
In accordance with Algorithm 1, the DNA strands {pE 1 qF 5 pE 1 qF 4 pE 1 q-F 2 pE 1 qF 1 pE 1 qF 3 } will be generated to denote that the jobs are processed in the order of 5-4-2-1-3 in the 1 st

PLOS ONE
operation. In this way, DNA strands can be obtained to denote all n jobs in every operation. By encoding the jobs in this manner, all (n!) m possible schedules will be obtained. Fig 3 in the following is the optimal schedule presented as a Gantt chart corresponding to the optimal scheduling sequence given above in Fig 2, where the maximum completion time, i.e., the makespan, of this schedule is 45. The result 45 is calculated using the data in Table 3.
In JSSP, different jobs may require the same machine in some operations, so that some jobs may have to wait for others to finish before being processed and machines may become idle while having to wait for jobs to come. Different schedules have different job and machine waiting times and might have different makespans. The advantage of this encoding is that, once a scheduling sequence, i.e., a schedule, is determined, the makespan corresponding to this schedule is also uniquely determined. However, it should be noted that the makespan is, but the corresponding scheduling sequences may not be, unique.

An outline of the algorithm
The basic idea of this DNA algorithm for solving the JSSP is to find an optimal solution by checking all possible solution candidates. This brute force approach is realized through DNA computing. Specifically, this proposed algorithm consists of four steps.
(1) Generate the initial solution space in test tube N 0 for the JSSP; (2) Screen the DNA strands representing the feasible schedules and discard the ones representing infeasible schedules; (3) Append time information strands at the end of the strands representing feasible schedules and calculate the completion time of each feasible schedule; (4) Select the strands corresponding to the optimal schedule that minimizes the maximum completion time, i.e., the makespan.

9:
Amplify (N 1 , N 0 ); 10: Discard (N 1 ); 11: end for End After executing Algorithm 2, all possible DNA strands representing all possible solutions of the JSSP can be obtained as shown below fa 11 pE 1 qF j 1 � � � pE 1 qF j k � � � pE 1 qF j n a 12 Sa 21 pE 2 qF j 1 � � � pE 2 qF j k � � � pE 2 qF j n a 22 S � � �g; where the subscript j k of F j k is uniquely determined by the value of k for 1 � k � n, and F j k 2 fF 1 ; F 2 ; � � � ; F n g, i.e., the sequence j 1 , . . .,j k , . . .,j n is an arbitrary out-of-order combination of the sequence 1,. . .,n.

Computation of the final completion time of each job for every strand.
In Algorithm 3, as explained in Section 2.2, TJ i;j k denotes the completion time of job j k in operation i, TM i;j k is the cumulative time (not including t i;j k ) of the required machine corresponding to job j k in operation i, and t i;j k is the corresponding processing time of job j k in operation i. The value of j k is also uniquely determined by the value of k for 1 � k � n, where j k 2{1,2,� � �,n}. The final completion time of the n jobs are stored separately in n test tubes. The single strand C, also with a length of u mer, in Algorithm 3 denotes one unit of time. for k = 1 to n do 5:

Algorithm 3. Computation of the final completion time of each job for every strand
if i>1 then 6: Separation (N j k ,Sω, U j k ); 7: Discard (N j k ); 8: Cutting (U j k ,Sω); 9: N j k : = B (U j k ,a 11 pE 1 q); 10: Discard (U j k ); 11: else 12: Continue 13: if TJ iÀ 1;j k > TM i;j k (when i = 1, both initial values are 0) then 14: Append-tail (N j k , o |{z}  Proof. The total complexity of the four algorithms is as follows In conclusion, the optimal schedule of a JSSP can be found with an O(n 2 ) complexity. Summary. The solution of the JSSP can be represented by a strand with a polynomial length.
Explanation. Suppose the length of the different strands is ; m� and j 2 ½1; n�: Let l = SSt ij , and also assume m � n. The length of DNA strand L corresponding to the optimal schedule in Algorithm 4 is as follows.
The final solution strand in Algorithm 4 is within appropriate length. The optimal solution can then be found and determined.

Experiment and comparison
The algorithm proposed in this study is simulated in Python. Two important tool libraries, i.e., Biopython and DOcplex, are used to simulate and implement the four algorithms, as components of the proposed algorithm, in this work. Biopython, a Python tool for computational molecular biology, is used to encode problems and construct solution spaces. DOcplex, a Python tool library for solving constraint programming problems, is used to simulate the constraints in Algorithm 3 and the objective function in Algorithm 4. The computer used for computation has an i5-4210H processor with a 2.90GHz clock speed and 12G of RAM.
The algorithm is first compared with four state-of-the-art heuristics on 43 JSSP benchmark instances (see Table 4). The results show that, except for instance LA29, the proposed algorithm found the best known solutions for the remaining 42 instances, and has the same or better performance than the four comparative heuristics.
The 43 JSSP benchmark instances are selected from the OR Library [31], which contains 3 instances (FT06, FT10, FT20) designed by Fisher and Thompson [32] and 40 instances (LA01~LA40) designed by Lawrence [33]. The four comparative heuristics used for comparison are MAGATS [21], NIMGA [34], aLSGA [35] and WW [36]. Table 4 shows the results obtained by the proposed algorithm and the four comparative heuristics for the 43 instances. These results include the names of the instances, the sizes of the instances represented by n×m, the best known solutions (BKS) and the best solutions obtained by the proposed algorithm and the four comparative heuristics. Results of the comparative heuristics are from the original respective publications [21,[34][35][36].
In order to visualize the scheduling results of the proposed algorithm, the Gantt charts of the optimal schedules of the instances FT20, LA20 and LA36 are presented in Figs 5-7, respectively.  show that the optimal makepans of instances FT20, LA20 and LA36 are 1165, 902 and 1268 units of time, respectively. Not all four comparative heuristics could find the best known solution for instance FT20 and none of these heuristics could find the best known solutions for instances LA20 and LA36. Table 5 shows a comparative analysis with four more heuristics on four instances of different sizes. The four heuristics are PSO [37], IGA [38], DE [39] and SSO-DM [18]. The table gives the statistical results, i.e., the best, worst, mean and standard deviation (Std.), of 20 independent runs. Except for instance YN4, the proposed algorithm found the best known solutions for the remaining three instances. The results of the comparative heuristics are from Zhou et al. [18].  show the box plots of these five algorithms on the four instances.  Table 6 shows the experimental results of the proposed algorithm on the instances of Yamada and Nakano [40] (YN1~YN4) and Storer et al. [41] (SWV01~SWV10). In the experiment, the running time limit of the algorithm is set to 2000 seconds (Sec.), and the relative error (RE), i.e., the error between the obtained and the best know solutions defined as a percentage of the best known solution, is introduced as a criterion. The third column BKS/UB in the table represents the best known solutions (BKS) or the known upper bounds (UB) when the BKS is unknown. The last column t shows the running time taken by the algorithm in

PLOS ONE
seconds. The results show that the proposed algorithm can find the best known solutions for the three instances SWV01~SWV03 in a short time, but cannot find the best known solutions for the remaining 11 instances within the running time limit of 2000 seconds. The maximum RE value for these instances do not exceed 6%.

Conclusions
Based on the DNA operations of the Adleman-Lipton model, an appropriate encoding strategy is developed first to generate all possible solutions in parallel using DNA computing in one  step. Then four highly efficient and parallel DNA algorithms as components of the proposed algorithm are proposed for JSSP. The proposed algorithm is simulated and compared with several heuristics using 58 JSSP benchmark instances from the literature, and the proposed algorithm found the best known solutions for 46 instances. The results show that the proposed algorithm performs better than the comparative heuristics. One direction of future works is to explore the possibility of solving the FJSP by using viable biological computational models including the sticker model among others. In addition, for larger-scale benchmarks such as the SWVs and the YNs, multi-threaded computing will be considered for simulation implementation in the future.