One-time-pad cipher algorithm based on confusion mapping and DNA storage technology

In order to solve the problems of low computational security in the encoding mapping and difficulty in practical operation of biological experiments in DNA-based one-time-pad cryptography, we proposed a one-time-pad cipher algorithm based on confusion mapping and DNA storage technology. In our constructed algorithm, the confusion mapping methods such as chaos map, encoding mapping, confusion encoding table and simulating biological operation process are used to increase the key space. Among them, the encoding mapping and the confusion encoding table provide the realization conditions for the transition of data and biological information. By selecting security parameters and confounding parameters, the algorithm realizes a more random dynamic encryption and decryption process than similar algorithms. In addition, the use of DNA storage technologies including DNA synthesis and high-throughput sequencing ensures a viable biological encryption process. Theoretical analysis and simulation experiments show that the algorithm provides both mathematical and biological security, which not only has the difficult advantage of cracking DNA biological experiments, but also provides relatively high computational security.


Introduction
Since Adleman [1] discovered the computational ability of DNA molecules in 1994, scientists have constructed and assembled DNA molecules in different ways according to biological operations, and implemented logical computational models based on molecular computation using DNA strand replacement, DNA polymerase and nanoparticles [2][3][4]. In addition, DNA molecules as information carriers begin to be used for DNA storage [5][6][7][8] due to it prominent advantages such as large storage capacity, high computational parallelism and low energy consumption. With the rapid development and maturity of DNA synthesis and sequencing technology, DNA storage technology can be realized. It's a futuristic and epoch-making technology. High-throughput DNA sequencing technology, also known as second-generation sequencing technology, can sequence hundreds of thousands or even millions of DNA molecules at a time. It has been widely used in genomics research since its birth, and is currently an important implementation tool for the construction of DNA storage schemes [9,10]. The a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 application of DNA molecule in information science not only challenges traditional cryptography which relies on mathematical difficulties for security [11,12], but also provides conditions for the design of more efficient and reliable DNA cryptography when combined with modern cryptography.
DNA cryptography usually takes DNA as the information carrier and biological technology as the implementation tool to realize the encryption operation method based on DNA technology [13]. The security of encryption scheme based on DNA technology depends on biological difficulties, but there are also some problems such as the unpredictability of experimental results caused by DNA non-specific hybridization and the complexity caused by complicated artificial operation process. In addition, based on some biological characteristics of DNA computation, DNA encryption scheme is implemented through pseudo-DNA computation operations such as encoding mapping, base calculation and confusion encoding table by simulating DNA biological operations [14]. Chaos map is mostly used in image encryption [15][16][17][18][19][20]. Due to its good pseudo-randomness and unpredictability, sensitivity to initial state and control parameters, the use of chaos map in the pseudo-DNA encryption scheme can improve computational security [14,[21][22][23]. In a pseudo-DNA cryptography, the biological nature of DNA makes the algorithm more random, but does not provide biosafe because it does not involve biological processes.
In 1917, Gilbert Vernam first proposed one-time-pad. The security of the one-time-pad encryption system depends on the randomness of the key. If the attacker cannot get code book used to encrypt messages, the algorithm is completely confidential. However, it is difficult to generate, store and distribute the key of one-time-pad. Because of DNA its huge storage capacity and high computational parallelism, DNA molecule began to be combined with one-timepad algorithm to solve the above problems. Therefore, how to solve the complicated and uncontrollable process of biological encryption, and how to better combine pseudo-DNA computing method with one-time-pad are the problems we need to consider. In this paper, a one-time-pad cipher algorithm based on confusion mapping and DNA storage technology is proposed to satisfy both mathematical and biological security.

Related work
DNA-based one-time-pad cipher algorithms initially used DNA biotechnology to solve key generation, storage and distribution problems. In 2000, Gehani et al. [13] designed two onetime-pad DNA encryption schemes using DNA sequence substitution mapping and chipbased DNA microarray technology. This scheme ensures data security from both mathematical and biological aspects, and shows the ultra-high storage density of DNA. But the scheme is difficult to operate in practice. Chen et al. [24] also used the massively parallel processing capability of biomolecular computing, in 2003 proposed a cipher design based on the DNA polymerase chain reaction technology. In this scheme, one-time-pad code book is assembled in the form of DNA strand, and modulo 2 addition is performed by primer amplification reaction, which realizes one-time-pad cipher algorithm. However, this algorithm also has some practical difficulties. In 2014, Wang et al. [25] also adopted DNA polymerase chain reaction and proposed a one-time-pad encryption algorithm based on DNA cryptography. In the algorithm, the single strands of synthetic DNA are composed of one-time-pad code book and the key distribution problem is solved by polymerase chain reaction. Compared with the afore mentioned cryptographic scheme that only depends on biological experiments, this algorithm proposes a triplet DNA encoding method. The three base combinations correspond to letters, Numbers and symbols, but there are only C 40 64 mapping relationships, so the encoding security is relatively low.
The combination of DNA self-assembly technology and one-time-pad also provides new ideas for DNA cryptography schemes. The one-time-pad encryption algorithms based on DNA tile self-assembly structure are proposed in [26][27][28]. True key randomness is achieved by the natural process of DNA self-assembly, and each part of the operation is calculated in parallel without human intervention. However, these cryptography schemes only provide biological experimental security, and the security of the scheme will be greatly reduced if an attacker obtains the algorithm operation flow. In 2014, Yang et al. [29] proposed a one-time-pad cipher scheme based on the self-assembling structure of DNA, and used the technology of toehold recognition and strand replacement to carry out XOR operation, and finally obtained the fluorescence intensity spectrum of ciphertext. The algorithm has the property of one-time-pad, but there is no corresponding encryption measure for the code book. If the attacker gets the code book and the structure construction method, it is easy to crack the cipher text and get plaintext information. In 2018, Peng et al. [30] proposed a one-time-pad scheme that integrates DNA information hiding and three-dimensional DNA self-assembly structure. The selfassembled structure of the scheme generates four random numbers and transmits them in DNA sequence by hiding information. Compared with the scheme proposed in [29], this DNA self-assembly cryptography scheme further ensures the security of self-assembly structure. In addition, DNA self-assembly can be combined with other biotechnologies to form new schemes. In 2017, Li et al. [31] proposed a molecular encryption system based on DNA-zyme. The ciphertext and key are encoded into DNA sequence and loaded on a fixed structure. During decrypting, the correct DNA ciphertext strand and fluorescence signal are obtained by adding trigger and DNAzyme structures. DNAzyme-based encryption scheme can better ensure the security of information from the perspective of biotechnology, but due to the limitation of DNAzyme structure, large-scale encryption and decryption cannot be completed.
One-time-pad algorithm can also use DNA biological characteristics to design pseudo-DNA cryptography. In 2014, Wan et al. [23] proposed a one-time-pad encryption algorithm based on hyperchaotic DNA computational optimization, which was applied to image encryption. The scheme is based on Logistic hyperchaotic mapping, and DNA base addition and subtraction calculation. Experimental analysis shows that the correlation coefficient of the ciphertext image obtained is close to 0, and the average execution time of encryption and decryption is short. However, the security of the algorithm mainly depends on the key sequence generated according to parameters. If the parameters are leaked and there is no complex encoding mapping and biosecurity to provide support, the algorithm is easy to be cracked. In 2018, Peng et al. [14] proposed a one-time-pad algorithm based on multi-base combination mapping coding and DNA computing, and the design types of DNA encoding rules were increased compared with the [23]. In this algorithm, the plaintext is segmented during encryption, and different keys are selected for each segment of plaintext according to different primer pairs. The security of the algorithm is enhanced by simulating the biological experiment process. All the above one-time-pad pseudo-DNA schemes use DNA biological characteristics and experimental process simulation to increase the randomness of the algorithm. If DNA biotechnology can be combined, a more secure and reliable DNA-based one-time-pad cryptography will be formed.

Our contribution
Most of the DNA-based one-time-pad encryption schemes encoding are relatively simple, and the security of data information transition into DNA information needs to be further enhanced. In addition, the DNA biotechnology involved has some practical operation difficulties or experimental condition limitations. Therefore, the rationality of biological experiments and the high computation security are critical to the design of DNA-based one-time-pad cipher algorithm. Inspiring by the DNA cryptography schemes [9,30,32,33], we proposed a one-time-pad cipher algorithm based on confusion mapping and DNA storage technology. The primary contributions of our constructed algorithm are as below: 1. In the algorithm, the combination of chaos map, encoding mapping, confusion encoding table and the simulation of biological operation process provides a large enough key space, so that the algorithm can effectively resist any form of exhaustive attack. Among them, parameters selection of encoding mapping and confusion coding table ensure the safety of the conversion process of plaintext and DNA sequences, RNA and protein sequences. In addition, simulating the flow of biological information from DNA to proteins makes the algorithm more stochastic.
2. In the encryption and decryption algorithm, security parameters and confounding parameters can be randomly selected in each step. Multiple random steps generate different ciphertext for plaintext, which ensures the dynamic encryption process and provides higher randomness of encryption compared with the above DNA one-time-pad encryption algorithm. Among them, security parameters include two sets of initial Logistic map parameters for generating key, and different key can be generated for every session according to the key generation algorithm.
3. The application of DNA synthesis and high-throughput sequencing technology ensures the biosafety of the algorithm through the physical isolation of plaintext information and biological information. In addition, the DNA library construction method of ciphertext segmentation and reassembly makes it impossible for the attacker to obtain DNA sequence information by sequencing without the adapter sequences. Even if all DNA fragments were sequenced with random primers, DNA ciphertext could not be correctly spliced without index sequences.

Organization
We will introduce the bases of the cipher algorithm in the next section. In section 3, data and parameters cipher algorithms are formally introduced. In sections 4 and 5, the algorithm is analyzed in detail for performance and security. Finally, we make a summary of the proposed scheme.
Encoding mapping Encoding rules: The binary numbers 0, 1 correspond to the bases A, T/U, C, G. There are 2 a combinations of a binary numbers and 4 b combinations of b bases. If binary numbers and base combinations are one-to-one correspondence, then 2 a = 4 b . In the case of a = 2, there are 4! corresponding relationships between two binary numbers and a single base,as shown in Table 1. And each corresponding relationship is numbered and defined as key i 2 [1,24], i = 0, 1, 2.
In this paper, the encoding rules are used to convert binary sequence to DNA or RNA base sequence respectively.

Confusion encoding table
In the process of constructing the confusion encoding table and data encryption, a common Arnold scrambling algorithm is used to transform the points (x, y) to (x 0 , y 0 ) in the N × N matrix. The scrambling transformation matrix of Arnold map can be expressed as: In Eq (6), x, y 2 (0, 1, . . ., N − 1), the scrambling parameters m and n of Arnold map are both positive integers. In Arnold map transformation, the values of the matrix can be uniformly distributed in the scrambled matrix after a certain number of iterations.
Due to the limited encoding mapping relationships, the confusion encoding table is used to enhance the security of the algorithm. The values of m and n are both 1, and the processes of constructing the confusion encoding table are as follows: Step1: 16 letters were selected from 20 amino acid letters, and the remaining 4 letters T, V, W, Y were randomly added to the amino acid sequence as redundant letters. Step2: The 16 amino acid letters were selected as rows and columns respectively to construct a 16 × 16 matrix, as shown in Table 2.
Step3: The rows and columns of the 16 × 16 matrix are numbered from 0 to 15, and each value of the matrix is represented by unique coordinate to generate the initial confusion encoding table, as shown in Table 3.
Step4: According to the number of iterations, Arnold map transformation is performed on the initial confusion encoding table. Assuming that Arnold map iteration value is Pk, and Arnold map iteration period of 16 × 16 matrix is 12, Pk 2 [0, 12].  IR  IN  ID  IC  IQ  IE  IG  IH  II  IL  IK  IM  IF  IP  IS   10  LA  LR  LN  LD  LC  LQ  LE LG

DNA structure design for sequencing
The DNA structure of high-throughput sequencing contains the reassembly and information regions. The reassembly region is denoted as parameter r. Each r consists of adapter and index, where the adapter region is used to connect the sequencing primer and the index regions are used to distinguish different samples after sequencing. And the information region is the DNA ciphertext sequence obtained by data encryption. The design of each part of DNA sequencing structure in this scheme is shown in part(a) of the Fig 1. The sender cuts off the DNA ciphertext sequence obtained after encryption of the plaintext data and adds the adapter and index shared by the receiver at both ends of each DNA ciphertext sequence to generate the DNA library. All the DNA strands are synthesized, mixed, and finally the DNA mixtures are publicly transmitted to the receiver, as shown in part(b) of the

Algorithm initialization
Alice randomly selects the parameters used in the encryption process and generates the confusion encoding table. The procedures are as follows: Step1: Two sets of initial parameters μ 0 , g 0 and μ 1 , g 1 of Logistic map are randomly selected to obtain the key K. Step2: Encoding mapping parameters key i , i = 0, 1, 2 are randomly selected.
Step3: Confusion encoding table parameter Pk is randomly selected and the confusion encoding table is generated from the initial confusion encoding table.

Parameters encryption
Security parameters confusion. Alice confuses the security parameters of encoding mapping key 0 and generating key μ 0 , g 0 , μ 1 , g 1 .
The procedures are as follows: Step1: Firstly, the security parameters sequence key 0 , μ 0 , g 0 , μ 1 , g 1 is converted into binary sequence according to the ASCII table.
Step2: The binary sequence in Setp1 is converted to DNA sequence according to the corresponding mapping relationship key 1 in initialization.
Step3: The DNA sequence generated by Step2 is cut in the middle, the first half is transcribed into mRNA sequence (T!U), and the mRNA sequence is converted into tRNA sequence (A$U, C$G). The second half is transcribed into rRNA sequence (T!U).
Step4: tRNA sequence and rRNA sequence are converted into binary sequence according to the mapping relation corresponding to the initialization value key 2 .
Step5: 4 bits binary numbers correspond to 1 bit decimal number, tRNA and rRNA binary sequences respectively correspond to decimal sequences Step7: Redundant amino acid letters T, V, W, Y are randomly added to amino acid sequence A 0 to generate amino acid ciphertext sequence C 0 .
The above security parameters confusion process simulates the flow of genetic information from DNA to protein through RNA. It is that the process of DNA transcription to mRNA and mRNA translation to protein with specific amino acid sequence. The encryption processes are shown in Fig 2. Confounding parameters encryption. Alice encrypts the confounding parameters key 1 , key 2 and Pk, which are used for the security parameter confusion.
The procedures are as follows: Step8: According to ASCII code, the confounding parameters sequence key 1 ,key 2 , Pk is transformed into binary sequence.
Step9: 4 bits binary numbers correspond to 1 bit decimal number, the first and second halves of the decimal sequence generated by the binary sequence are M 1 , M 2 , . . ., M j and N 1 , Step10: The decimal sequences M 1 , M 2 ,. . .,M j and N 1 , N 2 , . . ., N j as the horizontal and vertical coordinate respectively, generating j coordinates, and the j coordinates (M 1 , N 1 ), (M 2 , N 2 ), . . ., (M j , N j ) generate amino acid sequence A 1 according to the confusion encoding table.
Step11: Redundant amino acid letters T, V, W, Y are randomly added to amino acid sequence A 1 to generate amino acid ciphertext sequence C 1 .
The confounding parameters encryption processes are shown in Fig 3. Parameters ciphertext generation. Alice transmits the amino acid parameters ciphertext, including the amino acid ciphertext C 0 generated by the security parameters confusion and the amino acid ciphertext C 1 generated by the confounding parameters encryption, to Bob through the public channel.

Data encryption
Alice encrypts the plaintext with the key K, encoding mapping parameter key 0 , the adapter and index sequence of DNA sequencing structure. And the resulting DNA mixtures are

������ �! DNA mixtures
The procedures are as follows: Step1: According to the ASCII code, converting plaintext sequence P into binary plaintext sequence P b .
Step2: XOR P b with the K to obtain binary ciphertext sequence C b .
Step3: C b is divided into x groups. Each group length is 2 18 bits, and converted into a 512 × 512 binary matrix. These binary matrices are represented as C b1 , C b2 , . . ., C bx .
Step5: C B1 , C B2 , . . ., C Bx are transformed into the binary sequences in turn, and the final binary ciphertext sequence C B is obtained.
Step6: Convert C B to the DNA ciphertext sequence C DNA according to key 0 .

Parameters decryption
Confounding parameters decryption. Bob first decrypts the confounding parameters key 1 , key 2 and Pk.
The procedures are as follows: Step1: Redundant amino acid letters T, V, W, Y are removed from amino acid ciphertext sequence C 1 to obtain amino acid sequence A 1 .
Step2: The horizontal and vertical coordinate sequences M 1 , M 2 , . . ., M j and N 1 , N 2 , . . ., N j are restored from A 1 according to the initial confusion coding table.
Step4: According to the ASCII codes, parameters key 1 , key 2 and Pk are restored.
Step8: Transform tRNA and rRNA binary sequences into tRNA and rRNA sequences respectively according to the mapping relationship corresponding to key 2 .
Step9: tRNA sequence is converted into mRNA sequence (A$U,C$G), mRNA sequence and rRNA sequence are merged and then reverse transcribed into DNA sequence (U!T).
Step10: Convert DNA sequence into binary sequence according to the mapping relation corresponding to key 1 .
The security parameters decryption processes are shown in Fig 5.

Data decryption
After sequenced the DNA mixtures, splicing to obtain the DNA ciphertext sequence C DNA according to the index sequence. And then decrypted the plaintext data according to the security parameters key 0 , μ 0 , g 0 , μ 1 , g 1 .

DNA mixtures key
The procedures are as follows: Step1: According to two sets of different initial parameters μ 0 , g 0 and μ 1 , g 1 to obtain the binary key sequence K.
Step3: DNA ciphertext sequence C DNA is correctly spliced through known index sequences.
Step4: Convert C DNA into the final binary ciphertext sequence C B according to key 0 .
Step5: C B is divided into x groups. Each group length is 2 18 bits, and converted into a 512 × 512 binary matrix. These binary matrices are represented as C B1 , C B2 , . . .C Bx .
Step7: C b1 , C b2 , . . .C bx are transformed into the binary sequences in turn, and the binary ciphertext sequence C b is obtained.
Step8: The binary plaintext sequence P b is obtained by XOR C b with K.
Step9: According to the ASCII code, converting P b to plaintext sequence P.
The pseudo code for the data decryption processes are shown in Algorithm 2. And the processes of cipher algorithm are shown in

Parameters frequency analysis
Detailed example: The security parameters selected randomly include the encoding mapping parameter key 0 = 12 and two different sets of Logistic map initial parameters
Step3: Divide the DNA sequences generated by Step2 into the middle. The first half can be transcribed to mRNA sequences (T!U), and then the mRNA sequences can be converted to tRNA sequences (A$U,C$G). UGUC UGUA UAGU UGUG UAGA UGCG UGAU UAGU UGUU UAGA UGCC and the second half is transcribed into rRNA sequences ACAG AUCA ACAC AUCU ACUG ACAU AUCA ACAA AUCU ACGA ACGA.
Step5: 4 bits binary numbers correspond to 1 bit decimal number, tRNA and rRNA binary sequences respectively correspond to decimal sequences. Step7: Redundant amino acid letters T, V, W, Y are randomly added to amino acid sequence to generate amino acid ciphertext sequence C 0 . ADFVQNRIHYVMQWTNLNRPQMQGEVADNINRSYQMWQDIVNRTHHMQFAA DWGTD.
Step9: 4 bits binary numbers correspond to 1 bit decimal number, the first and second halves of the decimal sequence generated by the binary sequence. 3 4 2 12 3 and 7 2 12 3 1.
Step10: The two sets of decimal sequences are abscissa and ordinate respectively, generating 5 coordinates. And the coordinates generate amino acid sequence A 1 according to the initial confusion encoding table. DG CN NM MD DR.
Step11: Redundant amino acid letters T, V, W, Y are randomly added to amino acid sequence A 1 to generate amino acid ciphertext sequence C 1 . DTGCNNMYMDWDR.
The frequency counts of parameters plaintext and ciphertext characters are shown in Figs 7 and 8 respectively. After encryption, the frequency of parameters ciphertext is very different from that of plaintext, so there is no correlation.
Assuming that key 1 = 5, and other confounding parameters remain unchanged, the frequency of ciphertext characters is shown in Fig 9. Assuming that key 2 = 6, and other confounding parameters remain unchanged, the frequency of ciphertext characters is shown in Fig 10. Assuming that Pk = 2, and other confounding parameters remain unchanged, the frequency of ciphertext characters is shown in Fig 11.

PLOS ONE
The above three confounding parameters key 1 , key 2 , Pk were slightly changed, and the parameters ciphertext characters frequencies were greatly different from that before, and there was no correlation, which could resist statistical analysis.

Data frequency analysis
In the data encryption, binary plaintext converting to binary ciphertext with one-time-pad key K and Arnold map encryption. In order to test the statistical relationship between binary  plaintext and ciphertext, gray image was selected as the source data for encryption simulation experiment. The computer was configured as 2.50GHz processor, 8GB memory, Windows 7 operating system and the simulation software was Matlab R2018b. Two sets of Logistic map initial parameters (μ 0 , g 0 ) = (3.78, 0.51) and (μ 1 , g 1 ) = (3.92, 0.44) were selected, and the K was obtained by substituting into the key generation algorithm. For example, we select the gray images Lena and Baboon with the size of 160 × 160 pixels. The distribution of images pixel frequency is analyzed statistically before and after encryption, as shown in Fig 12. By comparing the histograms before and after encryption, it can be seen that the features of the original image after encryption are effectively concealed and there was no statistical similarity between the histogram of original and encrypted pixel frequency.
We calculate variances of histograms to evaluate uniformity of ciphered images. The lower value of variance between two different ciphertext images indicates the higher uniformity of encrypted image. We employ the variance formula in reference [34], as shown in Eq (7).
For quantity analyses of each Logistic map initial parameters, we calculated the images variance encrypted with different parameters on the same plaintext image. Based on the logistic map initial parameters of the above the generated key K, only one parameter value is changed at a time to compare the variance value of ciphertext images. The closer of the two variance values are, the better uniformity of the encrypted image is when the parameters changes. The variance of ciphered images compared among all parameters is shown in Table 4. The variance values of ciphertext images are about 5000, and encrypting the same plaintext image with different parameters results in a small difference in the variance values of the encrypted image. Therefore, the data encryption algorithm is efficient. The information entropy can reflect the pixel distribution of the gray image. The formula for calculating the entropy of encrypted image is as follows: When the entropy value of the encrypted image is close to 8, the distribution of gray histogram becomes more uniform. Different initial parameters are used to encrypt the images for 5 times and the average entropy of ciphered images is shown in Table 5. The encrypted image entropy value H(x) calculated according to Eq (8) is close to 8. Therefore, data encryption can effectively resist statistical attacks.
Lena gray images of different sizes are selected to test the encryption time. Different initial values are used to encrypt for 5 times and the average encryption time is compared with the recently image encryption algorithms [18][19][20], as shown in Table 6. The comparison results indicate that the encryption time of our algorithm is reasonable and effective.

Algorithm time complexity analysis
In order to analyze the time complexity of the algorithm, DNA one-time-pad encryption algorithms similar to our algorithm were selected to compare the total encryption time. The total encryption time consists of algorithm initialization, parameters encryption, self-assembly structure generation and data encryption time. It is assumed that the key generation time of the algorithm is T 1 , the encryption time for each set of parameters is T 2 , the generation time of each self-assembled structure is T 3 , and the time of each bit plaintext to generate ciphertext is T 4 . If the bit numbers of plaintext are n, the total encryption time comparison of the algorithms is shown in Table 7.
According to the four components of encryption time in Table 7, references [26] and [29] do not include algorithm initialization and parameter encryption time, and our algorithm does not include self-assembly structure generation time.
The algorithm initialization time in reference [30] is the key generation time T 1 . The initialization time of our algorithm is 3 times that of the algorithm in reference [30], which includes not only the key generation time, but also the selection time of the encoding mapping parameters and the generation time of the confusion encoding table. In terms of parameters encryption time, our algorithm is a fixed value T 2 . However, parameters encryption time in reference [30] is related to the bit numbers n of plaintext, which is nT 2 . References [26], [29] and [30] use self-assembled structures to construct logical operations. The generation time of each self-assembled structure T 3 is uncontrollable and uncertain due to the influence of artificial operation and experimental conditions, while our algorithm does not include this process. And the generation time of self-assembled structure in the algorithms of these three literatures is proportional to the n. In addition, the data encryption time nT 4 of each algorithm is also proportional to the bit numbers n of plaintext. Although the initialization time of our algorithm is three times key generation time, it does not include the uncertain self-assembling structure generation time which is proportional to n. By comparing the total encryption time of these DNA one-time-pad encryption algorithms, our algorithm has relatively less encryption time and provides a reasonable time complexity.

Algorithm performance features analysis
Ubaidurrahman et al. [35] proposed six efficient DNA cryptography properties in 2015, while Peng et al. [30] defined five performance parameters in 2018. In order to qualitatively compare the performance of algorithms, this paper defines the efficient performance features of DNAbased one-time-pad cryptography algorithms, as shown in Table 8. In addition, the performance features comparison of DNA-based one-time-pad algorithms are shown in Table 9.

Performance features Definition
Complete character set encoding The complete character set encoding requires that all plaintext elements including letters (uppercase, lowercase), numbers, and special characters can be converted into DNA sequence character sets.
Unique sequence for character encoding The encoding of plaintext into DNA sequence is unique for every element of the character set in every session.
Dynamic encoding process Different DNA sequence character sets can be randomly encoded for plaintext elements for each interaction session between sender and receiver.

Dynamic encryption process
The encryption and decryption algorithm should provide highly random encryption process, and the operations should include several random steps to generate different ciphertext for plaintext.
Biological encryption process DNA encryption and decryption algorithm provide the physical isolation security of biological encryption by using DNA biotechnology.
Biological process simulation DNA encryption and decryption algorithms should be based on biological processes or experimental techniques simulation to adapt to the digital computing environment. https://doi.org/10.1371/journal.pone.0245506.t008 This article adopts the ASCII code combined with encoding mapping parameters key i , i = 0, 1, 2, satisfy the transition of plaintext elements to the binary random numbers and the transition of binary random numbers to base, to ensure the integrity and uniqueness of the encoding. In addition, the randomness of encoding mapping parameters key i , i = 0, 1, 2 and confusion encoding table parameter Pk ensures that every session can randomly encoding different DNA ciphertext and amino acid ciphertext for plaintext and parameters, thus realizing the dynamic encoding process. In this paper, the dynamic encryption process is provided by random selectivity of security and confounding parameters. In addition to the encoding randomness provided by encoding mapping parameters key i , i = 0, 1, 2 and confusion coding table parameter Pk, it also includes the randomness of Logistic map parameters μ 0 , g 0 , μ 1 , g 1 and reassembly parameter r. After cutting and reassembling the DNA ciphertext sequence, the biosecurity is achieved through the synthetic DNA sequence and sequencing biotechnologies. In this scheme, the process of parameter encryption and decryption simulates the flow of genetic information from DNA to protein, which satisfies the biological process simulation and can resist the attack of mathematical analysis that rely on mathematical basis and cryptographic characteristics.

Key space analysis
A secure and effective one-time-pad algorithm should have a key space large enough to effectively resist brute force attacks. Most of the existing DNA-based one-time-pad algorithms rely on the biological experiment process, but the encoding is too simple. Once the experimental process is known, the algorithm is easy to be cracked. This paper proposes a cryptography scheme based on security parameters and sequencing technology. The encryption process depends on the selection of security parameters, and the security parameters are confused. In addition, high-throughput sequencing technology provides biosafety. In fact, the algorithm can provide high computational security under the condition that the key is leak. Assuming Table 9. The features comparison of DNA-based one-time-pad schemes.

Literatures
Complete character set encoding Unique sequence for character encoding Dynamic encoding process

Dynamic encryption process
Biological encryption process that the computational accuracy of the computer is 10 −16 , the actual key in the algorithm of this scheme includes parameters and their corresponding key spaces as follows: 1. Encoding mapping parameters key i , (i = 0, 1, 2).
In summary, the total key space of algorithm is: Therefore, the key space of the algorithm is sufficient to effectively resist exhaustive attack. We compared the key space with some DNA image encryption algorithms that rely on mathematical security, as shown in Table 10. Our algorithm provides the same or even larger key space as the DNA image encryption algorithm without considering the biosafety.

Sensitivity analysis
In order to evaluate the key sensitivity of the proposed data encryption algorithm, the key sensitivity analysis method in literature [36] was referenced. We make one set of Logistic map initial parameters with a slightly change and the other parameters unchanged. The Lena gray image was selected as the source data and the original image is converted into ciphertext image with one-time-pad key K and Arnold map encryption. The original image in (a) of the Resists chosen plaintext attack. Besides getting some corresponding ciphertext, the attacker has analyzed and obtained more information related to the key. According to the comparison of DNA ciphertext sequence and plaintext data, the operation processes of data encryption and decryption algorithm are obtained. It is unable to get encoding mapping parameter key 0 and the key through security parameters because its success is without cracking security parameters. It is difficult to decipher the DNA ciphertext sequence during the next decryption.
Resists chosen ciphertext attack. The attacker can select some ciphertext and get the corresponding plaintext. It is assumed that the attacker has mastered the data encryption algorithm, the selected DNA ciphertext sequence, the decrypted plaintext, and the key of the selected part. This one-time-pad algorithm uses different Logistic map parameters to generate the key for each encryption process. Even if the key is cracked this time, it cannot be used for the next decryption. Besides the key, the mapping parameter key 0 are also different in every session and cannot be used in the next decryption. In addition, DNA ciphertext sequences were segmented and primers were added at both ends. If you don't know the index order, you can't splice the information correctly, which is a big obstacle to decipher the DNA ciphertext.

Biological security analysis
High-throughput sequencing technology has experienced more than ten years of rapid development. At present, the second-generation sequencing equipment has greatly improved in flux and accuracy. Meanwhile, the sequencing cost has also been greatly reduced, making it the mainstream of commercial sequencing. This is a great opportunity for the development of DNA storage and DNA cryptography. In the existing DNA-based one-time-pad algorithms design, some encryption algorithms are purely based on biological experiments, and the operation process is complex and uncontrollable, which is not suitable for the computing environment of digital application. Some encryption algorithms only simulate DNA biological characteristics, but do not provide biological security. In this algorithm, the DNA ciphertext sequence obtained by data encryption is segmented with random adapter and index. After artificial synthesis, it is publicly transmitted. During decryption, DNA ciphertext sequence was obtained by high-throughput sequencing and index splicing. The biological encryption processes for the data are shown in Fig 15. This scheme combines DNA frontier technology with cryptographic algorithm, which not only guarantees computational security but also provides controllable biological security: In case ➀, even if the attacker obtains the entire DNA mixtures, sequence information cannot be obtained by sequencing without knowing the adapter sequence. In addition, when the attacker uses random primers for PCR sequencing regardless of cost, multi-sequencing amplification can contaminate the DNA mixtures.
In case ➁, even if the sequencing is successful, the correct order of DNA sequences cannot be obtained because the index sequence is unknown.

Conclusion
The one-time-pad cipher algorithm based on confusion mapping and DNA technology is divided into two parts: data and parameter encryption algorithm. Each step in the process of data encryption is determined by security parameters, among which the key generation parameters not only realize one-time-pad encryption characteristics but also facilitate key management, encoding mapping parameters ensure the randomness of binary sequence and DNA sequence transformation, and the addition of reassembly parameter is the premise of DNA sequence segmentation and splicing. In addition, the combination of synthetic and high-throughput DNA sequencing technologies with cryptographic algorithms greatly improves the security of the scheme. In the process of parameter encryption, amino acid ciphertext is generated according to confounding parameters, which simulates the flow of biological genetic information from DNA to protein, increasing the randomness of the algorithm. In this paper, the parameters preparation, plaintext encryption and decryption process, and ciphertext transmission in the algorithm are described in detail. By analyzing the performance and security of the algorithm, it can be seen that the algorithm in this paper not only provides computational security but also biological security. The security of cipher algorithm has been greatly improved and can resist various attacks.