Multiplexed Sequence Encoding: A Framework for DNA Communication

Synthetic DNA has great propensity for efficiently and stably storing non-biological information. With DNA writing and reading technologies rapidly advancing, new applications for synthetic DNA are emerging in data storage and communication. Traditionally, DNA communication has focused on the encoding and transfer of complete sets of information. Here, we explore the use of DNA for the communication of short messages that are fragmented across multiple distinct DNA molecules. We identified three pivotal points in a communication—data encoding, data transfer & data extraction—and developed novel tools to enable communication via molecules of DNA. To address data encoding, we designed DNA-based individualized keyboards (iKeys) to convert plaintext into DNA, while reducing the occurrence of DNA homopolymers to improve synthesis and sequencing processes. To address data transfer, we implemented a secret-sharing system—Multiplexed Sequence Encoding (MuSE)—that conceals messages between multiple distinct DNA molecules, requiring a combination key to reveal messages. To address data extraction, we achieved the first instance of chromatogram patterning through multiplexed sequencing, thereby enabling a new method for data extraction. We envision these approaches will enable more widespread communication of information via DNA.


Sanger Sequencing
Constructs were purified using Qiagen kits and stored in cell culture grade water (Cellgro). Constructs were diluted to 30 ng/μL and sent for sequencing at indicated ratios. Primer Exter-nalFw (GACATTAACCTATAAAAATAGGC), Primer ExternalRv (GCATCTTCCAGGAAATCTC), (a) For Alice to send a message (m) to Bob, she must first write the data into DNA and then physically send the DNA to Bob, who can read the DNA and extract the data. Eve, who is eavesdropping, can physically intercept and read m. Here we have identified three areas to explore within the communication channel between Alice and Bob: data encoding, data transfer, and data extraction. (b) Fragmented DNA communication. Data encoding: m can be mixed with decoy (d) data and fragmented, then written into DNA, where the key (k) is used to encode the data and can itself be written in DNA. Data transfer: DNA encoded k and fragmented m+d components can be transmitted between Alice and Bob using multiple different channels based on a secret-sharing system. Data extraction: chromatogram patterning can be used by Bob to extract data via multiplexed sequencing reactions.

Next-Generation Sequencing
An outside party (MIT BioMicro Center, Cambridge, MA) performed next-generation sequencing (NGS) sequencing and analysis on a mixture of n1+n2+n3+n4+n5+n6. Plasmids were purified using Qiagen kits and stored in cell culture grade water (Cellgro). To confirm purity, plasmids (300ng) were run on a 1% agarose gel. Plasmids were then mixed at equal concentrations of 30 ng/μL and 900 ng of the mixture was submitted to the MIT BioMicro Center. Blind experimental conditions were used throughout the sequencing and annotation process. Briefly, for NGS sequencing a Nextera kit (Epicentre) followed by 1.5% agarose BluePippin (Sage Science) isolation of 450-800 bp inserts was used to generate a library. A MiSeq (Illumina) run on a 600 nt v3 kit was used for pair-end sequencing. Sequence assemblies where then performed using various programs including: SOAP Denovo, Trinity, Mira, Velvet, and RAST annotation.

Results and Discussion
To date, several elegant methods have been proposed for encoding digital information in DNA, each taking a unique approach to convert digital data into bases while at the same time reducing the occurrence of homopolymeric stretches [16]. However, within these early days of the field different encoding methods need to be investigated and the pros and cons of different approaches evaluated until the field converges on a single standardized and DNA-centric encoding method.
To convert plaintext to bases for DNA encoding, we took inspiration from written text. We combined the familiarity of text-based communication-the QWERTY keyboard-and the genetic code to develop individualized keyboards (iKeys) that serve as a facile method for DNA communication. The natural genetic code employs three-letter DNA words (codons) to represent the 20 common amino acids used to build proteins. The four-letter DNA alphabet of adenine (A), cytosine (C), guanine (G) and thymine (T) thus yields 4 3 = 64 distinct codons. Accordingly, codons are units of nucleotides that encode information that is then translated into function. Here, we abstract the concept of a codon to encode information by mapping the 64 distinct codons onto a modified QWERTY keyboard to produce a personalized code-iKey-64for translating text into DNA (Fig 2a). This serves as an encoding key (k) for converting a message (m) into a DNA encodable language (Fig 1b), akin to a substitution cipher. Furthermore, any specific version of iKey-64 can itself be encoded in DNA and provided as an additional component of a communication, serving as a unique dictionary for each message (Fig 2b and 2c).
It is known that stretches of homopolymers in DNA often lead to sequencing inaccuracies [16]. To mitigate this problem, we rationally designed iKey-64 to reduce the incidence of homopolymers in DNA messages by basing codon assignment on the frequency of use of letters in the English language [17] (Table 2 and Fig 3). Higher frequency characters were designated to codons containing 3 different nucleotides, lower frequency characters to codons with the same nucleotide in the first and last position, and the least frequent characters were assigned to codons with 2 or more homopolymeric stretches. Here we use English as an example, but a similar approach can be used for other languages. Since the codons AAA, CCC, GGG, and TTT are assigned to function keys-that can encode any user-defined function-no homopolymeric stretches longer that 4 bases are possible when encoding regular English text (Fig 2a). For example, the letter VK would be encoded with bases GTTTTC, where the maximum homopolymeric stretch of 4 Ts would be reached. Additionally, since all numerals (0-9) were assigned to codons containing 3 different nucleotides, no homopolymeric stretches longer than 2 bases are possible when encoding numbers, including instances where digital data stored in bits, trits, etc. is converted to bases (Fig 2a). For example, the numerals 110011 would be encoded with bases AGCAGCCTGCTGAGCAGC, where the maximum homopolymeric stretch of 2 Cs would be reached. In the event where multiple consecutive function keys are used, spaces can be used to reduce homopolymeric stretches. In subsequent experiments we investigate new methods for fragmented DNA communication. Therefore, we encode text as an example since the contents of the communication are not our focus, but similar approaches should be applicable for other data formats.
To investigate this DNA platform for information transfer, we sought to disseminate texts across multiple different DNA strands so that the desired message would be revealed only if the correct strand combinations were analyzed. A single communication channel between Alice and Bob can be intercepted by Eve at a single point of contact, thereby compromising the message m (Fig 1a). However, a fragmented communication channel would require multiple points of contact for interception by Eve (Fig 1b). This approach can add an additional layer of protection for a communication and also provide opportunities to explore introducing tiers of complexity within a communication that is afforded by the unique makeup of DNA as a chemical polymer for information storage. Therefore, we created a fragmented communication platform that we call Multiplexed Sequence Encoding (MuSE), a secret-sharing system [18] that allows for communication of a message m across multiple distinct DNA molecules.
To extract information that is fragmented by MuSE across multiple distinct DNA molecules, one would have to sequence the DNA molecules individually then compare the sequences to look for regions of sequence identity to locate encoded messages. However, the distinct nature of DNA as a data storage medium provided us with an opportunity to explore alternative methods of data extraction. Accordingly, we sought to develop a platform that allows for multiple distinct DNA molecules to be sequenced within a single reaction, whereby the encoded data shared among DNA molecules could be easily located via patterns formed in sequencing chromatograms.
In designing MuSE, we expected that when multiple DNA strands are analyzed together by Sanger sequencing using a common primer, at chromatogram positions where two bases are identical a large homogeneous peak would be observed, and where two bases differ a small heterogeneous peak would be observed, thereby producing a pattern (Fig 4a). Not surprisingly, the naïve sequencing of multiple DNA strands with a common primer is unable to achieve chromatogram patterning, and instead it produces poor readouts ( Fig 5). However, the codons in iKey-64 were rationally assigned to characters based on the frequency of use of individual characters, thereby serving to reduce the incidence of homopolymers in DNA messages that reduce the accuracy of sequencing reactions. Therefore, we expected the design of iKey-64 to mitigate the problem of base calls moving out of phase when multiple DNA molecules were sequenced simultaneously with a common primer as observed in  To test whether chromatogram patterning could be achieved with MuSE, we used iKey-64 to encode the message 'Massachusetts Institute Technology' on two DNA strands, where space1 (AGT) was used with the first DNA strand (DNA-1) and space2 (CTA) with the second DNA strand (DNA-2) to demarcate individual words in the sequences (Fig 4b and 4c). In this design, co-sequencing both DNA strands together should introduce troughs around words in the resulting chromatogram, thereby providing a simple method to locate the message from a single sequencing reaction. As expected, individual sequencing of DNA-1 and DNA-2 produced high quality reads, but gave no indication of the presence or location of a message (Fig 4d). However, in a DNA-1+2 mixture, forward sequencing with a common primer did not reveal a message through chromatogram patterning, but rather camouflaged the message (Fig 4d). This was due  to variable DNA sequences placed upstream of the messages, where stretches of C and A homopolymers at the 5' ends interfered with base determination during Sanger sequencing, thus causing intentional misalignment of the recognized bases in the chromatogram (Fig 6a and 6b). Only reverse sequencing of DNA-1+2 with a common primer produced a distinct pattern in the chromatogram, readily identifying the location of the message to be decoded with iKey-64 (Fig  4d). Since there were no interfering stretches of homopolymers in the variable DNA regions, there were no shifts in the base calls during sequencing, thus leading to predictable chromatogram patterning from a multiplexed sequencing reaction (Fig 6c and 6d). Therefore, as a proofof-concept we demonstrated that information from multiple DNA molecules can be extracted in a single reaction.
While individual sequencing of each strand followed by sequence alignments can be used to extract information from multiple DNA molecules, chromatogram patterning provides opportunities to explore new methods for data extraction and for incorporating information in DNA mixtures. To illustrate, the degree of contrast achieved in the chromatogram patterns can be tuned in a MuSE communication by adjusting the ratio of DNA-1/DNA-2 (Figs 7 and 8). This serves as a method to embed information in chromatograms discreetly so that alignments of DNA sequencing data to known templates cannot be used to identify embedded information (Fig 9). Such an approach provides new opportunities for exploring ways to store information in DNA, where data extraction is dependent on multiplexed DNA sequencing.
Next we wanted to determine whether we could use the MuSE method to disseminate information encoded with iKey-64 across more than 2 DNA molecules. This would enable us to introduce more complexity into a fragmented communication channel (Fig 1b). To demonstrate, we sought to fragment a communication that contained an intended message and a decoy message across 6 distinct DNA molecules. Such a communication would include three components (Fig 10): (1) secret-sharing system: the intended message and the decoy message along with instructions on how to differentiate between the two would be disseminated across 6 DNA molecules, (2) encoding key: the information would be converted from plaintext into bases using iKey-64, and (3) combination key: a puzzle would enable the end-user to identify the strand combinations that need to be analyzed in order to extract the desired message.
Accordingly, iKey-64 was used to encode watermarks, a combination key, a desired message, and a decoy message within 525 bp regions across six synthetically produced DNA strands, recreating a World War II communication made during the establishment of Bletchley Park [19] (Figs 10 and 11a), a significant point in cryptography history. The functions of the elements are: (i) watermarks-an identification tag for each DNA strand that allows the enduser to categorize each strand according to the combination key, (ii) combination key-a riddle whose solution provides the correct combinations of DNA strands required to analyze in order to unlock the desired message, (iii) message-the desired information to be communicated, and (iv) decoy-a false message to be revealed if improper strand combinations are analyzed, for example as a result of an incorrect solution to the combination key.
A workflow of the process for the WWII communication encoded with iKey-64 is shown in Fig 11a to demonstrate how an end-user would extract information from our DNA communication. The first step would be to pool a partial sample of the available 6 DNA molecules (n1 +n2+n3+n4+n5+n6) obtained from the fragmented DNA communication within the secretsharing system. The next step is to identify the combination key in order to know which strand combinations need to be analyzed to reveal the desired message. Co-sequencing of the pooled DNA molecules with Primer Key , which is common to all 6 DNA molecules, followed by decoding with iKey-64 should reveal the information: "Pascal's triangle: d2r6-reverse" (Fig 10). Here a simple combination key was chosen to demonstrate the concept, and this riddle means that the desired message is revealed from sequencing DNA pairs in the reverse direction as ordered in Pascal's triangle from diagonal 2 down until row 6. Next, the desired message can be extracted by co-sequencing the correct DNA pairs using the sequencing primer Primer Message , which is common to all 6 DNA molecules. Thus, if strand pairs n1+n2, n3+n4, and n5+n6 were to be co-sequenced using Primer Message and decoded with iKey-64, then the embedded message "Bletchley Park: GC&CS Codebreakers" would be revealed. If, for example, one were to misinterpret the key, then a decoy message would be revealed-"Captain Ridley's Shooting Party"-as a result of co-sequencing DNA pairs n2+n3, n4+n5, and n6+n1, a circular permutation of the correct combination key.  In the event that the end-user does not have access to Primer Key and Primer Message -an unauthorized user such as Eve (Fig 1b)-then random sequencing primers may be used. For example, the sequencing primers Primer ExternalFw or Primer ExternalRv (Fig 12) may be used instead of Primer Key and Primer Message to extract messages embedded in DNA fragments. As a way to Fig 9. Discreetly embedded messages cannot be identified by sequence alignments. By varying the ratios of DNA-1 (orange) and DNA-2 (purple), the degree of chromatogram patterning can be tuned (Fig 7). When one partner is present at a lower concentration, chromatogram patterning is still achieved; however, the resulting chromatogram aligns perfectly with the more concentrated partner. Therefore, messages may be discreetly encoded between two DNA strands and revealed in chromatograms, but not identified by sequence alignments. Left: alignment of chromatograms from   obfuscate random sequencing attempts of pooled DNA samples, we flipped the informationcontaining regions of our WWII communication between the forward and reverse strands. We hypothesized that this would create a camouflage effect, where co-sequencing reactions would not produce chromatogram patterning and instead produce sequencing reads of poor quality that did not provide reliable sequence information (Fig 12). As intended, co-sequencing with Primer ExternalFw or Primer ExternalRv did not produce chromatogram patterning, regardless of whether message or decoy pairs (Fig 13), or all six strands were co-sequenced (Fig 14).
On the other hand, if the appropriate sequencing primers are used as per the data extraction workflow (Fig 11a), then the information from the fragmented DNA communication can be efficiently extracted. To demonstrate, when Primer Key is used to co-sequence a pooled sample of all six DNA molecules from the WWII communication (Fig 10), then the combination key "Pascal's triangle: d2r6-reverse" is revealed via chromatogram patterning while the other data encoding regions (watermark, message, and decoy data) do not lead to chromatogram The 525 bp information-encoding regions of the WWII communication were flipped between the forward and reverse strands to provide a camouflage effect against sequencing with random primers (Primer ExternalFw and Primer ExternalRv ). While the external DNA regions surrounding the information containing regions were identical, strands n1/n3/n5 were placed in the forward direction and strands n2/n4/n6 in the reverse direction, with watermarks used to determine the orientation.  patterning (Fig 14). Similarly, chromatogram patterning is not observed as expected when Primer Message is used for co-sequencing all six strands, since the proper strand combinations are not being co-sequenced as per the combination key. However, co-sequencing of DNA pairs with Primer Message as per the order in Pascal's triangle-n1+n2, n3+n4, and n5+n6-reveals the message "Bletchley Park: GC&CS Codebreakers" via chromatogram patterning (Fig 15). Alternatively, the co-sequencing of the incorrect pairs-n2+n3, n4+n5, and n6+n1-reveals the decoy message "Captain Ridley's Shooting Party" (Fig 15). Expectedly, co-sequencing of other pair combinations did not lead to any patterning (Fig 11b).
While co-sequencing of a pooled DNA communication with primers that are not specific to the messages results in poor quality sequencing reads and a camouflage effect with Sanger sequencing (Fig 13), an unauthorized end-user may use other sequencing platforms such as next-generation sequencing (NGS) to gain access to encoded information. To recreate such a scenario, we tested the difficulty associated with NGS analysis of DNA samples, where the enduser has no prior knowledge of the DNA sequences or what is encoded within them. Accordingly, we prepared a pooled and purified DNA sample containing the DNA molecules n1+n2 +n3+n4+n5+n6 from the WWII communication (Fig 10). We then submitted the sample for NGS analysis to an outside party under blind experimental conditions, asking them to provide us with the assembled contents of the sample (Fig 16a and 16b). While sequencing of the mixture produced~2 million reads (Table 3), the blind assembly of the reads to reconstruct the contents proved difficult and inconclusive. However, after the initial analysis we informed the outside party that there were 6 plasmids in the sample, each containing 525 bp messages as inserts. We further provided the vector sequence and asked for the exact sequences of the messages in the sample. A second round of analysis identified 6 assembled sequences that represented our encoded information ( Table 4). Alignment of the 6 identified sequences with n1, n2, n3, n4, n5, and n6 templates provided most of the information in the six DNA molecules, with n1, n2, n3, and n5 providing almost perfect sequence alignments (Fig 16c). Therefore, an end-user should be able to extract data from a fragmented communication using NGS with prior knowledge of the DNA contents and the encoding method.

Conclusions
Rapid advances in DNA synthesis and sequencing technologies are enabling the use of DNA for non-biological applications. One promising application that has emerged is the use of synthetic DNA for data storage, both for communication and long-term data archiving [1]. However, in these early days within the field we advocate exploring different methods-of putting information into, transporting information as, and taking information out of DNA molecules -with the long range goal of attaining a standard that can be accepted by industry and implemented for future applications.
Here we developed a method of encoding information in DNA that reduces the formation of homopolymers by taking into account the frequency of usage of different characters in English text. Our iKey-64 method is designed to convert both plaintext and numerals into a DNA language, while allowing for personalization. Users can shuffle codons assigned to the keyboard or alter the characters within the keyboard to develop a unique layout. With chromatogram patterning and homopolymer reduction, codon shuffling will enable 9.1 x 10 61 iKey-64 variants out of a maximum of 64! = 1.3 x 10 89 iKey-64 variants (Fig 3). Furthermore, we developed a secret-sharing system in MuSE that explores fragmentation of messages across multiple distinct DNA molecules, and enables a new method-chromatogram patterning-to locate messages within DNA molecules. Our encoding, transfer, and data extraction methods are proof-of-concept experiments that are designed to explore new approaches of Some limitations of our approach are that the iKey-64 method of encoding has not been tested for encoding numerals, and other methods [7][8][9] may prove more efficient. The iKey-64 method does not incorporate data compression and it would be interesting to explore ways to adapt data compression methods for encoding using this method. Also, the iKey-64 character- (a) Plasmids containing n1, n2, n3, n4, n5, and n6 sequences (Fig 10) were grown and purified in dH 2 O, mixed at equal concentrations of 30 ng/μL, and submitted to an outside party (MIT BioMicro Center) for NGS sequencing and assembly under blind experimental conditions. (b) 300 ng of plasmids containing n1, n2, n3, n4, n5, and n6 sequences were run on a 1% agarose gel to demonstrate purity. (c) The outside party (MIT BioMicro Center) was provided with the number of plasmids, vector sequences, and the size of messages inserted into the vectors and asked to assemble the messages encoded in the plasmids. They assembled 6 sequences ( Table 4) that represent the messages n1, n2, n3, n4, n5, and n6. Here the alignment of the 6 assembled sequences with n1, n2, n3, n4, n5, and n6 templates are shown. Shown below is a legend for the colorcoding of the templates. Boxes highlight assembled sequences with near perfect alignment to corresponding templates. codon assignment was based on the frequenting of usage of characters in the Oxford English Dictionary and with numerals categorized as high frequency characters. While this method was tested for English-based communications, in theory character assignment can be modified for other languages using a similar approach. Additionally, encoding with iKey-64 needs to be combined with other cryptography such as AES, RSA, Twofish and other methods to ensure data security. This will in turn require customized versions of iKey to be designed to allow for the encoding of encrypted data. Moreover, our chromatogram patterning method has been designed for Sanger sequencing. Since we were exploring the fragmented communication of short messages that could be read with a single Sanger read, we chose to use Sanger sequencing for initial proof-of-concept experiments. However, future experiments will need to focus on using NGS sequencing methods as they are more efficient, cost effective, and allow for the investigation of more complex fragmented communications. Additionally, the concept of storing information within overlapping sequencing reads in multiplexed sequencing reactions may also be adaptable for nanopore sequencing and NGS methods.
Thus far our experiments exclusively utilized DNA maintained in vitro in the form of plasmid DNA. Maintaining a full set of MuSE plasmids inside the same cell is problematic due to likely segregation loss, as the plasmids share common replication origins and resistance markers [20]. However, DNA maintained in vivo can also be used for the communication of digital information, for example for genome watermarking applications [1]. In future experiments, we aim to utilize programmable post-translational protein assembly [21][22][23][24] to develop an addiction module that will allow for the intracellular dual maintenance of two plasmids with a common origin and selection marker, which would enable in vivo MuSE. We intend for these early explorations to stimulate the development of future DNA communication tools that in turn may further broaden adoption of DNA for communication, an increasing possibility with the development of portable sequencing devices [25,26].  1  TAATACGACTCACTATAGGGACAGTCTAGTGCAGCAGTCAGTACGAGTCTCATGAGTGTAGGATGCATGAGATCAACGCT  AGCATCGCACTGTCGTCATGCAGCTGACTCCGATCTGACTATCGTCTGAGATCAGAGCGTAACGTAGTCAGTGCTAGCAT  GCGAACTCGATGATCGAGTCGTATCCACTGTTGCCATATATGCAGACGGCATAGTATGCGTGTATGCGTCGAGAGATCAT  CCCTATCTTGACGTTAGTTACAAGATCCCACCAATACTGCCAATAGACGGTCCTCCTTTCCCGTTGCTGTAAAACAGTCA  TGATCGTCATCAGATCATGCCGGCGTGATCTAGATACACGGTGGATTCAGCTACTAGTCGAATCATGACGTGAGAAGCAT  GAACGATATGAAGAAGTTATGTGGATAGCTGTCGACGTGATCGTATCGATGCAGTCCTCAGGTCATATTACTCGACAGTT  GCTAAGTCAGTCATCGTCATACGATGCCGCTGAGCAATAACTAGC   2  TAATACGACTCACTATAGGGACAGTCTAGTGCAGCAGTCAGTACGAGTCTCATGAGTGTAGGATGCATGATCATGATTCT  GATCTAGTCCAGCAGTAGAGTCGTCTCGATCGATCTGTGCATCGTCAGCGATATTCGACGTAGTCGCTCGACCTGACTCG  TGAGTGCAGCTACGTGTCAGTCATCCACTGTTGCCATATATGCAGACGGCATAGTATGCGTGTATGCGTCGAGAGATCAT  CCAGTTCTTGACGTTAGTTACAAGATTGGCCACGATCCATGCTAACGTCTCTTCCACCTTTCCCAAAAAGTAACACACCA  TGACGTATCGACTACGCACATACAGCATATGTGGATGATCACTGACTGACTGAACTACGATCATGGTGTATGTGAGCGTG  TATGTGCTCGTGACTGGAGAAACGGCAACAGTGGATGATTGACGTACGACTGCTAGCTCAGGTCATATTACTCGACAGTT  GCTAAGTCAGTCATCGTCATACGATGCCGCTGAGCAATAACTAGC   3  TAATACGACTCACTATAGGGACAGTCTAGTGCAGCAGTCAGTACGAGTCTCATGAGTGTAGGATGCATGATCATGATTCT  GATCTAGTCCAGCAGTAGAGTCGTCTCGATCGATCTGTGCATCGTCAGCGATATTCGACGTAGTCGCTCGACCTGACTCG  TGAGTGCAGCTACGTGTCAGTCATCCACTGTTGCCATATATGCAGACGGCATAGTATGCGTGTATGCGTCGAGAGATCAT  CCAGTTCTTGACGTTAGTTACAAGATTGGCCACGATCCATGCTAACGTCTCTTCCACCTTTCCCAAAAAGTAACACCGAC  TGATCGCGCATACGGCAACAGTGACTCTCGACTACCATAGTAGTGAGATGGTGGATTACGATCGCGTGATCTGAGTATCA  TTGATCTATAGTGGATTGACTGATGATCGTACTGTCGTACTGACTCTGACGTCGATCTCAGGTCATATTACTCGACAGTT  GCTAAGTCAGTCATCGTCATACGATGCCGCTGAGCAATAACTAGC   4  TAATACGACTCACTATAGGGACAGTCTAGTGCAGCAGTCAGTACGAGTCTCATGAGTGTAGGATGCATGATCATGATTCT  GATCTAGTCCAGCAGTAGAGTCGTCTCGATCGATCTGTGCATCGTCAGCGATATTCGACGTAGTCGCTCGACCTGACTCG  TGAGTGCAGCTACGTGTCAGTCATCCACTGTTGCCATATATGCAGACGGCATAGTATGCGTGTATGCGTCGAGAGATCAT  CCAGTTCTTGACGTTAGTTACAAGATTGGCCACGATCCATGCTAACGTCTCTTCCACCTTTCCCAAAAAGTAACACTGAC  TGCATTCGTGATCATCATGCCGGCGTGATCTAGATACACGGTGGATTCAGCTACTAGTCGAATCATGACGTGAGAAGCAT  GAACGATATGAAGAAGTTATGTGGATAGCTGTCGACGTGATCGTATCGATGCAGTCCTCAGGTCATATTACTCGACAGTT  GCTAAGTCAGTCATCGTCATACGATGCCGCTGAGCAATAACTAGC   5  TAATACGACTCACTATAGGGACAGTCTAGTGCAGCAGTCAGTACGAGTCTCATGAGTGTAGGATGCATGAGATCAACGCT  AGCATCGCACTGTCGTCATGCAGCTGACTCCGATCTGACTATCGTCTGAGATCAGAGCGTAACGTAGTCAGTGCTAGCAT  GCGAACTCGATGATCGAGTCGTATCCACTGTTGCCATATATGCAGACGGCATAGTATGCGTGTATGCGTCGAGAGATCAT  CCCTATCTTGACGTTAGTTACAAGATCCCACCAATACTGCCAATAGACGGTCCTCCTTTCCCGTTGCTGTAAAACATAGT  CATGACATCGACTACGCACATACAGCATATGTGGATCTAGCTTGACTAGTCAACGTCGATATCGCGTGATCTGAGTATCA  TTGATCTATAGTGGATTGACTGATGATCGTACTGTCGTACTGACTCTGACGTCGATCTCAGGTCATATTACTCGACAGTT  GCTAAGTCAGTCATCGTCATACGATGCCGCTGAGCAATAACTAGC   6  TAATACGACTCACTATAGGGACAGTCTAGTGCAGCAGTCAGTACGAGTCTCATGAGTGTAGGATGCATGAGATCAACGCT  AGCATCGCACTGTCGTCATGCAGCTGACTCCGATCTGACTATCGTCTGAGATCAGAGCGTAACGTAGTCAGTGCTAGCAT  GCGAACTCGATGATCGAGTCGTATCCACTGTTGCCATATATGCAGACGGCATAGTATGCGTGTATGCGTCGAGAGATCAT  CCCTATCTTGACGTTAGTTACAAGATCCCACCAATACTGCCAATAGACGGTCCTCCTTTCCCGTTGCTGTAAAACATAGT  CATGACATCGACTACGCACATACAGCATATGTGGATCTAGCTTGACTAGTCAACGTCGATATCGCGTGATCTGAGTATCA  TTGATCTATAGTGGATCATGACGTGCATGCAAGCTTAGCTAGTCAGATCAGTAGCTCTCAGGTCATATTACTCGACAGTT  GCTAAGTCAGTCATCGTCATACGATGCCGCTGAGCAATAACTAGC doi:10.1371/journal.pone.0152774.t004