A Biobrick Library for Cloning Custom Eukaryotic Plasmids

Researchers often require customised variations of plasmids that are not commercially available. Here we demonstrate the applicability and versatility of standard synthetic biological parts (biobricks) to build custom plasmids. For this purpose we have built a collection of 52 parts that include multiple cloning sites (MCS) and common protein tags, protein reporters and selection markers, amongst others. Importantly, most of the parts are designed in a format to allow fusions that maintain the reading frame. We illustrate the collection by building several model contructs, including concatemers of protein binding-site motifs, and a variety of plasmids for eukaryotic stable cloning and chromosomal insertion. For example, in 3 biobrick iterations, we make a cerulean-reporter plasmid for cloning fluorescent protein fusions. Furthermore, we use the collection to implement a recombinase-mediated DNA insertion (RMDI), allowing chromosomal site-directed exchange of genes. By making one recipient stable cell line, many standardised cell lines can subsequently be generated, by fluorescent fusion-gene exchange. We propose that this biobrick collection may be distributed peer-to-peer as a stand-alone library, in addition to its distribution through the Registry of Standard Biological Parts (http://partsregistry.org/).


Introduction
The construction of specific plasmid DNA sequences is a routine technique in molecular biology laboratories [1]. The most widely used DNA sequences (for instance, a promoter followed by a fluorescent protein, a multiple cloning site and a polyadenylation signal) are easily found in commercially-available plasmids [2,3]. However, when requirements start to become more stringent, multiple plasmid modifications are required and, although many changes may be relatively simple to perform, multiple modifications may become time-consuming cloning challenges.
Complex, multi-factor plasmids have to be built frequently for the applications of synthetic biology [4,5,6], often with the combinatorial use of different DNA parts [7]. To simplify such types of constructions an idempotent cloning system has recently been developed by Tom Knight [8] (described in the BioBrick Foundation Request for Comments #10, BBF RFC 10; http:// biobricks.org/). Briefly, this system uses a specific set of restriction enzyme sites at the 39 and 59 ends of each DNA cassette ('biobrick'), such that a biobrick 'A' may be fused with a biobrick 'B' to produce 'AB'. 'AB' contains an uncleavable 'scar' sequence between 'A' and 'B' and, importantly, the exact same set of restriction enzyme sites as the initial biobricks, at the 39 and 59 ends. In other words, every biobrick fusion product is itself a new biobrick and may even be used iteratively for the assembly of concatemers (See Figure 1). Given the physical idempotent characteristics of the system, biobricks may be fused together in any combination of parts, with few restrictions in the number of the biobricks, and no restrictions on the order ('BA' would be as simple to construct as 'AB').
The RFC 10 Biobrick format [8] is itself very useful and has formed the core of engineering challenges such as the annual International Genetically Engineered Machine (iGEM) competitions [9], where students are asked to engineer systems using biobricks. It is a well-documented system with a large and growing collection of parts that use the prefix GAATTCGCGGCC-GCTTCTAGAG (or GAATTCGCGGCCGCTTCTAG for protein coding parts starting with ATG) and the suffix TAC-TAGTAGCGGCCGCTGCAG. However, the original scheme has certain limitations, such as the difficulty of fusing parts coding for protein domains, since the fusion 'scar' changes the codon reading frame from biobrick 'A' to biobrick 'B'. Phillips and Silver [10] resolved this issue by presenting a slight modification to the initial design (removing the last G of the prefix and the first T of the suffix thereby conserving the reading frame in the scar). Special attention is required when preparing new parts in this format (BBF RFC 23), since the removal of the last G of the prefix means that a Dam methylation site may be formed if the part starts with TC, which blocks restriction with Xba I. Since the scar resulting from a RFC 23 fusion (ACTAGA) codes for threoninearginine, a chimeric protein 'AB' may be assembled from two RFC 23 biobricks 'A' and B'. When preparing such RFC 23 protein fusions, users should be aware that the arginine from the scar may be problematic since it is positively charged.
It is of interest to note that biobricks in the RFC 23 format may still be assembled with biobricks in the RFC 10 format without requiring new restriction enzymes. Although such a RFC 23-RFC 10 cross-fusion does not maintain the 6-bp scar required for the maintenance of a codon frame, N-terminal fusions of RFC 23 protein coding parts may still be performed as long as the frameshift is corrected by an adaptor part. In addition, the RFC 23-RFC 10 cross-fusion may be useful for assemblies that do not require the maintenance of a reading frame or a specific scar size, such as linking non-coding biobricks (e.g. transcription factor binding sites). We refer to this compatibility between RFC 23 and RFC 10 as 'assembly compatibility', that should not be confused with 'RFC 23 compatibility' that is reserved for fusion between biobricks that maintain the reading frame. Further biobrick formats (with only limited cross-compability) were later proposed and are documented as BBF RFCs.
Engineering with the Biobrick system is usually accompanied with an abstraction hierarchy perspective. With this perspective in mind, one uses 'DNA' to design basic 'parts' that may then be assembled into a single functional 'device' with a specific function. Finally, several of these 'devices' may be put together to design a synthetic 'system' with a high-level functionality. This hierarchy facilitates synthetic engineering since, ideally, a researcher working, for instance, at the 'system' level does not need to know the specific details on how to engineer and build the 'devices' used in that 'system'.
Synthetic biologists are now in the process of developing and characterizing basic 'parts' or part sets ('devices' [11]) that may be used to engineer biological machines [12,13,14] and to model them [15]. These parts are available from the Registry of Standard Biological Parts (http://partsregistry.org/) which is supported by the BioBricks Foundation (http://bbf.openwetware.org/).
Although the registry is available to any researcher, whether or not they work in the field of synthetic biology, we believe that the size of the registry, the existence of different biobrick formats (not always compatible with each other), and the limited communication of this resource to non-synthetic biologists, may hamper the evolution and distribution of biobricks. We therefore developed a cloning tool that should be of interest outside the field of synthetic biology; a biobrick toolkit for the construction of custom eukaryotic expression plasmids, using frequently-used components. We originally noted that many biobrick parts that one would require for eukaryotic projects were apparently missing from the registry. These parts included: several multiple cloning sites with different reading frames; an extensive range of reporter proteins; eukaryotic selection markers; eukaryotic internal ribosomal entry sites; and protein epitope tags. Importantly, most of these parts should be compatible with each other [10,16]. In addition, the novel use of MCS biobricks facilitates the cloning of DNA fragments of interest (e.g. open reading frames, promoters), without having to first remove any internal cleavage sites of the biobrick enzymes in order to render them 'biobrick compatible'.
This work is intended to highlight the potential of the biobrick system to construct custom plasmids for eukaryotic cell lines, for both synthetic and non-synthetic biologists. Since many parts that we consider important did not exist in the registry, we have built a distribution of 9 previously existing and 43 new biobricks. We demonstrate the utility of the new biobricks with some examples of assemblies. In addition, we use the created library to establish a gene-switching system, from tdTomato to EGFP expression, using recombinase-mediated DNA insertion (RMDI). Finally, we propose that this collection be distributed peer-to-peer as an independent library, as well as being available from the registry, and discuss some advantages in having this type of collection.

Preparation of parts
New parts were designed following the recommendations in BBF RFC 23. Briefly, parts were formatted to contain the prefix 59-GAATTCGCGGCCGCTTCTAGA-39 and the suffix 59-ACTAGTAGCGGCCGCTGCAG-39. Construction of the parts was done using 1) oligonucleotide inserts, 2) PCR with oligonucleotide-directed mutagenesis when required [1] or 3) synthesized directly by GenScript Corporation (Piscataway, NJ). Importantly, the part sequences do not start with a TC since this forms a Dam methylase site (GATC) and digestion with XbaI can be inhibited. The Biobrick sequences are available at the Registry of Standard Biological Parts website (http://partsregistry.org/), at GenBank (http://www.ncbi.nlm.nih.gov/genbank/) and in the Supporting Information S1.

Biobrick assemblies
Assemblies were performed similarly to the method presented in the Biobrick assembly kit from New England Biolabs (NEB; Ref. E0546S) and the Gingko Bioworks manual (http://ginkgobioworks.com/support/). Alternatively, one may use the adapted streamlined protocol we have previously reported [14].
Briefly, 500 ng of the upstream part are digested with 20000 U of EcoRI (NEB) and 10000 U of SpeI (NEB) in NEB EcoRI buffer. At the same time, 500 ng of the downstream part are digested with 20000 U of XbaI (NEB) and 20000 U of PstI (NEB) in NEB Buffer 3. Finally and concomitantly, 500 ng of the destination plasmid are digested with 20000 U of EcoRI and 20000 U of PstI in NEB EcoRI Buffer. Digestions are performed at 37uC for 1 hour followed by heat inactivation at 80uC for 20 minutes. Note that the recipient pSB1A* plasmids in the library contain a ccdB death cassette that is removed in the digestion. Furthermore, each assembly should contain an antibiotic resistance gene that is not present in either the upstream or the downstream part (e.g. if the upstream part is in the pSB1AK3 plasmid, containing ampicillin and kanamycin resistance, and the downstream part is in the pSB1AC3 plasmid, containing ampicillin and chloramphenicol resistance, the destination plasmid should be the pSB1AT3 plasmid since it contains the tetracycline resistance not present in the other plasmids).
Since the destination plasmid contains the ccdB death cassette, a two-way ligation of the fragments can be done using the Roche rapid DNA ligation kit, according to the manufacturer's instructions, without need for gel isolation or other purifications. Briefly, 4 ml of the upstream part digestion, 4 ml of the downstream part digestion and 2 ml of the destination plasmid digestion are used. 2 ml of the ligation product are transformed in one shot competent Top10 E. coli (Invitrogen), according to the manufacturer's instructions, and are subsequently plated onto LB agar plates. The plates contain antibiotic for the resistance provided by the destination plasmid (in the example above one should use LB agar plates containing tetracycline) and are incubated overnight.
PCRs of Biobricks (from plasmid DNA or bacterial colony) were performed using NEB Taq polymerase, using the manufacturer's instructions, and using the standard sequencing primers BBa_G00100 (59-TGCCACCTGACGTCTAAGAA-39) and BBa_G00101 (59-ATTACCGCCTTTGAGTGAGC-39) with 20 cycles at 55uC annealing temperature, with appropriate denaturation and elongation steps. PCR products were analyzed on agarose gels and positive colonies were grown overnight for DNA isolation using the Qiagen spin miniprep kit according to the manufacturer's instructions. All clones were verified by DNA sequencing.

Cell culture and transfection
HEK293 cells were purchased from ATCC. Cells were propagated in Dulbecco's Modified Eagle Medium (Gibco) supplemented with 10% fetal bovine serum (Gibco) and 1% penicillin-streptomycin (Gibco). Transfection of the cells was performed with lipofectamine 2000 (Invitrogen), according to the manufacturer's instructions, in 6-well plates: cells were incubated for 4 hours with 4 mg of DNA and 10 ml of lipofectamine reagent. Stable clones were prepared by selection with 0.5 mg/ml G418 (Sigma).

Cloning small Espin into the MCS of a custom-made plasmid
Rat small Espin (sEspin) [17] was amplified by PCR, from a plasmid kindly provided by the group of Hernán López-Schier, using the primers sEspinF (59-AAGAGGGATCCATGAACT-CCC-39) and sEspinR (59-CTTCTTACCGGTTTACTTAGG-GATCTCCCCCTTC-39) at 55uC annealing temperature. The PCR product was cloned into the Biobrick custom plasmid using the restriction enzymes BamHI and AgeI.

Microscopy
After transfection, cells were allowed to recover overnight and were dispensed into glass bottom culture dishes (MatTek). 24 h later, the medium was replaced with PBS for observation under the microscope.

Recombinase-mediated DNA inclusion plasmid preparation
The plasmid used for the creation of the stable clone expressing tdTomato was prepared using an assembly of the biobricks between the CMV promoter and polyadenylation (pA) signal. The assembled plasmid was digested with XbaI and PstI and cloned into pEGFP-C1 digested with NheI and PstI, thereby removing the EGFP sequence and providing the CMV upstream and pA downstream of the biobrick insert. Neomycin resistance used for selection is provided by the pEGFP-C1 plasmid backbone under the expression of the SV40 early promoter. The EGFP plasmid for inclusion was prepared using the biobrick collection.

Flow cytometry
Flow-assisted cell sorting (FACS) analysis was performed with FACSCanto (BD) and cell separation with FACSARIA II (BD).

Library
To create a distribution for the construction of custom plasmids, we have put together a set of 9 existing and 43 new biobricks and plasmid backbones. We believe that these should be useful for the creation of custom-made plasmids for use in eukaryotic cell lines (see Table 1). Importantly, a set of multiple cloning site (MCS) biobricks, covering all three different frames has been built, essential for the combination of the biobrick assembly strategy with classical cloning strategies (Fig. 2).
To facilitate flexible construction of custom-made plasmids, all biobricks are in the same format, containing the same prefix and suffix, as described in the Methods section. Biobricks may be divided into: (i) Non-coding (not intended for containing coding sequences; lengths are often not in multiples of three, or have a stop codons in the sequence).  Table 2 indicates whether a given biobrick maintains the codon frame, whether it contains a Kozak sequence or a stop codon, and shows the restriction sites present in each biobrick or plasmid backbone. In this table we explicitly present those elements that may not be used for biofusion assembly (highlighted in bold).
Whereas the biobrick assembly workflow will work for all parts in the collection, for classical cloning it is up to the user to check which biobricks will be used to assemble a plasmid. For example, the user must check whether a given restriction enzyme site is present only once (e.g. in the MCS site region) or whether it is not unique and is also found in another biobrick or plasmid backbone.

Constructing simple assemblies
The biobrick assembly principle is inherently iterative, allowing longer and longer poly-biobrick constructs to be built in a stepwise fashion (Fig. 1). This is particularly useful for making related constructs with variable copies of a motif, such as a transcription factor binding sites. To illustrate this, we show how the binding sites of the bacterial MS2 phage coat protein, which can be used to repress mRNA translation in synthetic biology applications [18], can be conveniently concatenated with biobrick assembly (Fig. 1D).
One of the simplest expression plasmids that one might construct using the distribution is a plasmid such as that presented in Figure 3A; a simple EGFP expression plasmid, used to label cells, as may be observed in Figure 3B. A user will typically want to create a translation unit and insert it under a promoter with a polyadenylation sequence in the end. The translation unit will   typically be composed of a head domain (in this case the Kozak sequence alone) followed by one or more internal domains (in this example the EGFP reporter coding sequence) and finished with a tail domain (here the simple stop codon biobrick).

Constructing a MCS plasmid for classical cloning
Although many commercially-available plasmids contain MCSs to allow fusing proteins of interest to fluorescent proteins, many combinations of colours, N-C orientations, stable cell-line selection genes or half-life modifications are simply not available. To illustrate this we have created a more complex plasmid, containing two cistrons, that is not commercially-available. In the first cistron, we assembled the blue fluorescent protein cerulean [19] before a complex MCS (assembled using MCS1 and MCS2). We subsequently cloned an actin-binding protein, rat sEspin [17], using the BamHI and AgeI restriction sites ( Figure 3C). By fluorescent microscopy, we observed Cerulean-sEspin localization in actin bundle stress fiber-like structures [17] (Fig. 3D). In the second cistron we included the neomycin resistance gene for positive selection. The example shown is just one of many customized cloning vectors that can be generated with this platform, and requires just three biobrick cloning iterations (first round: CMV-Kozak, Cerulean-MCS1a, MCS2a-IRES, Neomycin-pA; second round: CMV-Kozak-Cerulean-MCS1a, MCS2a-IRES-Neomycin-pA; third round: CMV-Kozak-Cerulean-MCS1a-MCS2a-IRES-Neomycin-pA). The use of MCSs links the biobrick format to classical cloning strategies, which should increase their appeal to researchers who simply wish to 'cut and paste' their DNA cassettes with standard restriction enzymes.

Recombinase-Mediated DNA Insertion
The collection presented here may be used to obtain stable cloning systems, for chromosomal integration in eukaryotic cells, such as Recombinase-Mediated DNA Insertion (RMDI) [20]. We implemented RMDI by establishing a stable cell line expressing tdTomato, and containing an heteromeric lox site in its coding sequence (lox66 [21]; see scheme in Figure 4A). Through Cremediated recombination we inserted DNA from a donor plasmid, containing EGFP preceded by a compatible heteromeric site (lox71). Thus, red expression was converted to green expression.  Table 2. Cont.
Since the EGFP plasmid did not contain a promoter, cells should express EGFP alone only if inserted at the correct RMDI site. Expression from random insertions next to genomic promoters (expected to be rare events) would be associated with both green and red expression. However, in the early stages after transfection, even correctly recombined cells would be expected to have some residual red expression, until the levels of tdTomato were degraded or diluted by cell division.
Indeed, after three days we observed a cell population expressing both EGFP and tdTomato by flow cytometry (0.23% for RMDI when using both the EGFP plasmid and Cre versus 0.08% for random insertion in the control with the EGFP plasmid without Cre; Fig. 4B and C). Thus, there appeared to be some Cre-induced increase in GFP fluorescence, but it was not clear whether the effects were site-specific or due to background fluorescence. Therefore, to investigate whether the tdTomato levels would subsequently fall, indicating correct RMDI, we sorted the EGFP-positive population in Figure 4C, grew them to confluency and re-analysed the final population. We thus obtained a population of cells where RMDI had indeed occurred, with high levels of EGFP and low levels of tdTomato, similar to nonfluorescent controls (20% green-only cells; Fig. 4D).
When considering only the EGFP-positive population (Q2+Q4) we note a marked shift in the Q4 population (8% of Q2+Q4 before sorting to 83% of Q2+Q4 after sorting and regrowth). This suggests that the insertion of the EGFP plasmid occurred at the intended lox site, thereby disrupting tdTomato, but that extra time was required for tdTomato levels to fall, via dilution and degradation, during regrowth of the sorted cells.
The 17% of Q2+Q4 found in Q2 is an EGFP-tdTomato double-positive population (4% of the total cell population; Fig. 4D), likely arising from cells containing multiple chromosomal copies of the initial tdTomato construct: here recombination occurred in one or more instances, but not in every copy. Random integrations of GFP near promoter regions are likely to be rare but may also contribute to this population.
We also observed tdTomato-positive EGFP-negative (32%) and tdTomato-EGFP double-negative (45%) populations. These are likely arising from cells that were unmodified and yet were carriedthrough the EGFP cell sorting step (false positives) or from cells that downregulated the CMV promoter [22].

Discussion
In this work, we have created a distribution of DNA fragments compatible with the Standard Registry of Biological Parts [8]. This distribution is intended for the specific use of creating custommade plasmids, with a focus on use in eukaryotic cell lines, particularly because many previous parts have focused on prokaryotic components. We have therefore constructed biobricks for elements that are routinely used in plasmids for transfection in eukaryotic cell lines. Using these elements, we provide three examples of applications of this system: the assembly of sequence motif concatemers; the assembly of vectors for transient and stable Although the present work makes use of existing concepts, such as the standardization of DNA parts [23] and classical cloning for expression in eukaryotic cell lines [1], the construction of the required elements into one collection and the combination of both cloning concepts makes this method a distinct new resource for preparing eukaryotic contructs. To provide this tool to users, we propose that the collection presented here be available as a standalone biobrick distribution exchanged in a peer-to-peer fashion, in addition to its availability through the Registry of Standard Biological Parts (http://partsregistry.org/).
A researcher may request access to the Registry to obtain a large collection of biobricks which are a valuable resource, especially to synthetic biologists. However, the size of the Registry database may be detrimental in some instances, especially for sporadic users. One example of this is the impossibility -at the moment -to search the registry database for biobricks in a specific format. In fact, special care must be taken when using biobricks from the registry since it is not always clear whether two internal protein domain biobricks may be fused in frame, because of the different biobrick formats [10,16]. This is in part due to the definition of the registry database of biobricks as being 'compatible' with a given format when meaning they may be assembled together even if the coding frame is not maintained, whereas other users may consider 'compatibility' as an indication that the fusion conserves the codon frame.
In contrast, RFC 23 compatibility in the small collection presented here -in the sense of codon reading frame conservation -is ensured except where otherwise explicitly stated. Another advantage of having an independent small distribution for a specific function is the possibility to exchange it in 96-well plates and to be sure that one is using sequence-verified components. For example, when sequence-verifying the registry, certain discrepancies have been found [24]. The bacteria containing the plasmids can be grown and shipped in wells filled with agar medium, allowing direct sharing between peers, without need for advanced robotics or pipetting systems.
We believe that other similar biobrick libraries should be created to be self-contained. That is, all parts in such libraries should be compatible and sufficient to be used for a specific function. For instance one could envisage kinase-phosphatase or signalling cascade libraries; transcription factor libraries; metabolic enzyme libraries. Such a library-based form of distribution may be one of several solutions for the growing size of the Registry that may rapidly reach a limit in the cost-efficiency of its own distribution (the registry is .13 000 parts and growing [25]). Ultimately, we hope to stimulate debate in this growing open standard for biological engineering.

Supporting Information
Supporting Information S1 FASTA DNA sequences of the Biobrick Collection. Prefix and Suffix are presented in capitals. (DOC)